Who pays for high demand data?

andyypants · April 24, 2019, 5:33pm

so I was reading this very good post on what markets might adopt the SAFE network : Market Research on market size and growth

It came up that the top 100 torrents are like 150GB of total data but account for at least 450TB of data transfer. Now correct me if I am wrong but it is my understanding safecoin goes into the farmer pool based on the size of your put but farmers get paid based on how many get requests they fulfill. There seems to be a large imbalance here where some data might have a much lower price to put compared to how much will be paid out to farmers for get requests of it. Is there a way that this gets balanced out or is it like one big communism and everyone basically is always picking up the slack of these leaches?

I mean that’s how torrents work and somehow its still worth it for a small percentage of people to pay for a mob of leachers so maybe the issue is null? It’s a bit different in that there the “farmers” are being taken advantage of, but on the SAFE network they will be paid fairly, its the end user that might get taken advantage of. I mean it’s really a question of if a person who buys safecoin and then storage space on the network is still getting a utility that outweighs their costs. I mean for the guy that just puts some pics of his family there for them and him to get at most every couple months. It might be that there is some communism in there and its not the very BEST price but its still one the user is willing to accept for what they get.

Don’t get me wrong… I want the SAFE network to be my source for free porn… but I want it to be done in a sustainable way that doesn’t take an unfair share of network resources from other people that are also paying for them.

neo · April 25, 2019, 12:16am

Caching kicks in for high demand data and there are no farming rewards for cached data. Its just the last hops that would have happened if a GET was actually done.

Caching is on the path to the requester so high volume data will be widespread so the load is handled by more and more nodes as it is requested more. The opposite to a server model.

andyypants · April 25, 2019, 12:57am

I don’t really understand how “caching” works here. I mean if more people want to download the data you have that much more pressure on those pipes even if you have it all ready to go…

So I guess my real question is who is paying those more and more nodes? To me it seems like this cost is put on the backs of anyone doing a put for a lower volume data. Maybe I am way off here but that is how I understand it now.

neo · April 25, 2019, 1:10am

A chunk that is requested is cached in the nodes that it passes through
If the same chunk is requested again then any node on the path that has it cached will return it.
This is no extra work for the nodes since without caching then all the nodes in the return path need to move the chunk. But caching means the closest node in the return path returns the chunk, thus less work overall for that chunk request.
Because there is no extra work then there is no special rewards

Caching is a function of the nodes. All nodes do it. And caching reduces load overall for every node because the chunk does not need to pass through the whole path. So in effect caching reduces the cost of bandwidth.

Now for a high volume chunk. 1,000,000 requests of the chunk. (average of 10 hops per request)
Without caching

1,000,000 * 10 nodes needs to pass on the chunk
10 million * size of chunk in bandwidth costs

With Caching (high volume means cache return occurs within say 3 hops

1,000,000 * 3 nodes need to pass on the chunk
3 million * size of chunk in bandwidth costs

From this on whole network

we see lower bandwidth requirements
we see lower node costs (ie lower average cost per node)

Thus when considering a large number of such chunks the overall effort for nodes is reduced for having caching. This is why there is no caching rewards because in fact caching reduces the efforts/costs/bandwidth on the node over time.

andyypants · April 25, 2019, 1:41am

I do like to hear there is some way to “mass produce” here… so at least for high volume data the cost per unit goes down. That is quite a bit better then nothing! But still I don’t see it as fully avoiding that high volume data costs more network resources but the price for all puts is the same. I think the question will be how much this discrepancy costs for those with low volume data. Will it be a drop in the bucket or will it be like too much communism to the point its unsustainable.

neo · April 25, 2019, 1:58am

Maybe think in terms of what it’d cost a VM server where you pay for every GB of data transmitted.

SAFE spreads the load across the network.

Now consider the original “It came up that the top 100 torrents are like 150GB of total data but account for at least 450TB of data transfer.” which is obviously spread across weeks and months.

Now just like torrents which do not reward the torrents do not collapse SAFE will not either and in fact has a much better opportunity to exceed torrent network many times over since it is not just torrents, and with rewards more in the torrent community are likely to remain online rather than download and not seed.

Back to the figure. “Usenet” NNTP was before torrents and the web and even today sees much more than 3TB of new data posted per day to it. That is >100TB per month so and is definitely on par with torrents, and there is no good stats on the volume downloaded but is in terms of 10’s of TB per day, or PB’s per month. A tad more than torrents.

SAFE will also replace the functionality of NNTP servers so we have that community too.

While these may not swap over to SAFE quickly I’d expect a year should see many/most.

The point is that torrents show that people are willing to donate bandwidth to help others which help themselves in the long run. Imagine now that they can be rewarded for having nodes and sharing chunks. Even if some highly requested chunks are being cached. It is unlikely that many of the chunks in ones vault will be highly requested ones. If it were then only highly requested files are being stored doesn’t it, but over time even highly requested files become less so as others become popular.

andyypants · April 25, 2019, 2:05am

Well are we talking about the same kind of person seeding torrents because they gain enough from the network to offset the altruism, compared to just the dude that wants a better dropbox? Is he gonna pay more to store his files just because it contributes to his source of free porn? I am not sure enough people would connect this and then choose to act in a way that costs them but benefits the group they are in.

neo · April 25, 2019, 2:09am

Considering that the upload will remain forever this is a better than torrents. NNTP shows people are willing to pay since it costs to connect to a NNTP server.

andyypants · April 25, 2019, 4:02am

ok people are willing to pay the costs. But a lot of the time even when people see a positive cost/benefit it doesn’t mean its necessary maximizing on that, just the best option that is available.

I wish I had a solution to offer that’s not like go back to the drawing board and rework the whole farming economy, delaying the release of a MVP. Best I could offer is maybe some upper limit on gets per put. Like at some point you have to basically put again if you want more gets available. That’s inefficient though… just like some kinda way when the network needs more nodes to serve up that high quality free porn its not like free nodes. I know that invites complications… like that is not exactly immutable data if someone can just request to see it enough then its inaccessible to everyone…

neo · April 25, 2019, 4:30am

I think you are trying to make a point but trying to have it made for you by pointing out problem after problem.

First it is expected that the cost of storing on the network will be on the same order as going and buying a disk to store your info on your hardware.

Second there is already limits on how much a file will be got. Its called human nature and how often something will be retrieved in a set time period.

Third is that cultural differences means that usually its not the whole world going after one piece of info

Forth as something gets popular that the hop distance to that material drops due to caching but the number of nodes caching it increases in an exponential form with the limit at all nodes.

Fifth - Dedep will mean there is only one “copy” of a chunk stored on the network. So Putting again is only going to cost the person doing the PUTs but the data is not stored again. Yes the network does work to attempt the PUT so the charge is justified.

Sixth - even if you could PUT the same file again then it does not help much because caching is already doing that.

Why? Caching handles a file (eg some movie star is photographed/filmed doing something really stupid) and 1 billion people today want to see the video. Guess what the caches in a whole lot of nodes will be serving up the chunks from their caches all day. Assuming here the network is maturing then it would be 100 to 300 thousand nodes retrieving the same chunk 10,000 times in one day for a way over the top example. 1/7 of the world’s population (1/3 the people with fast enough internet) is way way over the top, but the SAFE network would go fine.

I really am not seeing a massive problem. You took the torrents of 450TB from top 100 and that is obviously not per day but for all time for those 100. Lets say 3 months as a very conservative figure. That is 5TB per day. Now if you try to translate this to the SAFE network then the network is going to be used in large by the population, otherwise the torrents would still also being done by torrents as well.

Thus for you first example we have a mature network with the estimated 1/4 of internet connections that did some torrenting without the SAFE network. What over 1 billion. Even if the stats were lying and its 1/10 of internet connections then its 300 million who torrented.

Lets say 1/10 of those become nodes and there are at least an equal number for NNTP users and an equal number who are from neither group. So that could be as high as 90 million nodes.

Now lets say there is cross overs and that 90 million figure is too high and its 25 million nodes.

Then that 5TB per day is spread across 25 million nodes and upto say 50 million nodes.

That is an average of 200,000 Bytes per node 200MB average per node (or less)

Even if 10 million nodes that is 500MB per day for caching.

But this is assuming the top 100 files were only torrented for 3 months and not 1 or 2 or 3 years.

andyypants · April 25, 2019, 6:15am

I agree with this for the common every day store my files for me myself only to retrieve someday it can’t be more expensive the just having it stored on some central server

Lets not assume limits on human consumption of some particular data. It could be very usable over many reproductions. Say you have like a second life 2.0 game. Those files are gonna get accessed way more then just my random shit I thought was worth immortalizing.

not saying everyone is after one piece of info just one particular one could become very accessed among a particular demographic skyrocketing the costs compared to bob’s family photos.

ok I get that it is a logarithmic curve and not a linear one in terms of cost for gets of one particular set of data but that doesn’t mean its not always increasing the more there is.

True there is no actual utilitarian point of storing data more then once. I am only saying that when it is causing above a certain number of gets there should be an additional cost for that. I am sure there are more elegant ways then being like each put only has so many available gets.

That’s great and there should be some discount for the efficiency of mass producing the same data. Just not totally free cause it is not totally free for the network

The numbers are somewhat speculative but the point of my argument was that some data would be more in demand for gets compared to others and it is not fair everyone pay the same for puts in that case.

neo · April 25, 2019, 6:46am

Yes this is expected and that is where the factors are all included in the rewards for farming. Every node is farming and being rewarded for that and included in that reward is doing the other functions of the node. Rewards are not set in stone so many there is some room for the section to keep tabs on each nodes relative work and inform the rewards mechanism but this would assume divisible coin is being used so that the reward amount can be finely tuned.

So really at the end of the day its important to ensure the farming reward is covering all the costs of running the node.

As to unfairness due to some data being more popular, then its a case of data is stored fairly randomly so one day my node might have higher than normal caching or GETing but next week it might not. Then their is a regular relocation of nodes and thus a different set of chunks and so a different load profile.

But if you look at my semi realistic figures above based on the torrents stats then you see that the volume of data being pulled is not all that great when it is spread across the network.

andyypants · April 25, 2019, 9:06pm

I would say in the current model farmers are paid in a very fair and consistent way for the amount of work they do. I would not change that in any way if it can be avoided. Its the users paying for PUTs that seem to get variable costs depending on their use case. Maybe they had to make a decision here, do we want maximum fairness for farmers or for users. I think farmers will be more business like in their evaluation, so it might be more important to make sure their game is fair compared to users that will take something that is on balance value gained, even if it should/could have been more value.

neo · April 26, 2019, 12:51am

There really is no way to determine if one file uploaded will be more popular than another when its being uploaded.

Also the user uploading the (public) file is unlikely to know how popular it will be when uploaded.

Important: To have perpetual data then you have a major problem trying to charge more if a file become popular. What are you going to do? Remove it? then you break the goal of perpetual data.

No this seems to be one area where we just have to tweak the algorithm that charges for PUTs to cover the popular and unpopular. That is charge an amount for any upload that will ensure that there is enough for the downloads of the material.

We can have maximum fairness for both farmers and users. Remember that the network will determine the actual costs for PUTing and the reward rate. These are not directly connected remember. The network charges for PUTs without regard for the current GET rate. The reward rate is not looking at the current PUT rate. These are not dependent on each other in a direct sense. So the network can have a lot of flexibility to be fair to both.

And before we get into the 100’s of posts about how this is so there is a topic somewhere about it. What I will say is the current RFC has a factor in the reward payment system that accounts for the “kitty” of available coin addresses for coin creation. Only “available in total coin” (ratio) creation attempts will succeed. This ratio of success means that as available coins drop so does coin issuance attempts succeed. This has the effect of scarcity and the market will react so in effect we probably will see the fiat reward amount not suffering.

Also the charge & reward rates are based on similar parameters so yes there is a indirect connection though one significant variable. This means they will not get too far off track in the longer term and ensure the “kitty” is never too depleted for the state of the network. Large “kitty” early on and smaller “kitty” towards a steady state much later on.

tl;dr
The way proposed for rewards and PUT costs do not exclude being fair to both uploaders and farmers. It will be in the tweaking of the algorithms that will determine if the network determined prices/rewards will be fair. The market will also react and typically the interaction (due to algorithms designed that way) will mean that fiat prices will be fair too (except for bull/bear manipulations/runs). Obviously no guarantees

Anyhow this is where the beta tests will help to determine the best algorithms to start off with. Also there is no reason we cannot also tweak the algorithms in a later release if they are found to be wanting.

andyypants · April 26, 2019, 5:49pm

no disrespect to your long reply with a short reply… its basically cause I give thumbs up to all of this and have nothing more to add lol. Ya we need to see what happens when we combine home vaults, uploaders with more diverse use cases and test-safecoin. There are without a doubt great, ok, and bad settings that could be chosen. It will be all about testing then adjusting those dials then testing some more.

neo · April 27, 2019, 1:12am

Yep it is not going to be an easy exercise to ensure the algorithm is correct. But if you look at the current one, its not bad with its control loop and negative feedback path. Max/Min costs could do with special tweaking, but overall it does a reasonable job and if you want there is a reasonable amount of discussions in the forum.

no disrespect seen by me.

I gave longer than needed for others reading the topic and hopefully to give you an insight into my take on the issues involved. That way new ideas/concepts may be seen by others or yourself. And thus improve the understanding or propose a better way.

mav · April 27, 2019, 6:25am

@neo has covered most of the points, but one other thing to consider with caching is it stops the request reaching the ‘final’ vault and thus stops a farm attempt from happening (thus reducing the rate that safecoins are issued). This is an advantage to the cacher since less overall farmed safecoin means higher chance for success of their own farm attempts (in addition to saving bandwidth by not having to pass and receive messages from further along the route).

Cache is a very useful but complex aspect of the network. I consider caching to be a somewhat adversarial mechanism since it’s a way to prevent other vaults being rewarded.

My feeling is most vaults will be 99% cache and 1% chunks, eg ‘caches which store some chunks’ rather than ‘chunk stores that have some cache’. I suspect cache is going to be extremely heavily utilised by vaults wherever possible.

Just thinking out loud now about cheating with farming and not at all related to the OP, presumably most clients will have multiple connections to the network and will request chunks from the connection closest to the destination (ie least hops). Less hops also means lower chance to hit a cached copy of the chunk along the route. I dunno if this impacts ‘cheating’ on farming… might need some sort of farming rule that only allows farm attempts if the request has passed through at least 3 hops? If I run a vault and keep reconnecting my client until it’s directly connected to my vault, I can request my vault chunks knowing every request will be eligible for a farm attempt because there’s no chance of hitting cache along the way… haven’t thought too hard about it so maybe there’s holes in this logic.

neo · April 27, 2019, 6:31am

I was thinking about this as I wrote about cache and hops.

You will still have the issue of the section determining who is rewarded. So the requested chunk still has to be retrieved from the all the vaults with it and the best as indicated by the section will be eligible for a coin attempt. Thus it may not be your vault even though its closest to you.

But yes its a good question @maidsafe in how many hops is forced. See @mav post

c0dr · May 26, 2019, 1:19pm

as I understand in torrenting only the interested get the torrent file and contribute on the file, but in safe network its spread across everyone and if they find that all this bandwidth is not something poeple want to “donate” that means not be included in their vault - cache payback, then they will fix it and include it

neo · May 27, 2019, 1:26am

@c0dr All the rewards a Node gets for farming goes towards all the other functions for the Node such as bandwidth, CPU usage etc. Basically the Node is paid for being a Node using the rewards from farming.

Users of the SAFE network do not do any functions of the network, no caching, no bandwidth, no nothing except what they use to actually receive the data.

For bittorrent scenario the suppliers would become the Nodes and be paid for supplying and the leechers become the clients.

The difference is that the number of potential suppliers become all the nodes in the network and they are paid for what they retrieve for the network. Really a win-win

Topic		Replies	Views
Quick little question Autonomi Network Token (incl (e)MAID)	7	1367	August 27, 2015
The Safe Network's economics Autonomi Network Token (incl (e)MAID)	33	3752	January 20, 2015
Thoughts and questions on PUTs and GETs Autonomi Network Token (incl (e)MAID)	19	1623	March 11, 2015
Trying to understand safecoin Autonomi Network Token (incl (e)MAID)	3	1298	November 3, 2014
Question about Storage on the SAFE Network Beginners	6	1206	January 18, 2018

Who pays for high demand data?

Related topics