Next step of safecoin algorithm design

I like it, it’s real easy to sample.

I’m not sure how I feel about that rewards are not based on performance alone (returning a chunk when asked) but also on the space provided.

As I understand, shaping the reward is supposed to fight centralization by punishing very large vaults, but it will lead to the opposite result. If one has a lot of resources at his disposal, he can get higher rewards by just running multiple smaller vaults on the same computer. The catch is, the more sophisticated we get at stopping that from happening, the fewer attackers will have the sophistication to circumvent it, so the higher centralization will become.

4 Likes

Valid concern and one we do need to think deeply about.

Kinda of, but not so much punishing rather than encouraging vaults of the size the network wishes. So very small is not great, very large is also not great (for now).

This is not so easy when you look at all of the connections he must make as well though, (just for info). In any case, it is better (possibly) multiple nodes across the network failed as opposed to very large silo’s of data in a concentrated space. Again worthwhile discussion.

3 Likes

Well, this is kind of the point. We would be just selecting the most sophisticated with the most resources. Instead of getting into an arms race where we’re always one step behind, it may be better to make running multiple vaults on the same computer (by multiplexing bandwidth, file descriptors, storage) a default feature.

Let me join the above with something I wrote before. If a section allows infant vaults to join with a limited size (and vaults would allowed to double their storage whenever they reach a higher age) then it’s not such a big problem if one computer runs multiple vaults. Even if it fails, with all the small vaults it hosts, it can’t cause much damage because its storage is spread across several different sections. However, once the vaults it hosts become older and bigger, some of them will need to be either abandoned or migrated to another physical computer (for example, by 3rd party tools), so the potential damage is again limited.

5 Likes

I agree with this sentiment and a route for the future. The issue right now is complexity in such patterns. i.e. the groups then need to know who stores what and how to select who should deliver the data as opposed to knowing everyone has everything or else we kill them. This mapping of data to individual nodes then does add complexity. It is made worse by then us saying OK we need X copies of each data item, so the smaller vaults can hold 1 of these, but they are likely to drop off, therefore the smaller vaults do not hold 1 copy, but can be regarded as holding a fraction of the copy (from one angle) and rather than 4 copies we may have more than that if it is all smaller vaults and less if they are larger, but can we do less? If we do we could have less than 1/3 of the group holding the data and break our, oh as long as 2/3 are honest we are OK?

So we can look further into the complexity and try and resolve at least some of that. What if the section elders were the data/node managers and hold no or little data, but manage the groups in the section as well as the internals of the groups and who holds what? That means they hold less data but do more work, finding who should hold the chunk who should deliver them (ignore fastest etc. for now as that makes it more complex). Then the farming rate needs to consider the elders cut of the payment for a nodes ability to store and deliver data.

I hope this helps a little when we look at differing size vaults (which we do have as a debate in house a lot, well certainly with Viv and I :slight_smile: )

7 Likes

Could you describe this part in a little more detail? I thought that there were vault managers who mapped out (or otherwise know) which vault has what chunk at the last hop of the routing chain. Not true?

1 Like

vault manager ?

There is data manager - they hold chunks and who has them
Also Node Manager - closest nodes to a node. They can reward/penalise a node.

Data managers in sections can still happen, however knowing what node specifically holds data is managed by closeness to a chunk address. If we add in the nodes ability to store more data or less data then the algorithm for what chunk is where is more complex. It may not necessarily be the closest nodes any more. If these new groups cross a section boundary then it becomes even more complex.

3 Likes

From the perspective of the section it wants a certain ratio of spare space depending on how much flexibility it thinks it needs to meet future demand (this is true regardless of the specific curve or mechanism to achieve ongoing spare space).

From the perspective of the vault, it would receive requests for storing sacrificial chunks and would drop them either a) if it has no space or b) strategically in a way that optimizes the farm rate. Ditto with periodic tests.

So the ratio is enforced by both the section (in setting the ratio) and the vault (because they can choose their degree of compliance).

2 Likes

Hi, I would like just add that to have most vaults runs by single users, the bandwidth bottleneck is more serious with several days relocate period. The real upload speed for people on LTE, xDSL is pretty low and not guaranteed while it is most common connection to Internet in some countries. Even cable usually offer upload speed 1/10 of download speed.

To relocate 1TB every 30 days with approx 10mbps will take almost 10 days!

Than people will be push to have small vaults, while with smaller vaults they will pay still same amount for 24/7 electricity and receive less GETs.

We should really set target if vault will be limited to one IP address to be split to us many places as possible, or allow more centralized vaults with high demand for 1gbps connection.

If one IP per one vault is our target, than we have to reduce relocation to minimum or if possible abandon this idea.

Edit: add current global speed index: http://www.speedtest.net/global-index

Mobile: DL/UL 22/9 Mbps
Fixed: DL/UL 48/23 Mbps

1 Like

Some really broad thoughts and ideas about price to store.

No payment

Why not remove the price to store data? Upload is free. It sounds crazy but I think it’s not as crazy as it first sounds.

Pros:

  • Account creation chicken / egg problem solved.
  • Same consumer experience as existing web, ie free to participate (free as in beer).
  • No consumer angst for engagement costing money.
  • Safecoin algorithm is extremely simplified.
  • Safecoin scarcity is easier to understand.

Cons:

  • Spam needs managing
    • maybe by rate limiting or some other non-economic force which is possibly harder to enforce or reason about.
    • maybe by fees (discussed below).
    • maybe by some other mechanism… ideas?
  • Participant behaviour cannot be manage by making more coins available via recycling.
  • Safecoin is never recycled so must increase in value to keep paying for farmer resources as it becomes harder to farm.

User chooses

How about having an option for a storage price chosen by the user (like bitcoin fees are chosen by the user) and lower fees means more chance of being rate limited?

I really dislike this option since it puts cognitive load onto the user, risk onto the user, more power to the farmers, more power to rich users. But I put it here because maybe it has some value in a similar but improved form.

I don’t know if rate limiting can be meaningful against an extremely clever client, so it probably ends up punishing the wrong people.

Floating price

This is the current mechanism proposed in rfc-0012.

StoreCost (SC) = farm_rate (FR) * number_of_clients (NC) / GROUP_SIZE (8)

Price changes according to the farm rate and the number of clients. The general idea is price gets cheaper: “a safecoin will purchase an amount of storage equivalent to the amount of data stored (and active) and the current number of vaults and users on the network”. The bigger the network grows the more storage each safecoin can purchase.

Some example prices for the rfc-0012 proposal:

Early in the network:
NC = 10K (currently 9099 users on this forum).
FR = 0.2 (8 sacrificial chunks for every 10 primary chunks)
SC = 250
ie 1 safecoin buys 250 PUTS or 1 GB storage costs about 4 safecoin

Late in the network:
NC = 700M (10% of the world uses safe network. 7B * 0.1 = 700M)
FR = 0.6 (4 sacrificial chunks for every 10 primary chunks)
SC = 52.5M
ie 1 safecoin buys 52.5M PUTS or 1 TB storage costs about 0.02 safecoin

Gaming the farm rate:
NC = 10K
FR is 1 (0 sacrificial chunks for every 10 primary chunks)
SC is 1250
ie 1 safecoin buys 1250 PUTS or 1 GB storage costs about 0.8 safecoin

I think these examples look pretty reasonable!

One nice aspect is a lower farm rate means less rewards, but it also means users are spending more safecoin per GB. This is a cool bit of natural interplay for farmers. It has a natural balancing mechanism as described below:

A farmer is making a decision to maximise their chance of reward. The chance is a combination of farm rate (FR) and the portion of coins remaining (CR). Ideally the farmer wants both to be high, but a higher farm rate is usually accompanied by less coins remaining so they have some conflict with each other. Farmers will usually have to prioritise one or the other, not both.

The options to improve the chance of reward are

  • increase the rate of reward by increasing the farm rate.
  • increase the coins remaining by making storage more expensive by reducing the farm rate.

What should a farmer do?! Both actions are beneficial. They must work out whether they get more gain by reducing or by increasing the farm rate, then manipulate the farm rate in that optimum direction. How do they manipulate the farm rate? By choosing to store or drop more sacrificial chunks.

This graphic captures it visually. Any move left or right will naturally have some corresponding move down or up (and vice versa). Should a farmer aim to move right (and live with the corresponding move up), or should they aim to move down (and live with the corresponding move left)?

It would be logical (in this overly simple example) to follow the diagonal. Any move away from the diagonal to the right comes with a corresponding move up to an overall lesser value and any move down from the diagonal comes with a corresponding move to the left to an overall lesser value. There is presumably an inbuilt mechanism to force an overall push toward the top left as the network grows to overcome the farmer desire to move to the bottom right.

RFC-0012 is a really simple mechanism to implement but is (at least to me) INCREDIBLY complex to reason about. This post took a long time for me to come to terms with, and maybe there’s still mistakes in it?! Would love to hear of more ideas for how to break the mechanism rfc-0012.

This has a pretty clear answer now - sometimes the benefit of higher storage prices is better than the benefit of higher reward rate.

My main concern is that early creation of new safecoin could be quite rapid since there would be a higher chance to get rewarded (there are many coins remaining) than chance to increase the total coins remaining (there are already so many remaining). This has flow on effects for the motivation to store sacrificial chunks and thus to accurately measure spare space and network stress.

I realise this conflicts with my previous look at the sigmoid and KD curves, but such is the learning process! I’m overall positive about RFC-0012 again, although with the caveat toward maybe more investigation on the rate of coins being issued especially early in the life of the network.

My current questions are
Does this eventually balance out (and what is the equilibrium state)?
Does it end up forming cyclical behaviours?
Would the hypothetical cycles be more prone to stabilising or expanding?
How does the farming balance impact farmers?
How rational will farmers be?
What can farmers and consumers do to mess with the natural balance intended by farm rate?
How much can farmers know about the current state and how much force they exert toward the equilibrium state?

7 Likes

NC is number of clients currently connected to the section (actually was group before evolving into sections) And not the total number of clients or accounts existing on the network which is what your examples say.

So NC for early and mature network will not be all that different.

1 Like

Is there a source for this? The exact definition of NC is something I’ve never been clear on. From the rfc:

the total number of client (NC) accounts (active, i.e. have stored data, possibly paid)

which seems ambiguous to me.

1 Like

Its from a response to my asking this very question.

Think about it the network has absolutely no way of knowing how many accounts exist. Does it. Even for the ones the section is responsible for the Account record. Only the ones that are active in that section.

So it has to be the actively connected ones to that section. A section cannot know what is connected to other sections.

All the figures are for that section, not network wide and as such it doesn’t really matter how big the actual network is.

4 Likes

No that means 1 put costs 250 of the PUT balance.

When you spend a coin it was to buy a put balance of 2^64 and the storecost is subtracted from the put balance for each put. So as SC rises the put balances decreases more for each put. The RFC actually has a mistake in it where the put balance is doubled dipped and so is confusing

1 Like

Would be nice to create a pull request for some clearer definitions in rfc-0012, number of clients, storecost etc, remove the mistake… to some degree it’s academic in the end since there’s so much work still to do, but it’s hard to reason with ambiguous or incorrect terminology. Thanks for clarifying.

Is my understanding of this process correct: I have some safecoin but no put balance. I want to upload 1 GB. I transfer some safecoin into put balance and immediately consume the put balance by uploading the 1 GB. Isn’t this a longer and more detailed way to say ‘x safecoin buys y GB’? Is the additional detail important for the safecoin algorithm? It’s important for the implementation but does it affect the algorithm itself?

2 Likes

Well fine to say x safecoin buys y space.

Storecost though is a cost, not how much your coin will buy which is the inverse. Like that item costs so much. I.E. that PUT costs 250 of PUT balance. PUT balance was like dividing safecoin by 2^64

Well sort of, only one coin at a time.

And a farming rate of 0.2 is very high from my understanding. That is like one coin for every 5 GETS.

SC = 0.2 * 10000 / 8 == 250 from the put balance and not 250 PUTs/coin. But this doesn’t make sense either since that is way too low.

But as FR rises so too does the PUT cost.

eg FR of 0.5 gives SC == 625 from the equation which is each PUT cost 625 of the PUT balance which makes sense since farmers are getting more so too the cost to PUT should be higher

1 Like

I like the outside the box thinking @mav!

4 Likes

Because if you do that, then safecoin has no demand, and therefore no value. You’ve destroyed one of the very elegant internal economic mechanisms of the network.

In the end someone has to pay, the only option that makes sense is pay to PUT, free to fetch/GET.

This part bothers me as being too convoluted. I can see why it was setup this way for accounting needs. With a proper safecoin divisibility in play, it sure would be nice to simplify it down to buying as many 1mb PUT chunks as you can afford for a given quantity of safecoin.

5 Likes

The local number of accounts is known by counting the MDs with type tag 0. Then an approximate value can be extrapolated for the global network with the local density.

Approximation would be the same as the one on the actively connected accounts.

So both metrics could used with the same error margin.

1 Like

What are the units for the storecost formula? (the units are not stated in the rfc but we might be able to make a good guess)

StoreCost = FR * NC / GROUP_SIZE

Therefore a safecoin will purchase an amount of storage equivalent to the amount of data stored (and active) and the current number of vaults and users on the network.

I have assumed the units for storecost is PUTs per safecoin. Does that sound right?

So I would say storecost is how much your coin will buy (specifically how many PUTs a coin will buy), as the quote seems to also suggest. But it’s ambiguous so please correct my interpretation if needed!

Agreed, by the formula if FR rises it means storecost rises too. But is that a rise in PUTs per safecoin or a rise in the inverse? Quite an important distinction that we need to clarify.

Anyways… I think we agree but are butting heads over terminology. The rfc should be more specific about

  • how is NC measured
  • what are the units for the storecost calculation
  • is storecost for the purchase of PUT balance, the spending of PUT balance, or both?

I somewhat agree with this, but to really stretch the imagination, what if the primary value of the network was actually a free (as in speech) internet? Vault operators run vaults because they want to be able to download interesting new stuff and any payments are merely a side-effect? I don’t think that’s really too far of a stretch to imagine, considering the popularity of the non-economic torrent ecosystem, as well as the move by almost all consumption toward simpler and easier content distribution.

2 Likes

Love these kinds of questions from well thought out sources @mav Makes for an interesting morning think in the shower :smiley:

Yes, I agree, but I remain convinced it is the best shot so far.

The equilibrium value will likely be unlikely to get to and will be when there are no more put’s or new clients on the network and no farmer leaves. I am not sure this is either formal enough or indeed actually moves towards an answer to your question.

Yes, but I believe that is actually correct. As long as the cycles are representative of the cyclic nature of supply and demand.

After the initial network start, they should be steady, but influenced by external sources, like attraction and loss of clients etc.

I hope very much like real farmers where they can decide on farming X (data in our model) when it is valuable enough for them to profit.

In a market, I suppose you want them to be as greedy as possible for stability, but I suspect a huge amount of people will farm and possibly not even worry about rewards. Skype/tor etc. showed that putting something in to get something out, like free calls, or secure browsing was in fact enough. tor even more so as the setup costs for users is incredibly high in terms of complexity today.

I suspect consumers more than farmers can influence things. It is easier to be a consumer and therefore reduces the friction of getting on the network to do “something”. It is an area that is a blind spot at times, but all the security we place on farmers is great, but the more freedom clients want (like deleting data etc.) means the attack is much more likely to come from there (IMO). I suppose ethereum showed that as well, in a weird way.

I think they will have near full knowledge of the current state.

This part I cannot really say we have fully described or reasoned. I mean we think we have that covered but do need to try it out and model the assumptions much better. It is hard though as the assumptions in general; are based on what we think humans will do as clients and farmers.

4 Likes