Next step of safecoin algorithm design

This post is about building some more intuition around farm rate and the RFC-0012 farming algorithm.

Rather than talk about farm rate (FR), I’ll use farm divisor (FD) which is 1/FR, since it’s easier to talk about really big numbers than really small numbers.

The reason why FD matters (especially large FD) is it determines a) the rate of reward for vaults and b) the cost of storage. A large FD means cheap storage. But a large FD also means less frequent rewards. So consumers want the largest possible FD, farmers want the smallest possible FD, and hopefully most (or at least some) farmers double as consumers so they have some motivation to resolve the tension between these forces.

The FD is proposed to be a 64 bit unsigned int.

The full range of FD is between 0 and 18446744073709551615.

FD is calculated as
total_primary_chunks / (total_primary_chunks - total_sacrificial_chunks),
unless there are a surplus of sacrificial chunks in which case FD is the largest possible value (ie MAX_U64). This is because “we also want to ensure that farming stops if the sacrificial count is greater than or equal to the primary count.” (I think farming should never stop and this should be changed from MAX_U64 to total_primary_chunks).

FD and Storage

To get an idea of how FD might look in real life, consider ‘what would be a realistic very big in-real-life FD?’

Let’s take an existing-but-unrelated very big in-real-life number as an example; bitcoin mining difficulty is currently 923233068449 - that’s about 923 billion.

What would be needed to actually have FD as large as the current bitcoin mining difficulty?

The largest possible FD is to be had when sacrificial chunks are almost (but not exactly) the same as primary chunks, ie TS=TP-1. The formula for maximum FD for a given TP is FD=TP/(TP-(TP-1))=TP

This means to have a really big FD (ie FD = 923B) the section must be storing at least 923B primary chunks and 923B-1 sacrificial chunks.

I’ll assume every chunk is 1MB (not a good assumption but probably true in the end due to cost savings by aggregating several small files into a single full 1MB chunk).

This means the section is storing approx 860 PB of primary chunks (and same again for sacrificial). If the section has 1000 nodes, that’s about 880 TB per vault. Pretty big vaults.

To give an idea how hard it is to have FD=923B, consider if just one less sacrificial chunk is being stored for the entire 860 PB of primary chunks; FD would now be approx 460B, ie half the desired FD. So in reality, storing 860 PB of primary chunks is the bare minimum to get FD=923B, and in reality it would probably require an order of magnitude more storage per section.

FD and Rewards

Rewards are given at an average rate of 1 safecoin after a certain number of requests. If FD is 10, one safecoin is given after approximately every 10th request.

Using the previous very big in-real-life number of 923233068449, that means one safecoin every 923B requests.

How often would this be?

We’ll have to assume some numbers, but let’s use an average request rate of 10 Mbps per vault for this example. If each chunk is 1 MB that means there’s about 1.2 chunks being requested each second. This takes 774464030385 seconds to complete the 923B requests, ie 24558 years. So a 10 Mbps connection is presumably not viable for that FD.

Either connections will need to be much much faster than 10 Mbps or FD will need to be much much lower than 923B.

FD and Price

Cheapness of storage (permanent storage) is one of the big promises of SAFE.

How cheap would storage be if FD was a very big in-real-life number, such as 923233068449?

Store cost is defined as StoreCost = FR * NC / GROUP_SIZE or to rework it in terms of FD it’s StoreCost = NC / GROUP_SIZE / FD

GROUP_SIZE is currently fixed at 8, giving StoreCost = NC / (8 * FD)

NC is “the total number of client (NC) accounts (active, i.e. have stored data, possibly paid)”

The aim of this calculation in plain English is: “a safecoin will purchase an amount of storage equivalent to the amount of data stored (and active) and the current number of vaults and users on the network.”

The cheapest possible StoreCost is when NC = 1 (since presumably there must be at least one active client).

1/(8*923233068449) gives a Storecost of 0.00000000000013539, ie a rate of 7385864547592 PUTs per safecoin. If every PUT is 1 MB that makes a cost of 6878 PB for 1 safecoin.

Obviously the number of clients will be larger than 1, so if it were 1000 then it would be 6878 TB of storage for 1 safecoin. Or for 1 million clients it would be 6878 GB of storage for 1 safecoin.

I think it’s fair to say that a FD of 923233068449 does indeed make for cheap storage, but I feel it would be too cheap for farmers to support.

Note that GROUP_SIZE may be larger if PARSEC is efficient enough, or may be smaller if PARSEC is secure enough.

A new MAX_FD?

923233068449 fits within 40 bits. Since u32 is the next smallest after u64 it would seem that u64 is indeed a good choice for FD if such high values are needed. But is it possible to fit the largest realistic FD within u32?

Using the same methodology as above, FD of u32 (ie 4294967295) would result in:

Storage: the section is storing approx 4 PB. If the section has 1000 nodes, that’s about 4 TB per vault. That’s definitely too low. I imagine most vaults could store more than that.

Rewards: on a 10 Mbps connection it would take 3602879701 seconds per safecoin, or 114 years - far too slow. Either connections must be faster than 10 Mbps or FD lower than u32.

Price: the cheapest possible price would be 32 PB per safecoin. That seems really cheap, although with 1000 clients it becomes 32 TB and with 1 million clients it’s 32 GB per safecoin. So I’d say u32 is inadequate for FD based on price.

Considerations

Churn is not accounted for. It would reduce the available bandwidth for rewards via GET requests.

Cache is not accounted for. It would reduce the opportunities for vaults storing primary chunks to be rewarded if the chunk is instead served from cache. It would also reduce the cache-provider bandwidth available for serving primary chunks. It would also reduce the storage space available for the cache-provider. I keep saying it, but cache is a really interesting aspect of the network and I think it’ll come out as one of the key aspects of how the network is structured and rewarded.

Gossip is not accounted for, which will take resources away from responding to GET requests and have the effect of reducing the rate of reward.

I’ve assumed the statement “a safecoin will purchase an amount of storage equivalent to the amount of data stored (and active) and the current number of vaults and users on the network” means

  • more data stored = less expensive prices (ie the network is large so storage should be cheap)
  • more clients = more expensive prices (ie demand is high so storage should be expensive)

But that seems like an incorrect interpretation. If someone could do their own calculations of storecost and price for a specific FD, NC and GROUP_SIZE I’d appreciate seeing that for improving my own understanding.

Collisions between newly rewarded coins and the current issued safecoins is not accounted for, but it would have the effect of making rewards less frequent than calculated in this post.

I wonder if there might be a natural cycle evolve where farmers alternate between trying to get very cheap storage at one time and then very high reward rates at another time. I don’t know how it would evolve or what time period it would be over or what magnitude the cycle could have, but it would certainly be possible to create a cycle if vaults varied their TP:TS ratio roughly in sync with each other.

Summary

Overall it feels like the reverse approach of what I’ve taken here will be used by farmers to try to ‘set’ a suitable farming divisor.

First they’ll choose a storage cost that’s roughly what they’d like for themselves (ie the highest satisfactory FD).

Then they’ll balance it by choosing a reward rate that’s going to allow them to stay happy (ie the lowest satisfactory FD).

And finally they’ll provision enough storage so they can achieve the required FD (high FD requires lots of storage to achieve).

Would be keen to know your thoughts on the approach taken in this post.

12 Likes