next 20 years
I asked, because it was difficult to parse this information from the update. We know that network has been able to store files already, and the connectivity issues are much better after NatNet. So it sounds like there’s not much to go.
It seems to me that this part might be something to do before large data network:
And something like these I’d rather see done later:
Now that you have declared the large data network as a next step, it would be nice to have a view of what’s to be done before that.
As of yet, not in terms of incentivising. We need to come up with some of those parts.
The basics are, we have 2 archive node types
- DAG or DBC Audit nodes
- Data archive
These will store as much as they can and cover as much of the address space as they can.
These will become more important as we go on, I believe. Static or mostly static unread data would be good to push off the frontline nodes. Then we can keep those small and nimble. The DAG nodes will be a really good defence for bad actors in the currency space.
We do need to incentivise those nodes for sure, but some folk may run them as they are heavily invested in Safe (cash time or resources) and wish to keep it running. I think a few will do that. However, incentivising that behaviour would be very nice and feel right.
When we get them up then I think we will find solid and simple ways to incentivise them
This is true, similarish to running a bitcoin node.
Then I wonder is one more important than other, if equal reward should be equal.
Is the cost and effort greater for one over the other, then a few fanatics will jump in for sure.
But will there be enough.
Interesting times!
Awesome update. I’d be one of those people (fanatics) with a serious machine that is up 24/7 wanting to run an archive node. Sunk costs etc. A while ago, with SAFE partially in mind I build a vastly overserious NAS (xeon 12 core, 30Tb ZFS storage, 128 Gb ECC ram, 10 GbE connection)… it can run large data science models, serve as the home media system, and still retain 4-6 cores for running an archive node… If we keep going down the road of lots of light frontline nodes, my small stack of Pi4s would be more fit for purpose for that and the archive node would be free on the margin for me.
I think the issue here is how to define equal. By which parameter? and to which proportion?
If 3 archive nodes are covering the one chunk being uploaded and there are the 4 normal nodes for that chunk, do all of them get of the available pie (the “farming” rewards)
- 1/7 of the available pie or
- the 4 nodes get 1/5 and the archive nodes get 1/3 of 1/5 or
- the 4 nodes get 1/4 of 1/2 and the archive nodes get 1/3 of the 1/2 or
- …
What if in the first 12 months there is 10 archive nodes covering the uploaded chunk? Thats now 14 nodes to be paid.
One point is to consider that in the early network there is likely to be an overabundance of archive nodes simply because the ones most attracted to SAFE are likely to have the highest percentage of people that have sizeable storage capacity to dedicate to SAFE.
My initial thought is collectively all the archive nodes handling a particular chunk upload be considered as an additional node and share that proportion of one node. That is if 4 nodes and “X” archive nodes then each archive node receives 1/X of 1/5 of the available pie.
My reasoning is that the archive nodes will be receiving multiple (XOR) ranges where normal nodes receive from one range. Also as the network grows and the number of (XOR) ranges that a archive node can handle reduces, and each archive node will receive greater proportions for each chunk upload thus incentivising more archive nodes when the need is rising for more.
As to cost of running an archive node over a node. Well the aim has been to make it as easy as possible to run a node using spare resources (almost zero cost). For a archive node then this is likely to be someone dedicating extra resources to that task and thus significantly higher than near zero cost.
Ideally the network knows if it has too few or too many archive nodes and adjusts rewards accordingly, but separately from regular nodes.
Shoulda been called NoNatNet
Still awaiting QiNet! (for @qi_ma)
This is it, at least what I anticipate too.
If they will require additional robustness it is probably most cost efficient to run them on the likes of aws or digitalocean etc.
In my case extended power outages are a real concern.
Big storms cause major disruption here, no problem though if after ~3 days I can power it up and it can carry on from where it left off.
I seem to remember that some of the talk is that when nodes go offline for periods then when they come back they can validate their storage to the network. Even if not done for normal nodes then I see that is a necessity for archive nodes. Internet outages will happen from time to time, even if for 5 minutes every 6 months for a high uptime data centre. We don’t see it because large sites have multiple servers and when one server goes down for minutes (even if day) we simply do not notice it. For a single server/VM site then 5 minutes in 6 months is OK.
For an archive node that 5 minutes would also mean the loss of it. Unless of course the network & archive node can somehow validate the data. One way is it retrieves the correct data when asked, it cannot fake it since its hash has to match the address.
In essence if this is done than a home system with 20TB as an archive node can work fine and be a lot cheaper than a data centre server with 20TB to run
How about not archive nodes, but some data marked as archived data? It could have it’s own rewards and still be distributed all around?
Defeats lightweight and nimble frontline nodes then?
It is really encouraging, if at this stage you can consider the operation of many nodes on one device, then we are talking about a real anthill operating in the test micro network!
And what opportunities does this give in the future?!! :
For me these archive / service nodes seem like a come back to the last complicated design with elder nodes etc., that was abandoned for a simplicity of new design. And this simplicity is now being lost. Why do we need these archive nodes in the first place?
Efficiency.
All data is not equal, most data is never accessed again so it makes sense to focus more resources on frequently accessed data.
This was always envisaged and is not a significant extra complication.
And the most efficient is a central server with all the data inside No p2p network needed at all!
Perhaps just having different replication rates for different chunks based on access date would be ok? Like for archive data – 5 replicas, and for recent ones – like 20? And each node periodically selecting one of stored chunks, checking with the network if there is still enough copies, and request new replicas if necessary?
Just some notes to think of
Interestingly if you look at the earliest large scale p2p network done commercially (Amazon Dynamo) they did say that it would be impossible to run the amazon infra on centralised servers. Same now for Netflix and more AFAIK.
So bottleneck is generally not the data, it’s serving it across geographies and handling millions of client accesses. Then the compute per user request etc. and you see that decentralising space/cpu/memory all makes sense.
Even on prem storage systems are now distributed! The large ones that scale into PB and throughput in the GB/s range. You can’t get a server with enough CPU, RAM and network connectivity to satisfy the demanding modern workloads. For user file access you can still get away with systems with a single ‘head’ or, even better, 2 heads in active-active configuration. But for anything big these days it is multiple storage servers with some kind of clever stuff which enables any data on any of the servers to be accessed through any of them by the client. The servers, networking and disk controllers are all stock items and the clever engineering is at the software level. Sounds familiar!
If we could see this through analogy to a mechanical engineering, you are talking about power, not efficiency. Efficiency is power output GIVEN the power input (like electricity). It’s obvious, that at some point single server cannot supply needed power, but it’s still most efficient.
The efficiency comes from lack of resources wasted on synchronization. And with archive nodes, from what I grasped reading archive (sic!) posts from 2015 till now, there arises much complications and additional questions, like different incentives/rewards etc., so I assume lots of additional “energy” – communication overhead. Also, I think we can add developer time to the resources, that have to be spent.
The question is – is it worth it? Especially at this stage? It’s still not clear to me why do we need these service nodes so bad.