Some thoughts came to me this morning. How hard would it be to impelment example archive hdd’s to connect to nodes, to help preserve the network with restore point if it would in the future suffer from attacks. So the network always could recover no matter what. Could Maidsafe foundation after launch build/rent a few server places in the world with large capacity tape drives to help backup the network, restore points?
As Tim Berners Lee shift focus to decentralization lately with his foundation, would he besides a Internet Archive go for a Decentralized Internet Archive? As David has some connection with Tim, hoping for a partnership. As many different types of malicious actors in the future could have incentives to cause harm, will we see some kind of backup, archive, restore points. Maybe it could be a feature that node operators could choose to implement for free in the early stages, if they want to, to help secure the network. Like adding large HDD’s to the nodes which will funnction as archive/restore point.
That would also be a good reason for the Maidsafe foundation to have a significant amount of funds, to secure the network. @Bux@JimCollinson@rusty.spork@dirvine
Virtual tapedrive systems have existed for decades, and they still suck. It’s a never ending fight with os updates, software updates, firmware updates, hardware updates.
You will have to geo locate it in multiple locations, which adds another bonus of network latency and depending on 3rd party fibre providers to keep your networks working.
Having nodes themselves be able to offer back up their data on a restart of the node will have a archive effect as well. So major blackouts, cable cuts and so on will not cause data to be lost permanently but temporarily. I thought this was supposed to be the case already but my experience of nodes restarting is that they just download the records all over again.
Archive nodes though have been talked about by many people including some developers. And the best approach in my opinion is to work on specifying how the nodes would work and how they compensate the node operator for providing the additional resources. Maybe just by giving out quotes since they would be one of the closest nodes to many more records than a normal node.
That way some one who has plenty of spare space on their NAS could allocate many TB to this task. And if enough then this provides a greater backup than just having 5+ nodes storing each record.
In early stages my thinking was that maybe archiving could be voluntary, like besides my nodes running on a system with a 6TB drive, I could also choose to add 1 or a few 12TB drives which would collect, archive data from the network. The large drives would crawl the network for data to archive. Adds complexity.
It is there until the space is needed. Old data as in records that the node is no longer responsible for.
The algorithm as of the latest we’ve been told
Every hour (AFAIK) any record “too far away” in xor space from the node’s xor address will be deleted. This is to remove records picked up by the node before it has a full picture of its neighbourhood and thought the network was a lot smaller and it was responsible for a lot more of it.
record_store keeps 10% free record count (IE keeps up to 90% of max record value) by removing any record it is not responsible for. One assumes the records removed are the ones furtherest away from the node’s xor address.
These records that the node is not responsible for are kept for caching and attack protection where one node runner has all 5 closest nodes to some records. Also aids in reducing network traffic during churns where the node becomes responsible for some of the records again
That is a way too slow method. The records in the record_store is mainly the ones it was responsible for at one time and some of the close neighbourhood Node’s records.
Better to run plenty of nodes.
But remember the records on disk are encrypted by the node so that you need to have the node to decrypt them first. This is a secondary encryption by the node when storing the record. The record should already have been encrypted by the client before sending it and this secondary encryption is to ensure the data is unreadable** while at rest.
**The way its implemented at the moment is not as good as the initial plan was. The method to decrypt the records is to grab the secret key from the node’s data directory to seed the decryption. The secondary encryption was supposed to be a in-memory key pair so that no other APP can decrypt the record_store at any time, but hey why quibble about an ineffectual secondary encryption \s This should be dealt with when the fix is made to ensure new secret key every time a node is started
An update on this, it is scrubbed for a new version (ie reset) of the network (dunno about release) but was mentioned and seen in the logs that it was being done. As far as restart of a node, I have seen metrics show a lot of records held very quickly and without the download to fill that much.
I’d love a concise description of what happens in each scenario.