Archive Nodes, restore point, attacks, Maidsafe fundation

tobbetj · November 14, 2024, 11:12am

This post also exist on discord.

Some thoughts came to me this morning. How hard would it be to impelment example archive hdd’s to connect to nodes, to help preserve the network with restore point if it would in the future suffer from attacks. So the network always could recover no matter what. Could Maidsafe foundation after launch build/rent a few server places in the world with large capacity tape drives to help backup the network, restore points?

As Tim Berners Lee shift focus to decentralization lately with his foundation, would he besides a Internet Archive go for a Decentralized Internet Archive? As David has some connection with Tim, hoping for a partnership. As many different types of malicious actors in the future could have incentives to cause harm, will we see some kind of backup, archive, restore points. Maybe it could be a feature that node operators could choose to implement for free in the early stages, if they want to, to help secure the network. Like adding large HDD’s to the nodes which will funnction as archive/restore point.

That would also be a good reason for the Maidsafe foundation to have a significant amount of funds, to secure the network. @Bux @JimCollinson @rusty.spork @dirvine

Erwin · November 14, 2024, 11:40am

Extremely hard.

Virtual tapedrive systems have existed for decades, and they still suck. It’s a never ending fight with os updates, software updates, firmware updates, hardware updates.

You will have to geo locate it in multiple locations, which adds another bonus of network latency and depending on 3rd party fibre providers to keep your networks working.

tobbetj · November 14, 2024, 11:45am

Adding large HDD’s to nodes for normal node runners, as archive restore points?

Erwin · November 14, 2024, 11:48am

Ah like that. Thats a different story indeed.

neo · November 14, 2024, 11:50am

Having nodes themselves be able to offer back up their data on a restart of the node will have a archive effect as well. So major blackouts, cable cuts and so on will not cause data to be lost permanently but temporarily. I thought this was supposed to be the case already but my experience of nodes restarting is that they just download the records all over again.

Archive nodes though have been talked about by many people including some developers. And the best approach in my opinion is to work on specifying how the nodes would work and how they compensate the node operator for providing the additional resources. Maybe just by giving out quotes since they would be one of the closest nodes to many more records than a normal node.

That way some one who has plenty of spare space on their NAS could allocate many TB to this task. And if enough then this provides a greater backup than just having 5+ nodes storing each record.

tobbetj · November 14, 2024, 11:53am

In early stages my thinking was that maybe archiving could be voluntary, like besides my nodes running on a system with a 6TB drive, I could also choose to add 1 or a few 12TB drives which would collect, archive data from the network. The large drives would crawl the network for data to archive. Adds complexity.

Erwin · November 14, 2024, 7:33pm

Is ‘old’ data actually scrubbed ? or will it be there until you ‘physically’ remove that node ?

tobbetj · November 14, 2024, 10:00pm

Good question

neo · November 14, 2024, 10:18pm

It is there until the space is needed. Old data as in records that the node is no longer responsible for.

The algorithm as of the latest we’ve been told

Every hour (AFAIK) any record “too far away” in xor space from the node’s xor address will be deleted. This is to remove records picked up by the node before it has a full picture of its neighbourhood and thought the network was a lot smaller and it was responsible for a lot more of it.
record_store keeps 10% free record count (IE keeps up to 90% of max record value) by removing any record it is not responsible for. One assumes the records removed are the ones furtherest away from the node’s xor address.

These records that the node is not responsible for are kept for caching and attack protection where one node runner has all 5 closest nodes to some records. Also aids in reducing network traffic during churns where the node becomes responsible for some of the records again

Erwin · November 14, 2024, 11:23pm

If that data remains useful @tobbetj would just need an option to stop the housekeeping and just let the drives fill up.

neo · November 14, 2024, 11:35pm

That is a way too slow method. The records in the record_store is mainly the ones it was responsible for at one time and some of the close neighbourhood Node’s records.

Better to run plenty of nodes.

But remember the records on disk are encrypted by the node so that you need to have the node to decrypt them first. This is a secondary encryption by the node when storing the record. The record should already have been encrypted by the client before sending it and this secondary encryption is to ensure the data is unreadable** while at rest.

**The way its implemented at the moment is not as good as the initial plan was. The method to decrypt the records is to grab the secret key from the node’s data directory to seed the decryption. The secondary encryption was supposed to be a in-memory key pair so that no other APP can decrypt the record_store at any time, but hey why quibble about an ineffectual secondary encryption \s This should be dealt with when the fix is made to ensure new secret key every time a node is started

neo · December 11, 2024, 11:50pm

An update on this, it is scrubbed for a new version (ie reset) of the network (dunno about release) but was mentioned and seen in the logs that it was being done. As far as restart of a node, I have seen metrics show a lot of records held very quickly and without the download to fill that much.

I’d love a concise description of what happens in each scenario.

Topic		Replies	Views
Data persistency and APTs Community	5	284	March 9, 2024
Governance (and government intervention) of updates to the SAFE network Features	62	2419	October 9, 2020
SAFE Network in America (Radio Interview) Podcasts	8	2019	April 18, 2018
Launch of a community safe network Community	426	15531	March 6, 2020
Having fast storage nodes and slow storage nodes differentiated Marketing	1	750	April 29, 2014

Archive Nodes, restore point, attacks, Maidsafe fundation

Related topics