What about a catastrophic event that wipes out millions of nodes

We have to be careful to benchmark against current alternatives. This sort of event would be catastrophic with or without SAFENetwork. Amazon, Google, etc also have to keep nodes on the planet too.

Arguably, SAFENetwork would fare better, as all data will be widely dispersed. I know many IT systems would be far more centralised and at risk currently.

In fact, Maidsafe have mentioned licensing commercial versions of the technology to companies who want to increase data resilience. Running internal vaults for data storage can help them scale with resilience.

Edit: Just a thought; nested networks would actually allow a different mix of resilience and performance. A fast private maidsafe network, which stores data to the public SAFENetwork could be a powerful mix.

5 Likes

The private / public network cloud model is touted by the big service providers, but I don’t think https would be the right protocol for that with regard to SAFEnet. Local secured storage networks typically rely on ssh with RSA keys for security. Projects like ceph, gluster, BeejFS, etc. are active in this space and achieve similar goals to SAFE with regard to data redundancy and performance within the local network, but also have means for mirroring the remote locations and keeping all the networks in sync. This is mostly used for HPC.

EDIT:

I need to add to the above comment. The distributed filesystems I mentioned can be pain to setup (although BeeGfs is relatively user friendly) and sometimes challenging to maintain. The current and planned autonomous features of SAFE are superior in every way (my opinion). My statement about https was mostly due to ssh and command line bias, thoughts about latency, and bad assumptions about how SAFE is working under the hood with regard to TCP/UDP and XOR addresses. I still am not clear on a lot of aspects and trying to learn more. I tend to agree with the notion that the fastest way to learn something online is to make a statement on a forum and then wait to be told why it’s wrong… selfish, I know. :blush:

Can you please link to this discussion as I was under the distint impression that the number of chunk copies is four at the moment and that it is to be 8 later on. It went from 6 to 8.

So if the plan is now 4 on the live system then I would like to know. It seems a very backward step and insufficient for a secure system.

4 Likes

This is a horrible idea. Nothing personal, just my $0.02…

For a concept like SAFEnet, you want network protocols and network filesystems and all the tools to be completely opensource in order to spur innovation and collaboration, and user buy-in. If people want to pay for more data resilience or faster performance, then those features should be built into the protocol, and people can pay more SafeCoin for them as they see fit. Given a “net-neutrality” type argument, the network would need to balance the priority of each packet. As an analogy, consider everyday life when drivers pull over to the side of the road to let an ambulance speed by… “road neutrality” is life threatening. I may want 32 encrypted copies of my birth certificate, but a blurry picture of my cat doesn’t need more than 2, and since I’m the only one who can see this data unencrypted, I’m the only one that can tell the network which packet is worth more (to me). If a solar flare wipes out half the network and I lose my blurry cat photos, I don’t really care.

As far as MAIDsafe as a company is concerned, since they are the most experienced with the code, they are the ones best positioned to offer “premier services” to people who need troubleshooting or other support, or want to pay for specific features to be developed faster.
This is usually due to the fact that most users are either too busy or lazy or impatient to take the time away from their own projects to modify the code to fit their needs. The use of GPL makes sure all improvements are available to everyone. I’m not exactly sure how RedHat manages their business model… but there are a variety of ways to monetize and make good money from a completely open source ecosystem. Fame first, then fortune as you might say. Lack of any proprietary or commercial black box or binary blob maintains trust in the network and eliminates the possibility for “bad” code to be introduced.

2 Likes

I think I read somewhere about “archive” nodes that record data to tape or optical disk?
At least the optical disk would survive the type of electrical disturbance you are taking about, but the drives might not. I agree that planning for a big event like that is important, but it requires more than just software solutions. You need underground facilities with electrical isolation and faraday cages. Doable, but only the truly paranoid will pay for this, and it is not cheap. The protocol would need to have some way to take into account a site’s hardware and be able to determine if the extra safeguards (which people will be paying more for) are working correctly…

1 Like

Still to be developed but David has mentioned this quite a few times so I am sure it will be a high priority once the network code is more complete.

The XOR addressing that SAFE uses distributes the data randomly throughout the network and, since the network is global, geographically.

1 Like

Random means there is still a little chance that all copies end up in one place.

1 Like

Yes, but a vanishingly small one once there are hundreds of thousands or millions of nodes

It’s not about the number of nodes, it’s about the geographical locations.

1 Like

Sure, but let’s say users are spread across North and South America, Europe, India, China, Australia - that’s pretty representative of this forum, and there will be other countries too - then those nodes will be geographically widely spread, and the more users there are the lower the chance that data is replicated across the same nodes.

1 Like

Seeing as humans tend to concentrate in very small areas called cities, we’re always going to have unhealthy concentration geographically.

Tokyo,
New York,
Sao Paulo,
Seoul,
Mexico City,
Osaka/Kobe/Kyoto,
Manila,
Mumbai
etc…

and add to that that SouthEast Asia is overrepresented.

However, It is said, that the largest population growth in the coming 50 years will not be that area of world, but Africa, going from 1 billion to 3 billions (or so…). And Africa is quite large. So, at least some balancing continent wise.
(not that such a (human) population increase is very positive in general though.)

1 Like

It is the contrary: the replication factor is going from 8 to 4.

Here are some links about current value:

and some links about its future decrease:

I completely agree with you, when you say that 4 copies are insufficient for a secure system.

My other issue is that the replication factor should be an independent parameter. If tests or simulations show that this parameter should be increased, this can be done without impacting anything else (like the number of elders in section).

1 Like

In case this is of interest to anyone, I made a spreadsheet to calculate the file size at which file loss would become more likely than not:

Anyone can edit it, so please don’t mess around with it too much, otherwise other people won’t be able to use it…

It only works for files consisting of quite a few chunks, and assumes a completely synchronised node outage disaster.

It shows that 6.93 GB is the size at which a file stands a 0.5 chance of survival. This file size is increased by a factor of roughly 10 for every additional chunk copy, but chunk loss is still inevitable for the biggest files. The ‘prognosis per chunk’ however is always good.

It seems unavoidable that upload services will pre-chunk files regardless of the number of chunk copies.

4 Likes

THere is a lot of confusion here and a bot of speculation being seen as hard facts. So here is a few points.

  1. We calculate and test everything. So if 4 is enough then it will be the case, right now it’s over 8 actually (it is whole sections so can be 20 or more copies).
  2. It may decrease to 4 I think it will, but and this is very important, only if we can prove that is secure. (tl;dr I will alway guess forward but he team forces testing and certainly where possible on these guesses). that’s innovation, what seems mental like going to mars may in fact not be mental given enough thought, 4 copies may be less mental than terraforming another planet :wink: )

Then the actual issue of catastrophic events.Here is my take on this without too much thinking of the edge cases.

  • The network should contain the knowledge of all humanity and with luck it will.
  • The addition of knowledge is extremely important (theories → laws etc.)
  • The manipulation of data is very important (mutation of knowledge, transactions etc.)

So we have immutable and mutable data as we all know. Immutable data can be stored anywhere and secured, Here archive nodes can help. So on these archive nodes, what would they hold and how much would they hold? We go back to my guesses, here I claim that storage is becoming cheaper and more capable very quickly, I expect to be able to store the worlds data on a single device sometime not far away. Until then I expect significant increases in small devices, particularly IOT types.

Then mutable data, now that is more difficult, but we have data chains where we can show a version was secured on the network at some stage, but we cannot be sure if it is the latest version. for that we need more info, if it becomes available. So silo’s can hold this as well, it may or may not be the latest data.

However as the network restarts, even with new hardware (if it was a huge emp type catastrophe) then these silo’s can connect, compare their data, data chains and find the latest known version of any data. Even if a single peer can come on line it can show a later version (this is powerful) that the restarting network can accept.

Anyway, this is a snippet of how such autonomous networks can restart after catastrophe, it is not about losing a chunk, it’s about keeping them all and away they can be re-inserted on the network. So that is data chain + some data that fits the chain. It goes a lot deeper, but anyway I think it is pretty clear to be confident this is a solved problem here and post launch it is not an impossible thing to solve, it just needs some thought as to exactly what the fundamentals of the data types and proofs of valid data are, where they were stored or held is then a simpler issue.

tl;dr archive nodes will not be difficult, may never be needed but may be the norm depending on advances in storage tech.

12 Likes

Interesting - can you explain your workings?

Everyone is free to use the public net. If companies want a private net, it is a different propositions.

I was actually just highlighting that SAFENetwork is a good way of adding data resilience and performance relative to their current private systems. Take it as you wish though.

2 Likes

It’s the reverse equation for finding the prognosis (chance of survival) for a file, given its size.

The green cells are the input ones (edit these values)
The teal cells are the output ones (don’t edit these)

6.93 GB = 6 930 chunks
1 in every 10 000 chunks is destroyed
each chunk has a 0.9999 chance of survival
the chance of all 6 930 chunks surviving is 0.5

2 Likes

All good to know, and I trust that the network is being tested properly and thoroughly. I just think that pre-chunking will be included in upload clients if it reduces the tiny risk even a tiny amount.

I don’t get why pre-chunking is any better? You will still need all (pre-chunked & normal-chunked) chunks to rebuild a file.
Or are you referring to some kind of additional parity by “pre-chunking”?
The parity approach is actually a pretty nice approach as it can be implemented completely on the client-side without adding a potentially buggy feature to SAFEnet (as opposed to the configurable amount of chunk copies). Cons: it would require a client “fsck” to repair the missing chunks.