Perhaps a good way to increase SafeCoin recycling rate is to offer extra redundancy at a price. I’ve been thinking a lot about SafeCoin and the MaidSafe cloud storage service lately, and I’m getting more and more convinced that bandwidth is going to become a very significant factor, perhaps even more significant than storage space itself.
I’m not sure if the network already works this way, but since by default every slice of data is stored in 4 different locations on the network, when downloading a data slice the client could connect to all 4 vaults at the same time and request different chunks of data of that slice from every vault. The client could thus get a download speed equal to the upload speeds of the 4 vaults combined. Following from this, if data would be stored in 8 different locations instead of 4, the client would theoretically get double download speeds.
In addition to higher download speeds, the data would be even safer through the extra redundancy. Even if significant parts of the network would suddenly fail for whatever reason, the odds that at least one copy of every slice survives is significantly higher when 8 copies exist than when there are only 4.
For the average user and smaller files the default of 4 copies with a relatively low download speed is enough, but I can imagine rich/corporate users would be willing to pay for more redundancy and better download speeds.
Perhaps this source of SafeCoin recycling could be enough, allowing the network to give virtually unlimited storage to everyone, but charging extra for better speeds and extra redundancy. More recycling → bigger farming rewards → more farmers/vaults → more storage space.
Whoah…hang on a minute but is this not contrary to net neutrality ideas…exactly what we don’t want? Not a techie and thanks for simplifying how this other technical aspect works, I get it now. No prob with doubling the doodah as long as available to all users. Ahhh… I think it just clicked that it would technically have to be universal…in which case, if technically easy/quick to do now or implement later, would be a brilliant idea,… .just ignore me.
Not related to your general point about offering premium performance for a price, but I don’t think it works like that. I think you are assuming a file is stored in a Vault (replicated to three other Vaults, each containing the whole file). Whereas it is a chunk (1MB) of a file that is stored on a Vault, and replicated to three other Vaults. Therefore you can’t request one chunk from Vault1, another from Vault2 etc, or rather, this is what always happens.
So SAFE always delivers the get each chunk from a different Vault performance feature and there is no benefit from upping redundancy beyond 4 Vaults.
I was calling the 1MB parts slices and a chunk a part of that slice, perhaps I should’ve used the terms differently.
There’s no technical reason why a client wouldn’t be able to request only a part of that 1 MB part, though it may currently not be implemented. If so I think it should be, it would give a huge boost to download speeds, and the idea in my OP would become possible.
My understanding (I think David Irvine said this recently), is that yes, the API can request a small piece of 1MB chunk, but it is always the whole chunk that is returned (and presumably vice versa). I can see how it might seem inefficient, but efficiencies in systems like this are often counter-intuitive, so it may not be so.
I don’t particularly are about extra download speed, but what if I do want greater redundancy? Is there a way to designate a particular file for double or triple redundancy?
I thought there are at least four copies of each chunk.
Lets say Vault A - D contain the same replicated chunk. When Vault B goes offline a new vault, Vault E, will replicate and start serving the chunk. So, there are always at least four replicated copies of the chunk on the Network. As I understand it, when Vault B comes back online there will be five copies of the chunk.
Over time, it is likely that there will be more than four replicated copies of each chunk; but the network will always try to keep four copies available at all times.
The network will calculate the min number to never lose any data! This is important, we are guessing 4, but it will be dynamic later. There should be no data lost of any kind whatsoever, but more importantly a human should not try and calculate this.
[edit] 5->4
I can understand why the network would not allow someone to go below a minimum set by the network, but why would it be bad to allow someone to go above that, assuming that you are charged more.
ah indeed, but if its a waste of my resources, if the network is properly compensated, why should the network care about that.
Unless this is an image decision, a guarantee that everyone’s data is equally safe? Or equally valued
Except that I don’t value all of my data equally. There are certain files that I like but I could recreate from other sources and certain files that I value far more.
This is common problem in data redundancy on servers. Since a chunk is 1 mb it is so easy to replicate it on demand. Since the nodes that are storing data are so far apart from eachother the event that causes an outage on an individual node does not correlate likely.
Plus a data chunk requiring a strong node to serve the data goes to the strong nodes; A data chunk not so frequently accessed goes to a node with probably less bandwidth and thus that node is relied upon less frequently. It is important to note also that data chunks get stored first on nodes thats are strongest; and lastly on weak nodes if at all.
This being said. The data redundancy of four is sufficient enough on this dynamic network. One must remove the server idea from the equation that there is not just only one node available to the data chunk to store the data. There is the whole planet of nodes at disposal to immediately replicate the data chunk. Small data pieces replicate quickly is what allows for this to take place.
Storing more data chunks per file will simply cause more space to be utilised; and that would make sense to make more fault tolerance in huge servers but even there they cannot afford to hardly replicate beyond three times. Four is such a gift; not to mention it is done so in a distributed manner which further adds to evolutionary reliability in terms of data storage.
Hmmm, I guess at an emotional level, I just don’t get how reliable a 4X copy really is. I’m game to be convinced.
Though at a conceptual level, if I try to explain why 4X is enough copies, I don’t have an answer other than, people smarter than me (such as @dirvine) tell me so.
Not smarter ;-), by far, it is a guess to be tested, based on older kadmelia networks like guntilla/emule where 8/20 replicas was enough, but when all connections were very light, i…e not checked for many hours/weeks/months between churn events. As we are milliseconds between churn events then the chance of 4 nodes going down in the average churn event seems unrealistic. This is good, but potentially too good, we may not need 4 copies (kademlia republish is 24 hours, refresh == 60 mins). 4 copies may be way too much IMHO.
The bottom line for us. is that we lose no data, beyond that is just more caching really and not necessary.
I think it’s part of the deduplication process, if the file content is exactly the same, then only one file will exist in the mist…the name doesn’t alter that.
However if you slightly changed the file itself, it might get past as unique.
There are discussions on here that debate this topic from the perspective of video files, different compression schemes, throw in an extra frame etc…it’s going to be interesting to experiment with this, should provide some good threads on here.
It would be an interesting experiment to use say ZeroVM or Docker to fire up as many vaults as the hardware permits.
I could also see this being done commercially in Data Centres with something like Openstack to offer massive redundancy and security of internet 1 data.
i.e using SAFE opensource on Servers via containers/ VM’s effectively competing against global SAFE…unless the patents guard against this outcome?
There’s a project called TripleO (Openstack On Openstack) who’s mission it is to build a seeding machine that could spawn and commission a whole data-centre in an hour or less, automatically.
I was thinking out loud about the local SAFE situation in a business context here: