MaidSafe Pre-Dev-Update Update :safe: 25th April 2016

Sorry I can’t. I specialize in making screenshots from GH ;-).

7 Likes

Reduced complexity = greater efficiency and therefore increased security. Unless we lose a feature and in this case we go from 6 copies of data to possibly 8, but that limitation is likely short lived as we introduce N+P sharing (perhaps)

11 Likes

I agree to @dirvine about the reduced complexity, but my kudos go for the professionalism behind those kind of decisions. People don’t like to destroy what they spent time building, so when somebody is willing to say “hey, I found a better way, so let’s scrap all that stuff” that means they care more about quality than their feelings :smirk_cat: Another example for this was the rewrite in Rust after they were sure it was going to be better on the long run.

EDIT: Another thing is that we don’t like to change how we think about something we already decided (“this is how we do things here”) so when a team goes “well, we’ll think about this another way from now on” that shows unusual flexibility.

9 Likes

What is N+P sharing?

1 Like

Forward error correction - like reed-solomon/Shamir secret share or Rabins IDA protocol. (all slightly different) see things like fec++ etc.

So split a file into N parts and any P (number of parts) is enough to rebuild the entire file. Similar to a multisig type thing in some ways.

6 Likes

YAY!! MY OLD FAVORITE! :smiley_cat: That would introduce ridiculously high levels of redundancy; 4-5 years ago we exchanged one or two emails about this actually :joy_cat:

Can it be a problem that with this we would need to know the other pieces that make up the file to restore a lost piece? In the current (plain copy) system, there are fewer losses that are acceptable, but they can all be restored trivially. In a P-of-N setup, the losses could add up over time, because you need somebody with enough information (i.e. the datamap or the keys or idk) to put it all back together and then re-generate the missing pieces.

EDIT: How about doing the P-of-N on the chunk level? Nothing needs to be secret about the list of pieces that make up a chunk, and if the FEC part comes after the encryption, we’re back on “trivially restorable” land. Or maybe is that what you meant all along?

2 Likes

Yes :smiley: so this is a mix, we use replication for security and N of P for consensus and efficiency of xfer. So each node can hold the whole chunk, but only transfer the part they are responsible for.

Yes for this and caching we will identify chunk pieces by an index. So we can ask anyone for that index, if that makes sense. Need to watch injection attacks (a bad peice) though (we have a neat solution there though, digitally signed parts)

Not in place yet, but we will be testing this out as a parallel task, it has many advantages and can also provide extra security in fact.

1 Like

Nice! I understand. Btw I added a question to my previous post literally seconds within you posted this, but now I see it would indeed be done on the chunk and not the file level.

Yes, I realized my concern was only relevant if the FEC is done on the documents, where the other chunks are unknown; doing numbered parts for the chunks seems completely fine. I got confused because you used the word “file” in your original comment :smile_cat:

2 Likes

@dirvine:

So each node can hold the whole chunk, but only transfer the part they are responsible for.

Need to watch injection attacks (a bad peice) though (we have a neat solution there though, digitally signed parts)

Hmm, I’m lost in understanding this:

I picture an infinite recursion being needed to keep out such injection attacks, with chunks divided into littler, signed chunks to keep out an attack, and then again on the smaller scale, and so on. What is the error in my thinking?

1 Like

Not sure what you mean. So we get 4 chunks and one is bad but signed. We ask another node for that peice (index) and report the initial sender to the group. I am not sure where you are seeing recursion, but I may be missing something here.

1 Like

Ugh sry to post so much. I just realized I do NOT understand :joy_cat:

My understanding about FEC is that one can use a LOT less space to store the same amount of data while maintaining the same level of assurance that it’s not going to get lost (if the pieces are stored truly independently.) So, I had always thought that the primary benefits would be more efficient utilisation of storage space (i.e. instead of 1 MB => 4 MB something like 1 MB => 1.25 MB or less.) However, if all storage nodes store all the pieces of the chunk, then we’re actually a bit worse off, because now we’re on the 1 MB => 5 MB plan.

1 Like

Is this a bit like .par and .rar files on Usenet? Using the .par files to fix the missing/damaged .rar files?

1 Like

Similar concept, useful in many environments such as xfer of data across networks / RAID drives etc.

2 Likes

Yes. Let’s say you have an 8 MB chunk. You can say “let’s generate 10 x 1 MB pieces in a way that any 8 of them would return my original 8 MB data.” Now you can lose any 2 of those 10 pieces and still have everything.

Imagine for example a quadratic function: if you give me any 3 points that are on it, I can pick any 2 of them to compute the parameters for the curve. EDIT: make that 4 and 3 :joy_cat:

2 Likes

That’s cool, now I’m not a coder or something. But if you are on TCP with your chunks, you should always get the complete chunk after some time without a bit broken isn’t it? So this idea is for UDP then? Otherwise, why not make the network store some more chunks instead of doing it this way? I remember the devs talked about reliable UDP about a year ago. So this is to fix the UDP corrupt data problem?

1 Like

No, I’m more likely to be missing the point. I just need to think it through some more; I didn’t intend to throw a spanner in the works.

TCP uses retransmission for error correcting: if you miss something, you can get it again.

Forward error correction, on the other hand, does it the other way around: it sends (or stores) more data expecting some of it will get lost. This is what your mobile does, for example (on numerous levels, actually: how the audio encoding stream spreads out the bits within a packet, then on the level of the packets themselves, and probably on a couple more.)

Now, this doesn’t have to be about transfer, it can be about storage as well, which includes your .par example and, as @dirvine mentioned, this is what RAID (well, some of the RAID levels) is about.

The awesome thing about FEC is that with it you can gain high levels of redundancy for less storage or bandwidth than if you simply duplicated or retransmitted your data. The SAFE network stores multiple identical copies of the chunks, but it could reach similar redundancy by storing slightly more (but much smaller) FEC packets for a chunk. But I think @dirvine has other benefits in sight, maybe in addition to this.

2 Likes

So are we talking on a file basis or chunk basis?

If you talk of file basis and each part is a full 1MB chunk then the processing can be like minutes for a 4GB file on many computers and impossible for ARM.

If on the other hand you are splitting up each chunk into parts then this keeps processing (time) requirements quite low.

Yes this is all encrypted chunks we are talking about here. It’s not decided yet though a few parts to work through and we will do that publicly as usual :wink:

A neat trick is use this for sending data to client (instead of X copies, X parts). The main issues here are

  1. Caching, do we cache parts ? (few issues there)
  2. Bad chunks - identification not so simple (of bad parts)
3 Likes

Just to be clear, you are talking of taking a chunk and splitting it up into parts???

Aux Question: Say you use 8 parts resulting in say 12 after redundancy. You then store a mixture of the parts in each vault and only retrieve the parts needed to make up the original chunk . And is done over a number of vaults to reduce time for a whole chunk to reach the destination?

1 Like