SAFE Network - TEST 6

Sorry for this long post.

This is certainly an annoying issue and is currently a problem thats caused by the way the self_encryption crate works and is something weā€™re working on addressing quickly. Just to give some context on whats happening here, few people seem to have spotted the actual issue already but just to make it clear:

Simple scenario of uploading a file (few mb):
(larger file or folder/dns only wraps this problem in more layers)

Demo app calculates its progress based on amount of data it has sent launcher via launchers streaming API.
Launcher correspondingly calls an NFS module function via ffi to stream these bytes. NFS module then calls self_encryption write which accepts the data and returns immediately.

Now the issue is nothing(chunks to store) actually gets written to the network until self_encryption close function is called. Now by the time the launcher invokes the close fn from self_encryption(via NFS module), demo app considers all bytes sent to launcher and thereby has its progress as 99% waiting on just the close fn to complete. As you can guess close does not return immediately since its just now going to start sending chunks to the network.

So the current(not correct) approach for progress bar involves the progress bar updating when data is sent from Demo app to self_encryption and then at 99% all the resulting data is sent to the network. So while it appears like ā€œits stuck at 99%ā€, its actually just starting the long process of sending all chunks to the network at that stage and that entire process is represented by 1% of the progress bar.

So part 1 fix for this(currently getting sorted by @anon86652309 in self_encryption) is to write to network from self_encryption as and when its possible and not wait to do it all from the close fn. This will thereby mean Demo app will not just go to 99% immediately and only when corresponding chunks get sent to the network and the close fn itself only will have to write the last chunks as per the self_encrypt algorithm to the network than all chunks of the same file. So the process of storing a file isnā€™t going to get any faster, the progress indication should reflect close to the real picture with this change.

Unfortunately this isnā€™t the only issue here as if it was, we could until this progress indicator stuff is patched, just advise wait indefinitely at 99% and it should eventually succeed when all chunks get written to the network. Currently crust module is setup to drop non-critical messages if its got too high a load to handle. This non-critical bracket includes client PUT/POST/GET. This drop can also occur not just from the local client(launcher) but across the network at vaults too in transit.

Routing has a recovery mechanism in place for this where when routing sends a message out, it expects the receiver to send an acknowledgement. If it doesnt get this ā€œackā€, it tries to send the same message via a different network route. However after trying GROUP_SIZE(8) routes, if it realises the message isnt getting ackā€™d(someone in the route(s) is dropping the message), routing ā€œgives upā€ on this message.

This scenario now means clients(launcher) could end up with a request its trying, not getting a response from the network(success/failure) and ofc if this happens the progress bar is just going to remain where it is and no reply is going to actually come from the network. This scenario will then really cause the progress bar to be ā€œstuckā€ and not update at all.

Part 2 fix is now @AndreasF with others from the routing team are also working to provide feedback on this case when routing is about to ā€œgive upā€ by notifying the client(launcher) that the network is busy and the client should maybe try later since routing is not able to currently send the message across. To complicate things further routing does not send client requests as a single message to the destination but splits into multiple smaller messages to increase the speed the message is sent across hops. So when a part message is about to be ā€œgiven upā€, routing is going to flag client the corresponding request has ā€œtimed outā€.

With these two updates, we can then hope for the expected behaviour from demo app progress where it doesnt just go to 99% immediately. It should progress along its progress bar as and when data is stored to the network correspondingly and if it stays at the same progress its fine to wait since if the request is going to get dropped, launcher can expect an error from routing indicating a ā€œtimed out/ network busy retry laterā€ sort of message.

Disconnected issue to this which again maybe isnā€™t directly from demo app/launcher is the problem of failed operation such as a DNS folder setup not being recoverable. @ustulation is currently looking into this as safe_core was expected to provide some degree of retrying capability. That certainly needs patched too as without that its an issue too where just cos the network was busy at a certain point and the request failed/got-ignored blocks the user from setting up the same data in the future.

22 Likes

If you have a local vault, then the client(launcher) can be using your local vault as its proxy node to the network. Proxy node serves as the entry point to the network. This will eventually not just be limited to 1 proxy but multiple for performance and security benefits but currently its just a single vault.

So yeh, if you then kill that vault, your client(launcher) looses its ability to communicate with the network. Now without restarting the vault, if we choose ā€œretryā€ from the launcher, it should get a new proxy node ofc which will not be the local node and be suitable to resume its client operations

9 Likes

Would be great to have a client capable of seeing upload stopped or getting frustrated part way through a large upload, even to the point of seeing the network drop entirely, then returning and continuing from where it left off. Avoiding redundant repeats of action, can only help user experience.

6 Likes

Great stuff @Viv :slight_smile:

What about the problem of being blocked once this has happened? People have reported what sound like broken states when theyā€™ve retried after giving up before completion.

4 Likes

This is what spandan is looking into right now. I summarised it a bit in the previous post

This seems to be a bug or a case not handled from safe_core recovery right now. so hopefully we can get this addressed too

9 Likes

Again, a developer goes above and beyond to include the community in whatā€™s going on.

Much appreciation to this awesome team and all that they are creating with the SAFE Network!

12 Likes

Well this is what all this testing is for. To find all these bugs and get rid of them.

2 Likes

I want a Maidsafe NETFLIX , when! wheeen! ;Ā“(

2 Likes

That ones prolly down the pike aways as the network needs to become far more efficient with its routing messages etc. but Iā€™m sure something like popcorn time will pop up at some point utilizing the streaming api that was recently released :slight_smile:

2 Likes

Popcorntime > Netflix. Iā€™d take popcorntime over netflix any day.

iā€™ll prove it- twenty characters

cool cool cool those where the ones i wanted to upload too :slight_smile: :hugging:

@Pierce ā€¦i started to watch hellsing ultimate -.-" ā€¦ >.< but at least i hat do work only 12h today :rolling_eyes: :robot:

2 Likes

I cant connect to the networkā€¦is the test already over?

http://video.vault.safenet/ <<< i just started big buck bunny Oo seems to work for me

1 Like

Test Net 6 is still working fine for me.

EDIT:Added later.

2 Likes

Canā€™t connect at all. Deleted all Little Snitch rules for Safe Launcher but still nothing

hmmm - looks like mac to me - with ubuntu 14.04 everything fine as far as i can tell Oo

and as mentioned ā€¦ sites load ā€¦

Yes, i have it always when i (re)start it. No matter what.

(and i got an email with your answer before youedited it ;))

1 Like