Sorry for this long post.
This is certainly an annoying issue and is currently a problem thats caused by the way the self_encryption crate works and is something weāre working on addressing quickly. Just to give some context on whats happening here, few people seem to have spotted the actual issue already but just to make it clear:
Simple scenario of uploading a file (few mb):
(larger file or folder/dns only wraps this problem in more layers)
Demo app calculates its progress based on amount of data it has sent launcher via launchers streaming API.
Launcher correspondingly calls an NFS module function via ffi to stream these bytes. NFS module then calls self_encryption write
which accepts the data and returns immediately.
Now the issue is nothing(chunks to store) actually gets written to the network until self_encryption close
function is called. Now by the time the launcher invokes the close fn from self_encryption(via NFS module), demo app considers all bytes sent to launcher and thereby has its progress as 99% waiting on just the close fn to complete. As you can guess close does not return immediately since its just now going to start sending chunks to the network.
So the current(not correct) approach for progress bar involves the progress bar updating when data is sent from Demo app to self_encryption and then at 99% all the resulting data is sent to the network. So while it appears like āits stuck at 99%ā, its actually just starting the long process of sending all chunks to the network at that stage and that entire process is represented by 1% of the progress bar.
So part 1 fix for this(currently getting sorted by @anon86652309 in self_encryption) is to write to network from self_encryption as and when its possible and not wait to do it all from the close
fn. This will thereby mean Demo app will not just go to 99% immediately and only when corresponding chunks get sent to the network and the close fn itself only will have to write the last chunks as per the self_encrypt algorithm to the network than all chunks of the same file. So the process of storing a file isnāt going to get any faster, the progress indication should reflect close to the real picture with this change.
Unfortunately this isnāt the only issue here as if it was, we could until this progress indicator stuff is patched, just advise wait indefinitely at 99% and it should eventually succeed when all chunks get written to the network. Currently crust module is setup to drop non-critical messages if its got too high a load to handle. This non-critical bracket includes client PUT/POST/GET. This drop can also occur not just from the local client(launcher) but across the network at vaults too in transit.
Routing has a recovery mechanism in place for this where when routing sends a message out, it expects the receiver to send an acknowledgement. If it doesnt get this āackā, it tries to send the same message via a different network route. However after trying GROUP_SIZE(8) routes, if it realises the message isnt getting ackād(someone in the route(s) is dropping the message), routing āgives upā on this message.
This scenario now means clients(launcher) could end up with a request its trying, not getting a response from the network(success/failure) and ofc if this happens the progress bar is just going to remain where it is and no reply is going to actually come from the network. This scenario will then really cause the progress bar to be āstuckā and not update at all.
Part 2 fix is now @AndreasF with others from the routing team are also working to provide feedback on this case when routing is about to āgive upā by notifying the client(launcher) that the network is busy and the client should maybe try later since routing is not able to currently send the message across. To complicate things further routing does not send client requests as a single message to the destination but splits into multiple smaller messages to increase the speed the message is sent across hops. So when a part message is about to be āgiven upā, routing is going to flag client the corresponding request has ātimed outā.
With these two updates, we can then hope for the expected behaviour from demo app progress where it doesnt just go to 99% immediately. It should progress along its progress bar as and when data is stored to the network correspondingly and if it stays at the same progress its fine to wait since if the request is going to get dropped, launcher can expect an error from routing indicating a ātimed out/ network busy retry laterā sort of message.
Disconnected issue to this which again maybe isnāt directly from demo app/launcher is the problem of failed operation such as a DNS folder setup not being recoverable. @ustulation is currently looking into this as safe_core was expected to provide some degree of retrying capability. That certainly needs patched too as without that its an issue too where just cos the network was busy at a certain point and the request failed/got-ignored blocks the user from setting up the same data in the future.