Uploading performance

I’ve seen some threads like Profiling vault performance, from that thread the upload speed seems a bit slow, but I’m not sure what to make of it in terms of what the actual upload speeds might be when the network is up and uploading also might be much more optimized and I suppose parallelized. Assuming you have a fast connection and computer, could you have 1000 connections to the network all uploading simultaneously? If you upload a file the self-encryption process splits it into lots of chunks, could all of the chunks in theory be uploaded in parallel?

Are there at all any ballpark estimates to how long it might take to upload a 1GB file, a 1TB file or the contents of a 10TB harddrive full of random crap, say for someone with a 100Mbit line or a 1Gbit line, would there be much difference, or the network would be the bottleneck?

3 Likes

Good question, and not one that has an answer yet as far as I know. There was some discussion in ‘the other place’ about optimising vault performance. I could be wrong but I think I read that one machine won’t be able to have multiple parallel upload connections to the network (can’t find that post now). The other unknown, of course, is how much latency XOR networking will introduce, if any.

4 Likes

I’m always sure that ,after the network starts, there will be download managers Apps. that will make several different connections at once to speed file downloads.

In the Safe Vault yes, and with the Launcher maybe. But with the authenticator paradigm can not prevent someone does.

4 Likes

This was the vault upload/download (not sure which now) and a limitation that is planned to be lifted when the code is written.

For client uploads there should be no such limitations since an APP could simply open multiple connections, one for each chunk.

In the past a “blocking” mechanism was being used which meant for a connection client & vault the code waited for each chunk/packet to be sent which as you can imagine was a drag.

So for now we do not have a “speed” estimate, but I’d say if an APP can chunk up your files and send them off using different connections to the network then you should be able to near max out the upload link spped that the ISP allows. We of course have to contend with ISP contention ratios and what the actual upload speed is at the time you measure it. Your 1Gbitss/sec upload might only be able to sustain 400Mbits/sec because of the ISP and their contention ratios/network layout.

4 Likes

At least it sounds like upload speed won’t really be an issue (or download speed for that matter) :slightly_smiling_face:

1 Like

Upload speeds should be dependent on your link speed and any lag through the network. Lag is because the chunk stored OK signal will take noticeable time (networking perspective)

Download speeds though will be interesting in that you are now dependent on others vault upload link speeds. In theory your (say) 10 chunks could come from vaults with only an effective upload speed of 1Mbits/sec and so your 1Gbits/sec is quite idle with the download from a parallel download of the 10 chunks coming in at 10Mbits/sec combined speed.

3 Likes

I’ve been considering a proposal to ‘improve’ upload speed. It’s still too soon and would only detract attention from the tasks at hand, but here’s the idea in basic form:

Chunks are uploaded by the client to the first node as fast as possible, and the first connection will distribute them asynchronously to the network at a rate more appropriate to the network.

There are two upload counters: one for the data ‘arriving’ at the first node, another for it being fully ‘synced’ and stored. The user only sees upload progress as the ‘arrival’ amount (the faster one). The secondary background syncing progress is simply for confirmation the network has not had any hiccups.

In bitcoin terms, it’s quite similar to how a client perceives zero confirmation vs confirmed transactions.

Currently each chunk must be fully synced before the next one is uploaded, which means there’s no way to split this progress into two different measurements.

This proposal adds quite a bit of complexity to the frontend and also to the vaults, and to possible failure modes, but I think the increase in perceived upload speed will be worth it. It would make the upload seem as fast as the client can upload (or as fast as the first node can receive chunks, whichever is slower).

8 Likes

Maybe you can confirm/dash the idea of an APP sending each chunk on a separate connection so that they are effectively uploaded in parallel

Would this be the “relay” node? Isn’t there going to be a (small) number of relay nodes for any one client so the one used is not always the same?

I think I read that async is being implemented now. Is that right though?

I would have thought that would only require the client side to send chunks on multiple connections and the client keep track of each (when sent and when receive stored OK signal), Honestly I would have thought this is simply a client side operation and use the upcoming async comms that are being implemented in the core.

3 Likes

I see two interpretations here:

send each chunk on a separate connection [to the network / to the client manager] so that they are effectively uploaded in parallel.

I’m assuming the first interpretation is what you mean. Is this correct?

The client connection is currently handled by the maid manager persona (see wiki). Looking at the structure of maid manager there is an accounts field, ie data entering the network is aimed at the accountname xor space rather than the chunkname xor space.

Could it be redesigned so data can aim at the chunk level rather than the account level? I guess it could… but the authentication mechanism would probably add some overhead since the chunk must be checked against the account anyhow, which presumably means checking back with the maid manager.

The idea has merit though, since it would distribute the network load for the main chunk content across many nodes, leaving only the authentication handling to the maid manager. Could be an improvement… but I’m not familiar enough with the design of the personas to fully grasp the feasibility!

On the other hand, clients interfacing to the network via just one node (the maid manager) provides a degree of consistency, whereas interfacing to many points on the network may mean inconsistent responses, which may increase the complexity of the user experience and chance of upload failure if something goes wrong.

Not sure about this, my understanding is a client interacts with a single entry point on the network. Like on TOR, once the network establishes your entry point it stays that way for the session. Maybe someone more knowledgeable can clarify?

I’m not sure. There’s an async branch of safe_client_libs but I haven’t been following it so don’t know if it represents actual work on the async feature.

So I admit overall there’s not a lot of clarity from me, I’ve not followed the code as closely as I used to.

4 Likes

Utorrent uses a piece completion progress bar that fills each segment as pieces arrive. Not linearly. It perfectly represents progress while being easy to understand. Might be worth considering.

5 Likes

Oh yes, and if I recall it also does different progressively darker shades for each segment starting, partially complete, checking, complete.

2 Likes

Yes this is what I was meaning. And thanks for your evaluation[quote=“mav, post:9, topic:14404”]
Not sure about this, my understanding is a client interacts with a single entry point on the network. Like on TOR, once the network establishes your entry point it stays that way for the session. Maybe someone more knowledgeable can clarify?
[/quote]

My understanding is that during the early testnets there is (will be) only one relay node, but for the later tests/live network .it will be multiple nodes so that no one node can completely monitor anothers activity, even if it doesn’t know what is actually being sent.

2 Likes

@mav Found one such reference to multiple relay nodes in the later tests/live network. Other references use the singular/plural for relay nodes

1 Like

Would this mean that someone who has data stored that they want quick access to might be repeatedly getting, but not storing, their data, to keep it more cached and thus ensure faster download speeds? Perhaps there should be a way to pay extra to ensure fast access to private data and perhaps there should be some limits to how often you can get your private data without paying anything, not saying it should be expensive, just enough to discourage people from basically spamming the network.

Wasteful really. If you really wanted super fast access to certain data then store it on your drive.

Perhaps, but goes against the desire to level the playing field. You get a network where money talks and everyone else gets the slowest speeds.

In any case my example is likely to be worse case scenario. If the current 6Mbit/sec requirement for vaults remains then the above would be 60Mbits/sec rather than 10Mbits/sec. But normally you would expect the fastest vault to respond to be better than the minimum and unlikely to have all 10 chunks to come from the minimum speed vaults.

Also as time progresses the speeds increase. Take for example if we were discussing this 3 years ago, it would have been rare for many home internet connections to expect 1Gbits/sec up and down speeds. But now there is a significant portion of (europe?) that has 1Gbits/sec. We in Australia are either in the 30% that can get 100/40 Mbits/sec nominally (less in reality) or the group that gets 100/2 Mbits/sec or the majority that get a maximum of 20/1 Mbits/sec

1 Like

It’s common in some Asian countries like South Korea. In Europe it’s certainly available many places. I don’t know many people who have gigabit lines though, where I live people usually have 50-100 Mbit, a gigabit line is about 2x the cost per month, but most people currently don’t see the need to pay the extra amount as Netflix and whatever works fine without it. Perhaps people would upgrade if they could make the money back and more by farming.

1 Like