funny that there wasn’t a single testnet since the switch to 4MB that was with a smaller chunk size in the last 3 months (just to test if that was the main thing causing trouble for people)
There has been an update to the discussions in git about this by the QUINN people. @bzee
And it seems that they have confirmed some things @dirvine while not fully understanding Nodes comms and its peer-to-peer nature. They are viewing this through the glasses of data-centre server to client type of comms where its the server controlling things. For Autonomi the “server” is just another node and the client is another node (node here could be a uploader/downlader app too) and both can be behind a potato router.
max_stream_data
doesn’t apply to a connection, but instead to a stream on the connection. One connection may have multiple streams, in which case the max allowed unacknowledged data on the connection is the# streams X max_stream_data
.max_connection_data
applies an additional maximum to the whole connection.
But this detail is probably not super important for you if you have a smaller number of streams.
Yes I did conflate the two a bit in the OP but later posts separate them. The point made in the OP was relating to the stream being part of the connection. As mentioned the conflation is not so important an issue with small number of streams per connection.
max_stream_data
is not the maximum outgoing data that we can sent unacknowledged on a stream. Instead it is the max unacknowledged data that we allow the remote peer to sent to us, before they must stop sending (else it’s a protocol violation).
In fact this just confirms my findings and clarified it. Each node (or downloader app) can be the requester and there is a sender node (or uploader app) and it is indeed the sender that needs to have its data chopped up into smaller parts to allow, on its link/router, other data to flow or the router buffer to not overflow. So they have confirmed the OP while correcting the wording.
Basically this is a peer to peer network and nodes can and will be at certain stages a sender or receiver. The requester limiting the max windows size helps the sender to not have router issues or link congestion issues. The comments above is written more from the requester side.
@riddim This explains why setting the max_window_size on your end had no effect, it is because the troubles only occur when your nodes are sending. You setting the size only affects when you are the requester (receiving). And explains another reason it has to be network wide.
Also a potential attack vector, albeit a tiny largely ineffectual one, in future when the max window size is reduced to 128KB or 256KB and that it a person could set their env vars to want the larger window size and cause some routers to have buffer overflows, and a good reason to not allow ENV vars to change the max_window_size
I’m very curious too see how the settings you recommend would effect the challenges I’m having in running nodes from home. But there has not been much acknowledgement towards them from the team, so it’s hard to say if they have even read your last post or not.
I wonder, would it be technically possible to run a Comnet with the window size set to 128 KiB? Could all the participants set all the required parameters right now that the PR below is in?
I think that if the official release is going to be tweaked towards your suggestions, they may still adjust the window just a little, but I’d really like to see how it works in the range you proposed, and especially in the low end of the range rather than the higher end.
If we had a Comnet to try this out, it might speed up the research quite a lot. Personally, I would be keen to participate, as the small number of nodes I can run, doesn’t earn me almost anything. For a long time the earnings were so low, that I wasn’t even eligible for the lottery. So, helping the development with a parallel track would be better use of my limited resources. Especially if the results would help in ending the discrimination against the smallest participants.
Though, unfortunately I’ll be traveling for the next couple of weeks, so I’m not actually able to participate during that time. But maybe the idea of Comnet needs some time to brew, and perhaps the official settings haven’t gotten down to “your levels” in that time?
EDIT:
Forgot to link the PR I mentioned earlier. (Also changed the number 125KiB to 128KiB)
I wonder how many people who usually participate in a comm net have upgraded their routers as well. Maybe just you and riddim have not yet.
The problem with people trying it is that it isn’t their end that needs the lower max windows size because it is the uploading that causes issues and its the other end (the requester) that specifies the window size desired
So yes, a commnet is needed or a Maidsafe testnet inviting a few people like yourself.
But basically if lowering the value to something like 128KB or 256KB does not adversely** affect datacentre test nodes then its good basic flow control settings for lower grade routers that typically make up the majority of ISP supplied routers.
**I would perhaps expect a few percent reduction for speed tests might occur in datacentre nodes to datacentre nodes due to handshaking, but unless its a “flat out” test then I would be surprised at anything beyond normal testing variance. But since we need home nodes to be favoured then a couple of percent loss for data centre is overshadowed by the increased %age of healthy home nodes communicating in a healthy way.
It seems the recent updates have helped the nat table issue due to the lower connections required, even to the point the starlink router handles 80 nodes without issue. Trying 2 SBCs and 120 nodes.
Now its time for the max windows size to be reduced to 128KB for good flow control health. There is a reason IP packets are still 1536 bytes and max TCP packets is anywhere from 1 to 10 packets before NAK/ACK is required. Flow control is essential for peer to peer between home systems with potato router. Experience of over 50 years tells me this. Now will max window size in QUIC actually implement this right? Dunno but hope it does.
Just to clarify, by “uploading” you mean “a node sending data to requester”?
And it is the requesting side, that decides how big lumps of data my node is going to send. So now, when anyone requests a bit larger chunk, like the max. 4MB, my router is edging towards congestion. And there is actually no setting I could set to tone it down?
That is a great improvement for sure!
From the comment in the Github it is confirmed to be that. And when you consider it, it should be this way too since its the receiver that has to supply resources to handle the data
I did not notice one, and if you think about it, it is the receiver (requester) that has the buffers to be concerned with and its ability to process the incoming data. The sender is just pumping out data, so no buffers to consider.
The concept/code does not seem to consider intermediate devices in the communications path. Thus the need for Autonomi node/client code to do this.
In most cases, apart from malicious code, this will never be an issue since all nodes are using the same code and if reduced window size it’ll be all good. And all malicious code does is cause dropped packets. Hiccups if you will. It’d require a bot farm of large size (when compared to network size) to cause communications disruptions but hard to cause real troubles.
Example usage for the CLI:
bzee@demo:~$ ANT_MAX_STREAM_DATA=250000 ant file upload README.md
It would seem the implementation has not gone into core, just a CLI option on the client side. If correct, this does nothing for the spikes on home nodes.
Correct. It is the receiving node that controls the max_window_size. So the node setting it would only be helping another node sending to it.
So…we should consider this best practice for uploading now?
If so, it really needs to be in the docs and widely known.
It also needs to be in the client GUI as an option (something like ‘smooth upload’ checkbox)
It is unlikely to help right now. The setting will be removed soon, it just got left as an artefact.
We have to do large scale tests and really set this network wide in an upgrade.
And I got bogged down with this week and helping in discord with tge.
Will get those theory charts done this week and posted so hopefully help a little with testing