Discussion on low level data flow and home networks. Found solution to allow 4MB max chunk size as smooth as 1/2 MB - its a setting in QUIC

Forward

Discussion on data flows, router buffers issues for home setups using “bring what up have”
@joshuef If you want to check the setting I suggest might solve buffer exhaustions in home routers then skip to the end.

Examination of home setups and “bring what you have” impacts communications in the network.

Introduction

The Autonomi network is at the heart a distributed data and file storage system using the spare storage capacity that people have on their computers. One of the goals has always been to have people use resources that they already own which helps satisfy two other goals.

  1. an extremely distributed network unseen anywhere to date in the world where people around the world can participate in the network by not only using the network but importantly be able to provide the nodes and storage to bring this about.
  2. a decentralised network that is owned by no one and everyone on the earth equally. Also one that is not controlled by anyone or organisation, government, corporation yet collectively owned by everyone equally.

Low level communications if not measured can cause effects that appear to have other causes because of such things as packet loss and transfer delays. These may appear in metrics as nodes not being responsive, computers the node is run on being overloaded, the computers CPU usage rises due to errors but appears as overloaded computer.

In this summary of considering low level communications the impacts of using “bring what you have” will have on the quality of low level communications over the network and stability. As such the compute/storage provided will be assumed to be a typical computer from SBC to desktop since their comms will be similar. The examination on communications will look in to such things as

  1. What elements are people are bringing along in terms of network setup, switches, and routers
  2. How data moves through the users networking
  3. Touch on how data will flow through the internet (not done since this discussion is big and with time constraints. It is to do with packet errors causing whole records to be resent)
  4. How udp will affect how data flows and data sizing effects

Discussion

Background considerations

Autonomi is relying home users to be able to use the hardware they currently have now and not in the future. So while the network software has to be written with the future in mind, it still needs to be able to work in a satisfactory fashion where outages of large sections of the network happen.

Outages could include or be caused by but not be limited to things like, earthquakes, large power outages, fibre cables being cut/damaged (anchor, exit points damage, acts of war, maintenance, and so on), political actions, border router bugs/misconfiguration, solar.

For the network to survive it needs to be stable, not lose data, “feel safe/secure to users”, not so slow people stop using it, and does not cost an excessive amount, and people have easy ways to buy and use tokens.

This summary of low level communications is not concerned with most of the requirements for a network that will be adopted by the world. It is concerned with low level communications which has consequences around the stability of the network and affects communications speed. Speed of record flow has a number of components and low level communications is just some of the factors affecting record transfer feeds. This examination is not considering the processing time within the user’s nodes nor remote nodes, but does consider the communications.

Network elements

Home network had started off with the LAN being 10Mbps, then 100Mbps, and now 1Gbps for most of the home LANs. Slower speeds are rare except for WiFi connections in the home. WiFi being slower and closer to the household’s internet connection and may actually benefit the household’s node stability because of the effects that will be outlined later.

For this examination the typical household situation with computers being connected to one or more switches and a ISP supplied router with some notes on WiFi and mobile devices. This is not meant to be an exhaustive discussion but to bring understanding of the underlying communications with the view of good decisions being made for parameters affecting the communications in Autonomi.

The following descriptions are not meant to be detailed and meant to provide an overview of the device’s purpose.

Hubs (L1 device)

These are basically just connecting the network twisted pairs together electrically without any buffering or flow control. This is used in LANs. Cannot control flow so that packets do not overlap. All computers receive all packets sent by other computers connected to the hub

Hopefully no one is using these devices because switches provide much better connectivity and very inexpensive to make hubs inappropriate

Switches (L2 device)

These allow multiple computers to be connected together without packets clashing causing errors. Used in LANs (Local Area Networks)

The switch has memory buffers, uses the MAC address to do port to port switching. The advantage of using the MAC address is that packets are only sent to the device that is the packet’s destination. For switches used for the typical home network the ports are all the same speed and the memory buffer does not need to be too large.

Routers (L3 device)

These devices connect a LAN (Local Area Network) to WAN (Wide Area Network). When a packet is destined for a device not on the LAN the router will send the packet over the WAN connection (internet connection typically)

Most homes will only have one LAN, but some may have multiple LANs with one or more routers handling transfer of packets between LANs and the WAN(s)

The router is a much slower (relatively) than a switch since routing is a lot more expensive in processing and often done by a Microprocessor rather than very fast switching chip. Routers all have memory buffers to queue up packets to/from the WAN and for holding its NAT table.

Routers are used throughout the internet and corporate networks. These are much more powerful than any home routers. Although the need for large memory buffers really depends on the networks being connected and the speeds of the ports connecting to the LANs. For instance a 1Gbps to 1Gbps connection does not need as large a buffer as a 10Gbps LAN to a 100Mbps WAN (or LAN)

ISP & home routers (L3 & L2)

These are routers with a switch in the same unit. The switch will be handled by a specialised switch chip with its own internal small buffer memory. Any packets destined for the internet will be buffered/queued in the router’s memory buffers and sent in turn. Packets received from the internet are stored in a buffer and using it’s NAT table the packet is either rejected or passed onto the destination device on the LAN

Typically the home LAN is 1Gbps. And the WAN (internet connection) is 40Mbps or slower.

When the average upload speed obtained from the speed test sites it is a pure average. This means because the 1Gbps & higher upload rate that is possible in various selected regions of the world, the average is skewed upwards which is shown by the relatively slow average of 48Mbs.

For this discussion the uplink speed of 40Mbps will be used and must be remembered that most average home internet connections will be this and below.

TCP & UDP

TCP is not used for Autonomi at this time. The notable property of this protocol is the error detection and built in flow control because each packet (block) is acknowledged as OK or NOK.

UDP is a protocol that is faster than TCP, but does not have error detection or acknowledgement as part of its protocol. Also correct packet ordering is not guaranteed through routers or routes used.


www.cloudflare.com/en-gb/learning/ddos/glossary/user-datagram-protocol-udp/

QUIC

This is used by libp2p which Autonomi uses. libp2p is a library.

There is flow control within QUIC and max data block size is set to 10MB

            // Ensure that one stream is not consuming the whole connection.
            max_stream_data: 10_000_000,

Basic data flow

Data flows into and out of the device the node is running on. The node receives data in the form of messages and records. Messages are small and form part of the protocol that makes Autonomi work as a distributed data store. These can be operational messages and responses or they can be requests for a record to be sent and sending a request for a record to be sent to the node. For this examination the operational messages can essentially be ignored since they are small and do not significantly cause communications issues. Records on the other hand are large in comparison and exercise the network infrastructure to a greater extent.

When a record is sent to the node it will arrive at the router which will buffer the packets making up the record, then checking the destination of each packet and forward it to the device the node is running on. Because the typical home network the WAN (internet) connection is significantly slower than the internal LAN, which means in normal operation the packet is forwarded without much delays. IE buffer queue will remain small.

When a record is being sent from the node the record will be broken down into network packets and forwarded through one or more switches to the router which will buffer the packets while sending other packets from the buffer queue. These other packets may or may not be the earlier packets from that data block (record). The node will send up to the “max_stream_data” before waiting for handshake back from the receiving node. This is currently set to 10MB

Since Autonomi is using UDP the packets will be sent rapid fire from the device to the router as fast as the LAN networking will transfer the packets. Typically this is 1Gbps. The router will forward on the packets across the WAN (internet) at 40Mbps or lower for the majority of home connections. If the router is not sending any other packets then the buffer is filling up at approx 120MBytes per second and emptying at approx 5MBytes per second. Thus for 1.2MB of data it takes approx 0.01 seconds to transfer the data and takes 0.24 seconds to forward the data block to the internet.

Now with QUIC there is the possibility to do flow control on the data being uploaded to another node. Since the value is 10MB currently there would be up to 10MB sent in one go.

Single node considerations

For one node then records will be sent across the device’s network to the router and the above considerations on buffering will apply. One or multiple records can be requested at the “same” time. This can occur if a new node appears in the network close enough to records held by the operator’s node. This means that the router’s buffer will have to handle approx twice the data.

Multiple node considerations

The requests for records will be happening for multiple nodes now in an independent manner. This may or may not result in there being multiple records being sent to the router for forwarding to other nodes. When there is little load on the network then this will not happen often unless there is a influx of many new nodes and more than one is close to some nodes on the operator’s device.

Multiple nodes on multiple devices on the one LAN

The switch now may have to buffer packets being sent to the router section since the router is only one port off the switch (internal or not). For a lot of packets being sent by multiple devices at full speed will require buffering. This is because the router has no way to flow control the UDP packets, whereas TCP can due to the TCP protocol.

Review of basic data flow.

Autonomi uses UDP as the low protocol with QUIC on top. Due to UDP having no flow control itself the packets from a record request up to “max_stream_data” will be send at full LAN speeds to the switch onto the router. If the switch is also switching packets to the router from other devices then buffer space in the switch will be used. The router will buffer these packets while it is forwarding the packets via the internet to the other nodes.

All is fine if there is no buffer filling up.

Buffer sizing vs record sizing

The basic data flow discussion showed that routers (& switches) have buffers to queue up packets to be sent, the buffers are both essential and can be in certain circumstance exhausted at which time the only remedy is for packets to be dropped.

For Autonomi one of the basic parameters for storage is the Max Record (chunk) size. For any file being uploaded or record the maximum parcel size of data in the record is the Max record size. Many records may be lower than this.

Thus when file is requested there will be a minimum of 3 nodes supplying the various records (chunks) making up the file. This is due to SE (Self Encryption) having a minimum of 3 chunks and each chunk has a maximum size of Max Record Size.

From this the average number of records in any one node on the network will be a function of the total data and Max Record Size. It is not an exact function due to the minimum of 3 chunks per file and so there will always be more records stored than total data stored divided by max chunk size.

The significance of this is the number of nodes sending data when a given file is downloaded is higher (or same for small file) when the max record size is smaller. The follow on effect is that for each home node the buffers will fill less than for larger max record sizes.

How can max record size cause issues with home networks.

When more records are requested from one or multiple nodes on the LAN then the router has memory buffers for. When there is no more buffer space available then being UDP the following packets will be dropped.

What is the buffer size in a router?

This depends on the router model and purpose of a router. Here is an example of 2 routers with different purposes. Mikrotik routers were chosen because they actually provide the amount of RAM. This RAM is split up into send & receive buffers, indexes, NAT table, operational variables and any other requirements

(Semi) Infrastructure router


From the specifications the RAM is 64MB.
For high speed in and high speed out the buffering will always be minimal and thus the RAM requirements is not as high as for other applications where there is a much greater mismatch between LAN & WAN

Specialised “home lab” type of router


Here the RAM is much higher to allow large buffers and NAT tables. It was specially made for many types of operations.

ISP supplied router

No specs since there are 100’s of possible routers ranging in RAM size of 16MB to 64MB

This is one reason why their NAT tables are relatively small and connection wise only supports 5 to maybe 40 or 50 nodes.

From the discussions above it can be seen that the limited buffer size on most ISP supplied routers will also be a limiting factor on the number of nodes that can be run.

What does this all mean

This discussion has deliberately not gone into too much depth and details, but provides the brief that provides enough details to examine the what is happening on home networks when max record size is changed. This is not meant to provide a comprehensive set of reason for a recent collapse of the network, but to provide some tools for developers to use to work out this one area of sizing when planning for the current home internet setup. One that will persist for years to come in the average home.

Home ISP supplied routers are woeful on their internal specifications with such things as

  • routing throughput “pps” (packets per second) and data (Mbps)
  • RAM 16MB to 64MB with most not at 64MB
    • NAT table space
    • send & receive buffers
  • etc

For most home situations the overall data flow is receive which requires little buffer space. And upload is rarely used in comparison for anything more than requesting download data/packets and occasionally a 5Mbps type of video “call”. With the internet connection speeds much lower than LAN speeds then no significant buffering for receiving packets and if upload is above 8 Mbps then no buffering for those those video chat sessions.

This is why ISPs get away with supplying these low spec routers. For those with fibre then typically the router has better specs. In these cases though with the WAN (internet) connection being on par with the LAN speeds then router buffing will be very minimal as well, thus the ISPs get away with relatively low specs for a premium internet connection.

In all case though the users who have reported back all show that their ISP supply router had issues with the NAT table size being unable to support enough connections for more than about 20 to 50 nodes.

From this discussion it can be seen that for ISP supplied routers the record size can also cause a limiting factor on the number of nodes possible and may explain why 4MB max chunk size prevented many who could run 5 nodes not being able to reliably run even 1 node continuously and mobile connections not at all.

So how can a good size for max record be determined?

It depends on the target makeup of the network desired.

If a network with nodes mainly in professional locations such as data centres or large businesses with premium networking/internet then larger the max chunk size the better since the switches and routers will have the RAM due to the network infrastructure having WAN speeds comparable with any LAN speeds.

If a network with nodes mainly in the homes with average ISP supplied router/switch then a smaller max chunk size that will not overflow the internal router buffers whenever more than a couple of chunks are being uploaded.

What is a size that is suitable for maximising home nodes, well from the test networks 1/2MB worked well for the loads being placed on the network. This is because compared to the 4MB max chunk size there is 1/8 the buffer needed for the same number of chunks, and this also means that coinciding chunk uploads through the one router is less likely since a chunk is fully uploaded 8 times quicker. This represents up to 1/64th the buffer load on the home router

Conclusions

Data flow through the home network to the internet (upload) relies heavily on the buffer supplied in the router due to using UDP and the router not being able to regulate that flow and dropping packets if the buffer space is exhausted.

Multiple nodes uploading chunks at the approx same time will use more of that buffer space. The larger the chunk the more buffer space required.

A typical ISP router with 16MB to 64MB memory will not be allocating all that memory to buffer space since the RAM has many tables and buffers required to live in the RAM and since most ISP routers will not have a upmarket specs and likely only have 16 or 32MB RAM the buffer space could be as low as 6MB on a router with 16MB and 12MB on a router with 32MB, its pretty obvious that any network with chunk sizes above 2MB will max out many home routers once 2 chunks have to be uploaded at the same time. And for better ISP router maybe 4 chunks or even 6 chunks at the same time.

Thus it can be seen that 4MB max chunk sizing will exclude many home routers.

Addendum and solution if I read the QUIC code correctly

From the discussions it can be noted that the issue was flow control on the record data blocks being sent over UDP. If there was a way to tell nodes to only send the record in small amounts then the buffer issue reduces to a manageable amount.

solution?

As mentioned in the write up there is a QUIC constant that controls that maximum amount of data.

// Ensure that one stream is not consuming the whole connection.
            max_stream_data: 10_000_000,

Just set it to 500KB or even 250KB for potato routers and 4 or even 8MB max chunk size should be fine not causing routers to exhaust buffers with higher activity like replication & churning.

Being at the default of 10MB is the problem

42 Likes

for the record, setting max_stream_data Has the Brave AI Assisted research return the following for Rust in the MS WIN11 build…

Configuring Quic’s Max Stream Data

Configuring Max Stream Data for QUIC Protocol on MS Win11 using Rust

To configure the max_stream_data variable for the QUIC protocol on MS Win11 using the Rust programming language, you’ll need to utilize the msquic library, a cross-platform, C implementation of the IETF QUIC protocol exposed to C, C++, C#, and Rust.

msquic Library

First, ensure you have the msquic library installed and imported in your Rust project. You can do this by adding the following dependency to your Cargo.toml file:

[dependencies]
msquic = "0.4.0"

Configuring Max Stream Data

To configure the max_stream_data variable, you’ll need to use the QuicClientConnectionOptions struct provided by the msquic library. Specifically, you’ll set the max_bidirectional_streams and max_unidirectional_streams parameters to control the maximum number of streams allowed for bidirectional and unidirectional communication, respectively.

Here’s an example code snippet demonstrating how to configure max_stream_data:

use msquic::{QuicClientConnectionOptions, QUIC_PARAM_CONN_PEER_BIDI_STREAM_COUNT};

let mut options = QuicClientConnectionOptions::new();
options.set_param(QUIC_PARAM_CONN_PEER_BIDI_STREAM_COUNT, 128); // Set bidirectional stream limit to 128
options.set_param(QUIC_PARAM_CONN_PEER_UNIDI_STREAM_COUNT, 256); // Set unidirectional stream limit to 256

// Create a Quic client connection with the configured options
let connection = msquic::QuicClientConnection::new(options);

In this example, we’re setting the max_bidirectional_streams parameter to 128 and max_unidirectional_streams parameter to 256. You can adjust these values according to your specific requirements.

Note: The max_stream_data variable is not a direct configurable option in the msquic library. Instead, you control the maximum stream count using the max_bidirectional_streams and max_unidirectional_streams parameters, as demonstrated above.

Additional Considerations

When working with QUIC streams, it’s essential to consider the following:

  • The QUIC protocol allows a maximum number of streams equal to 2^62. As there are 4 unique stream types, the maximum number of streams is 2^60, per stream type.
  • MsQuic takes care of updating the maximum stream count for the peer as old streams get shut down.
  • You can configure the unidirectional and bidirectional limits separately using the set_param method.
  • The default value for these limits is 0. If you want to allow the peer to open any streams, you must set a value.

By following these guidelines and configuring the max_stream_data variables using the msquic library, you’ll be able to control the maximum number of streams for your QUIC-based application on MS Win11 using Rust.

Follow up


🌐
doc.rust-lang.org
Data Types - The Rust Programming Language


docs.quic.tech
Connection in quiche - Rust

🌐
quicwg.org
Applicability of the QUIC Transport Protocol

🌐
github.com
[QUIC] Decide how to handle QUIC stream limits · Issue #32079 · dotnet/runtime

🌐
github.com
msquic/docs/Streams.md at m

7 Likes

Boom!

That’s the benefit right there of an active, involved and passionate community!

I have nothing to add except that something felt ‘a little off’ about the recent 4MB chunk driven journey to the edge of the cliff and beyond and this might be the best explanation and analysis of how we arrived at the bottom of the canyon last testnet. Twice.

A humble suggestion would be that this is read and digested by anyone associated with the networking design.

My personal feeling is that results of testnets are being skewed by what works - or seems to work until things go bad - for large node runners and this is skewing results away from what is realistic for home node runners.

13 Likes

Yes utilising QUIC’s flow control seems to have been the missing element. Set at 10MB default meant for UDP it was never used. @joshuef Flow control allows tcp style flow control but at larger amounts higher than 1500 bytes (10.5K if using 7 packets blocks). 10MB is great if using high performance devices and internet (aka data centre quality) but no good for home networks. 512KB max chunk sizing seems to have been OK from the earlier tests, but without quality metrics even 512KB window for QUIC flow control might high for optimal home setups.

And whats great about the flow control is you get tcp style error recovery without the requirement for the whole chunk/record being resent, just the window size resent. This means the time taken to send a 4MB or 8MB or even 16MB chunk does not increase significantly but great saving in time if an error (packet loss) does occur.

20 Likes

A few more facts on QUIC flow control to consider om MS WIN 11 to optimize safenode uploads to the Internet

I asked the question of Brave AI Assisted Search:

“What parameter settings in MS Win 11 OS best optimize QUIC protocol UDP uploads from LAN local 1gigabit/sec Ethernet to ISP 50mbit/sec Internet Service?”

and got this comprehensive answer, which looks accurate…, given the sources referenced, the results are displayed below as a ‘re-combulated’ Mistral AI generated answer. :wink:

Optimizing QUIC UDP Uploads Windows 11

Based on the provided search results and considering the specific scenario of QUIC protocol UDP uploads from a LAN local 1gigabit/sec Ethernet to an ISP 50mbit/sec Internet Service in MS Win 11 OS, here are the recommended parameter settings for optimization:

  1. MsQuic library configuration:
  • Enable MsQuic by setting the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\Quic to 1 (DWORD).
  • Set the QUIC_MAX_CONCURRENT_STREAMS registry value to a reasonable number, such as 32 or 64, to balance concurrent streams and avoid overwhelming the ISP’s 50mbit/sec connection.
  1. Network interface card (NIC) settings:
  • Ensure the NIC is configured for jumbo frames (MTU 9000) to maximize Ethernet bandwidth. This can be done through the NIC’s advanced settings or by setting the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Network\{4D36E96D-E325-11CE-BFC1-08002BE10318}\001\Parameters\TcpWindowSize to 14600 (decimal).
  1. TCP/IP settings:
  • Set the ReceiveWindow (RWND) value to a reasonable size, such as 128KB or 256KB, to avoid congestion and optimize QUIC performance. This can be done through the Windows Network and Sharing Center or by setting the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Network\{4D36E96D-E325-11CE-BFC1-08002BE10318}\001\Parameters\TcpWindowSize to the desired value.
  • Disable TCP window scaling (set TcpWindowScaling to 0) to prevent TCP from reducing the window size based on the remote endpoint’s advertised window size.
  1. QUIC-specific settings:
  • Enable QUIC congestion control by setting the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\Quic\QuicCongestionControl to 1 (DWORD). This will allow QUIC to adapt to network conditions and avoid congestion.
  • Set the QUIC_MAX_RETRANS_TIMEOUT registry value to a reasonable value, such as 100-200 milliseconds, to balance retransmission timeouts and avoid excessive retransmissions.

Note: These settings are based on the provided search results and may require experimentation to find the optimal combination for your specific environment. Additionally, it’s essential to monitor network performance and adjust these settings as needed to avoid congestion and ensure optimal QUIC protocol UDP uploads.

Here’s a summary of the recommended settings:

Registry Key Value Description
MsQuicEnabled 1 Enable MsQuic library
QUIC_MAX_CONCURRENT_STREAMS 32 or 64 Balance concurrent streams
TcpWindowSize 14600 (decimal) Enable jumbo frames
ReceiveWindow 128KB or 256KB Optimize QUIC performance
TcpWindowScaling 0 Disable TCP window scaling
QUICCongestionControl 1 Enable QUIC congestion control
QUIC_MAX_RETRANS_TIMEOUT 100-200 ms Balance retransmission timeouts

Please verify these settings and adjust as necessary to ensure optimal QUIC protocol UDP uploads from your LAN to the ISP’s 50mbit/sec Internet Service.

Follow up


🌐
techcommunity.microsoft.com
What’s QUIC? - Microsoft Community Hub

🌐
en.wikipedia.org
QUIC - Wikipedia

🌐
auvik.com
What is QUIC? Everything You Need to Know | Auvik

🌐
helpdeskgeek.com
How to Optimize TC

5 Likes

imo this is a job for the Maidsafe team, to at least consider the collective wisdom of Mistral AI as it is the source of the Brave AI Search which does imo a pretty credible job of curating decent sources on matters of MS WIN 11 sorcery.

Which gets me thinking creepy Nathan Myhrvold, professed architect of MS Windows, purveyor of Epstein Island several times, (Bill Gates buddy buddy too) surely has his spot reserved, when the time comes for him to depart this earth, right beside the coal furnace with his name engraved on his own personal coal shovel.

A deserving job for such a being, serving his satanic master for the rest of eternity, non? :wink:

2 Likes

Remember the AI is crawling the google docs and the multitude of browser orientated settings. Autonomi is a specialised case that doesn’t operate like a browser. What the team have now seems good except the lack of flow control because of the max window size being bigger than anything they are transferring.

We are after non technical people using this so no setting HW etc

Also that AI has not considered buffers in the router either.

Best to get it working without requiring any windows “optimisation” by the user. If a user wants to optimise setting then go for it.

Those TCP/IP settings are meaningless since its using UDP and no need to set jumbo frames since UDP is a fast fire protocol and the savings are in the tiny %age range considering the internet routing is not optimised for jumbo packets but for the 99% of traffic sized at the default size.

From my reading it is the receiving node that dynamically sets the concurrent streams for each connection. Only the max is configurable. But yes for a node with 200+ peers the concurrent streams should be 1 or maybe 2 for each connection if testing shows its OK

6 Likes

keep in mind QUIC handles both TCP and UDP concurrently (likely round robin), it’s also bidirectional in its flow control configuration, having different settings for each direction.

imo the safenode ‘home’ network setting needs to have its own YAML/JASON configuration file.

What is clear to me is that in order to get 5 or 10 nodes working on an 8GB RAM MS Wind 11 4core 8 Thread i510thgen slow horse is to set the payload size to a 256kilobit in both directions,

given the mix of traffic the safenode generates as it first sets up, learns about neighbours, processes a lot of relays supported by GOSSIP msg exchanges and then, occasionally writes to and read to disk, dealing with copy jobs and eventually quotes and storing for instances of file uploads, after getting paid.

In short a lot of different comms types with different payload size characteristics for each comms ‘channel’ safenode as it applies to Quick UDP egress/ingress.

Thoughts on this is size setting 256kb ‘both ways’ for QUIC to help optimize the ‘Home’ setting and the node-launchpad perhaps having separate yam/json configuration files for each type of meta-setting (ie ‘home’ et al)?

Anyone?

3 Likes

Yes but Autonomi uses only UDP

Well I had suggested 512KB or 256KB, so I am already in agreement here. 512KB was shown to work with 512KB max chunk, so …

8 Likes

The safenode code supporting TCP for trans inside ie- a Colo DC to get to Colo Operator Shared Storage Appliances operating in a SAN array, might be an option at some point, as the price can be quite cheap on long term storage contracts, ROCE/RDMA over iSCSI IP is the other option in the DC.

It means a node operator could rent for instance an 8 or 12 array of low power pi5s, placed in a secured shelf and have the pi5s via say OMV or similar store files on cheap storage as a service, where the DC Operator plugs a couple of leaf connections at 10Gbit into small in tray local switch which then dual homes those connections in to separate pair of Spine TOR Switches to rent cheap RAID protected block storage to the shelf tenant.

Another way to cost effectively scale up a ‘pod’ of safenodes, especially if one is running a container setup, which makes the update relatively easy.

One newer provisioning program to actually look at to potentially provision containerized safenodes to say an array of Pi5s running docker in the Colo DC is dstack.

Dstack was birthed recently a couple of years ago out of TUM in Munich, and the dstack author is now running a commercially supported FOSS project for dstack out of Berlin. It’s a pure Docker play, no Kubernetes required. Very good at scheduling ‘fleets of GPUs’ to service a mix of different workloads, dstack can also be used for anything, so why not use it to deploy containerized safe nodes.

1 Like

Yes the plan is to use TCP and IPv6 down the track, but in the foreseeable future while they are still debugging current features and implementing some other needed features like folders, native token etc

2 Likes

Do you really want them to make it easier and more viable to run safenodes in datacentres? That’s the kind of centralisation they are trying to avoid.

3 Likes

Seriously impressed here @neo This is incredible work and a great explainer at the same time. Amazing cc @joshuef

20 Likes

Considering all my grammatical errors and all the “then” instead of “than” you seemed to have gotten through my rough “report”

Thank you for the kind words. This was definitely a report designed to help bring understanding (wisdom) as a result of working through these various problems in “another life” :smile:

I was attracted to SAFE way back because of the similarities to the 5 year project back in the 70’s/80’s for a major telco doing store and forward of (small) data blobs between computers. Basically a node network. with routing, etc, once many of these were installed. Many of these issues were just a part of making it all work, the protocol writing, the hardware design, the RTOS written, and so on. This is perhaps one reason i have quite an understanding on what was perhaps happening behind the scenes in the home network.

21 Likes

There is a twist here though:

Only ghetto-tier-datacenter autonomi-operators would survive.

Kind of similar to how you can’t profitably mine bitcoin in equinix DC because you’re paying (a lot) for redundant power, storage, expensive cooling solutions, 24/7 smart hands, juniper core routers etc etc that are just overkill for running autonomi nodes.

If I were to consider rackspace for running autonomi nodes my only consideration would be price. If someone would offer a half burnt down cow stable with a 10/40gbps connectivity to the nearest internet i’d go for it and there’s no way that google or aws would be able to compete with absolute rock bottom prices. They have way to much overhead to do something as specific as running bitcoin miners or autonomi nodes.

7 Likes

I realise it will be very difficult to run nodes economically on the live network. It’s more this testing phase I’m worried about. I looked at it. The numbers don’t work out well.

My point to the parent poster was doing something that makes it easier to concentrate nodes in DCs would be counterproductive in my view.

3 Likes

Agreed, the network should be actively optimised for home nodes. If something also benefits DC nodes then fine, don’t deliberately work against DC nodes, but also don’t do something deliberately for DC nodes too in my opinion.

10 Likes

Nice one. Just had a wee read now, thanks @neo

This should be something we can get in in the non-too-distant.

Do you perchance have an idea of how to best measure the change?

12 Likes

As suggested, it’ll show up in a few metrics and typically when there is some change, like churning, replication with more than just a slightly more chance of nodes behind a home router sending data than what would be when basically idle.

I doubt very much you’ll see it when testing inhouse. The DO droplets are just too well connected and higher performance networking.

I expect it will show up in shunning rates, how the network survives when 10% of a >50K home node network goes off line within minutes, and other metrics that show various connectivity stats.

Its very difficult (impossible really) to measure the home router dropping packets and buffer filling since those metrics do not exist for nearly all of them. But that is where the effect is showing.

If you can see the data block retry figures then this should help determine when its happening. But again most unlikely in DO droplets. My Mikrotik router did show some dropped packets but it has probably well over 200 MB it can use for the buffer (packets destined for the internet).

Maybe try one network for 2 days and ask Mightyfool to drop his 10K nodes and see the results over 2 hours, then try one network with a small QUIC window size for 2 days and get him to drop his 10K nodes again. Both times uploading plenty of files. This is a sort of brute force method.

In any case the QUIC max window size should be a max of 512KB and maybe even as suggested by another to be at 128KB. This should be done no matter the results of a test since its good practice when using home nodes. If they were all VPS then a much higher value would be fine, but we are optimising for home nodes and home routers have NAT sizing issues and also as important with >1/2MB chunk sizes are the buffers.

I expect that if you wanted to test 8MB or 16MB max chunk sizes then these would now work when window size is small, and maybe a way to also test the window size effect.

One effect I didn’t get around to pointing out is that setting a small window size <= 512MB is that it also allows the router to adequately handle multiple nodes better and other traffic going through the home routers. That also is another reason for 128KB

16 Likes

Put some nodes behind any Cisco enterprise router with 20 Mbit line (or whatever speed you want to test). Cisco routers can give you lot of statistics about buffers, drops, etc… Many of them even allow you to set different buffer sizes, so you can simulate devices with different amounts of memory.
It is probably possible with other brands too, Cisco is what I have worked with.

6 Likes