For those encountering problems with QuicNet, I offer some pictorial advice spotted in the Budapest subway
What bandwidth usage are you seeing? Currently my 200 nodes are using more bandwidth than 600+ nodes in last testnet.
30 nodes chewed through about 500Gb in 24 hours in my set up
Average for 30 node machines:
24Hr: RX 140.39, TX 161.85, Tot 302.24 GB
I have one at (also 30 nodes)
24Hr: RX 288.14, TX 317.20, Tot 605.34 GB
I am thinking that nodes at home is going to become extremely important in the upcoming testnets for the simple reason that quotas in datacentres are going to be exceeded very quickly because of the fast uploads we now have. IE expense.
Nodes at home allow this to be not only spread across a lot more machines (homes) but also larger quotas for those on āunlimitedā internet plans.
@joshuef ?
Something is not right with routing updates I think. I stopped all nodes on one of my machines more than an hour ago and I am still getting around 4 Mbps (100 pps) of safenetwork traffic towards that machine. Many different source IP addresses.
I have only one public IP so I forward port ranges to different machines. Is it possible that something somewhere only checks if IP is alive instead of IP:port?
Having trouble uploading, log attached.
safe.zip (2.3 MB)
Yeh, I have that same feeling.
Weāre also encrypting now, which means chunks are hanging around in nodes longer than previous too.
Which also also means that replication
might be a significantly heavier process now .
Aye, it may be less nodes per droplet basically to sort this. Alongside most likely further mem improvement.
Weāre close to having the omni distribution there. At that point weāll switch off the faucet (at least for a while).
For sure.
FYI all, its looking like the higher than anticipated mem is causing node restarts, and thereās a bug there w/r/t nodeās keys which is essentially whatās causing the upload issues.
We canāt really manage that on this testnet, but weāll be away to fix the restart issues (which should hopefully solve uploads), and then be focusing on mem a bit on this quic front I think.
So Iāll be bringing this testnet down later today! Thanks everyone for getting so deeply into it and all the log sharing and debugging!
Took 23 minutes to upload 100 x 5KB files for meā¦so inefficient somewhere
Sadly there are lots of parts that could be causing problems here. As noted, uploads are shaky due to the node restart issue. Iām pretty doubtful itās the encryption layer thatās causing issues here. (Could be though!)
Is this the same problem we had last year?
no, different one
With regard: feat(faucet): fetch maid pubkeys to enable encryption of distributions
At what stage does the safe folder (windows) get fully encrypted, since most of the keys and wallet files are plain text?
I dont understand what the reboot/restart issue is. Why are nodes restarting and causing the incorrect keys issue, nobody else appears to be asking so did I miss a memo?
The nodes participating in the testnet are configured to run as services via Systemd. The service definition instructs that the service be restarted if it dies. So I think the processes are dying because of a memory issue, then the service manager restarts them. I think Iāll have to follow up regarding an issue with keys, though. The nodes should restart with the same root directory, and hence the same key. Unless this is some other problem altogether.
This and initially a misconfigured genesis node was enough to trip us up.
So
- our tolerance to a bad node at the client is too low.
- Node restarts w/ sytemmd needs a fix
- memory issues to be addressed
Thanks gentlemen makes far more sense now.
true and cool that with these tests in the wild we already have āsabotaging nodesā
Unintentionally cool, but yeh! Good to be seeing this and so nice to be able to see whatās up pretty readily.
Weāre also eyeballing encryption as a potential cause of mem issues. @qi_ma noted maidsafe nodes were actually running the prior release to the OP! Which had no encryption enabled. And we have not seem as wild an increase in mem as other nodes (albeit an increase, but no nodes seen above ~600mb, which is substantially different to 4gb @josh reports!)
So a PR is in to disable that for any next-net, which weāll get up probably as soon as that restart issue is nailed down and we can see where we are!
donāt sell me short here Josh, I have a sixer too!