QuicNet [30/01/24Testnet] [Offline]

For those encountering problems with QuicNet, I offer some pictorial advice spotted in the Budapest subway

5 Likes

What bandwidth usage are you seeing? Currently my 200 nodes are using more bandwidth than 600+ nodes in last testnet.

4 Likes

30 nodes chewed through about 500Gb in 24 hours in my set up

7 Likes
Average for 30 node machines:
24Hr: RX 140.39, TX 161.85, Tot 302.24 GB

I have one at (also 30 nodes)
24Hr: RX 288.14, TX 317.20, Tot 605.34 GB
5 Likes

I am thinking that nodes at home is going to become extremely important in the upcoming testnets for the simple reason that quotas in datacentres are going to be exceeded very quickly because of the fast uploads we now have. IE expense.

Nodes at home allow this to be not only spread across a lot more machines (homes) but also larger quotas for those on ā€œunlimitedā€ internet plans.

@joshuef ?

6 Likes

Something is not right with routing updates I think. I stopped all nodes on one of my machines more than an hour ago and I am still getting around 4 Mbps (100 pps) of safenetwork traffic towards that machine. Many different source IP addresses.

I have only one public IP so I forward port ranges to different machines. Is it possible that something somewhere only checks if IP is alive instead of IP:port?

10 Likes

Having trouble uploading, log attached.

safe.zip (2.3 MB)

3 Likes

Yeh, I have that same feeling.

We’re also encrypting now, which means chunks are hanging around in nodes longer than previous too.

Which also also means that replication might be a significantly heavier process now :thinking: .

Aye, it may be less nodes per droplet basically to sort this. Alongside most likely further mem improvement.

We’re close to having the omni distribution there. At that point we’ll switch off the faucet (at least for a while).

For sure. :+1:


FYI all, its looking like the higher than anticipated mem is causing node restarts, and there’s a bug there w/r/t node’s keys which is essentially what’s causing the upload issues.

We can’t really manage that on this testnet, but we’ll be away to fix the restart issues (which should hopefully solve uploads), and then be focusing on mem a bit on this quic front I think.

So I’ll be bringing this testnet down later today! Thanks everyone for getting so deeply into it and all the log sharing and debugging! :muscle: :bowing_man:

18 Likes

Took 23 minutes to upload 100 x 5KB files for me…so inefficient somewhere

2 Likes

Sadly there are lots of parts that could be causing problems here. As noted, uploads are shaky due to the node restart issue. I’m pretty doubtful it’s the encryption layer that’s causing issues here. (Could be though!)

7 Likes

Is this the same problem we had last year?

no, different one

1 Like

With regard: feat(faucet): fetch maid pubkeys to enable encryption of distributions

At what stage does the safe folder (windows) get fully encrypted, since most of the keys and wallet files are plain text?

2 Likes

I dont understand what the reboot/restart issue is. Why are nodes restarting and causing the incorrect keys issue, nobody else appears to be asking so did I miss a memo?

3 Likes

The nodes participating in the testnet are configured to run as services via Systemd. The service definition instructs that the service be restarted if it dies. So I think the processes are dying because of a memory issue, then the service manager restarts them. I think I’ll have to follow up regarding an issue with keys, though. The nodes should restart with the same root directory, and hence the same key. Unless this is some other problem altogether.

7 Likes

This and initially a misconfigured genesis node was enough to trip us up.

So

  1. our tolerance to a bad node at the client is too low.
  2. Node restarts w/ sytemmd needs a fix
  3. memory issues to be addressed
17 Likes

Thanks gentlemen :+1: makes far more sense now.

10 Likes

:face_with_monocle: true and cool that with these tests in the wild we already have ā€œsabotaging nodesā€

6 Likes

Unintentionally cool, but yeh! Good to be seeing this and so nice to be able to see what’s up pretty readily.

We’re also eyeballing encryption as a potential cause of mem issues. @qi_ma noted maidsafe nodes were actually running the prior release to the OP! Which had no encryption enabled. And we have not seem as wild an increase in mem as other nodes (albeit an increase, but no nodes seen above ~600mb, which is substantially different to 4gb @josh reports!)

So a PR is in to disable that for any next-net, which we’ll get up probably as soon as that restart issue is nailed down and we can see where we are!

14 Likes

don’t sell me short here Josh, I have a sixer too! :stuck_out_tongue_winking_eye:

4 Likes