IntolerantNodeNet [18/09/23 Testnet] [ Offline ]

It does indeed! :slightly_smiling_face:

3 Likes

Here is a suggestion from something I’ve encountered: If you are trying to start up a lot of nodes on a machine bear in mind that there is a ‘hump’ in the amount of RAM that a safenode uses as it starts up.

I’d been running 50 nodes on a AWS t4g.medium which has 4GB RAM. There was plenty left after a whole day of running because the memory leak seems to be partially fixed. So when it crashed because it ran out of disk space (doh!) I thought I’d start up 100 this time. I thought it could accommodate that. I started the 100 and went to bed.

I found in the morning that there were only 61 nodes running. It looked like the cull of nodes had happened really soon after they were started. I started them again and watched what happens. As a node starts it’s RAM usage balloons from about 40MB to more than 100MB over a few minutes before falling back to a much more reasonable level after about 5 minutes.

I’d been starting them with a 5 sec delay using:-

#!/bin/bash

for i in  {1..100}
do 
	SN_LOG=all $HOME/.local/bin/safenode --log-output-dest=$HOME/.local/share/safe/node/$i --root-dir=$HOME/.local/share/safe/node/$i &
        sleep 5
        
done

to be kind to it because I thought that would be plenty of time for it to open the log file and deal with the initial flood of activity. But that didn’t take into account the ballooning of RAM usage. So I’m now starting up 100 with a whole 60 seconds delay between each. That seems to be enough to stop the machine running out of RAM and killing processes.

Here is the output of top showing safenodes ordered by Process ID. That shows this ballooning up and down as the safenode ages over the first few minutes because each safenode was started at a 1 min interval here. ‘RES’ is the column to look at.

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                         

  13837 ubuntu    20   0   59244  43624   6528 S  87.0   1.1   0:05.78 safenode                                        
  13830 ubuntu    20   0   78544  66884   6528 S   0.0   1.7   0:18.17 safenode                                        
  13816 ubuntu    20   0  112452 100128   6528 S   0.0   2.5   0:42.10 safenode                                        
  13809 ubuntu    20   0  112492 100628   6528 S   0.0   2.6   0:38.65 safenode                                        
  12766 ubuntu    20   0  103424  91604   6400 S   0.3   2.3   0:28.22 safenode                                        
  11882 ubuntu    20   0  106312  94508   6400 S   7.7   2.4   0:26.45 safenode                                        
   6344 ubuntu    20   0   64932  53956   6528 S   0.7   1.4   0:20.39 safenode                                        
   6291 ubuntu    20   0   63276  51708   6528 S   0.3   1.3   0:36.89 safenode                                        
   6282 ubuntu    20   0   70672  59036   6400 S   1.0   1.5   0:49.90 safenode                                        
   6276 ubuntu    20   0   73744  62116   6528 S   1.0   1.6   0:45.07 safenode                                        
   6270 ubuntu    20   0   64040  52580   6400 S   0.7   1.3   0:35.59 safenode                                        
   6264 ubuntu    20   0   76448  65168   6528 S   0.3   1.7   0:52.16 safenode                                        
   6258 ubuntu    20   0   75416  64100   6528 S   1.0   1.6   0:39.72 safenode                                        
   6252 ubuntu    20   0   57748  46464   6400 S   0.0   1.2   0:24.11 safenode                                        
   6246 ubuntu    20   0   61132  50020   6528 S   0.0   1.3   0:34.66 safenode                                        
   6240 ubuntu    20   0   68976  57648   6400 S   1.0   1.5   0:40.03 safenode                                        
   6234 ubuntu    20   0   65084  53900   6528 S   3.0   1.4   0:35.32 safenode                                        
   6227 ubuntu    20   0   61732  50552   6400 S   3.3   1.3   0:32.65 safenode                                        
   6207 ubuntu    20   0   65612  53920   6400 S   0.3   1.4   0:43.39 safenode                                        
   6201 ubuntu    20   0   64272  52956   6528 S   0.3   1.3   0:44.75 safenode                                        
   6195 ubuntu    20   0   69116  57788   6400 S   0.3   1.5   0:47.00 safenode                                        
   6188 ubuntu    20   0   62420  51244   6400 S   0.7   1.3   0:37.43 safenode                                        
   6181 ubuntu    20   0   68720  57352   6400 S   0.3   1.5   0:43.04 safenode                                        
   6174 ubuntu    20   0   65032  53936   6528 S   1.0   1.4   0:45.15 safenode                                        
   6168 ubuntu    20   0   64776  53820   6528 S   0.0   1.4   0:37.86 safenode                                        
   6162 ubuntu    20   0   68604  56940   6528 S   0.3   1.4   0:50.35 safenode

EDIT:
I don’t know what I was thinking! How was 100 nodes using about 50MB each ever going to fit into 4GB RAM?! I’m no more alert than when I went to bed! I’ll start again with a lower number. Sorry for the churn!

But I think there is still value in a longer delay between starting nodes if RAM is in any way ‘tight’ because of this ballooning effect.

9 Likes

I think I’ve spotted a UX bug in ‘safe wallet send’.

safe wallet send -h

states it sends 1 nano:-

**Usage:** **safe wallet send** [OPTIONS] <amount> <to>

**Arguments:**

<amount> The number of nanos to send

<to> Hex-encoded public address of the recipient

But it seems to send a whole (toy money) 1 SNT:-

ubuntu@ip-172-30-1-42:~/scripts$ safe wallet balance
Built with git version: 26c3d70 / main / 26c3d70
97.997821493

ubuntu@ip-172-30-1-42:~/scripts$ safe wallet send 1 81d9fcd8f4bce7b8d30becc15d989f707f7dfb28aedc0d06145d90d8592ef854a15a0e9e8d832f6017feaa9b946f79fd
Built with git version: 26c3d70 / main / 26c3d70
Instantiating a SAFE client...
🔗 Connected to the Network                                                                                                       Sent Token(1000000000) to PublicAddress(PublicKey(01d9..e0ca))
Successfully stored wallet with new balance 96.997821493.
Successfully stored new dbc to wallet dir. It can now be sent to the recipient, using any channel of choice.

ubuntu@ip-172-30-1-42:~/scripts$ safe wallet balance
Built with git version: 26c3d70 / main / 26c3d70
96.997821493
6 Likes

Good spot.

5 Likes

Idk, I have a fairly decent connection 500/500 and it seems to me that I get the worst mileage here of all participants.

For instance, last test, I tried many times to upload buck bunny without any success but while I was trying @Aragorn succeeded.

This test I couldn’t get anything larger than 10MB up.
(Tried all kinds of concurrency/batch combos)

Not moaning, I know it is being worked on :slightly_smiling_face:

5 Likes

Available CPU power and bandwidth can change drastically
in time, test on startup doesn’t do much (especially for mobile devices).

Packetloss, jitter, latency, antiviruses,… There are lot of factors, that can make “good” node perform badly.

4 Likes

That doesn’t make sense. Test on setup can inform a lot. Into the future as well as concurency and batch-size can be increased even further for greater performance. Ignoring this pre-test will lead to either having ultra-conservative settings for everyone or failures.

BTW, these tests are EASY to do.

1 Like

I have to say, I’d rather opt for decent defaults, but give users the control to change things. Adding in tests/confirming/validating tests, applying those to settings is so many more steps to go wrong.


PRs are very welcome

1 Like

Grandma can’t. KISS is great for engineers, but needs to apply to average users too.

After launch I will support this if it’s needed.

Grandma doesn’t need to would be the point here, I think?

Honestly I think it’s too early to worry about needing to do this. There will be reasonable defaults that work for the vast majority of users. We’re still dealing with bugs around storecost in this testnet. To try and fix anything that’s occurring because of that doesn’t make a heap of sense.

I imagine it should only be powerusers looking to eke out maximum speed etc that would need to tweak any settings.

8 Likes

The point I’m making is that average users need something that just works but is also competitively performant. Else they won’t use the product and so the network won’t grow.

I agree, who knows how things will develop between now and launch.

After launch we will see where things are and where we can add enhancements to bring in more users.

4 Likes

Indeed, me too.

6 Likes

I just downloaded everything I had uploaded. The files I had uploaded several times, also downloaded several times. The same file with same address repeatedly. Why is this? Is it just that it is mentioned several times in a list of files? Or is deduplication not working?

1 Like

I started upload of a large file last night with -c 200 -batch-size 2 – which is what has been working for me consistently and it worked again with the large file.

So now I’ve uploaded a lot of small files, a few medium sized files and one large file all without any problems or need to retry (uploaded on first round).

First round of upload completed, verifying and repaying if required...
======= Verification: 1659 chunks to be checked and repayed if required =============
======= Verification Completed! All chunks have been paid and stored! =============
Uploaded all chunks in 832 minutes 8 seconds

Weird. I will try downloading now and see what happens.

1 Like

I think that each time you upload it creates a file in the folder “uploaded files” that contains the info used to download. So I believe what is happening is that since you uploaded several times it’s created multiple copies of that file - and when you download it’s just downloading what is in those files - hence you are downloading multiple times.

None of this is related to deduplication though.

2 Likes

Yeah, I think that is the case. BTW, would be cool to have that list in a human readable form.

3 Likes

I think this. It’s not the network, but the how we’re storing what you write to the network on your local system. Each upload cmd writes a file atm.

There’s work in flight to improve this.

also this

4 Likes

I’m not sure if it’ll be today or tomorrow as i’m battling a broken release process just now,

but I intend to bring this down as soon as we can get another up to test fixes to the client upload / not enough storecost bug.

(We’ve a fix in which should make using updated clients smoother with future networks too; right now they are very strictly version locked which prevents us using a new one here so we’ll have to start fresh :frowning: )

11 Likes

I tried downloading all of my uploads and all went well except the one large file.

Chunks error Not all chunks were retrieved, expected 1658, retrieved 1514, missing ... (list of chunks omitted)

Given it took quite a long time to do the download I’m not going to try again.

There doesn’t seem to be any log for the download. So not sure what went wrong.

2 Likes

I found it too.
15 days ago…

3 Likes