ReplicationNet [June 7 Testnet 2023] [Offline]

Does this disprove the theory that 2000 nodes are required for everything to work well.

4 Likes

I restarted node at 13:38 (local time) and during 2 minutes from 13:39 to 13:40 it received 209 chunks.
One idea is that logging can prevent race conditions in code from happening.
Not much chances for it however.

3 Likes

I’m wondering @maidsafe already uses AI for helpful suggestions in their development progress?

2 Likes

complied for arm and am now running on an oracle cloud vps

sucessfuly uploaded a 2gb movie if anyone wants to watch the crow give this a go.

safe files download crow.mkv e4f7b027f7daa7e5f97d8a09b99a5279f2c0fe6fe7ce8c91b0f9f15935ffbd4f
4 Likes

Nice! Thank you for the explanation, and seems to align with what was seen in the charts above.

The peak memory of safenode pid with this testnet were higher than the stats table provided in the June 8th summary from the internal tests carried out by your team, which is expected since this is likely a larger testnet size and running for a longer time.

As more testnets progress, and more data is either written both to the network, and to individual nodes (raw size), I am curious where the upper limit of the safenode pid ends up flat-lining even after lazy pruning continues to occur while the pid maintains high uptime.

Maybe the team kind of already knows the answer on the peak memory limits from baseline spin up of safenode pid based on the exact cache sizes (upper limits) of some of these data structures within your code base? Either way, its a wait and observe from my point of view with each testnet, :smile: .

One thing is for sure, I hope to get more generic pid level and OS metrics flowing inside the LXC from external programs running inside the LXC along side the safenode pid.

I just saw this feature request on the repo as well:

In addition, it would be nice to have even more chunk info be in a json format along with the feature request above (nice!):

{"process":{"chunks":{"chunk_writes":1,"total_chunk_reads":1,"chunk_read":1,"total_chunk_written":1}}}

Though this may all come for free, once the --json-output parameter is ready for show time.

8 Likes

The peak memory usage is so far mainly depends on the ParallelFectching factor, which currently we set it as a low number.
This is to avoid the surge, flatten the tranffic, and avoid potential choking.

However, the cache of chunk_names helden in self node currently doesn’t have a hard threshold, and don’t have a pruning defined so far.
So, the memory usage of safenode process will keeps growing (slowly) as long as chunks flows in, with small spikes.

We will consider put a bar on the max entries to hold, together with a way of pruning later on.

Also, really appreciate the suggestion on joson format chunk related info , will record it and take into consideration later on as well.

12 Likes

This could very well change the strategy on how much storage space to allocate to a single safenode pid for safe farmers on one physical machine based on RAM Required vs Chunk/Storage Used ratio.

For instance, in my case, say I provide a mount point for 100TB (some sort of NAS or clustered file system) of storage space for a single safenode pid, but the amount of memory that has to be held in memory for all the chunks’ metadata exceeds the physical RAM over time, then clearly it will impact the performance of the node as swap usage will start to increase.

In that case, I would want to run enough safenode pids spread across multiple physical machines, so they individually never exceed the max physical memory available to each pid, but still be backed by the same mount point, except have sub-folders for each safenode pid against that central mount point.

I am sure some folks here that are planning to be a safe farmer have low & high end machines, or both, and some that are also planning to have a NAS based storage provided to the safenode, as oppose to local storage.

Overall, very excited to see how this all plays out as time continues, since everyone will try to make the best use of their existing hardware to be as an efficient off a safe farmer, as they can be, :hugs: .

Seems like a very reasonable future step later on, if and when needed.

9 Likes

Download died after 3 mins for me

Edit. Ah: “Inbound traffic warning” from Linode

3 Likes

6 min on an oracle cloud node

Removed old logs from directory: "/tmp/safe-client"
Logging to directory: "/tmp/safe-client"
Built with git version: 8c6ab75 / main / 8c6ab75
Instantiating a SAFE client...
🔗 Connected to the Network                                                                                                                                                         Downloading file "crow.mkv" with address e4f7b027f7daa7e5f97d8a09b99a5279f2c0fe6fe7ce8c91b0f9f15935ffbd4f
Successfully got file crow.mkv!
Writing 2315684476 bytes to "/home/ubuntu/.safe/client/crow.mkv"

real	6m8.785s
user	3m40.429s
sys	2m37.542s

but it is on a pretty good connection

Retrieving speedtest.net configuration...
Testing from Oracle Cloud (xx.xx.xx.xx)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by DSM Group (Peterborough) [118.56 km]: 4.333 ms
Testing download speed................................................................................
Download: 1034.17 Mbit/s
Testing upload speed......................................................................................................
Upload: 644.49 Mbit/s



4 Likes

Woah - that’s quick - took about 30 mins on my home connection, which is normally pretty fast

3 Likes

I forgot to time this download – but it was a good bit >30 mins

4 Likes

30 nodes humming away on a 2Gb Hetzner instance

All nodes got chunks, some much more than others


safe@ubuntu-2gb-nbg1-1:~/.safe/node$ du -hS|grep store
79M	./safenode_4/record_store
335M	./safenode_1/record_store
52M	./safenode_23/record_store
71M	./safenode_2/record_store
206M	./safenode_19/record_store
58M	./safenode_5/record_store
82M	./safenode_28/record_store
29M	./safenode_29/record_store
38M	./safenode_17/record_store
229M	./safenode_14/record_store
225M	./safenode_6/record_store
41M	./safenode_21/record_store
6.1M	./safenode_8/record_store
68M	./safenode_3/record_store
42M	./safenode_11/record_store
47M	./safenode_22/record_store
213M	./safenode_13/record_store
50M	./safenode_26/record_store
359M	./safenode_9/record_store
95M	./safenode_16/record_store
16M	./safenode_12/record_store
52M	./safenode_18/record_store
214M	./safenode_25/record_store
2.5M	./safenode_10/record_store
20M	./safenode_24/record_store
225M	./safenode_27/record_store
71M	./safenode_30/record_store
49M	./safenode_20/record_store
154M	./safenode_7/record_store
69M	./safenode_15/record_store
6 Likes

how are you starting the nodes ? are you using a port forward ?

i just tried a cheeky node on oracle cloud and got this

[2023-06-09T16:34:25.046292Z WARN sn_node::api] NAT status is determined to be private!
[2023-06-09T16:34:25.046334Z INFO safenode] Node is stopping in 1s...

i then tried with the old port forward from previous test nets and got the same result :frowning:

2 Likes

It couldn’t really get much simpler - no port-forwarding, no messing with firewalls, just run this wee script

#!/bin/bash


SAFE_PEERS=/ip4/142.93.33.47/tcp/43691/p2p/12D3KooWATuSWjt61DUoqhVmHXpfenMnqMoPmrhnREQkZ25xrDGQ


for i in  {1..30}
do 
        SN_LOG=all /usr/local/bin/safenode --log-dir=/home/safe/.safe/node/safenode_$i --root-dir=/home/safe/.safe/node/safenode_$i &
        sleep 2
        
done

I havent tried from home, other than a quickie yesterday to prove that it was a no-go from behind NAT.
The nodes from home will come soon enough, for now we will learn plenty from small cloud instances. @Josh I think mentioned 1/2 gig micro instances from AWS so I should look at that next. I just had some credit to burn on Hetzner. Its cost me <0.3euro so far.

I have used about 8GB disk so far, depends how long @joshuef keeps it running for, I may add another 20Gb volume if I have to.

4 Likes

Dunno, I saw that a couple of times last night - even after I remembered to set SAFE_PEERS=/ip4/142.93.33.47/tcp/43691/p2p/12D3KooWATuSWjt61DUoqhVmHXpfenMnqMoPmrhnREQkZ25xrDGQ …

so please keep trying.

@joshuef . @qi_ma Is there any particular log message we shold look for in nodes which have zero|few chunks?

3 Likes

Digitalocean, it was just to see how it goes.
For $2 more on the next size up you get better value.
Haven’t really had time to poke much though.
What is interesting to me is what the monthly transfer needs to be for a long running node, cpu/mem allowing a bunch to run on one instance may not be the issue.

5 Likes

Awesome work team maidsafe :clap:

I could download the crow in less than 10mins :fire:

I could also upload a 15mb mp3 on first attempt but I could not download it so far.

I tried to download it just after upload, it could retrieve only 27 chunks on 31.
Then 10 minutes later I tried again, “expected 31, retrieved 13”
Then again 1 minute later, “expected 31, retrieved 16”
Then again 1 minute later, “expected 31, retrieved 16”, same.

Oh and now I could get 18 on 31 !
I’ll try my luck again later.

if anyone else want to try :
test.mp3 e306123ba3f26c175b2410c9a350d46689b84672c6c60d36a50ab6ac4b75892a

9 Likes

Not all chunks were retrieved, expected 31, retrieved 26

willie@gagarin:~/projects/maidsafe/safe_network$ safe files download test.mp3 e306123ba3f26c175b2410c9a350d46689b84672c6c60d36a50ab6ac4b75892a
Removed old logs from directory: "/tmp/safe-client"
Logging to directory: "/tmp/safe-client"
Current build's git commit hash: cea98bcca21d075970e4fae72090da49a2af348f
🔗 Connected to the Network                                                                                                                                 Downloading file "test.mp3" with address e306123ba3f26c175b2410c9a350d46689b84672c6c60d36a50ab6ac4b75892a
Did not get file "test.mp3" from the network! Chunks error Not all chunks were retrieved, expected 31, retrieved 26, missing [643a71(01100100).., 45ef30(01000101).., 029996(00000010).., 704fb5(01110000).., 15810e(00010101).., 459f83(01000101).., 71a91d(01110001).., b4dcb8(10110100).., e0aab3(11100000).., d1ad5d(11010001).., 4ff725(01001111).., 88fbf8(10001000).., 72c1d8(01110010).., 268de1(00100110).., d75c74(11010111).., c72134(11000111).., fbca8d(11111011).., eba0d3(11101011).., 0ea689(00001110).., 729ece(01110010).., a2cb3b(10100010).., 4d1a7d(01001101).., 928cff(10010010).., ad3dbe(10101101).., 7b2746(01111011).., 687731(01101000)..]..

and the same 26 of 31 when I try from my Hetzner instance.

4 Likes

Same here, got 26 among the 31 chunks.

btw. Within the error msg:

Chunks error Not all chunks were retrieved, expected 31, retrieved 26, missing ...

The listed chunks are actually the fetched ones. It’s a bug with the release build, but got resolved with the current main head.
The missing chunks are actually:

missing [5a1c9e(01011010).., 73b703(01110011).., 6be651(01101011).., 63594e(01100011).., ab8031(10101011)..]

if anyone by chance see those chunk_names appear in their node logs.

7 Likes

Yes, confirmed here with the latest from GH

Built with git version: d0b547a / main / d0b547a

willie@gagarin:~/projects/maidsafe/safe_network$ safe files download test.mp3 e306123ba3f26c175b2410c9a350d46689b84672c6c60d36a50ab6ac4b75892a
Removed old logs from directory: "/tmp/safe-client"
Logging to directory: "/tmp/safe-client"
Built with git version: d0b547a / main / d0b547a
Instantiating a SAFE client...
🔗 Connected to the Network                                                                                                                                 Downloading file "test.mp3" with address e306123ba3f26c175b2410c9a350d46689b84672c6c60d36a50ab6ac4b75892a
Did not get file "test.mp3" from the network! Chunks error Not all chunks were retrieved, expected 31, retrieved 26, missing [5a1c9e(01011010).., 73b703(01110011).., 6be651(01101011).., 63594e(01100011).., ab8031(10101011)..]..
7 Likes