ReplicationNet [June 7 Testnet 2023] [Offline]

We are thrilled to announce the upcoming launch of our new testnet, which builds upon the success of our popular NatNet. This testnet, codenamed ReplicationNet, is set to explore the exciting field of Data Replication and Health Metrics. We’ll be concentrating on analysing the network health with considerable data load.

Our prior data storage and replication system relied on the libp2p::record method. Although this method has simplified network discovery and routing to a great extent, the record replication’s inherent nature broadcasts all records to its CLOSE_GROUP_SIZE peers. This broadcasting can trigger a data storm during churn, potentially leading to network congestion as data volume increases.

To overcome these challenges, we have implemented targeted replication over the libp2p::record facility. Unlike previous methods that broadcast to all CLOSE_GROUP_SIZE peers, our updated approach seeks out missing data amongst the CLOSE_GROUP_SIZE, but only requests data once (unless there was an issue). This strategy significantly reduces network traffic and has been locally proven to perform with far lower memory and CPU usage, while increasing resilience.

The primary objectives of the ReplicationNet testnet are to:

  • Verify the replication method functions as intended, minimising data loss.
  • Gather network/process health metrics for a more comprehensive understanding of node performance.
  • Provide a baseline for further improvements to data replication flows

Participation guidelines

If you are interested in participating, we kindly request that you:

  • Use a client to upload files. (The client can be found here.)
  • If possible, share the uploaded files with the community by giving out the name-address.
  • Periodically download the shared files to verify they’re retained by the network.
  • Run a node from a cloud VM (home nodes will likely fail and be closed down) (Nodes can also be found here.)
  • Keep the node running for as long as possible.
  • Share the health metric logs of your node.

Connecting to ReplicationNet

To join ReplicationNet with a cloud node, set the network address using the SAFE_PEERS env variable or use the --peer= argument. You can use any of the following addresses to connect to the network:

export SAFE_PEERS="/ip4/142.93.33.47/tcp/43691/p2p/12D3KooWATuSWjt61DUoqhVmHXpfenMnqMoPmrhnREQkZ25xrDGQ"

# fall-back addresses
/ip4/165.227.231.8/tcp/43103/p2p/12D3KooWByBQokh2D8Y7ATzXVCzKGiN5g4L8GKXBmBoZhAVeW4At
/ip4/165.232.106.150/tcp/41989/p2p/12D3KooWRYaapMU4i4zwNT3zQhcYPrkCd5QoHuLe7Z4PfGs3hQj5
/ip4/165.22.125.99/tcp/43473/p2p/12D3KooWAd3oubsP5yqU3gBnPw27zDqSZu4LW1F1hqkoojXBL7Rd
/ip4/165.22.119.173/tcp/38933/p2p/12D3KooWAvDMcv39DDsNd8kFSEMCc3cx5ye5VjYFhe43RRLw63rz

Initial network

We have 100 droplets running 2001 nodes in total. (1 droplet having 2vcpu2gb of mem). We haven’t run a network this large with the community (at least not for very long), so this will be testing out the testnet tool to some degree too!

Using the Client

To put/get files you’ll need to use the safe client. Which you can grab for your platform from GitHub . Once you have the client, you need to either set the SAFE_PEERS environmental variable or use the --peer= argument with any of the above network addresses.

Now to upload a directory/file to the network, use the following command:

# using the SAFE_PEERS variable
export SAFE_PEERS=/ip4/142.93.33.47/tcp/43691/p2p/12D3KooWATuSWjt61DUoqhVmHXpfenMnqMoPmrhnREQkZ25xrDGQ
safe files upload -- <path>

# alternatively using the --peer argument. It should be set during each command
safe --peer=/ip4/142.93.33.47/tcp/43691/p2p/12D3KooWATuSWjt61DUoqhVmHXpfenMnqMoPmrhnREQkZ25xrDGQ files upload -- <path>

The file-addresses of the content you’ve uploaded are saved locally and are used to enable automatic downloads. Use the following command to download them back to the ~/.safe/client/downloaded_files folder.

To download the content you’ve just uploaded:

$ safe files download

Running a Node

You can find the safenode binaries for your platform here.

Connect your node to the network using the SAFE_PEERS environment variable or the --peer argument, similar to a client. Consider keeping your logs in a directory for convenience:

$ SN_LOG=all safenode --log-dir=/tmp/safenode --root-dir=/tmp/safenodedata

Windows:

$ set SN_LOG=all safenode --log-dir=/tmp/safenode --root-dir=/tmp/safenodedata

Please note that if you are running from home in a NAT environment, the node should automatically shut down after a few minutes. If this occurs, kindly share the peer ID found in the initial log lines.

Error: We have been determined to be behind a NAT. This means we are not reachable externally by other nodes. In the future, the network will implement relays that allow us to still join the network.

It should be possible to run more than one node per cloud VM (we’ve successfully run 10 on a 1 GB 1 vCPU droplet), depending on its size, CPU and memory. Please note though that logs and data will build up and may exceed the storage capacity.

Interesting Log Lines

For this testnet run, log lines containing the following keywords are important to us. If anyone running nodes can periodically pull logs and share information around these keywords, that would be amazing.

  • PeerAdded:
  • Detected dead peer
  • Sending a replication list
  • Replicate list received from
  • Fetching replication
  • Replicating chunk
  • Chunk received for replication

We’ve also enabled the node/client to regularly log specific metrics about the system, network, and the running process. This metric is logged in the form of JSON objects to the usual log file and thus can be parsed and piped to other applications for analysis. Below is a sample log line containing the metrics,

[2023-06-05T12:29:59.680321Z TRACE sn_logging::metrics] {"physical_cpu_threads":12,"system_cpu_usage_percent":11.24604,"system_total_memory_mb":33517.777,"system_memory_used_mb":16400.195,"system_memory_usage_percent":48.929844,"network":{"interface_name":"enp0s31f6","bytes_received":1774,"bytes_transmitted":37947,"total_mb_received":2367.518,"total_mb_transmitted":717.744},"process":{"cpu_usage_percent":1.2671595,"memory_used_mb":35.004417,"bytes_read":0,"bytes_written":8192,"total_mb_read":0.0,"total_mb_written":0.1024}}

Known problems

We’re aware that some messages are being dropped, and so the replication flow is still imperfect. We’re continuing to dig into this and have some leads. Hopefully we’ll learn more from this testnet too.

45 Likes

First!

Then to read and try…

17 Likes

second to get here

I dub 2023 as the year of the testnets

18 Likes

Hmmm…

C:\Users\topia>safe files upload test.jpeg
Logging to directory: "C:\\Users\\topia\\AppData\\Local\\Temp\\safe-client"
Current build's git commit hash: 0be2ef056215680b02ca8ec8be4388728bd0ce7c
Error: invalid multiaddr

Location:
    sn_peers_acquisition\src\lib.rs:55:24
6 Likes

Have you setup the SAFE_PEERS env variable?
If not, better use --peer opt, like:

safe --peer=/ip4/142.93.33.47/tcp/43691/p2p/12D3KooWATuSWjt61DUoqhVmHXpfenMnqMoPmrhnREQkZ25xrDGQ files upload -- <path>
9 Likes

Please clarify if argument name should be peers, peer or both can be used.

4 Likes

Awesome, going to be a while before I can join but whomever wrote the instructions needs a gold star :star2: :clap: excellent.

13 Likes

I did this (on Win):

set SAFE_PEERS="/ip4/142.93.33.47/tcp/43691/p2p/12D3KooWATuSWjt61DUoqhVmHXpfenMnqMoPmrhnREQkZ25xrDGQ"

That didn’t give any error.

I didn’t do any cleaning before though, can there be some leftovers fron NatNet in my computer that interfere now?

This worked.

EDIT: and after removing the “quotation marks” from here, setting the environment variable seems to have worked as well.

8 Likes

Awesome stuff! I can actually join this one later if it’s up long enough :slight_smile:

10 Likes

Let’s check if it will work:
safe --peer=/ip4/142.93.33.47/tcp/43691/p2p/12D3KooWATuSWjt61DUoqhVmHXpfenMnqMoPmrhnREQkZ25xrDGQ files download xor.html 5f7ff8546e1d226d2e436b657a611a9c32f7f6a58024a8340a4b141d3f2e66aa
(this file generates XOR texture)


Few notes:

  1. It is better to say what steps user should perform to do such sharing. I can guess it, other users may not.
  2. (Console) API for file upload/download looks convoluted, I liked API from previous testnets much more. I like to control where my files are located, I like to download files in the same directory as I store safe binary, but I can’t without hacks. Look how curl works, how wget works - this is common and comfortable.
1 Like

‘–peers’ not peer

EDIT: Nope ignore me, it is apparently --peer

4 Likes
error: unexpected argument '--peers' found

  tip: a similar argument exists: '--peer'

Usage: safe <--peer <PEERS>> <COMMAND>
1 Like

thanks @Vort was just trying to figure that out myself :slight_smile:

here is old beg blag to get the ball rolling.

safe files download ~/begblag.mp3 3eb0873bd425e5599d72c2873ca6e691d5de5c75bbc89f2e088fa95e3390927a
6 Likes

@qi_ma is looking now

3 Likes

Just checked, it is --peer :+1:

3 Likes

Such command is not Windows-friendly.

Also it may make sense for safe to check it before downloading:

Successfully got file ~/begblag.mp3!
Writing 15766382 bytes to "C:\\Users\\Vort\\.safe\\client\\~/begblag.mp3"
Failed to create file "~/begblag.mp3" with error Os { code: 3, kind: NotFound, m
essage: "Системе не удается найти указанный путь." }

OP updated to --peer

9 Likes
netstat -anp
tcp        0      0 192.168.X.X:50246    159.65.62.47:35929      ESTABLISHED 37137/safenode
tcp        0      0 192.168.X.X:57614    143.110.174.164:36013   ESTABLISHED 37137/safenode
tcp        0      0 192.168.X.X:57064    209.97.139.13:34615     ESTABLISHED 37137/safenode
...
netstat -anp | grep safenode | grep ESTABLISHED | wc -l
326

I used --port 12000 to connect with a NAT port forward rule, and the logs did reveal:
Connected to the Network

Is the --port flag only used for initial connection to get a list of more peers, and at that point it is expected to spawn a new local IP/Port combinations with the different peer nodes?

Were we expecting 300+ TCP connections on a network of this size per safenode pid?

Additional Note: The binaries on the github link did work out of the gate on an Alpine LXC, so prior testnet’s issue on not being able to run the safenode pid is no longer an issue, yay.

5 Likes

I’ve started a cloud node and am not seeing any metrics in the log. Do we need to enable TRACE? If so better add to the OP.

3 Likes