[Offline] Fleming Testnet v6.2 Release - Node Support

:trumpet: Fleming Testnet v6.2 is LIVE! :trumpet:

We are excited to announce that Fleming testnet v6.2 is now LIVE, following on quickly from last week’s v6/6.1 release.
Released versions:

  • safe_network v0.4.0
  • sn_api v0.31.0
  • sn_cli v0.31.0

We know we said that we didn’t anticipate another testnet for at least 4 weeks now, buuuut…after some intensive detective work, we’ve a new network with several fixes bedded in and are looking to put them through the wringer! This should be of special interest to anyone who was connecting from a public ip or cloud computing instance.

Client changes

The CLI now has a default query timeout of 10 minutes (see below re: DataNotFound tweaks for more on the why). This can be overridden with the env var SN_CLI_QUERY_TIMEOUT, supplying an amount of seconds eg: SN_CLI_QUERY_TIMEOUT=60 safe cat <safe::url> would set the timeout to be 60 seconds.

Continuing IGD connection issues

As highlighted in the original Fleming Testnet v6 release post, we have observed some IGD issues which we believe are in our IGD implementation. These issues can mean that trying to join the testnet with your node will sometimes incorrectly fail with an IGD error, unless you have a public facing IP address, or have set up port forwarding on your router and launched your node specifying your internal & external IP and port numbers to reflect this.

If you do receive IGD errors when you try to join your node, first thing to do is try a few times - the issue seems intermittent. If you don’t have a public facing IP address, which the majority of us would not, but you feel that you are advanced enough to set up port forwarding on your router, you can also give this a try. We can’t provide instructions for all the different variations of routers out there, but we will put instructions below on how to launch node when you are port forwarding.

If you are getting constant IGD errors and you don’t feel that you are advanced enough to set up port forwarding then please continue to use the testnet as a client. No matter whether you run a node or not, you can still follow the instructions in the Joining the Testnet section below and use the CLI as per the User Guide.

We are working hard to fully identify and resolve any IGD issues.

v6.2 Changelog

1. Fixed payment issue at the genesis node

In the previous testnet we noticed that sometimes processing payments during data upload would fail, this was because the genesis node had opened up too many ports.
When we do a connectivity check for new nodes, temporary endpoints are created. They were not closed properly hence not releasing file descriptors needed to create a new port-mapping. That hindered the node from saving the payment to disk as too many file descriptors were in use.

2. Fixed error that returned NodeNotReachable even for nodes from a static IP address

The too many file descriptors issue mentioned above also caused the connectivity check to fail, and we returned NodeNotReachable to new nodes. This should also be fixed now.

3. No more error responses from Nodes for DatNotFound

After we removed some accumulation of messages at elders to prevent messaging overload, we’ve had Error or Success messages being send to the client directly. When sections split, or were out of sync, it could happen that even though we had the data, we were receiving enough DataNotFound errors to cause the client to error out, even though a success might have come in later.

To combat this, (and to prevent range attacks repeatedly querying xorname space), we’re no longer sending DataNotFound errors, and it will be up to clients to timeout on requests appropriately.

4. Blsttc

We’ve updated our internal BLS library to use the new BLSTTC which @mav pioneered. This uses a faster BLS library under the hood of threshold crypto to perform signatues and validations, while maintaining the same API and functionality!

What happens to previous testnet iterations?

All previous Fleming testnet iterations have been taken offline, with all data removed.

This is to allow us to concentrate on any issues that arise from the new testnet iteration, and avoid any community confusion over which testnet they are connected to, or any errors as a result of contamination from previous configurations and data. We ask that everyone who attempts to use this latest testnet iteration removes their $HOME/.safe folder before trying to interact with this testnet, i.e. $ rm -rf $HOME/.safe/ to ensure there is no contamination.

All of the bootstrap nodes for previous iterations that we had on Digital Ocean have been destroyed, with new, clean D.O. nodes created for this testnet iteration - please follow the full list of instructions below to remove all previous testnet settings and data from your machine and to get you started with the latest testnet iteration.

IMPORTANT - please be aware that this is a testnet and therefore any data added to it will be wiped once we are finished testing. We do not recommend uploading any important or sensitive information as IT WILL be lost.

Are We Working on Other Fixes?

Yes, we’ve currently set a list of features which we would like to have as many of in place for v7:

  1. Integrate Dbc for payments and rewards (means ripping out current AT2 flows).
  2. Finalise message simplification (which will kill a lot of silly bugs and extra unneeded serialization).
  3. Integrate a faster threshold_crypto (thanks @mav).
  4. Formalise message flows for GET/PUT and mutate (along with costs).
  5. Switch on network versions in messages (to prevent people using the wrong version of x,y,z)
  6. Measure more accurately bandwidth use as well as mem issues (which will mean formalising logging a bit more).

Where can I report any issues found?

If you come across any issues in your testing, start by checking the Known Issues section of this post (see below) to see if it has already been acknowledged . If it has not yet been added, you can post your issue in the comments of this post. You can also report issues in the Online Spreadsheet for Testnet Results and Issues - see the section below.

We will monitor and investigate reported issues as soon as we can.

Online Spreadsheet for Testnet Results and Issues

If you will be uploading data to the testnet and/or providing safes (nodes), please consider posting your data and results at:

SN Testnet Review
(Massive thank you to @VaCrunch for creating this spreadsheet :clap: )

This is a new version of this spreadsheet, with the versions used for previous iterations now locked.

For those posting:
• No need to Sign In. The white cells are for your data entry.
• One row for each of your devices.
• Scroll down until you get to the first empty row.
• You do not need to use your Forum name for your ID, but please use the same ID for all your devices.
• Supporting/analysis tabs are at bottom of screen: Error_Msgs, Summary, Thumbnails, Charts, Matrix, Top 10, Map, Resources, Lists, and Comments.
• Use the View/Zoom feature at top to reduce the visible size of the spreadsheet to match your screen. 75% will be the best fit for most people.
• If you choose not to record the name of your country please select “Unspecified” at the bottom of the list.
• If you change the number of nodes (safes) you are running, just edit the existing figure, do not add another line for the same device.

Known Issues

Any reproducible issues that we, or the community, come across will be added to this section:

  1. Although we’ve started making some optimisations, we still expect to see performance issues, particularly around $ safe files put .... operations with larger files/folders. There has been very little optimisation of write times, nor any general analysis of speed so far, with the majority of our efforts concentrated on getting features in place and fully functional before we spend more time looking at it from a performance perspective.

  2. We have removed authd from the stack for normal users right now, instead focussing on Safe CLI. To that end, we have updated the CLI to be able to do all things that previously required authd. This means no GUI right now, mostly. The focus is on network stability and then speed. As we optimise for speed, the team will also focus on authd again, perhaps removing it (our goal) in favour of a simpler approach. So test as much as possible with the CLI for now. It reduces user error, reduces the potential bug surface, and allows total focus on the prize - a fully autonomous decentralised network.

  3. We have discovered a bug with rewards not being paid out and therefore decided to disable rewards so as not to block the testnet release. We will not be fixing rewards or transfer based issues as it seems the DBC work will transform those processes to be simpler and more privacy-aware.

  4. We believe there is a bug in our IGD implementation. This bug means that trying to join the testnet with your node seems to intermittently fail with an IGD error, unless you have a public facing IP address, or have set up port forwarding on your router and launched your node to reflect this.

    If you don’t have a public facing IP address, which the majority of us would not, but you feel that you are advanced enough to set up port forwarding on your router, you can give this a try. We can’t provide instructions for all the different variations of routers out there, but we will put instructions below on how to launch a node when you are port forwarding.

  5. It’s been noticed after v6 launch that large file uploads can hang as transfers get out of sync. A potential fix for this is known and already planned, as per here.

Let’s try it out!

Please read and follow the instructions below carefully.

IMPORTANT - please be aware that this is a testnet and therefore any data added to it will be wiped once we are finished testing. We do not recommend uploading any important or sensitive information as IT WILL be lost.

Installing/Updating to Latest

To avoid any pollution from previous testnet iteration settings and data, we ask that everyone removes their $HOME/.safe/ directory, then installs the CLI, authd and node again.

$ rm -rf $HOME/.safe

Note that Windows users may be required to install specific software before being able to install the CLI using the command below - see full instructions for Windows here.

Now download and install the verified latest CLI binary via our install script with the following command:

$ curl -so- https://sn-api.s3.amazonaws.com/install.sh | bash

This script downloads the correct CLI binary for your OS (Windows, macOS or Linux), installs it in the correct directory, creating that directory if it doesn’t exist, and adds it to your system PATH.

You may need to restart your terminal window at this point for any changes to your system PATH to take effect. You can now confirm whether the CLI is installed and set up correctly. The output of this command should match the cli version from the release post:

$  safe -V

Next, you should install the latest node with the command below.

$ safe node install

Before proceeding we’ll just make sure we kill any old sn_node processes which have been left running:

$ safe node killall

You should now be equipped with the latest CLI and node.

Note that you can find full installation instructions in our user guide:

Joining the Testnet

As with previous public testnets, we are hosting some Elders and Adults on Digital Ocean to kick off the Network. These act as hardcoded contacts which bootstrap you to the network, therefore you will need a network configuration file to inform your CLI which network to bootstrap to. We store and update these connection details on S3 for you to easily point your configuration to.

If you have followed the instruction above to remove your $HOME/.safe folder before installing everything again, you should have no existing network configurations - you can confirm this with the $ safe networks command.

You can add a profile for the latest testnet which points to our S3 location, this is done using $ safe networks add like so:

$ safe networks add fleming-testnet https://sn-node.s3.eu-west-2.amazonaws.com/config/node_connection_info.config
Network 'fleming-testnet' was added to the list. Connection information is located at 'https://sn-node.s3.eu-west-2.amazonaws.com/config/node_connection_info.config'

Now you need to ensure that you are set to use this newly added fleming-testnet configuration, we can use $ safe networks switch fleming-testnet for this:

$ safe networks switch fleming-testnet
Switching to 'fleming-testnet' network...
Fetching 'fleming-testnet' network connection information from 'https://sn-node.s3.eu-west-2.amazonaws.com/config/node_connection_info.config' ...
Successfully switched to 'fleming-testnet' network in your system!
If you need write access to the 'fleming-testnet' network, you'll need to restart authd (safe auth restart), unlock a Safe and re-authorise the CLI again

(Ignore the “…you’ll need to restart authd…” advice in there)

Now to get write access to the network, you can use the following command via the CLI before proceeding to uploading files or transfer testcoins with CLI:

$ safe keys create --test-coins --for-cli
New SafeKey created: "safe://hyryyyy8ogt4yxmtgqd5yj9kuim911yuchbz77mhcrsdqh6dc5noypi9nqa"
Preloaded with 1000.111 testcoins
Key pair generated:
Public Key = f0347407ae2670f604fd53aaff29026ce06fdeaf8c2586ee786cd8a006d7e276
Secret Key = 7c2839cfda20f3c0f7872372d20a28080c7713688f16d9199c1ec02a7b0d185b
Setting new SafeKey to be used by CLI...

We now have our CLI and node components up-to-date, the latest hardcoded contact details to bootstrap to the public testnet, and we’ve generated some test Safe Network Tokens that we can use later to upload data.

To Add Your Node To The Testnet

Due to the IGD bug described at the beginning of this post, we encourage all users wanting to try connecting a node to try the usual method of using $ safe node join as follows:

$ safe node join
Creating '/Users/maidsafe/.safe/node/local-node' folder
Storing nodes' generated data at /Users/maidsafe/.safe/node/local-node
Starting a node to join a Safe network...
Launching with node executable from: /Users/maidsafe/.safe/node/sn_node
Node started with hardcoded contacts: <A list of network contacts will be output here>
Launching node...
Node logs are being stored at: /Users/maidsafe/.safe/node/local-node/sn_node_rCURRENT.log
(Note that log files are rotated, and subsequent files will be named sn_node_r[NNNNN].log, with values starting at 00000 and up to 99999.)

BUT we’ve found during in-house testing that this will fail intermittently with an IGD error, when it shouldn’t. We’re working to resolve this now. You can retry joining your node multiple times if it fails unexpectedly for you.

The alternative/workaround is to set up port forwarding on your router, then launch your node with:

$ $HOME/.safe/node/sn_node --public-addr <public ip:port> --local-addr <localnet ip:port>

We’ve had success with this in-house, but it’s not for everyone.

We’re working hard in the background to resolve the IGD issue which should allow more people to join first time.

If the above steps are successful for you, your node will now launch and attempt to connect to the public network. Keep in mind that it can only join the testnet if the Network is accepting new nodes at that time (see NOTE below). You can keep an eye on its progress via its logs, which can be found at $HOME/.safe/node/local-node/sn_node_rCURRENT.log.

If there is no space on the testnet for your node to join, your node will automatically try to rejoin until it is successful, or until you kill the sn_node process ($ safe node killall). This is an anti Sybil attack feature which we have put in place to only accept new nodes onto the testnet when resources are required, therefore you may be attempting to join the Network but your logs tell you that the Network is not accepting new nodes at this time:

The network is not accepting nodes right now. Retrying after 3 minutes

This is expected behaviour which will happen each time you try to connect until the testnet detects that resources are running low. Your node will automatically try to rejoin in a loop, you do not need to attempt to join again manually, or via a script. We recommend keeping a watch on your node log file to check if you have been accepted - $ tail -f $HOME/.safe/node/local-node/sn_node_rCURRENT.log

You can help to speed up the process of the network needing new resources by adding some data yourself - you don’t need to run a node to upload data! See the Do I need to run a node to participate? section below.

To Connect to the Testnet as a Client

Before working your way through the CLI commands to perform various actions on the network, following the steps above gives your Safe some test Safe Network Tokens to use. This means that there is no need to farm first to earn rewards before being able to try operations such as uploading to the testnet. You can then proceed with the various CLI commands to perform operations on the network.

You can even connect to the testnet in read only mode using CLI, i.e. once you’ve installed CLI you can try fetching content uploaded by other users. For example, try download the following image and open it locally afterwards:

$ safe cat safe://hygoyeyx768nkst7qqjjk8khjm67tr1n4otbctcts5j7dten43zae3ga83y > ~/safe-the-planet.png

IMPORTANT - please be aware that this is a testnet and therefore any data added to it will be wiped once we are finished testing. We do not recommend uploading any important or sensitive information as IT WILL be lost.

Do I need to run a node to participate?

No, you can join with just the CLI to experiment with data, tokens, etc. You can follow the instructions in the Joining the Testnet section above, but you don’t need to run $ safe node join - at this point, just go straight to using the CLI as per the User Guide.

Further Information

Where are my node logs?

When you launch your node you should see the location of your log file printed on screen - this will be $HOME/.safe/node/local-node/sn_node_rCURRENT.log. You can tail your logs with a command such as $ tail -f $HOME/.safe/node/local-node/sn_node_rCURRENT.log

Note that as a result of these log changes, the log file that you will be used to tailing in previous testnet iterations has now changed name to sn_node_rCURRENT.log. You will also notice that the log files now rotate when they hit a capped size limit, i.e. the contents of sn_node_rCURRENT.log are copied over to sn_node_r00000.log when it fills and the current logs are clean again, next it will copy to sn_node_r00001.log, and so on. Searching for historic logs may therefore mean they are not in the sn_node_rCURRENT.log file.

Where are my rewards?

As per the Known Issues section above, in the build up to today’s release, we discovered a bug with rewards not being paid out and therefore decided to disable rewards so as not to block the testnet release. We will not be fixing rewards or transfer based issues as it seems the DBC work will transform those processes to be simpler and more privacy-aware.

       ____
      /  \ \
     / /\ \ \
    / / /\ \ \
   / / /__\_\ \
  / / /________\
  \/___________/
39 Likes

First but got go to work now :weary: happy testing people.

20 Likes

Yay, pleasant surprise!
Missus, get off the mac so that I can try this out!!

11 Likes

sascha@Knut:~$ time safe files put Safe_Put_Cat/Safe_Nugget.png

Illegal instruction (core dumped)

real 0m11,233s
user 0m0,211s
sys 0m0,078s

5 Likes

First time for this series of testnets my computer and router played together and I have IGD (via uPnP) working first time with no issues. My router even shows the ports being opened and used.

Now waiting to get into a section, currently waiting

3rd attempt it got timeout error but retrying in 3 minutes.

ALSO noticed that each try leave a new port open in the router. The poor router will have too many connects eventually. Not sure how long before they time out.

after 2nd try there was 2 open ports
after 3rd try there was 3 open ports
after 4th try there was 4 open ports.

@joshuef this might be an issue. Can the sn_node use the same port (if no previous error) and save over loading home routers with open ports. Especially if people have multiple nodes trying to join

At 16 attempts to join its 16 open ports. This also increases the attack surface for the PC

18 Likes

My cloud node is behaving much better - waited to for over three hours without exiting and then got in and stored 200MB of chunks (see here) :clap:

I thought for a moment I was getting IDG errors but now see these are because I’m also logging INFO messages now. Confirmed by Lionel below, and can be skipped using --skip-idg for nodes which have fixed public IP (ie not behind a router).

tail ~/.safe/node/local-node/sn_node_rCURRENT.log 
[tokio-runtime-worker] ERROR 2021-06-30T11:17:14.278854595+02:00 [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.12.4/src/connections.rs:300] Failed reading from a bi-stream for peer 159.65.27.95:60797 with error: StreamRead(ReadError(ConnectionClosed(LocallyClosed)))
[sn_node] INFO 2021-06-30T11:20:14.291391442+02:00 [src/routing/routing_api/mod.rs:136] cbeab3.. Bootstrapping a new node. 
[sn_node] INFO 2021-06-30T11:20:24.392686203+02:00 [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.12.4/src/endpoint.rs:272] IGD request failed: Could not find the gateway device for IGD - IgdSearch(IoError(Custom { kind: TimedOut, error: "search timed out" }))
[sn_node] INFO 2021-06-30T11:20:24.471130183+02:00 [src/routing/core/bootstrap/join.rs:292] Sending JoinRequest { section_key: PublicKey(021a..354f), resource_proof_response: None } to [(cbeab3(11001011).., 209.97.176.127:55047)] 
[tokio-runtime-worker] ERROR 2021-06-30T11:20:24.620136285+02:00 [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.12.4/src/connections.rs:300] Failed reading from a bi-stream for peer 209.97.176.127:59685 with error: StreamRead(ReadError(ConnectionClosed(ApplicationClosed(ApplicationClose { error_code: 0, reason: b"" }))))
[sn_node] ERROR 2021-06-30T11:23:14.276204284+02:00 [src/node/bin/sn_node.rs:146] Encountered a timeout while trying to join the network. Retrying after 3 minutes.
[sn_node] INFO 2021-06-30T11:26:14.280223043+02:00 [src/routing/routing_api/mod.rs:136] 0b9f5d.. Bootstrapping a new node. 
[sn_node] INFO 2021-06-30T11:26:24.381509434+02:00 [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.12.4/src/endpoint.rs:272] IGD request failed: Could not find the gateway device for IGD - IgdSearch(IoError(Custom { kind: TimedOut, error: "search timed out" }))
[sn_node] INFO 2021-06-30T11:26:24.461194250+02:00 [src/routing/core/bootstrap/join.rs:292] Sending JoinRequest { section_key: PublicKey(13b2..0aba), resource_proof_response: None } to [(0b9f5d(00001011).., 159.65.94.13:57349)] 
[tokio-runtime-worker] ERROR 2021-06-30T11:26:24.604935238+02:00 [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.12.4/src/connections.rs:300] Failed reading from a bi-stream for peer 159.65.94.13:55218 with error: StreamRead(ReadError(ConnectionClosed(ApplicationClosed(ApplicationClose { error_code: 0, reason: b"" }))))

My mobile broadband node is showing IDG error, to be expected I think so just noting. Here’s the log:

Running safe_network v0.4.0
===========================
[sn_node] INFO 2021-06-30T10:16:54.223668511+01:00 [src/routing/routing_api/mod.rs:136] a7ff3d.. Bootstrapping a new node. 
[sn_node] INFO 2021-06-30T10:17:05.338561985+01:00 [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.12.4/src/endpoint.rs:272] IGD request failed: Could not find the gateway device for IGD - IgdSearch(IoError(Custom { kind: TimedOut, error: "search timed out" }))
[sn_node] INFO 2021-06-30T10:17:06.042551904+01:00 [src/routing/core/bootstrap/join.rs:292] Sending JoinRequest { section_key: PublicKey(0fa9..40e0), resource_proof_response: None } to [(a7ff3d(10100111).., 209.97.176.91:33956)] 
[sn_node] ERROR 2021-06-30T10:18:09.176484864+01:00 [src/routing/core/bootstrap/join.rs:132] Node cannot join the network since it is not externally reachable: 82.132.214.81:19623 
[sn_node] ERROR 2021-06-30T10:18:09.176760533+01:00 [src/node/bin/sn_node.rs:140] Unfortunately we are unable to establish a connection to your machine (82.132.214.81:19623) either through a public IP address, or via IGD on your router. Please ensure that IGD is enabled on your router - if it is and you are still unable to add your node to the testnet, then skip adding a node for this testnet iteration. You can still use the testnet as a client, uploading and downloading content, etc. https://forum.autonomi.community/
11 Likes

Gah, a brutally busy day for me. Look forwards to trying this out this evening. Great job team.

8 Likes

Same, woke at 4:30am to get a head start.
Now I’m running late :tired_face:

8 Likes

Id assume we can fix that. Cc @lionel

9 Likes

Yes, paging @lionel.faber Nice catch @neo :muscle:

9 Likes

I’m at 24 open ports now. LOL
Dead router by morning unless the router starts pruning the old ones. Or is the sn_node trying to use all of them and thus keeping them open?

Router says I can configure 32 of them, but that may be user defined. Maybe it can handle more

4 Likes

It’s because IGD is attempted by default. It can be skipped by passing --skip-igd to the node bin. Since it’s a cloud node even though IGD failed, it’s still reachable and the node keeps running.

4 Likes

Thanks for flagging this :+1:

5 Likes

We should only flag that fail if all mechanisms fail IMO. So if echo service works and we are contactable then no error.

3 Likes

It isn’t an error (my bad) but an INFO message.

3 Likes

Oooo I’m in! Waited to join for over three hours without exiting and then got in and stored 200MB of chunks. Not much happening for a while now.

I’ve updated vdash (v0.6.3) and it shows both PUTs and GETs so long as you start the node with sufficient logging as follows:

RUST_LOG=info,quinn=off safe node join

To update and start vdash:

cargo install vdash
vdash ~/.safe/node/local-node/sn_node_rCURRENT.log

Zoom the timelines in and out using ‘i’ and ‘o’. To quit press ‘q’.

Example:

12 Likes

Sounded like this can be removed from the v7 Todo list already?

1 Like

For old CPU node crashes in the same way as CLI :frowning:
It was expected however.

2 Likes

All libs being up(?)graded to use the downgraded portable blst as we type. Later we will try and see if we can detect OS and supply optimised binaries per cpu etc. For now we fallback

4 Likes

Only 3 hours? Been 4 and still waiting.

Maybe there could be a signed message the node in waiting can receive from its tries (from section) and that be used to verify its been waiting for “X” period of time and thus get some preference over later nodes