Small file worked OK here
Gave upon my handsome features, you lucky people!!!
"willie-holiday.png" 681b0f64e82a186dbc82a7e387a76983d12cedccc60b5a8096654b4a2dc9246f
Don’t shout though, I have lost my ears. Dunno why the hat is not over my eyes…
Small file worked OK here
Gave upon my handsome features, you lucky people!!!
"willie-holiday.png" 681b0f64e82a186dbc82a7e387a76983d12cedccc60b5a8096654b4a2dc9246f
Don’t shout though, I have lost my ears. Dunno why the hat is not over my eyes…
one of my nodes charging slightly above average
so if you have 25 coin feel free to go for an upload
oooooh - just 0.025 coin needed! more or less a bargain to upload chunks to me!
edit/ps:
hmhmmm… it happened right after the record count went from 1478 to 1479
Pps:
Hmmm - but maybe it’s just a coincidence (when I look at the next record increase and store cost increase there) (precise timing is not the strength of my monitoring I guess)
I got this new (to me) set of errors when trying to upload from one of my VPS boxes
safe@snawthisyineither:~$ safe files upload -p cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
Logging to directory: "/home/safe/.local/share/safe/client/logs/log_2024-05-11_17-02-01"
safe client built with git version: 16f3484 / stable / 16f3484 / 2024-05-09
Instantiating a SAFE client...
Connecting to the network with 49 peers
🔗 Connected to the Network Chunking 1 files...
"cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb" will be made public and linkable
Splitting and uploading "cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb" into 6586 chunks
Error:
0: Failed to upload chunk batch: Wallet Error MsgPack deserialisation error:: invalid length 2, expected struct Output with 3 elements.
Location:
/home/runner/work/safe_network/safe_network/sn_cli/src/files/files_uploader.rs:171
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
safe@snawthisyineither:~$ safe wallet balance
Logging to directory: "/home/safe/.local/share/safe/client/logs/log_2024-05-11_17-03-24"
safe client built with git version: 16f3484 / stable / 16f3484 / 2024-05-09
Error:
0: MsgPack deserialisation error:: invalid length 2, expected struct Output with 3 elements
1: invalid length 2, expected struct Output with 3 elements
Location:
sn_cli/src/bin/subcommands/wallet.rs:32
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
safe@snawthisyineither:~$ safe wallet address
Logging to directory: "/home/safe/.local/share/safe/client/logs/log_2024-05-11_17-03-34"
safe client built with git version: 16f3484 / stable / 16f3484 / 2024-05-09
Error:
0: Wallet Error MsgPack deserialisation error:: invalid length 2, expected struct Output with 3 elements.
1: MsgPack deserialisation error:: invalid length 2, expected struct Output with 3 elements
2: invalid length 2, expected struct Output with 3 elements
Location:
sn_cli/src/bin/subcommands/wallet/hot_wallet.rs:131
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
safe@snawthisyineither:~$ safe wallet get-faucet 188.166.171.13:8000
Logging to directory: "/home/safe/.local/share/safe/client/logs/log_2024-05-11_17-03-46"
safe client built with git version: 16f3484 / stable / 16f3484 / 2024-05-09
Instantiating a SAFE client...
Connecting to the network with 49 peers
🔗 Connected to the Network Error:
0: Wallet Error MsgPack deserialisation error:: invalid length 2, expected struct Output with 3 elements.
1: MsgPack deserialisation error:: invalid length 2, expected struct Output with 3 elements
2: invalid length 2, expected struct Output with 3 elements
Location:
sn_cli/src/bin/subcommands/wallet/helpers.rs:52
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
safe@snawthisyineither:~$
I started reading up on LibP2P docs to better understand the layers involved and get more acquainted with the terminology and the processes involved here.
Below is a summary (feel free to revise or correct me) if I incorrectly stated any items:
AutoNAT (Determine Public or Private Node based on a node’s peers Dial Attempt/Responses off its assumed public address)
AutoRelay (Discover & Bind to closest Relay Nodes on the Network)
Circuit Relay (Connect To & Request Reservations with Discovered Relay Nodes)
Circuit Relay (Establish Connection via Relay to Remote Peer Node)
DCUTR (a successful simultaneous coordinated dial off node A & B results in successful hole punch (attempts are repeated via relay until DCUTR takes place properly))
Statement quoted from an IPFS website:
Note: There are situations in which hole punching will not work, most notably when one of the nodes is behind a symmetric NAT. In such cases, nodes can instead explicitly add port mappings, either manually or by using UPnP. As a last resort, nodes can leverage external relay nodes.
It seems from earlier charts, the last phase of this hole punching, DCUTR (I am seeing only a 1-3% success rate here for safenodes with --home-network flag (i.e. going through the whole Phase 1 & Phase 2 steps for a successful Hole Punch).
Granted, AutoNAT is not in play yet, and therefore, I am not sure if its already smart enough to attempt to connect to only public nodes that can act as relay nodes, or is attempting to reach out to both types off nodes (public and private). Is the component ‘AutoRelay’ or equivalent already in play, and does it work without AutoNAT integrated successfully?
For reasons unknown yet with Group A (–home-network flag), the alternative routes, aka manual NAT port forwarding (Group B) and UPnP (Group C) are resulting in far less logs (errors) and overall less connections errors thus far.
If a DCUTR is not successful, is communicating via Relay Node via ‘Circuit Relay’ off Step 1 of Phase 2 good enough for safenode pids (continuing to use the relay as a proxy (bi-directional channel via Relay between A & B nodes)), but not have the final A <=> B direct link?
At that point, is it really a successful hole punch, or its a successful communication maintained by relays only?
It seems being dependent on relay nodes (if lack of static NAT Portforwarding, unable to do UPnP, and also not get to the final successful DCUTR stage of hole punch sequence) setup would be the last resort (not that its a bad option (and it is an important option to have!)), but in terms of order of preference (according to the IPFS statement), .
I wanted to ensure the objective for a given testnet, and the terminology being used is properly understood by all of us (if possible) in the simplest terms, hence this post, .
I know the phases are evolving as the team is continuing to do work in the background for AutoNAT, UPnP, and other integration/components to give us the smoothest experience possible .
just chucked up an mint image 3gb
took two attempts due to the dreaded
Failed to upload chunk batch: The maximum specified repayments were made for the address: ec2881(11101100)
error
🔗 Connected to the Network "linuxmint-21.2-mate-64bit.iso" will be made public and linkable
Splitting and uploading "linuxmint-21.2-mate-64bit.iso" into 2260 chunks
**************************************
* Uploaded Files *
**************************************
"linuxmint-21.2-mate-64bit.iso" d239d34f77f73f4dc60ef3e8a12ffc87ded440dd886536d5aaaea0b2c15e2963
Among 2260 chunks, found 45 already existed in network, uploaded the leftover 2215 chunks in 11 minutes 30 seconds
**************************************
* Payment Details *
**************************************
Made payment of NanoTokens(270446) for 2215 chunks
Made payment of NanoTokens(46628) for royalties fees
New wallet balance: 0.999682926
What is the cause of this error?
Pretty sure it’s trying to pay a node that you can’t connect to.
In my case it is impossible to upload such large files without getting the error (maximum specified repayments) even if I reset the wallet.
For example, I deleted the safe directory and started completely from scratch by uploading larger and larger files.
This is the result:
1.- File 216KB (4 Chunks)->Ok
2.- File 3,2MB (8 Chunks)->Ok
3.- File 28,9MB (57 Chunks)->Ok (Stuck in a chunk for about three minutes)
4.- File 44,2MB (86 Chunks)->Ok (Stuck for a couple of minutes in two chunks)
5.- File 84,1MB (162 Chunks)->Error maximum specified repayments (1 chunk without uploading)
New attempt to upload the last file->Error (maximum specified repayment)
After a reset I try to upload the last file. This time I can finish even though the only pending chunk takes about four minutes.
6.-File 149,3MB(285 chunks)->Error (maximum specified repayment, 5 chunk without uploading)
New attempt->Error (maximum specified repayment)
New reset and new attempt->Error (maximum specified repayment). I try with another file.
7.-File 289,4MB(553 chunks)->Error (maximum specified repayment). I try several times and I always get the error even if I reset the wallet.
It is as if it is impossible to connect correctly to some groups. In the previous testnet the same thing happened to me.
Did you upload from home or from VPS?
If it is from home, do you have open ports?
I’m uploading from home. No open ports for the client and they shouldn’t be needed for the client. Open ports for running safenodes but I’m not using them just now because I have the 40 nodes set to use ‘–home-network’.
Are you running safenodes as well?
No. Only client.
There’s probably nothing wrong with your setup then if you can get chunks to upload at all. It could be the issue we’ve seen before that some nodes that would have a chunk destined for them can’t take the payment for some reason so those chunks keep failing.
Tried running a node with “home network” and good old port forwarding. One thing I noted, that in the last hour or so, the amount of get’s in Vdash were raising really quickly in the port forwarded nodes. Much slower with “home network”.
I started looking at the logs to see if there is any patterns on Peer IDs vs DCUTR success/failure, and frequency of those success or failures for a single safenode service, in this case safenode1
(part of Group A).
Attempt #1:
Attempt #2
Attempt #3
Result:
Additional Note:
Note:
Additional Note:
What’s interesting especially in the example above is the Peer ID above is the same Peer ID that failed to do a DCUTR (literally 2 mins prior to a proper success), even though on the failure attempts, it had tried 3 attempts with the whole workflow to try to get to a successful DCUTR (OK) in the first batch attempt, .
Reflecting over the timeline of 18 hours for a single safenode1
service with --home-network
flag:
There were 20,481 DCUTR (ERR) log entries with 3 attempts exceed messages
There were only 201 DCUTR (OK) log entries
Pivoting by Peer IDs across the 20,682 messages off DCUTR attempts (OK vs ERR):
Total Unique Peers: 3393
Total Unique Peers with ERR: 3377
Total Unique Peers with at least 1 OK: 104
Total Unique Peers with at least 1 OK and with ERR: 88
Total Unique Peers with at least 1 OK and without ERR: 16
I find it interesting that there is a few buckets off different combinations of my safenode1
attempting DCUTR against different but same remote peer id in terms of 1 or more attempts, and different outcomes over the 18 hours:
FWIW, DCUTR may not be the primary focus by the team if UPnP is a huge success and external relay communication continues to work even without a high DCUTR success rate (final phase of a successful hole punch (I believe)) (all TBD).
cc @joshuef
I found the port forwarding has way less errors than home-network, on the order of 5-10 times less
These errors may also slow down the rate of successful gets and puts
So the short answer:
If it’s being flagged, it’s not considered normal.
ConnectionIssue
could be any number of things, essentially your node did not respond fast enough to a message. (So they could be overwhelmed/cpu starved eg). Or it could be normal dropped packets.
“What is normal” is something we need to dial in on. So if we’re seeing a lot of people reporting healthy nodes being flagged, we’ll need to be more tolerant. This is an ongoing process. So we’ll have to see what’s happening here and make some proposals and try them out I think. (Suggestions welcome!).
This should be UTC timestamp and we’re not being super accurate here are we? We’re measuring to ~10mins if I recall?
I suspect we can set that as a default for folks to use, and then the need for home-network
should be even less, really.
If you can open a bug report with the details, that’d be awesome
(It may be related to something @qi_ma is looking at , but i’m not sure sure.)
I’ll cede to @bzee on this stuff!
Looks liek we’re being too intolerant of chunk failure (as part of the larger process), and bailing too early (we can try again later, sort of thing), and that’s more likely to bring down larger uploads, really.
That’s being looked at now.
cc @qi_ma this might well be something we’re seeing with the added autonat/holepunching etc complexities!
Aye, hopefully it’s essentially a last resort (for non technical peeps).
I think you nailed it here, working out what the correct number of nodes a machine can handle is not easy, x amount of nodes will run fine for hours the next thing you know the load average nearly dbl what the cpu can handle.
Happens in bursts presumably network gets busy as it will typically affect several nodes at a time.
I have been running enough nodes to be just on the verge of a queue, I suspect when the cpu gets behind these complaints come through.
You are likely right on the money, I dont get many bad reports but I am pushing the limits and occasionally going over.
same results here trying to get a number is very difficult.
and the way it is will lead to lots of bad nodes on the network with the node Olympics coming up everyone id going to be trying get as many nodes running as they can
Yeah I thought about this today, those who run too many will suffer the consequences.
I am going to aim to run at 80% instead of 100%. Hopefully that gets me GOOD instead of BAD reports
Kid you not, have a log message telling people they have good, it nodes may just be the incentive we need.
that’s what I aiming for but is nearly impossible to get it to stay at 80%
would be cool if safe node manager could terminate or add nodes depending on the available resources.
Just to be clear, I meant to run at 80 of what I consider possible without the spikes.
Just in case someone read that as running at 100% cpu