My test of downloading 1 file and uploading another 1 file was successful.
I changed my peer and am now successful at uploading.
Seems it should be a obvious first consideration if upload fails but it was not for me at least, my first thought was the network is buggered.
My other observation regarding memory not returning to baseline after the spike I saw was also wrong, it did, it took a while but it did, that baseline is however still up and to the right.
Easier to see here
This is working for me
export SAFE_PEERS="/ip4/167.99.68.228/tcp/45415/p2p/12D3KooWNzKi3djwB9tXLjMm9EdskadS1P6j1E6RtQPtXYvxn7nc"
no luck after changing over to you lucky peer
Hmmm smaller files too? I was just successful with 34.2MB mp4
Successfully stored 'drama.mp4' to 10959f6563e4ee916f1270efb5674e25185ea81704c4efd2fba9000404c45558
But am not getting a 184.9MB mp4 to upload after multiple tries.
Failed to store all chunks of file 'bunny.mp4' to all nodes in the close group: Network Error Could not retrieve the record after storing it: a4ac123d7eb729e82da04b8d36085349a73e3befec80f963e6a4447e421381fc
seems ok here :
Built with git version: 794fca7 / main / 794fca7
Instantiating a SAFE client…
Connected to the Network Loaded wallet from “/home/safenet/.local/share/safe/client/wallet” with balance Token(99995632384)
Preparing (chunking) files at ‘upload/big_buck_bunny_720p_stereo.ogg’…
Making payment for 386 Chunks that belong to 1 file/s.
Successfully made payment of 12352 for 386 records. (At a cost per record of Token(32).)
Successfully stored wallet with cached payment proofs, and new balance 99.995620032.
Successfully paid for storage and generated the proofs. They can now be sent to the storage nodes when uploading paid chunks.
Storing file ‘big_buck_bunny_720p_stereo.ogg’ of 196898674 bytes (386 chunk/s)…
Successfully stored ‘big_buck_bunny_720p_stereo.ogg’ to 54f3a6dcf2d6f4d0612a37368121ab5218937d2a0c6d643112a2d51f673de72c
Thank you
Specifying the port seems to have worked both from home and on AWS (kinda)
Fairly certain I tried specifying/not specifying the port the other day and it made no difference but certainly seems to have worked today. Or at least I get further…
on the AWS instance I got this in the logs though
[2023-09-02T16:39:06.214344Z TRACE sn_node::replication] replication: close for NetworkAddress::RecordKey( - 5dc6161a5f30c9b293381f231591bb3b53acbc195d477d8ef12208067522d42a) are: [PeerId("12D3KooWN78YfXNVTEzHjFPMmomZnmWDj3mvFEtX1ExGGBF9aR5z"), PeerId("12D3KooWD9DrFY3awTHMowVfVpKjbjx95kscmGdzArqSaWGKbK2s"), PeerId("12D3KooWSfDhVzZczQcq6RzSuDYHC1UDxKibmZEfwu4DjzLh8zCK"), PeerId("12D3KooWHPZpAMQnfWiXK8KJ9ir56SrngyRhCdcMF1bbUXCKsyq6"), PeerId("12D3KooWDU47LeqxD6h7Jwcjd1bCPR6cqK8SSRkZcLYupb5JNv2Y"), PeerId("12D3KooWNLvV8mFzfX9yDznRv5RfHu8rD7Y2yYxKfquXw9cQyP9Y"), PeerId("12D3KooWCX2q3YwYWJV7PBrWsHLTgxdmpxyiMCH2KPnaaGTWeFcA"), PeerId("12D3KooWPouAcAEqH8WVM1eBs48Tmx2MWV3jP5FgFZKnEVHcwc8K"), PeerId("12D3KooWHCCxcgXfW5SSidRxNKMDP8yJvJM6UxjEE935SfnKniiV")]
[2023-09-02T16:39:06.216016Z TRACE sn_node::replication] replication: close for NetworkAddress::RecordKey( - 1cabfe2742e6552edaa397a6f1395c8343d9ea49f5c9287f4998d1a8b47ac0e7) are: [PeerId("12D3KooWN78YfXNVTEzHjFPMmomZnmWDj3mvFEtX1ExGGBF9aR5z"), PeerId("12D3KooWSfDhVzZczQcq6RzSuDYHC1UDxKibmZEfwu4DjzLh8zCK"), PeerId("12D3KooWD9DrFY3awTHMowVfVpKjbjx95kscmGdzArqSaWGKbK2s"), PeerId("12D3KooWHPZpAMQnfWiXK8KJ9ir56SrngyRhCdcMF1bbUXCKsyq6"), PeerId("12D3KooWDU47LeqxD6h7Jwcjd1bCPR6cqK8SSRkZcLYupb5JNv2Y"), PeerId("12D3KooWNLvV8mFzfX9yDznRv5RfHu8rD7Y2yYxKfquXw9cQyP9Y"), PeerId("12D3KooWCX2q3YwYWJV7PBrWsHLTgxdmpxyiMCH2KPnaaGTWeFcA"), PeerId("12D3KooWPouAcAEqH8WVM1eBs48Tmx2MWV3jP5FgFZKnEVHcwc8K"), PeerId("12D3KooWHCCxcgXfW5SSidRxNKMDP8yJvJM6UxjEE935SfnKniiV")]
[2023-09-02T16:39:06.217549Z TRACE sn_node::replication] replication: close for NetworkAddress::RecordKey( - e313e0c51075a47444c1cabfc5c9e1095d180854fbe2eb71e2d96a69f2b5f734) are: [PeerId("12D3KooWHPZpAMQnfWiXK8KJ9ir56SrngyRhCdcMF1bbUXCKsyq6"), PeerId("12D3KooWDU47LeqxD6h7Jwcjd1bCPR6cqK8SSRkZcLYupb5JNv2Y"), PeerId("12D3KooWNLvV8mFzfX9yDznRv5RfHu8rD7Y2yYxKfquXw9cQyP9Y"), PeerId("12D3KooWN78YfXNVTEzHjFPMmomZnmWDj3mvFEtX1ExGGBF9aR5z"), PeerId("12D3KooWD9DrFY3awTHMowVfVpKjbjx95kscmGdzArqSaWGKbK2s"), PeerId("12D3KooWSfDhVzZczQcq6RzSuDYHC1UDxKibmZEfwu4DjzLh8zCK"), PeerId("12D3KooWHqG8AhBjHpW7diQiXGFtA2mYkQCHj3bksSkQifUS5wNi"), PeerId("12D3KooWQErjJaMkcXYm8huBfCGjJYJrmgcj7kn2twM5KLi12Njf"), PeerId("12D3KooWPouAcAEqH8WVM1eBs48Tmx2MWV3jP5FgFZKnEVHcwc8K")]
[2023-09-02T16:39:06.219084Z TRACE sn_node::replication] replication: close for NetworkAddress::RecordKey( - 4e10814db620ec4df34d567906df57e531759d039649e297e3b92bb117a423bb) are: [PeerId("12D3KooWN78YfXNVTEzHjFPMmomZnmWDj3mvFEtX1ExGGBF9aR5z"), PeerId("12D3KooWD9DrFY3awTHMowVfVpKjbjx95kscmGdzArqSaWGKbK2s"), PeerId("12D3KooWSfDhVzZczQcq6RzSuDYHC1UDxKibmZEfwu4DjzLh8zCK"), PeerId("12D3KooWHPZpAMQnfWiXK8KJ9ir56SrngyRhCdcMF1bbUXCKsyq6"), PeerId("12D3KooWDU47LeqxD6h7Jwcjd1bCPR6cqK8SSRkZcLYupb5JNv2Y"), PeerId("12D3KooWNLvV8mFzfX9yDznRv5RfHu8rD7Y2yYxKfquXw9cQyP9Y"), PeerId("12D3KooWCX2q3YwYWJV7PBrWsHLTgxdmpxyiMCH2KPnaaGTWeFcA"), PeerId("12D3KooWPouAcAEqH8WVM1eBs48Tmx2MWV3jP5FgFZKnEVHcwc8K"), PeerId("12D3KooWHCCxcgXfW5SSidRxNKMDP8yJvJM6UxjEE935SfnKniiV")]
[2023-09-02T16:39:06.220642Z TRACE sn_node::replication] replication: close for NetworkAddress::RecordKey( - f7d670cde3638e1f562e023d8f0e993d90e090ea95078bc697238584dbf8200d) are: [PeerId("12D3KooWN78YfXNVTEzHjFPMmomZnmWDj3mvFEtX1ExGGBF9aR5z"), PeerId("12D3KooWSfDhVzZczQcq6RzSuDYHC1UDxKibmZEfwu4DjzLh8zCK"), PeerId("12D3KooWD9DrFY3awTHMowVfVpKjbjx95kscmGdzArqSaWGKbK2s"), PeerId("12D3KooWHPZpAMQnfWiXK8KJ9ir56SrngyRhCdcMF1bbUXCKsyq6"), PeerId("12D3KooWDU47LeqxD6h7Jwcjd1bCPR6cqK8SSRkZcLYupb5JNv2Y"), PeerId("12D3KooWNLvV8mFzfX9yDznRv5RfHu8rD7Y2yYxKfquXw9cQyP9Y"), PeerId("12D3KooWCX2q3YwYWJV7PBrWsHLTgxdmpxyiMCH2KPnaaGTWeFcA"), PeerId("12D3KooWPouAcAEqH8WVM1eBs48Tmx2MWV3jP5FgFZKnEVHcwc8K"), PeerId("12D3KooWHCCxcgXfW5SSidRxNKMDP8yJvJM6UxjEE935SfnKniiV")]
thread 'tokio-runtime-worker' panicked at 'range end index 3 out of range for slice of length 0', sn_protocol/src/storage/header.rs:73:32
which is a shame cos it was happily storing chunks
ubuntu@DialNetNodesouthside01:~$ du -h ~/.local/share/safe/node/record_store/
31M /home/ubuntu/.local/share/safe/node/record_store/
I’m glad you’re back on the road!
I am also getting the
thread 'tokio-runtime-worker' panicked at 'range end index 3 out of range for slice of length 0'
messages but it doesn’t seem to be stopping nodes from working and storing records.
The message doesn’t seem to be going into the logs though! They seem to only appear on the console and that makes it difficult to tie them to any time or event.
OK I will restart the AWS node.
Bit distracted today, MrsSouthside has issued an edict that my man cave is to be moved upstairs as she WILL have French doors out the back and a proper patio for next spring and my shit will just have to get out of the way. Pleas about network wiring, radiators in the wrong place, general upheaval and the fact that the only French she knows is “Deux verres rouge, s-il vous plait” so WTF is she going to do with not one but two French doors did not go down well. C’est la vie.
Finally joined in on the fun, .
Just reporting as well that I too saw this message at the launch off my safenode pid just a few minutes ago.
The node had received roughly ~176 files in the record_store in < 5 minutes. Happy to see files being stored under record_store
rather quickly this time around, .
I will report back a bit later with an updated dashboard image, but for now going to let the logs collect on disk for a little bit, and analyze them a bit later.
having another go at uploading
your lucky peer works fine from another machine but when I try to upload from my main vps which has 20 nodes running the upload fails
if any of the team have a minute log
safe.zip (8.9 KB)
root@sp1:~/safe# safe --log-output-dest data-dir files upload test.txt
Logging to directory: "/root/.local/share/safe/client/logs"
Built with git version: 794fca7 / main / 794fca7
Instantiating a SAFE client...
🔗 Connected to the Network Loaded wallet from "/root/.local/share/safe/client/wallet" with balance Token(199999987036)
Preparing (chunking) files at 'test.txt'...
Making payment for 1 Chunks that belong to 1 file/s.
Error: Failed to send tokens due to Network Error Could not retrieve the record after storing it: 61f5a0b3072a979aff8b2ef30dead666d464f17fb18cdb4cace1e5149fc10da9.
Last night my Ubuntu laptop with one node and one big (1.2GB) upload process going, dropped the connection to wifi, with a message:
activation of network connection failed
I turned the wifi off and back on, and managed to reconnect after a while. But during the night the connection dropped again, and this morning I was not able to connect before rebooting. Then it connected fine again.
I have a hunch, that this might somehow be related to all this testnet stuff going on on my machine. But it is hard to tell, since I very rarely use this machine outside our testnets.
My guess is that this is a local fault that has only become noticeable because of the node. The fault might also be triggered by continuous use, but I think it is unlikely to be a fault in sn_node
. That’s all speculation obviously.
My wild guess is too many ports - too many streams - having to be multiplexed by wifi chipset and it overheated corrupting it’s ability to function until reboot.
Quic? should fix this.
I tried it and 2 other peers to upload buck_bunny probably 6 times in total, none finished successfully. Everything else albeit smaller files did complete.
The odd thing is that @Aragorn did successfully upload buck_bunny (in a different format) while I was trying.
Failed to store all chunks of file 'bunny.mp4' to all nodes in the close group: Network Error Could not retrieve the record after storing it
Mine was mp4 vs Aragorn’s ogg but surely that can’t be the cause of different outcomes.
The two versions are of similar size.
mp4: 184866348 bytes (363 chunk/s)
ogg: 196898674 bytes (386 chunk/s)
WiFi is convenient but I still try to wire most of my connections.
Observations:
-
Initial # of peers connected on spin up (~1520) during initial discovery was almost 3x the average compare to the # (522) in steady state, though it might have been due to the influx of ~175 records being received to disk (I am not fully sure).
-
I can’t believe we had over 2965 unique Peer IDs discovered already! Possibly up to additional 50% more nodes were added by the community to the mix after MaidSafe’s 2000 peer nodes? Impressive!
-
Personal observation, for me, an easier interpretation of CPU Usage for
safenode
pid from the metrics json line was to divide it by the number of logical CPUs detected. I.e. It showed > 100% for a brief period, if you are running on more than 1 CPU, but by default, Grafana metrics for % units are set to 0 to 100% range:
[2023-09-03T05:21:46.090103Z DEBUG sn_logging::metrics] {"physical_cpu_threads":4,"system_cpu_usage_percent":73.694984,"system_total_memory_mb":6895.4355,"system_memory_used_mb":205.68063,"system_memory_usage_percent":2.982852,"process"
:{"cpu_usage_percent":241.7605,"memory_used_mb":102.350845,"bytes_read":0,"bytes_written":1671168,"total_mb_read":0.004096,"total_mb_written":31.567871}}
-
Number of
Wrote record <record_key_name> to disk!
(i.e. 216 doesn’t match the # of files in the record_store - 198), where as theChunk with ... validated and stored. Stored Successfully
is at 181. Is this all okay, or how should I be interpreting this? -
The messages below were logged as ‘ERROR’ but was this an error on my node, or the remote peer node? I am guessing it was a peer node error? Am I interpreting it properly?
Note: Searching for say5cf3be
in the record_store does find a single record with that file name prefix.
[2023-09-03T05:23:13.351388Z ERROR sn_node::api] Error while handling NetworkEvent::ResponseReceived Protocol(ChunkNotStored(5cf3be(01011100)..))
[2023-09-03T05:23:19.828431Z ERROR sn_node::api] Error while handling NetworkEvent::ResponseReceived Protocol(ChunkNotStored(aa5bf3(10101010)..))
[2023-09-03T05:23:32.265645Z ERROR sn_node::api] Error while handling NetworkEvent::ResponseReceived Protocol(ChunkNotStored(7a059c(01111010)..))
[2023-09-03T05:23:36.284023Z ERROR sn_node::api] Error while handling NetworkEvent::ResponseReceived Protocol(ChunkNotStored(0ac6ff(00001010)..))
- New messages that I haven’t parsed into the dashboard yet, still need to review them if they provide additional value, or not (seem to be logging the # of attempts or # of copies received):
[2023-09-03T05:23:13.349044Z INFO sn_networking::event] Getting record f41ff46adf4fd4ca896bfba59653d24470b04e9138fe1c41b6d954f578640022 early completed with 4 copies received
[2023-09-03T05:23:09.886990Z DEBUG sn_networking] Getting record of f41ff46adf4fd4ca896bfba59653d24470b04e9138fe1c41b6d954f578640022 attempts 7/30
...
[2023-09-03T05:23:09.887291Z TRACE sn_networking::record_store] Record not found locally
Overall, the node seems to be operating well!!
Congrats to the team for getting this build out there, and having it be tested and validated by the community!
Just too many things that can go wrong with wifi as opposed to a lump of Cat5/6. And the things that can go wrong with a wired connection are usually easy to spot/fix/prevent.
Still, if we are heading for all sorts devices, and general public, it’s going to be wifi. If it’s only the node, not the network taking the hit, then it’s all OK. (As long as it doesn’t kill the motivation of an average Joe.)
I think my system is limping in many ways and I should update to newest Ubuntu at least. It’s throwing some weird errors every now and then. For example the laptop’s integrated keyboard is dead in every other boot or so, and I need to plug in an external one.
I have a bit of a strange one. Unless it’s been encountered before.
I’ve started 9 nodes from a RPi4 at home. They ran for about 4 mins and got some records each. Then they started getting the messages about NAT:-
Error: We have been determined to be behind a NAT. This means we are not reachable externally by other nodes. In the future, the network will implement relays that allow us to still join the network.
which is fair enough as I have a horrible double NAT situation going on with my insistence on not using the garbage equipment sent by my ISP.
However, one of the Safenodes is merrily still running.
I thought the process was completely deterministic? Is it kind of probalistic? Is it that the node got a lot of records and the network is cool with it because it seems to be doing well?!
Also, the ones that failed seemed to be found to be both Private and Public
1/safenode.log:[2023-09-03T19:55:47.538667Z DEBUG sn_networking::event] AutoNAT outbound probe: Error { probe_id: ProbeId(1), peer: Some(PeerId("12D3KooWMG4WunwePLXD5wuRqF2GjnWbVreYse2JLayVsPF8LcTX")), error: Response(DialError) }
1/safenode.log:[2023-09-03T19:55:47.538733Z INFO sn_networking::event] AutoNAT status changed: Public("/ip4/xxx.xxx.xxx.xxx/tcp/12021/p2p/12D3KooWPdi7DsGBaKcutnStbNQVb2msUyKqP8WajiYDUpE2zMk6") -> Private
1/safenode.log:[2023-09-03T19:55:47.538893Z WARN sn_node::api] NAT status is determined to be private!
That just seems odd!
As is the fact that the network is fine with one of the nodes but not the other eight.
That looks like a bug actually, unless one of them managed to somehow be port forwarded? Seems sus though
I’m glad it’s baffling you as well! They should all be forwarded but it is a double NAT situation so I wasn’t surprised when they started failing. But this neither one thing nor the other does seem odd.
The setup is:-
BT ADSL > Draytek Vigor 167 (ADSL modem) > Turris Omnia (proper router)
I’ve setup the Draytek Vigor with a DMZ for the Turris.
The Turris has port forwarding setup for ports 12021 to 12029 to the Pi’s IP.
I know the Draytek to Turris DMZ and port forwarding is working because I’m able to get to my VPN server.