I promised you only small uploads
I got @aatonnomicc’s BBS but it was quite slow
I 'll time a 5MB upload
im just trying to get a second node on line in that i have on oracle cloud
its 5 in the morning where i am so I need to get at least a few hours sleep ill leave my nodes online in see if they are still alive in the morning and can pick this up again tomorow.
getting a couple of timeouts now and a connection loss
WARN 2021-12-19T01:03:34.536832Z [sn/src/routing/core/comm.rs:L280]:
➤ send {recipients=[Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: Some(Connection { id: 93825003924048, remote_address: 64.227.42.158:12000, .. }) }] delivery_group_size=1 wire_msg=WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MessageId(4c29..12d2), msg_kind: NodeAuthMsg(NodeAuth { section_pk: PublicKey(10f6..f568), public_key: PublicKey(ec69..972e), signature: Signature(8d81..7f0a) }), dst_location: Node { name: f6f327(11110110).., section_pk: PublicKey(10f6..f568) } } } }}
➤ during sending, received error Connection(TimedOut)
WARN 2021-12-19T01:04:34.538086Z [sn/src/routing/core/comm.rs:L280]:
➤ send {recipients=[Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: Some(Connection { id: 93825003924048, remote_address: 64.227.42.158:12000, .. }) }] delivery_group_size=1 wire_msg=WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MessageId(591f..a903), msg_kind: NodeAuthMsg(NodeAuth { section_pk: PublicKey(10f6..f568), public_key: PublicKey(ec69..972e), signature: Signature(3f14..9201) }), dst_location: Node { name: f6f327(11110110).., section_pk: PublicKey(10f6..f568) } } } }}
➤ during sending, received error Connection(TimedOut)
WARN 2021-12-19T01:05:28.365463Z [sn/src/routing/core/comm.rs:L480]:
➤ bootstrap {bootstrap_nodes=[64.227.35.39:12000, 64.227.37.201:12000, 64.227.37.214:12000, 64.227.37.216:12000, 64.227.37.217:12000, 64.227.37.220:12000, 64.227.41.78:12000, 64.227.42.158:12000, 134.209.20.194:12000, 134.209.182.191:12000, 134.209.186.158:12000, 134.209.186.174:12000, 157.245.36.36:12000, 157.245.46.136:12000, 167.172.56.121:12000, 167.172.58.77:12000, 167.172.60.21:12000, 178.128.45.32:12000, 206.189.18.226:12000, 209.97.187.109:12000]}
➤ handle_incoming_connections
➤ handle_incoming_messages {connection=Connection { id: 93825000299248, remote_address: 167.172.60.21:12000, .. }}
➤ error on connection with 167.172.60.21:12000: ConnectionLost(TimedOut)
WARN 2021-12-19T01:05:34.539584Z [sn/src/routing/core/comm.rs:L280]:
➤ send {recipients=[Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: Some(Connection { id: 93825003924048, remote_address: 64.227.42.158:12000, .. }) }] delivery_group_size=1 wire_msg=WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MessageId(77d0..9c21), msg_kind: NodeAuthMsg(NodeAuth { section_pk: PublicKey(10f6..f568), public_key: PublicKey(ec69..972e), signature: Signature(5adc..590e) }), dst_location: Node { name: f6f327(11110110).., section_pk: PublicKey(10f6..f568) } } } }}
➤ during sending, received error Connection(TimedOut)
got lots of chunks still
willie@gagarin:~/.safe/node$ du -h
322M ./root_dir/db/chunks/blobs
322M ./root_dir/db/chunks
4.0K ./root_dir/db/register/blobs
16K ./root_dir/db/register
322M ./root_dir/db
322M ./root_dir
353M .
might be down
is name: f6f327(11110110).., addr: 64.227.42.158
one of your nodes?
$ time safe files put 5MBtest
Error:
0: ClientError: Failed to obtain any response
1: Failed to obtain any response
Location:
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/result.rs:1914
Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
real 18m28.687s
user 0m0.131s
sys 0m0.023s
INFO 2021-12-19T00:55:12.840769Z [sn/src/dbs/kv_store/mod.rs:L159]:
➤ store
➤ Used space: 14991422504
INFO 2021-12-19T00:55:12.840778Z [sn/src/dbs/kv_store/mod.rs:L160]:
➤ store
➤ Max capacity: 1099511627776
INFO 2021-12-19T00:55:12.840780Z [sn/src/dbs/kv_store/mod.rs:L161]:
➤ store
➤ Used space ratio: 0.013634619339427445
I see on the directory I gave as --root-dir as part of sn_node join having only 7GB being used:
# du -h ./
0 ./db/register/blobs
513.0K ./db/register
7.0G ./db/chunks/blobs
7.0G ./db/chunks
7.0G ./db
7.0G ./
Note: I forgot to delete the prior safe node root directory prior to joining this testnet iteration, so I see older chunks in there, but either way the directory size doesn’t add up to 14GB.
Why does it think there are ~14 GB in use on the node from safe node logs? Am I misinterpreting some units or not including other folder paths?
I am seeing the same IP having issues from my node logs as well.
send {recipients=[Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: "<locked>" }]
...
➤ during sending, received error Connection(TimedOut)
yes it is. I appear to have lost a few nodes.
As I noted above this was on very low spec droplets, 1vcpu-1gb.
It was running at about 60% mem capacity most of the time but I now see the remaining nodes have spiked to 90%
It may have been the cause, but it was worth a try.
Will revisit with higher spec droplets but I think my build is using more memory than it should and would like to figure out why instead.
We need to get a proper application into the BGF so there is a reasonable budget for this kind of testing.
Here we think we are falling cos limited budgets means low-spec intitial nodes when we should be failing because we are pushing the edge and confirming/denying the other days results
My current sn_node is using more than the 1GB of ram on the droplets you used:
pmap <pid> | tail -n 1
mapped: 1847692K
The alpine container currently has 8GB of ram allocated to it.
well yes and no, the reason for low spec nodes in this run was not budget it was to compare with the Maidsafe Playground which was on pretty beefy droplets but did not fair significantly better, I think there is definite value in determining the minimum requirement.
The next will be on a step up and if that gets maxed out a step up again.
Fair enough.
Can I ask just what no and spec of nodes you launched?
Im still trying to rejoin but I dont think its going to happen
➤ Aggregating received ApprovalShare from Peer { name: 8ddcee(10001101).., addr: 209.97.187.109:12000, connection: Some(Connection { id: 93824998797328, remote_address: 209.97.187.109:12000, .. }) }
WARN 2021-12-19T01:49:30.027099Z [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.27.2/src/connection.rs:L492]:
➤ Stopped listener for incoming bi-streams from 206.189.18.226:12000 due to error: Reset
WARN 2021-12-19T01:49:30.027111Z [sn/src/routing/core/comm.rs:L480]:
➤ send {recipients=[Peer { name: 0c33bd(00001100).., addr: 167.172.60.21:12000, connection: None }, Peer { name: 35b668(00110101).., addr: 64.227.37.220:12000, connection: None }, Peer { name: 7533ef(01110101).., addr: 64.227.37.214:12000, connection: None }, Peer { name: 797d7f(01111001).., addr: 64.227.35.39:12000, connection: Some(Connection { id: 93824995714384, remote_address: 64.227.35.39:12000, .. }) }, Peer { name: 8ddcee(10001101).., addr: 209.97.187.109:12000, connection: None }, Peer { name: b40378(10110100).., addr: 206.189.18.226:12000, connection: None }, Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: None }] delivery_group_size=7 wire_msg=WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MessageId(c258..b51b), msg_kind: NodeAuthMsg(NodeAuth { section_pk: PublicKey(10f6..f568), public_key: PublicKey(2e19..6f28), signature: Signature(55c3..210a) }), dst_location: Section { name: 2e1977(00101110).., section_pk: PublicKey(10f6..f568) } } } }}
➤ handle_incoming_messages {connection=Connection { id: 93824998930096, remote_address: 206.189.18.226:12000, .. }}
➤ error on connection with 206.189.18.226:12000: ConnectionLost(Reset)
INFO 2021-12-19T01:49:30.151850Z [sn/src/routing/core/bootstrap/join.rs:L236]:
➤ join {network_genesis_key=PublicKey(0f13..1f57) target_section_key=PublicKey(0f13..1f57) recipients=[Peer { name: 8af009(10001010).., addr: 64.227.35.39:12000, connection: Some(Connection { id: 93824995714384, remote_address: 64.227.35.39:12000, .. }) }]}
➤ Aggregating received ApprovalShare from Peer { name: b40378(10110100).., addr: 206.189.18.226:12000, connection: Some(Connection { id: 93824995673584, remote_address: 206.189.18.226:12000, .. }) }
Encountered a timeout while trying to join the network. Retrying after 3 minutes.
Anyhow - we move on, we’ve learned another way it stops working.
The node I added had plenty RAM, will you be able to tell how many nodes joined other than yours?
20 droplets 1vcpu-1gb-amd, amd being the premium option which runs on faster/newer CPU’s and NVMe SSD. I think I’ll bump that to 2gb for the next try and see what happens then.
Thanks for taking part.
No its me that should be thanking you for the opportunity to take part
Keep your billing info so we can work out an informed figure to ask BGF for …
Some charts from the monitoring off this lxc in particular during the time window while testnet was live:
PID Start Time: 00:35
Note: I did also run safe files put -r ./uploads/ which contained 3 512MB files, which eventually failed (probably the network was while it was running).
I don’t know how much that influenced the memory and cpu of the container along with sn_node itself.
Next time I will carry out some more tests from the cli on another container & host vs. the host that the safe node itself is running on.
My thanks to all of you , I wasn’t able to have a go myself this time but I always follow what’s happening and I think it’s very helpful to the project.