Offline - Just a quicky (run 1)

ss

3 Likes

I promised you only small uploads :slight_smile:

I got @aatonnomicc’s BBS but it was quite slow

I 'll time a 5MB upload

3 Likes

im just trying to get a second node on line in that i have on oracle cloud

3 Likes

its 5 in the morning where i am so I need to get at least a few hours sleep ill leave my nodes online in see if they are still alive in the morning and can pick this up again tomorow.

4 Likes

getting a couple of timeouts now and a connection loss

WARN 2021-12-19T01:03:34.536832Z [sn/src/routing/core/comm.rs:L280]:
	 ➤ send {recipients=[Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: Some(Connection { id: 93825003924048, remote_address: 64.227.42.158:12000, .. }) }] delivery_group_size=1 wire_msg=WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MessageId(4c29..12d2), msg_kind: NodeAuthMsg(NodeAuth { section_pk: PublicKey(10f6..f568), public_key: PublicKey(ec69..972e), signature: Signature(8d81..7f0a) }), dst_location: Node { name: f6f327(11110110).., section_pk: PublicKey(10f6..f568) } } } }}
	 ➤ during sending, received error Connection(TimedOut)
 WARN 2021-12-19T01:04:34.538086Z [sn/src/routing/core/comm.rs:L280]:
	 ➤ send {recipients=[Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: Some(Connection { id: 93825003924048, remote_address: 64.227.42.158:12000, .. }) }] delivery_group_size=1 wire_msg=WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MessageId(591f..a903), msg_kind: NodeAuthMsg(NodeAuth { section_pk: PublicKey(10f6..f568), public_key: PublicKey(ec69..972e), signature: Signature(3f14..9201) }), dst_location: Node { name: f6f327(11110110).., section_pk: PublicKey(10f6..f568) } } } }}
	 ➤ during sending, received error Connection(TimedOut)
 WARN 2021-12-19T01:05:28.365463Z [sn/src/routing/core/comm.rs:L480]:
	 ➤ bootstrap {bootstrap_nodes=[64.227.35.39:12000, 64.227.37.201:12000, 64.227.37.214:12000, 64.227.37.216:12000, 64.227.37.217:12000, 64.227.37.220:12000, 64.227.41.78:12000, 64.227.42.158:12000, 134.209.20.194:12000, 134.209.182.191:12000, 134.209.186.158:12000, 134.209.186.174:12000, 157.245.36.36:12000, 157.245.46.136:12000, 167.172.56.121:12000, 167.172.58.77:12000, 167.172.60.21:12000, 178.128.45.32:12000, 206.189.18.226:12000, 209.97.187.109:12000]}
	 ➤ handle_incoming_connections 
	 ➤ handle_incoming_messages {connection=Connection { id: 93825000299248, remote_address: 167.172.60.21:12000, .. }}
	 ➤ error on connection with 167.172.60.21:12000: ConnectionLost(TimedOut)
 WARN 2021-12-19T01:05:34.539584Z [sn/src/routing/core/comm.rs:L280]:
	 ➤ send {recipients=[Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: Some(Connection { id: 93825003924048, remote_address: 64.227.42.158:12000, .. }) }] delivery_group_size=1 wire_msg=WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MessageId(77d0..9c21), msg_kind: NodeAuthMsg(NodeAuth { section_pk: PublicKey(10f6..f568), public_key: PublicKey(ec69..972e), signature: Signature(5adc..590e) }), dst_location: Node { name: f6f327(11110110).., section_pk: PublicKey(10f6..f568) } } } }}
	 ➤ during sending, received error Connection(TimedOut)

got lots of chunks still

willie@gagarin:~/.safe/node$ du -h
322M	./root_dir/db/chunks/blobs
322M	./root_dir/db/chunks
4.0K	./root_dir/db/register/blobs
16K	./root_dir/db/register
322M	./root_dir/db
322M	./root_dir
353M	.
3 Likes

might be down

is name: f6f327(11110110).., addr: 64.227.42.158 one of your nodes?

$ time safe files put 5MBtest 
Error: 
   0: ClientError: Failed to obtain any response
   1: Failed to obtain any response

Location:
   /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/result.rs:1914

Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

real	18m28.687s
user	0m0.131s
sys	0m0.023s
INFO 2021-12-19T00:55:12.840769Z [sn/src/dbs/kv_store/mod.rs:L159]:
    ➤ store 
    ➤ Used space: 14991422504
INFO 2021-12-19T00:55:12.840778Z [sn/src/dbs/kv_store/mod.rs:L160]:
    ➤ store 
    ➤ Max capacity: 1099511627776
INFO 2021-12-19T00:55:12.840780Z [sn/src/dbs/kv_store/mod.rs:L161]:
    ➤ store 
    ➤ Used space ratio: 0.013634619339427445

I see on the directory I gave as --root-dir as part of sn_node join having only 7GB being used:

# du -h ./
0	./db/register/blobs
513.0K	./db/register
7.0G	./db/chunks/blobs
7.0G	./db/chunks
7.0G	./db
7.0G	./

Note: I forgot to delete the prior safe node root directory prior to joining this testnet iteration, so I see older chunks in there, but either way the directory size doesn’t add up to 14GB.

Why does it think there are ~14 GB in use on the node from safe node logs? Am I misinterpreting some units or not including other folder paths?

3 Likes

I am seeing the same IP having issues from my node logs as well.

send {recipients=[Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: "<locked>" }]
...
➤ during sending, received error Connection(TimedOut)
4 Likes

yes it is. I appear to have lost a few nodes.
As I noted above this was on very low spec droplets, 1vcpu-1gb.
It was running at about 60% mem capacity most of the time but I now see the remaining nodes have spiked to 90%
It may have been the cause, but it was worth a try.

Will revisit with higher spec droplets but I think my build is using more memory than it should and would like to figure out why instead.

4 Likes

We need to get a proper application into the BGF so there is a reasonable budget for this kind of testing.
Here we think we are falling cos limited budgets means low-spec intitial nodes when we should be failing because we are pushing the edge and confirming/denying the other days results

3 Likes

My current sn_node is using more than the 1GB of ram on the droplets you used:

pmap <pid> | tail -n 1
mapped: 1847692K

The alpine container currently has 8GB of ram allocated to it.

5 Likes

well yes and no, the reason for low spec nodes in this run was not budget it was to compare with the Maidsafe Playground which was on pretty beefy droplets but did not fair significantly better, I think there is definite value in determining the minimum requirement.

The next will be on a step up and if that gets maxed out a step up again.

7 Likes

Fair enough.
Can I ask just what no and spec of nodes you launched?

Im still trying to rejoin but I dont think its going to happen

 ➤ Aggregating received ApprovalShare from Peer { name: 8ddcee(10001101).., addr: 209.97.187.109:12000, connection: Some(Connection { id: 93824998797328, remote_address: 209.97.187.109:12000, .. }) }
 WARN 2021-12-19T01:49:30.027099Z [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.27.2/src/connection.rs:L492]:
	 ➤ Stopped listener for incoming bi-streams from 206.189.18.226:12000 due to error: Reset
 WARN 2021-12-19T01:49:30.027111Z [sn/src/routing/core/comm.rs:L480]:
	 ➤ send {recipients=[Peer { name: 0c33bd(00001100).., addr: 167.172.60.21:12000, connection: None }, Peer { name: 35b668(00110101).., addr: 64.227.37.220:12000, connection: None }, Peer { name: 7533ef(01110101).., addr: 64.227.37.214:12000, connection: None }, Peer { name: 797d7f(01111001).., addr: 64.227.35.39:12000, connection: Some(Connection { id: 93824995714384, remote_address: 64.227.35.39:12000, .. }) }, Peer { name: 8ddcee(10001101).., addr: 209.97.187.109:12000, connection: None }, Peer { name: b40378(10110100).., addr: 206.189.18.226:12000, connection: None }, Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: None }] delivery_group_size=7 wire_msg=WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MessageId(c258..b51b), msg_kind: NodeAuthMsg(NodeAuth { section_pk: PublicKey(10f6..f568), public_key: PublicKey(2e19..6f28), signature: Signature(55c3..210a) }), dst_location: Section { name: 2e1977(00101110).., section_pk: PublicKey(10f6..f568) } } } }}
	 ➤ handle_incoming_messages {connection=Connection { id: 93824998930096, remote_address: 206.189.18.226:12000, .. }}
	 ➤ error on connection with 206.189.18.226:12000: ConnectionLost(Reset)
 INFO 2021-12-19T01:49:30.151850Z [sn/src/routing/core/bootstrap/join.rs:L236]:
	 ➤ join {network_genesis_key=PublicKey(0f13..1f57) target_section_key=PublicKey(0f13..1f57) recipients=[Peer { name: 8af009(10001010).., addr: 64.227.35.39:12000, connection: Some(Connection { id: 93824995714384, remote_address: 64.227.35.39:12000, .. }) }]}
	 ➤ Aggregating received ApprovalShare from Peer { name: b40378(10110100).., addr: 206.189.18.226:12000, connection: Some(Connection { id: 93824995673584, remote_address: 206.189.18.226:12000, .. }) }
Encountered a timeout while trying to join the network. Retrying after 3 minutes.

Anyhow - we move on, we’ve learned another way it stops working.
The node I added had plenty RAM, will you be able to tell how many nodes joined other than yours?

2 Likes

20 droplets 1vcpu-1gb-amd, amd being the premium option which runs on faster/newer CPU’s and NVMe SSD. I think I’ll bump that to 2gb for the next try and see what happens then.

Logs

Thanks for taking part.

6 Likes

No its me that should be thanking you for the opportunity to take part :slight_smile:

Keep your billing info so we can work out an informed figure to ask BGF for …

4 Likes

Some charts from the monitoring off this lxc in particular during the time window while testnet was live:

PID Start Time: 00:35

Note: I did also run safe files put -r ./uploads/ which contained 3 512MB files, which eventually failed (probably the network was while it was running).

I don’t know how much that influenced the memory and cpu of the container along with sn_node itself.

Next time I will carry out some more tests from the cli on another container & host vs. the host that the safe node itself is running on.

9 Likes

My thanks to all of you :clap:, I wasn’t able to have a go myself this time but I always follow what’s happening and I think it’s very helpful to the project.

12 Likes