Offline - Just a quicky (run 1)

Josh · December 19, 2021, 12:44am

Southside · December 19, 2021, 12:46am

I promised you only small uploads

I got @aatonnomicc’s BBS but it was quite slow

I 'll time a 5MB upload

aatonnomicc · December 19, 2021, 12:58am

im just trying to get a second node on line in that i have on oracle cloud

aatonnomicc · December 19, 2021, 1:01am

its 5 in the morning where i am so I need to get at least a few hours sleep ill leave my nodes online in see if they are still alive in the morning and can pick this up again tomorow.

Southside · December 19, 2021, 1:06am

getting a couple of timeouts now and a connection loss

WARN 2021-12-19T01:03:34.536832Z [sn/src/routing/core/comm.rs:L280]:
	 ➤ send {recipients=[Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: Some(Connection { id: 93825003924048, remote_address: 64.227.42.158:12000, .. }) }] delivery_group_size=1 wire_msg=WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MessageId(4c29..12d2), msg_kind: NodeAuthMsg(NodeAuth { section_pk: PublicKey(10f6..f568), public_key: PublicKey(ec69..972e), signature: Signature(8d81..7f0a) }), dst_location: Node { name: f6f327(11110110).., section_pk: PublicKey(10f6..f568) } } } }}
	 ➤ during sending, received error Connection(TimedOut)
 WARN 2021-12-19T01:04:34.538086Z [sn/src/routing/core/comm.rs:L280]:
	 ➤ send {recipients=[Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: Some(Connection { id: 93825003924048, remote_address: 64.227.42.158:12000, .. }) }] delivery_group_size=1 wire_msg=WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MessageId(591f..a903), msg_kind: NodeAuthMsg(NodeAuth { section_pk: PublicKey(10f6..f568), public_key: PublicKey(ec69..972e), signature: Signature(3f14..9201) }), dst_location: Node { name: f6f327(11110110).., section_pk: PublicKey(10f6..f568) } } } }}
	 ➤ during sending, received error Connection(TimedOut)
 WARN 2021-12-19T01:05:28.365463Z [sn/src/routing/core/comm.rs:L480]:
	 ➤ bootstrap {bootstrap_nodes=[64.227.35.39:12000, 64.227.37.201:12000, 64.227.37.214:12000, 64.227.37.216:12000, 64.227.37.217:12000, 64.227.37.220:12000, 64.227.41.78:12000, 64.227.42.158:12000, 134.209.20.194:12000, 134.209.182.191:12000, 134.209.186.158:12000, 134.209.186.174:12000, 157.245.36.36:12000, 157.245.46.136:12000, 167.172.56.121:12000, 167.172.58.77:12000, 167.172.60.21:12000, 178.128.45.32:12000, 206.189.18.226:12000, 209.97.187.109:12000]}
	 ➤ handle_incoming_connections 
	 ➤ handle_incoming_messages {connection=Connection { id: 93825000299248, remote_address: 167.172.60.21:12000, .. }}
	 ➤ error on connection with 167.172.60.21:12000: ConnectionLost(TimedOut)
 WARN 2021-12-19T01:05:34.539584Z [sn/src/routing/core/comm.rs:L280]:
	 ➤ send {recipients=[Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: Some(Connection { id: 93825003924048, remote_address: 64.227.42.158:12000, .. }) }] delivery_group_size=1 wire_msg=WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MessageId(77d0..9c21), msg_kind: NodeAuthMsg(NodeAuth { section_pk: PublicKey(10f6..f568), public_key: PublicKey(ec69..972e), signature: Signature(5adc..590e) }), dst_location: Node { name: f6f327(11110110).., section_pk: PublicKey(10f6..f568) } } } }}
	 ➤ during sending, received error Connection(TimedOut)

got lots of chunks still

willie@gagarin:~/.safe/node$ du -h
322M	./root_dir/db/chunks/blobs
322M	./root_dir/db/chunks
4.0K	./root_dir/db/register/blobs
16K	./root_dir/db/register
322M	./root_dir/db
322M	./root_dir
353M	.

Southside · December 19, 2021, 1:19am

might be down

is name: f6f327(11110110).., addr: 64.227.42.158 one of your nodes?

$ time safe files put 5MBtest 
Error: 
   0: ClientError: Failed to obtain any response
   1: Failed to obtain any response

Location:
   /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/result.rs:1914

Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

real	18m28.687s
user	0m0.131s
sys	0m0.023s

Shu · December 19, 2021, 1:22am

INFO 2021-12-19T00:55:12.840769Z [sn/src/dbs/kv_store/mod.rs:L159]:
    ➤ store 
    ➤ Used space: 14991422504
INFO 2021-12-19T00:55:12.840778Z [sn/src/dbs/kv_store/mod.rs:L160]:
    ➤ store 
    ➤ Max capacity: 1099511627776
INFO 2021-12-19T00:55:12.840780Z [sn/src/dbs/kv_store/mod.rs:L161]:
    ➤ store 
    ➤ Used space ratio: 0.013634619339427445

I see on the directory I gave as --root-dir as part of sn_node join having only 7GB being used:

# du -h ./
0	./db/register/blobs
513.0K	./db/register
7.0G	./db/chunks/blobs
7.0G	./db/chunks
7.0G	./db
7.0G	./

Note: I forgot to delete the prior safe node root directory prior to joining this testnet iteration, so I see older chunks in there, but either way the directory size doesn’t add up to 14GB.

Why does it think there are ~14 GB in use on the node from safe node logs? Am I misinterpreting some units or not including other folder paths?

Shu · December 19, 2021, 1:32am

I am seeing the same IP having issues from my node logs as well.

send {recipients=[Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: "<locked>" }]
...
➤ during sending, received error Connection(TimedOut)

Josh · December 19, 2021, 1:37am

yes it is. I appear to have lost a few nodes.
As I noted above this was on very low spec droplets, 1vcpu-1gb.
It was running at about 60% mem capacity most of the time but I now see the remaining nodes have spiked to 90%
It may have been the cause, but it was worth a try.

Will revisit with higher spec droplets but I think my build is using more memory than it should and would like to figure out why instead.

Southside · December 19, 2021, 1:41am

We need to get a proper application into the BGF so there is a reasonable budget for this kind of testing.
Here we think we are falling cos limited budgets means low-spec intitial nodes when we should be failing because we are pushing the edge and confirming/denying the other days results

Shu · December 19, 2021, 1:44am

My current sn_node is using more than the 1GB of ram on the droplets you used:

pmap <pid> | tail -n 1
mapped: 1847692K

The alpine container currently has 8GB of ram allocated to it.

Josh · December 19, 2021, 1:49am

well yes and no, the reason for low spec nodes in this run was not budget it was to compare with the Maidsafe Playground which was on pretty beefy droplets but did not fair significantly better, I think there is definite value in determining the minimum requirement.

The next will be on a step up and if that gets maxed out a step up again.

Southside · December 19, 2021, 1:52am

Fair enough.
Can I ask just what no and spec of nodes you launched?

Im still trying to rejoin but I dont think its going to happen

 ➤ Aggregating received ApprovalShare from Peer { name: 8ddcee(10001101).., addr: 209.97.187.109:12000, connection: Some(Connection { id: 93824998797328, remote_address: 209.97.187.109:12000, .. }) }
 WARN 2021-12-19T01:49:30.027099Z [/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/qp2p-0.27.2/src/connection.rs:L492]:
	 ➤ Stopped listener for incoming bi-streams from 206.189.18.226:12000 due to error: Reset
 WARN 2021-12-19T01:49:30.027111Z [sn/src/routing/core/comm.rs:L480]:
	 ➤ send {recipients=[Peer { name: 0c33bd(00001100).., addr: 167.172.60.21:12000, connection: None }, Peer { name: 35b668(00110101).., addr: 64.227.37.220:12000, connection: None }, Peer { name: 7533ef(01110101).., addr: 64.227.37.214:12000, connection: None }, Peer { name: 797d7f(01111001).., addr: 64.227.35.39:12000, connection: Some(Connection { id: 93824995714384, remote_address: 64.227.35.39:12000, .. }) }, Peer { name: 8ddcee(10001101).., addr: 209.97.187.109:12000, connection: None }, Peer { name: b40378(10110100).., addr: 206.189.18.226:12000, connection: None }, Peer { name: f6f327(11110110).., addr: 64.227.42.158:12000, connection: None }] delivery_group_size=7 wire_msg=WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MessageId(c258..b51b), msg_kind: NodeAuthMsg(NodeAuth { section_pk: PublicKey(10f6..f568), public_key: PublicKey(2e19..6f28), signature: Signature(55c3..210a) }), dst_location: Section { name: 2e1977(00101110).., section_pk: PublicKey(10f6..f568) } } } }}
	 ➤ handle_incoming_messages {connection=Connection { id: 93824998930096, remote_address: 206.189.18.226:12000, .. }}
	 ➤ error on connection with 206.189.18.226:12000: ConnectionLost(Reset)
 INFO 2021-12-19T01:49:30.151850Z [sn/src/routing/core/bootstrap/join.rs:L236]:
	 ➤ join {network_genesis_key=PublicKey(0f13..1f57) target_section_key=PublicKey(0f13..1f57) recipients=[Peer { name: 8af009(10001010).., addr: 64.227.35.39:12000, connection: Some(Connection { id: 93824995714384, remote_address: 64.227.35.39:12000, .. }) }]}
	 ➤ Aggregating received ApprovalShare from Peer { name: b40378(10110100).., addr: 206.189.18.226:12000, connection: Some(Connection { id: 93824995673584, remote_address: 206.189.18.226:12000, .. }) }
Encountered a timeout while trying to join the network. Retrying after 3 minutes.

Anyhow - we move on, we’ve learned another way it stops working.
The node I added had plenty RAM, will you be able to tell how many nodes joined other than yours?

Josh · December 19, 2021, 2:00am

20 droplets 1vcpu-1gb-amd, amd being the premium option which runs on faster/newer CPU’s and NVMe SSD. I think I’ll bump that to 2gb for the next try and see what happens then.

Logs

Thanks for taking part.

Southside · December 19, 2021, 2:05am

No its me that should be thanking you for the opportunity to take part

Keep your billing info so we can work out an informed figure to ask BGF for …

Shu · December 19, 2021, 2:06am

Some charts from the monitoring off this lxc in particular during the time window while testnet was live:

PID Start Time: 00:35

Note: I did also run safe files put -r ./uploads/ which contained 3 512MB files, which eventually failed (probably the network was while it was running).

I don’t know how much that influenced the memory and cpu of the container along with sn_node itself.

Next time I will carry out some more tests from the cli on another container & host vs. the host that the safe node itself is running on.

happybeing · December 19, 2021, 9:46am

My thanks to all of you , I wasn’t able to have a go myself this time but I always follow what’s happening and I think it’s very helpful to the project.

Topic		Replies	Views
OFFLINE Another Quicky (run 2 & 3) Community community-test	118	2051	December 20, 2021
OFFLINE Will it be a Quicky? (run 4) Community community-test	173	3247	December 23, 2021
Community Test 13 November - offline Community	211	2986	November 25, 2021
Community-Test (oct8) Ofline Community	63	1176	October 12, 2021
[Offline] Fleming Testnet v6.1 Release - Node Support Releases	64	3243	June 29, 2021

Offline - Just a quicky (run 1)

Related topics