"community1" Test Network Has Restarted, Join Us!

davidpbrown · June 26, 2016, 6:44pm

Surely it only needs restarting if it falls over?.. which is what I was suggesting above:

while true; do safe_vault; sleep 2; done

Restart I wonder is what saw my vaults disconnect from the network and then stop dead.

bluebird · June 26, 2016, 7:05pm

But it fairly consistently fails, and I can’t be watching it; it needs some program of operation that gives a baseline of recovery.

Southside · June 26, 2016, 7:28pm

Weirdness, an hour ago my vaults were at least rtrying to connect, then they bombed out almost immediately and now they try and then bomb out…

willie@leonov:~/projects/maidsafe/community/community1-amd64-latest$ ./safe_vault
INFO 19:40:48.354825041 [safe_vault safe_vault.rs:96]

Running safe_vault v0.9.0

INFO 19:40:50.561084502 [routing::core core.rs:1171] Bootstrapping(PeerId(2455…), 0)(8d5d81…) Connection failed: Proxy node needs a larger routing table to accept clients.
INFO 19:40:51.824185599 [routing::core core.rs:1032] Client(8d5d81…) Running listener.
INFO 19:40:51.883811981 [routing::core core.rs:1555] Client(8d5d81…) Sending GetNodeName request with: PublicId(name: 8d5d81…). This can take a while.
INFO 19:40:51.986922194 [routing::core core.rs:1371] Client(bf3c15…) Added ac0af4… to routing table.
INFO 19:40:51.989781962 [routing::core core.rs:411] ---------------------------------------------------------
INFO 19:40:51.989815750 [routing::core core.rs:413] | Node(bf3c15…) PeerId(06b9…) - Routing Table size: 1 |
INFO 19:40:51.989843921 [routing::core core.rs:414] ---------------------------------------------------------
INFO 19:42:30.389659521 [routing::core core.rs:2340] Node(bf3c15…) Dropped ac0af4… from the routing table.
INFO 19:42:30.389705985 [routing::core core.rs:411] ---------------------------------------------------------
INFO 19:42:30.389715042 [routing::core core.rs:413] | Node(bf3c15…) PeerId(06b9…) - Routing Table size: 0 |
INFO 19:42:30.389724106 [routing::core core.rs:414] ---------------------------------------------------------
WARN 19:42:30.389886608 [safe_vault::vault vault.rs:171] Restarting Vault
willie@leonov:~/projects/maidsafe/community/community1-amd64-latest$ ./safe_vault
INFO 19:43:33.465241550 [safe_vault safe_vault.rs:96]

Running safe_vault v0.9.0

INFO 19:43:35.734560768 [routing::core core.rs:1032] Client(02adf6…) Running listener.
INFO 19:43:35.857563868 [routing::core core.rs:1555] Client(02adf6…) Sending GetNodeName request with: PublicId(name: 02adf6…). This can take a while.
INFO 19:44:35.857465847 [routing::core core.rs:1807] Client(02adf6…) Failed to get GetNodeName response.
WARN 19:44:35.857614587 [safe_vault::vault vault.rs:171] Restarting Vault
INFO 19:44:38.131931967 [routing::core core.rs:1032] Client(fb6798…) Running listener.
INFO 19:44:38.191221070 [routing::core core.rs:1555] Client(fb6798…) Sending GetNodeName request with: PublicId(name: fb6798…). This can take a while.
INFO 19:44:38.236749852 [routing::core core.rs:1371] Client(941cec…) Added f1b4e0… to routing table.
INFO 19:44:38.238031543 [routing::core core.rs:411] ---------------------------------------------------------
INFO 19:44:38.238116380 [routing::core core.rs:413] | Node(941cec…) PeerId(3690…) - Routing Table size: 1 |
INFO 19:44:38.238140699 [routing::core core.rs:414] ---------------------------------------------------------
INFO 19:44:46.856411702 [routing::core core.rs:1371] Node(941cec…) Added 94df6a… to routing table.
INFO 19:44:46.856518520 [routing::core core.rs:411] ---------------------------------------------------------
INFO 19:44:46.856554682 [routing::core core.rs:413] | Node(941cec…) PeerId(3690…) - Routing Table size: 2 |
INFO 19:44:46.856579423 [routing::core core.rs:414] ---------------------------------------------------------
INFO 19:46:30.503030553 [routing::core core.rs:2340] Node(941cec…) Dropped f1b4e0… from the routing table.
INFO 19:46:30.503139274 [routing::core core.rs:411] ---------------------------------------------------------
INFO 19:46:30.503170877 [routing::core core.rs:413] | Node(941cec…) PeerId(3690…) - Routing Table size: 1 |
INFO 19:46:30.503188719 [routing::core core.rs:414] ---------------------------------------------------------
WARN 19:46:30.503364138 [safe_vault::vault vault.rs:171] Restarting Vault
willie@leonov:~/projects/maidsafe/community/community1-amd64-latest$ ./safe_vault
INFO 19:50:30.557848087 [safe_vault safe_vault.rs:96]

Running safe_vault v0.9.0

willie@leonov:~/projects/maidsafe/community/community1-amd64-latest$ ./safe_vault
INFO 20:25:20.674644176 [safe_vault safe_vault.rs:96]

Running safe_vault v0.9.0

INFO 20:25:22.942460243 [routing::core core.rs:1032] Client(9a18be…) Running listener.
INFO 20:25:23.001196356 [routing::core core.rs:1555] Client(9a18be…) Sending GetNodeName request with: PublicId(name: 9a18be…). This can take a while.
INFO 20:25:23.446142475 [routing::core core.rs:1371] Client(837657…) Added 6a0c93… to routing table.
INFO 20:25:23.447839885 [routing::core core.rs:411] ---------------------------------------------------------
INFO 20:25:23.447874349 [routing::core core.rs:413] | Node(837657…) PeerId(efcf…) - Routing Table size: 1 |
INFO 20:25:23.447900466 [routing::core core.rs:414] ---------------------------------------------------------
INFO 20:25:57.752707152 [routing::core core.rs:2340] Node(837657…) Dropped 6a0c93… from the routing table.
INFO 20:25:57.752818949 [routing::core core.rs:411] ---------------------------------------------------------
INFO 20:25:57.752844721 [routing::core core.rs:413] | Node(837657…) PeerId(efcf…) - Routing Table size: 0 |
INFO 20:25:57.752879904 [routing::core core.rs:414] ---------------------------------------------------------
WARN 20:25:57.753123163 [safe_vault::vault vault.rs:171] Restarting Vault
willie@leonov:~/projects/maidsafe/community/community1-amd64-latest$

Apologies for the flood.

davidpbrown · June 26, 2016, 7:35pm

If @bluebird is restarting every two minutes, then it’s to be expected it’ll be hit and miss. I don’t understand why the seed node going offline would kill the network though… I would have expected that if it’s a prompt restart then it wouldn’t matter.

If the network while small cannot maintain connections beyond the seed, then perhaps that suggests it needs more nodes initially to truly start a stable network. Would be interesting to know if the seed going offline, gives a different kind of error.

The other solution I guess would be to have more seed nodes… but surely in theory only one good seed is required. I expect bluebird’s upload bandwidth isn’t that limiting that it can be an issue here… a handful of nodes booting surely isn’t a heavy load.

bluebird · June 26, 2016, 7:46pm

I’m open to ideas. Should I lengthen the restart period?

The other hard-coded contact has been taken offline, so we could use some more.

EDIT: The seed vault (started with --first) is only significant because it is the only hard-coded contact right now. If I don’t have that flag then we know from previous test, that it exits. The first flag guarantees that it sits and listens. More hard-coded contacts (or caching) is required to have a stable network.

dirvine · June 26, 2016, 7:50pm

AFAIK this is all using master, so expect non network tested scenario

There is a limit to joining nodes on any node.

Each node starts as a client type then tries to promote itslef (asks the network to do that).

If many nodes try and connect to a single bootstrap node they will fail and try the next bootstrap node to connect (to prevent connection flood attacks etc.). Restarting the seed node during initial start will kill the network as they fall below group size and try to restart, and there is nothing to restart to. You want bootstrap nodes to all start on the same port they were initially on if they do restart (otherwise you need new config files).

I would caution against making changes to the code, but of course please do play around and do that, but it’s a bit of work to keep up with changes and the devs are changing a lot fast. So play at your own risk Hopefully as we progress there will be more help. Just now we are disabling parts to test others and these networks are running disabled to an extent.

The best way to start a network would be start a single seed node with --first. Then start another few (say 10) nodes all directly connectable. Have all these nodes IP:PORT in the bootstrap list of all nodes.

Then you should be OK.

Southside · June 26, 2016, 7:53pm

I’m deep in network/firewall reconfiguration right now. When I get that sorted I can give you a static IP for a few days/weeks for another seed vault at the end of half decent fibre…
But that is unlikely before tomorrow night.

Need to get the main job done right first.

bluebird · June 26, 2016, 7:57pm

Just to note: There should only be one vault with the --first flag, and its only purpose is to get it started when there are no others.

davidpbrown · June 26, 2016, 8:00pm

Perhaps in the config you could list a number, each with their own port on your IP… so start up to 10, if your bandwidth can handle that, and then the too-many-th node, cascades to the next in line until the network has moved beyond the seeds. Something like that?..

What I suggested above as do while loop, only restarted on fail… when the process ends, it loops and restarts. A set time for restart, is asking for instability… especially if that is the only seed.

bluebird · June 26, 2016, 8:03pm

What distro? For vanilla Debian, here are my notes on making firewall rules persistent:

   $ sudo apt install iptables-persistent

That creates /etc/iptables/rules.v4 and rules.v6.

Then create /etc/network/if-pre-up.d/iptables and make it executable.

That file should contain:

#!/bin/sh
iptables-restore < /etc/iptables/rules.v4

That reloads the rules on each reboot.

You can then edit the rules.v4. It’s pretty self-explanatory.

bluebird · June 26, 2016, 8:05pm

The process doesn’t end. It simply becomes non-responsive.

davidpbrown · June 26, 2016, 8:10pm

or just busy?.. you did seem to do a restart when it was working. I don’t know if it necessarily declares all the work it’s doing.

You might be right but if the process is eating CPU and RAM then I would wonder it’s likely just maxed out.

bluebird · June 26, 2016, 8:12pm

@dirvine thanks for your input. That scenario describes how I got the first community1 going. Although the client nodes were actually on a LAN but i faked direct connection by redirection of a range of ports and configuring each client node to listen on a different port. So now you know.

bluebird · June 26, 2016, 8:14pm

It probably wasn’t doing anything. What’s to do with a network of less than ten vaults? When we have another hard-contact I’ll go back to non-cycling.

davidpbrown · June 26, 2016, 8:19pm

bug finding… found one odd error already above.

If there’s a place for finding issues, it’s at the very limit of what is possible.

Built it and they will come… once there is a working something, more people will be inclined to try something in the absence of another. The World without any SAFE network doesn’t feel right

bluebird · June 26, 2016, 8:58pm

I was responding to your statement that it was busy. I meant, what’s for the seed vault to do that it would be so busy as to be unresponsive, on such a small network.

I know the small network itself is a very valuable resource, which is why I’m doing this.

davidpbrown · June 26, 2016, 9:27pm

So, for fun, I just tried to boot a network myself, to a point that launcher doesn’t suggest proxy needs a larger table… and in frustration ended up starting 16!.. all because I didn’t think to also update the launcher config too! Given the surprisingly low CPU usage, I’m letting them all run, expecting I might pair them down if they do not just fall down under overload.

I’ve added my IP to the config then, if you’re still having trouble, it’ll be interesting to see if my contribution helps.

{ "hard_coded_contacts": [ "91.121.173.204:5483", "185.16.37.156:5483", "109.147.231.101:5483" ], "tcp_acceptor_port": 5496, "service_discovery_port": null, "bootstrap_cache_name": null, "network_name": "community1" }

davidpbrown · June 26, 2016, 9:30pm

Do you have a different port for each instance?.. Initially mine fell down for trying to just have a copy folder, without updating the tcp_acceptor_port in the vault config.

davidpbrown · June 26, 2016, 9:40pm

=>
http://odyssey.safenet

bluebird · June 26, 2016, 9:45pm

On this iteration of community1 (i.e., this weekend, and last weekend as well) my vaults all have exactly the same config file.

I added your IP (on port 5483) to the config of the seed vault and my local vaults.

I am running five local vaults. Two have tables of 17-19, one has a table of 2, and the other two are saying it will take a while.

I have also turned off the restarts.

Topic		Replies	Views
"community1" Test Network is Alive, Join Us! Development test_network	152	9347	June 11, 2016
Join the NEW Community Test Network (Running Now): "community1" Development	244	11040	May 21, 2016
Community-run Testnet Info! Development	54	4956	May 13, 2016
User run network based on test 12b binaries Community	466	13933	February 24, 2017
SAFE Network - TEST 2 - Update (7th May 12:15 BST) Now complete Updates	203	11300	May 21, 2016

"community1" Test Network Has Restarted, Join Us!

Running safe_vault v0.9.0

Running safe_vault v0.9.0

Running safe_vault v0.9.0

Running safe_vault v0.9.0

Related topics