MaidSafe Dev Update :safe: 17th May 2016 - TEST 3 - Update (20th May 13:30 BST) Now Complete

I had two segmentation faults from vaults over the last 24hrs and one other continued apparently ok but that was pushing the limit on CPUs.

Currently I can only been able to login to a one of my old accounts - not reliably; can create new ones but eitherway the DNS lookup today seems weaker and I could only do a few <10 before the launcher stalls. Now I get nowhere, so I don’t know if that is fluxing good and bad. Some complaint about refused connection to bootstrap peer I think is new.

So, I’ve not been able to check all urls from the list above but did about half before not being able to do more… and can only see one that is still alive is http://Token.safenet I don’t know if that’s an effect of that link being more popular than others or that it was created later but well done for surviving.

I wonder the effect of bringing new vaults online and closing old ones… but if the bootstrap peers are not available now, perhaps there’s not much that can be done now??

Hi All,

INFO 14:48:39.537964677 [safe_vault::personas::data_manager data_manager.rs:369] Stats : Client Get requests received 40 ; Data stored - ID 71 - SD 16 - total 20601277 bytes
DEBUG 14:48:45.859070733 [routing::core core.rs:519] Node(37cd..) - RefusedFromRoutingTable
INFO 14:50:29.034559315 [routing::stats stats.rs:127] Stats - Sent 8500 messages in total, 49 uncategorised
INFO 14:50:29.036085892 [routing::stats stats.rs:130] Stats - Direct - NodeIdentify: 64, NewNode: 507, ConnectionUnneeded: 64
INFO 14:50:29.036102986 [routing::stats stats.rs:134] Stats - Requests - Get: 283, Put: 155, Post: 13, Delete: 0, GetNetworkName: 21, ExpectCloseNode: 197, GetCloseGroup: 600, Refresh: 2107, Connect: 151, ConnectionInfo: 1665, GetPublicId: 71, GetPublicIdWithConnectionInfo: 0
INFO 14:50:29.036110508 [routing::stats stats.rs:149] Stats - Responses - GetSuccess: 974, GetFailure: 52, PutSuccess: 109, PutFailure: 7, PostSuccess: 227, PostFailure: 1, DeleteSuccess: 0, DeleteFailure: 0, GetCloseGroup: 905, GetPublicId: 243, GetPublicIdWithConnectionInfo: 0, GetNetworkName: 35
DEBUG 14:52:03.738738857 [routing::core core.rs:1403] Received ConnectionUnneeded from PeerId(4d24..).
INFO 14:52:45.067436297 [safe_vault::personas::data_manager data_manager.rs:283] Stats : Client Get requests received 41 ; Data stored - ID 71 - SD 16 - total 20601277 bytes

I’m assuming this looks about right?

There don’t seem to be too many grumbles or problems…

Rup

1 Like

looks just fine, you should see some Routing Table Size INFO ones at times as well.

A bit like:

NFO 15:17:02.893778559 [routing::core core.rs:414]  ------------------------------------------------------- 
INFO 15:17:02.893787199 [routing::core core.rs:416] | Node(5105..) PeerId(a666..) - Routing Table size:  54 |
INFO 15:17:02.893791727 [routing::core core.rs:417]  ------------------------------------------------------- 

Rup

2 Likes

Something happened in the last ten minutes where my log is filled with messages saying it can’t find other nodes, then it added thirty of so new nodes to the routing table.

EDIT: Just a speculation: Maidsafe might be taking down their droplets, and we’ll get an announcement to that effect.

EDIT1: The reason I say that is the lines like this, from earlier on:

Stats - 45 connections to 42 peers - direct: 11, punched: 34

Compared to now:

Stats - 57 connections to 53 peers - direct: 8, punched: 49

“Punched” connections would be home/office computers, on NAT, i.e., community nodes, while “direct” would be machines in the cloud. The ratio has gone way down, from 1:3 to 1:6, suggesting that cloud machines are disappearing.

EDIT2: Now the ratio has suddenly gone back up:

Stats - 55 connections to 50 peers - direct: 19, punched: 36

So maybe they’re bouncing their network instead of taking it down permanently. This is on my list of numbers to plot.

3 Likes

I just started a new Vault and couldn’t connect. Seems like the Droplets are offline.

Well, my data has gone. I can’t even enter in my account anymore. It seems that the test will be over soon.

1 Like

What I’ve learnt this test is simply that putting down the firewall and have router portforward seems to clear up most of the log clutter and it’s a lot more INFO and almost no DEBUG or WARN. I wondered that the network was quite this afternoon and more activity would perhaps have spawned more information than the simple lots of routing table summaries that I saw. I’m also more inclinded to the thought that less is more. Having seen segmentation faults on vaults left running for ~5hours+, think I pushed the hardware too hard, especially for the churn and grit in the system on this pass. Next test I hold is stable enough to warrant a community testnet on the back of it. This one I think is designed to fail as per OP, so not much more to be learned. Hoping the devs got something useful from it. I’m still unsure what debug information is useful, if any… I don’t know if the sum of logs would show anything more interesting and I can’t make much sense of detail that I can see. /walloftext!

1 Like

i opened 5483 tcp and things improved nicely, all traffic outgoing is allowed, not sure if i need to open anything else up incoming?

i get a combination of info and debug messages with next to zero warn’s this being the only one i see:

WARN 20:46:15.749587904 [routing::core core.rs:1539] Accepting connection anyway, since node_id_cache is disabled.

though as its ignored it doesn’t seem to be a big issue.

does anyone know how long this test will run?

i’m happy to leave vault running, just interested…

rup

Alternate community net is up now… see the other thread for the IP and network name to put to the crust config file in safe_vault folder.

Should there not be an official announcement regarding the end of test 3 before moving to a community network?

3 Likes

People will get different experiences from them… decentralise the decision making I say… let the People decide :slight_smile:

1 Like

All works for me now, except the private file uploading.

1 Like

Probably expected behavior, but seeing a lot of the following error codes on Win7

10048 - Address already in use.

Typically, only one usage of each socket address (protocol/IP address/port) is permitted. This error occurs if an application attempts to bind a socket to an IP address/port that has already been used for an existing socket, or a socket that was not closed properly, or one that is still in the process of closing.

10060 - Connection timed out.

A connection attempt failed because the connected party did not properly respond after a period of time, or the established connection failed because the connected host has failed to respond.

10061 - Connection refused.

No connection could be made because the target computer actively refused it. This usually results from trying to connect to a service that is inactive on the foreign host—that is, one with no server application running.

2 Likes

I can connect but have been unable to login or register for two or so hours on testnet3.

Is anyone finding this still working?

Can you give us an update on status @Ross?

I need a functioning Launcher to test with :slight_smile: and would like to avoid building a mock network if possible.

Just tried it.

  • Vault won’t come up. Gets stuck in “Received Event::GetNetworkNameFailed. Restarting Vault”
  • Launcher eventually ends up in “Registration Failed!”
  • Launcher claims it’s “Connected to the SAFE Network OKAY”, which feels like a lie.
2 Likes

Vault appears to be idle, got some statistics from where it paused:

INFO 09:05:03.652143800 [routing::stats stats.rs:127] Stats - Sent 174000 messages in total, 28316 uncategorised

INFO 09:05:03.653143800 [routing::stats stats.rs:130] Stats - Direct - NodeIdentify: 194, NewNode: 759, ConnectionUnneeded: 112

INFO 09:05:03.653143800 [routing::stats stats.rs:134] Stats - Requests - Get: 1290, Put: 1951, Post: 58, Delete: 0, GetNetworkName: 2536, ExpectCloseNode: 32032, GetCloseGroup: 10012, Refresh: 24050, Connect: 1301, ConnectionInfo: 45526, GetPublicId: 543, GetPublicIdWithConnectionInfo: 0

INFO 09:05:03.653143800 [routing::stats stats.rs:149] Stats - Responses - GetSuccess: 1907, GetFailure: 5, PutSuccess: 2080, PutFailure: 45, PostSuccess: 369, PostFailure: 48, DeleteSuccess: 0, DeleteFailure: 0, GetCloseGroup: 17374, GetPublicId: 1803, GetPublicIdWithConnectionInfo:0, GetNetworkName: 1689


Last entries:

WARN 09:15:28.024855900 [routing::core core.rs:1717] Tunnel to PeerId(b63e…) via PeerId(d5ba…) closed.
INFO 09:15:28.024855900 [routing::core core.rs:399] Node(f978…) - Indirect connections: 11, tunneling for: 2

1 Like

@Ross do you need the test participants to send their vaults logs once Test 3 is finished, or do you have all you need on your side (maybe you already collect them automatically or use other means to get the same info) ?

3 Likes

I will speak to the Dev team this morning to get the latest status and post an update here.

11 Likes

I just tried connecting a few Vaults on a few different machines (various OS) and was successful and in this test phase this is the main area we are focused on, so we will continue running TEST 3 and look at bringing it down later today.

We do not need the full logs but it would be useful if folks searched through their Node.log for Insufficient upstream bandwidth and shared the average number of Dropped messages their node recorded.

The line looks like the one below and for example in the log I was checking on a Vault that I had running since Wednesday - I had 16 occurrences of this line and the average number of Dropped messages on this node was 10.

DEBUG 00:06:04.778869455 [crust::tcp_connections tcp_connections.rs:115] Insufficient upstream bandwidth. Dropped 14 messages with priority >= 2.

If your number is particularly high the Devs may want to follow up and get a copy of your full log.

Thanks Ross

6 Likes