MaidSafe Dev Update - March 2, 2017 - Test 12c

David would any logs from people’s home vaults be of any use to the team?

At the moment we are Ok, if there were issues we would try and collect some logs, but it all looks good for now.

16 Likes

I had an idea for a simple monotonous (i.e. not absolute) network wide “clock.” The current value of this clock would be sent as one or two extra header fields within the regular SAFE network packets. Nodes would collect the opinion of their close nodes, but they would only distribute what their disjoint section as a whole thinks is the correct time. Basically, disjoint sections would aggregate the time information that is received and filtered by their members, and increase it once it’s plausible that most of the network is already in sync. There would never be a 100% correct value, but it would always be almost sure nobody on the network thinks it’s too much more or too much less than what we ourself think is the current value.

There’s a very badly explained version of this idea somewhere on this forum, but I’m not sure it’s worth reading it. Anyway, I started writing a simulator for it on my day off one or two weeks ago, but I never got to finish it. But it’s coming.

3 Likes

hiya,

late joiner… been travelling so not able to join the fun til today…

running up 0.13.1 now, anything to watch out for?

should appear as normal on 185.16.37.149

cheers

rup

wow…

bit of a struggle to join network, but the vault manfully fought the good fight and is now up and running…

Running safe_vault v0.13.1
==========================
I 17-03-07 12:01:38.018787 Node(f131bf..()) Requesting a relocated name from the network. This can take a while.
I 17-03-07 12:07:38.018438 Node(f131bf..()) Failed to get relocated name from the network, so restarting.
W 17-03-07 12:07:38.018520 Restarting Vault
I 17-03-07 12:07:41.324076 Node(8f2c95..()) Requesting a relocated name from the network. This can take a while.
I 17-03-07 12:13:41.324210 Node(8f2c95..()) Failed to get relocated name from the network, so restarting.
W 17-03-07 12:13:41.324378 Restarting Vault
I 17-03-07 12:13:44.681683 Node(245a17..()) Requesting a relocated name from the network. This can take a while.
I 17-03-07 12:19:44.681757 Node(245a17..()) Failed to get relocated name from the network, so restarting.
W 17-03-07 12:19:44.681951 Restarting Vault
I 17-03-07 12:19:47.965235 Node(559f7e..()) Requesting a relocated name from the network. This can take a while.
I 17-03-07 12:25:47.965360 Node(559f7e..()) Failed to get relocated name from the network, so restarting.
W 17-03-07 12:25:47.965644 Restarting Vault
I 17-03-07 12:25:51.312384 Node(303b40..()) Requesting a relocated name from the network. This can take a while.
I 17-03-07 12:31:51.312549 Node(303b40..()) Failed to get relocated name from the network, so restarting.
W 17-03-07 12:31:51.313049 Restarting Vault
I 17-03-07 12:31:54.590790 Node(a0d5eb..()) Requesting a relocated name from the network. This can take a while.
I 17-03-07 12:37:54.590919 Node(a0d5eb..()) Failed to get relocated name from the network, so restarting.
W 17-03-07 12:37:54.591130 Restarting Vault
I 17-03-07 12:37:57.875846 Node(90cff4..()) Requesting a relocated name from the network. This can take a while.
I 17-03-07 12:43:57.875802 Node(90cff4..()) Failed to get relocated name from the network, so restarting.
W 17-03-07 12:43:57.876066 Restarting Vault
I 17-03-07 12:44:01.287947 Node(5a9b77..()) Requesting a relocated name from the network. This can take a while.
I 17-03-07 12:44:01.853548 Node(d20973..()) Received relocated name. Establishing connections to 15 peers.
I 17-03-07 12:44:02.021376 Node(d20973..()) Starting approval process to test this node's resources. This will take at least 300 seconds.
I 17-03-07 12:44:31.854333 Node(d20973..()) 1/15 resource proof response(s) complete, 59% of data sent. 380/410 seconds remaining.
I 17-03-07 12:45:01.854387 Node(d20973..()) 14/15 resource proof response(s) complete, 95% of data sent. 350/410 seconds remaining.
I 17-03-07 12:45:31.854577 Node(d20973..()) 14/15 resource proof response(s) complete, 96% of data sent. 320/410 seconds remaining.
I 17-03-07 12:46:01.854740 Node(d20973..()) 14/15 resource proof response(s) complete, 98% of data sent. 290/410 seconds remaining.
I 17-03-07 12:46:31.854880 Node(d20973..()) 14/15 resource proof response(s) complete, 99% of data sent. 260/410 seconds remaining.
I 17-03-07 12:47:01.855026 Node(d20973..()) All 15 resource proof responses fully sent. 230/410 seconds remaining.
I 17-03-07 12:47:31.855247 Node(d20973..()) All 15 resource proof responses fully sent. 200/410 seconds remaining.
I 17-03-07 12:48:01.855345 Node(d20973..()) All 15 resource proof responses fully sent. 170/410 seconds remaining.
I 17-03-07 12:48:31.855472 Node(d20973..()) All 15 resource proof responses fully sent. 140/410 seconds remaining.
I 17-03-07 12:49:01.901910 Node(d20973..()) All 15 resource proof responses fully sent. 110/410 seconds remaining.
I 17-03-07 12:49:01.929652 Managing 1 client accounts.
I 17-03-07 12:49:01.981794 Managing 2 client accounts.
I 17-03-07 12:49:01.993868 Managing 3 client accounts.
I 17-03-07 12:49:02.031005 Node(d20973..(110)) Resource proof challenges completed. This node has been approved to join the network!

phew!!

rup

8 Likes

+1 :+1: to this!
No more starting and stopping countless vaults.
Modeling human behavior :wink:

2 Likes

Got a bunch of warnings few hours ago, anything to worry about?

W 17-03-07 12:05:12.984201 Node(48b987..(010)) Not enough signatures in SignedMessage { content: RoutingMessage { src: Section(name: d00000..), dst: ManagedNode(name: 2f2743..), content: RoutingTableResponse { Prefix(1110), {PublicId(name: e42946..), PublicId(name: ee50ad..), PublicId(name: e6d742..), PublicId(name: e254e3..), PublicId(name: e656ac..), PublicId(name: ed6073..), PublicId(name: ec3b9f..), PublicId(name: ee3e2b..), PublicId(name: e17ac6..), PublicId(name: e47727..), PublicId(name: e084d4..)}, MessageId(494b15..) } }, sending nodes: [SectionList { prefix: Prefix(110), pub_ids: {PublicId(name: d39990..)} }], signatures: [PublicId(name: d39990..)] }.
W 17-03-07 12:05:12.986690 Node(48b987..(010)) Not enough signatures in SignedMessage { content: RoutingMessage { src: Section(name: d00000..), dst: ManagedNode(name: 2f2743..), content: RoutingTableResponse { Prefix(1111), {PublicId(name: f02451..), PublicId(name: f43400..), PublicId(name: f826b2..), PublicId(name: f4a53a..), PublicId(name: fa4d6e..), PublicId(name: fb3cec..), PublicId(name: fad72a..)}, MessageId(494b15..) } }, sending nodes: [SectionList { prefix: Prefix(110), pub_ids: {PublicId(name: d39990..)} }], signatures: [PublicId(name: d39990..)] }.
W 17-03-07 12:06:51.731277 Node(48b987..(010)) Not enough signatures in SignedMessage { content: RoutingMessage { src: Section(name: d00000..), dst: ManagedNode(name: 2f2743..), content: RoutingTableResponse { Prefix(010), {PublicId(name: 47e036..), PublicId(name: 430aee..), PublicId(name: 48b987..), PublicId(name: 577fb4..), PublicId(name: 45cc45..), PublicId(name: 54be8c..), PublicId(name: 469cf5..), PublicId(name: 488806..), PublicId(name: 50e368..), PublicId(name: 494625..), PublicId(name: 541e4c..)}, MessageId(8ccbca..) } }, sending nodes: [SectionList { prefix: Prefix(110), pub_ids: {PublicId(name: d39990..)} }], signatures: [PublicId(name: d39990..)] }.
1 Like

I had the same issue on March 3, a few hours after I started my vault. I believe it was caused by some network hiccup that my server had because I was not able to even ssh into it for a period of time (even though websites that I host there were still being served). After I was finally able to ssh into the server is when I noticed the error messages like yours. I eventually restarted the vault, and all has gone perfectly since then.

You can perhaps post your log for somebody to examine. You can do this from the bash command line with https://transfer.sh/ This is what I did. Perhaps between both of our logs, the team may be able to learn something.

To answer your question, I do not see anything bad happening on my vault. It’s happy and showing an estimated network size of 160.

2 Likes

My vault recovered from this by itself - I didn’t restart and haven’t see these warnings for over 6h now.

6 Likes

I hate to put a curse on the good ship 12c but it’s been amazingly solid so far. I have one vault running on Ubuntu in AWS and one on Windows at home which I take offline periodically and there’s not been a murmur of complaint from either. I wonder when it will be time to dial back the entrance criteria? Do we need to wait for datachains or will there be an interim stage for testing with slower nodes?

7 Likes

@mav waiting patiently :wink:… no pressure!

4 Likes

I suddenly have an estimated network size of 76 how can that be if Maidsafe have 100 droplets?

2 Likes

My vault agrees. My Estimated network size is now 74

I just quickly went from “managing 10 client accounts” to “managing 18 client accounts”. Hey, I’m moving up in the world!

4 Likes

steady drop from 150 in a short time.

W 17-03-08 01:22:58.583262 Node(6e2e86…(011)) Not enough signatures in SignedMessage { content: RoutingMessage { src: NaeManager(name: f4870e…), dst: Client { client_name: 4cef4d…, proxy_node_name: 541e4c…, peer_id: PeerId(74a2f837…) }, content: UserMessagePart { 1/1, priority: 3, cacheable: false, d59521e78c949a6 } }, sending nodes: [SectionList { prefix: Prefix(1110), pub_ids: {PublicId(name: e6d742…), PublicId(name: e254e3…), PublicId(name: ed6073…)} }], signatures: [PublicId(name: e6d742…), PublicId(name: e254e3…)] }.

There is a bug in crust, not bad, we can see it and have been tracking this afternoon. Instead of logging a particular error (too many timers) it disconnects the node instead. We are able to sort the symptom easily enough, but are trying to find out why too many timers are established before committing to a fix. At the moment this is causing connection loss, but we are leaving it right now to analyse the network under this loss as it would be a weird attack to be able to do this, but it’s good info.

So not a big worry, however we do need to patch it.

18 Likes

Are we kind of at a place now where we always have a community net going?

Looks like all the TEST 12’s so far have been allowing people to successfully run p2p networks on their own now.

Have we kind of passed the threshold where now there always is a p2p somewhat stable SAFE net running? Would be very cool!

I doubt it will be worth it, this bug needs cleared really. I do not think it will take long to patch, maybe worth letting us do that and restart the testnet.

14 Likes

Fix and explanation of the error in crust that we have suffered. Curious because if the network will be more active the bug would have remained latent. A good demonstration that the live tests are fundamental.

18 Likes

Is there a measure for data retention.
ie. 12c vs pre 12 series vaults from home?

Can’t wait to get home tonight and have a muck around.

2 Likes