We found a couple of issues (this is what tests are for). It seems that the routing refactor (which is ongoing) has probably caused this issue. What we have seen is requests with no reply, which points to routing or at least the network. We will now confirm this specifically. To resolve that, what we will do immediately is
1: We will stop this current TEST7 network for now.
2: Backport routing to latest stable + a few cherry picked fixes.
3. Randomise the bootstraping of clients. They were ddosing the seed nodes, basically as all nodes were bootstrapping from only the first few nodes, which was a bug.
So we will keep all the client code as is (it will need recompiled with crust/routing though) and continue to test the client code (safe_core/launcher/demo_app). I am happy about that as those guys worked through the night for this test to happen. They deserved their sleep.
So we will restart this network as TEST7b and not give out vaults initially. This will allow the test of all the new client code to continue. We will then issue vaults after a short time to just check this was a routing issue (With lots of smaller nodes).
As this happens it lets the team, who are mostly working in routing refactor to continue and get that completed and allow us to get the routing table changes and data chains in place much sooner. This is something we do not want to push back.
Life in the fast lane at the bleeding edge Seriously though we said we will roll out as fast and often as possible and we will continue to. It’s critical to success.
Thanks again folks , hopefully you will get the clients back today.