First thing to say is that we’re massively happy about how ReplicationNet panned out. It was a bit of a gamble chucking up so many nodes - our biggest testnet yet by some margin - and we were expecting some major wobbles, but it took everything we could throw at it in its stride and without complaint, until full nodes stopped play. Most encouraging of all, this stability was despite some messaging errors around data replication that might have been expected to bring it down. Instead it swatted them away like a fly. Heartfelt thanks once again to everyone who took part , and a special mention to @shu for his fantastic dashboard work .
ReplicationNet - findings and actions
So, having gone through the logs, both those kindly shared by the community and our own, we can report the following.
-
The slowly rising memory issue is almost certainly due to nodes reaching capacity. We do not see this behaviour until a number of nodes get full (1024 chunks in this case). Once the network is operational we shouldn’t see this as new nodes will be incentivised to join.
-
Out-of-memory issues seem to be caused by too much data being stored in cache as the node approaches capacity. (And for that case, we’ve too many nodes on too small a machine it seems). That’s not a bug per se, libp2p should disperse that cache and data would be stored as more nodes joined.
-
We’ve identified and squashed a bug whereby data replication was causing connection closures, and consequently a lot of dropped messages around replication. This is something likely to spell doom, and it’s a testament to the underlying stability of the network that it had such little impact.
-
Another bug fix was to do with
Outbound Failure
errors. -
Data distribution across nodes is pretty uniform. Again, great news because we can use percentage space used as a trigger for reward pricing as planned. The issue of some nodes not filling up is a bug, likely something to do with new nodes not promoting themselves into others’ routing tables strongly enough.
-
There are a few anomalies in the logs where put requests and chunks stored metrics don’t seem to match up. We need to work on clarifying those.
-
To give users with lower bandwidth more control, we’ve added the ability for the client to set the timeout duration for requests and responses from the network. We’ve also increased the default timeout duration from 10 to 30 seconds.
-
We’re now thinking about payment flows and rewards for the different scenarios: new data, replicating data and republishing data (where valid data has been lost for whatever reason)
The next testnet will help us test these suppositions and fixes, as well as validating some work around user experience.
General progress
All eyes are now on DBCs, with @bochaco and @Anselme working on securing the verifying the payment process for storing chunks, including checking the parents of the payment DBC are spent, and ensuring their reason
-hash matches against the payment proof info provided for each chunk. Anselme has fixed a flaw whereby the faucet and wallet were not marking DBCs as spent. Turned out this had to do with synchronous activity by the checking nodes causing a read-write lock, whereas we need it to be async.
Similarly, @roland is eliminating a deadlock in PUT and GET operations to ensure they can be performed - and paid for - concurrently. Parallelisation is the name of the game. He’s also ensuring our data validations occur regardless of when the data comes in to a node, preventing some “sideloading” of data via libp2p
/kad
protocols (which would essentially have allowed free data).
@bzee is still tinkering with the innards of libp2p
, currently tweaking the initial dialling of the bootstrap peers.
@Joshuef and @qi_ma have been mainly working through the findings of the last testnet and fixing as they go.
@chriso has been hard at work getting safeup
updated, more on that soon.
And @aed900 has competed a testnet launch tool to automate the creation of testnets on AWS and Digital Ocean.
Onwards!
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ; German ; Spanish ; French; Bulgarian
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!