This week we’ve been looking at the results of the testnet and working on fixes for bugs. The first thing to note is that, to make debugging easier, the testnet was deliberately unforgiving, with a chunk having to be replicated to all eight close group nodes to be considered valid. That said, it did unearth some strangeness around missing chunks which @qi_ma and @joshuef have been digging into.
Downloading large files was occasionally failing due to one or more chunks being missing from the download. Thanks to all who reported this. One reason we suspect for this is to do with caching. When fetching a record from Kademlia we have some choices. Quorum::One
means we take the first reply we get, Quorum::All
means we wait for all answers to come in and check that they match. Since Chunks are self verifiable (content addressed) one answer should be enough as we can check its validity on the spot.
However, it seems that Kademlia caching, which should cache chunks at closer nodes when using Quorum::One
, does not work in quite the way we thought… It appears to only ensure one node holds the data, only (as opposed to still ensuring it goes to all data-holders, even though we only require one copy back). So we’re disabling that for now and reverting to Quorum::All
to see how we get on.
Incorrect store costs are another possible reason for missing chunks. Sometimes the client requests store costs from nodes different from those that end up storing the chunks. This results in the storing nodes not being paid. We are also seeing some replication failures so we’re looking into those too. (This may well be related to the Quorum
/ caching work above!)
To make life easier, we’ve made it so that if one chunk fails to download, the whole process stops with a MissingChunk
error message, rather than waiting 'til the end. We’re also improving logging to debug each batch of uploads and downloads. And since logs are providing valuable debugging info, we’re now logging output from clients and nodes by default. The logs are quite verbose, so be aware small instances will likely fill up quicker.
And we have added hardcoded bootstrap peers into the node and client code, so no more having to set the SN_PEERS
variable.
Batch-size
and concurrency
seem to have some effect, with larger batch sizes significantly speeding up downloads and larger concurrency settings, up to 40 or so, doing the same. We’re now experimenting with the effects on performance of reducing the Close Group size from 8 to 5, which should bring quicker downloads and lower memory usage.
Thanks as usual to everyone who got stuck in and put the thing through its paces. As a reward, you got to enjoy the fun of living in a hyper-inflationary environment, plus one or two folks got extremely SNT-rich. Don’t spend it all at once guys.
CashNotes
CashNotes are the new name for DBCs that better reflects the way payments are actually made. The underlying code for it has not changed here.
Essentially, they are a local representation of tokens in a wallet. They can be spent on the network in exchange for new ones of the same value (total tx input/output value) by the recipients.
CashNotes are created in transactions and assigned to derived public keys. The derived keys are created from the recipient’s public key plus a random index. A different derived key is used for each transaction making each CashNote unique and unlinkable to the owner’s original public key.
The recipient needs the secret random index along with the parent transaction information to generate the corresponding derived private key in order to redeem the CashNote to receive payments.
General progress
@joshuef and @qi_ma have been the main team members involved with looking at incorrect store costs, failed replication and missing chunks. Josh raised a PR to fail fast as soon as chunks are missing during download, and temporarily removed Kademlia caching and switched back to Quorum::All to address the missing chunks issue.
As well as helping with the debugging, Qi continues to research libp2p
caching to understand it better, and studied the GossipSub
pub/sub implementation that @bochaco is working on too. That is now at the testing stage, and they are tracing how messages propagate between nodes. A little way to go yet on that. @bochaco has also been working on rewards notifications in the node.
@Anselme cleaned up unused code in the sn_transfers crate, minimised security risk by making modules private, and rewrote benchmarks using high-level transfer APIs to handle this change.
@bzee is working on reusing transfers when node costs change. When the storing node increases its price between the time the client requests a store and when it makes the payment, that payment is insufficient. Rather than having to start again, we want to retry with the original CashNote, then if that fails again top it up with an additional CashNote, which is quicker.
@Chriso Added support for versioned binaries in the automated testnet deploy tool and has been working on getting that running, alongside some nice improvements to the safe
ux.
@roland also worked on fixes in response to the testnet findings, and @dirvine reduced Close Group size from 8 to 5 to improve performance. David has also been doing some further thinking on a secure upgrade mechanism for Safe.
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ;
German ;
Spanish ;
French;
Bulgarian
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!