In what’s been a good week all round for tracking down niggling anomalies and bugs, we ask @joshuef to explain how we are tackling issues due to message serialisation and the stress that puts on the network. Something of a this one …
General progress
@davidrusu has been tidying up an issue where adults were skipping Anti-Entropy probes before splits, so they did not always have the necessary up-to-date information about the sections.
Meanwhile, @bochaco is working through a list of improvements with respect to how we store registers (mutable data - CRDTs), including changing some internal store APIs to avoid cloning some objects, writing individual operations (instead as one file which could be overwritten by another incomplete CRDT), removing some unused storage error types and adding more contextual information to others, and allowing storage of register edit commands for all cases (now order of commands coming in should be less important, i.e., there’s no race condition to have a CreateRegister
Cmd
before an EditRegister
, which there is on main
).
And @chriso has been improving error handling in sn_node
including the sort of error messages that get sent to the client, to make it easier for users to see what’s going on.
Message serialisation
We’ve seen from @southside’s probing that the network seems to come under stress with larger data PUTs. His tests of 2GB+ uploads have shone a light on an issue there that mayyy be part of the reason… and it could just be too many messages.
That doesn’t mean that we’re sending too many (although we could be sending less). But it looks like the strain of forming messages, and the rate at which we do that, is too damn high.
This has been something we’ve seen in heaptraces of the node’s memory use for a while, but a path forward to fix wasn’t really clear.
We serialise
each message to bytes, and given that we need to make a different MsgHeader
for each node, there wasn’t much way around that.
But what if we didn’t?
@southside’s poking brought the question to the fore again, though, and @joshuef who’s been annoyed at the amount of mem used by serialisation for a while now, decided to bang his head against it again.
And this time another idea came up. We’d tried to remove the need for Dst
(destination, information on where the message is intended to go) previously… but we can’t do that and keep our Anti-Entropy
flows alive. So that was a non-starter.
But after some hacky attempts to update the Dst
Bytes
in a pre-serialised message, to avoid doing all that work again, we realised that we were forcing a square peg into a round hole. Namely, the limitation of providing only one Bytes
of a message did not actually make sense for us.
SOooOooooo…
So after a bit of a refactor in qp2p
, our network wrapper, we can now send three different sets of Bytes
over our connections, and of these, only one (our Dst
) actually needs to change if we’re sending the same message to different nodes!
This means instead of 7x the work when sending a message to section elders, it’s 1x - and we reuse the Bytes
for our MsgHeader
and Payload
! Only needing to re-encode Dst
each time.
Neat.
But wait… there’s more!
Now, this is a decent reduction in the computational cost of sending a message already. But it also has another knock-on in terms of memory. Previously during MsgHeader
serialisation we formed our one set of Bytes
by copying the payload
(the actual message we’re sending… so ~1MB per chunk), so this is some memory allocation work, and it means each message had its own unique set of Bytes
representing the same payload
. So sending one chunk to four adults would have five copies of that chunk in memory.
But now we use a cheap copy of Bytes
(which is a pointer type to the underlying data…) so no duplication of memory is needed! So sending one chunk to four adults should only need the one copy of the data now
En fin
Here’s what main
looks like. Here we see three runs of the 250 client tests (one PUT of 5MB and 250 clients concurrently trying to get that data), and then 10 runs of the full standard sn_client
test suite.
And this is on the pending PR:
You can see that the peaks for these tests seem to top out faster, and with less overall mem (~900MB vs 1,800MB). And with that our new benchmark measure throughput of sending one WireMsg
to 1,000 different Dsts
.
- main throughput: 7.5792 MiB/s
- PR throughput: 265.25 MiB/s
Which is pretty pleasing too.
The branch is not yet merged, there’s some final tidying to be done before we get it in, but it is looking like a promising change that may help nodes run on leaner hardware (or hopefully let @southside upload more on his local testnets!? )
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ; German ; Spanish ; French; Bulgarian
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!