Update 01 September, 2022

In what’s been a good week all round for tracking down niggling anomalies and bugs, we ask @joshuef to explain how we are tackling issues due to message serialisation and the stress that puts on the network. Something of a :bulb: this one …

General progress

@davidrusu has been tidying up an issue where adults were skipping Anti-Entropy probes before splits, so they did not always have the necessary up-to-date information about the sections.

Meanwhile, @bochaco is working through a list of improvements with respect to how we store registers (mutable data - CRDTs), including changing some internal store APIs to avoid cloning some objects, writing individual operations (instead as one file which could be overwritten by another incomplete CRDT), removing some unused storage error types and adding more contextual information to others, and allowing storage of register edit commands for all cases (now order of commands coming in should be less important, i.e., there’s no race condition to have a CreateRegister Cmd before an EditRegister, which there is on main).

And @chriso has been improving error handling in sn_node including the sort of error messages that get sent to the client, to make it easier for users to see what’s going on.

Message serialisation

We’ve seen from @southside’s probing that the network seems to come under stress with larger data PUTs. His tests of 2GB+ uploads have shone a light on an issue there that mayyy be part of the reason… and it could just be too many messages.

That doesn’t mean that we’re sending too many (although we could be sending less). But it looks like the strain of forming messages, and the rate at which we do that, is too damn high.

This has been something we’ve seen in heaptraces of the node’s memory use for a while, but a path forward to fix wasn’t really clear.

We serialise each message to bytes, and given that we need to make a different MsgHeader for each node, there wasn’t much way around that.

But what if we didn’t?

@southside’s poking brought the question to the fore again, though, and @joshuef who’s been annoyed at the amount of mem used by serialisation for a while now, decided to bang his head against it again.

And this time another idea came up. We’d tried to remove the need for Dst (destination, information on where the message is intended to go) previously… but we can’t do that and keep our Anti-Entropy flows alive. So that was a non-starter.

But after some hacky attempts to update the Dst Bytes in a pre-serialised message, to avoid doing all that work again, we realised that we were forcing a square peg into a round hole. Namely, the limitation of providing only one Bytes of a message did not actually make sense for us.

SOooOooooo…

So after a bit of a refactor in qp2p, our network wrapper, we can now send three different sets of Bytes over our connections, and of these, only one (our Dst) actually needs to change if we’re sending the same message to different nodes!

This means instead of 7x the work when sending a message to section elders, it’s 1x - and we reuse the Bytes for our MsgHeader and Payload! Only needing to re-encode Dst each time.

Neat.

But wait… there’s more!

Now, this is a decent reduction in the computational cost of sending a message already. But it also has another knock-on in terms of memory. Previously during MsgHeader serialisation we formed our one set of Bytes by copying the payload (the actual message we’re sending… so ~1MB per chunk), so this is some memory allocation work, and it means each message had its own unique set of Bytes representing the same payload. So sending one chunk to four adults would have five copies of that chunk in memory. :frowning:

But now we use a cheap copy of Bytes (which is a pointer type to the underlying data…) so no duplication of memory is needed! So sending one chunk to four adults should only need the one copy of the data now :tada:

En fin

Here’s what main looks like. Here we see three runs of the 250 client tests (one PUT of 5MB and 250 clients concurrently trying to get that data), and then 10 runs of the full standard sn_client test suite.

And this is on the pending PR:

You can see that the peaks for these tests seem to top out faster, and with less overall mem (~900MB vs 1,800MB). And with that our new benchmark measure throughput of sending one WireMsg to 1,000 different Dsts.

  • main throughput: 7.5792 MiB/s
  • PR throughput: 265.25 MiB/s

Which is pretty pleasing too.

The branch is not yet merged, there’s some final tidying to be done before we get it in, but it is looking like a promising change that may help nodes run on leaner hardware (or hopefully let @southside upload more on his local testnets!? :crossed_fingers: )


Useful Links

Feel free to reply below with links to translations of this dev update and moderators will add them here:

:russia: Russian ; :germany: German ; :spain: Spanish ; :france: French; :bulgaria: Bulgarian

As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!

71 Likes

I take this weeks flag now read :partying_face: @bones is the real winner

Thx Maidsafe devs

Now truly read :stuck_out_tongue_closed_eyes:

19 Likes

Second will do… must try harder!

17 Likes

Third first time on podium in a while

15 Likes

Honorable Mention today. Need to train harder.

13 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! :racehorse:

11 Likes

6th! yes!!!

11 Likes

Great work! It nice to see things becoming more stable.

Any idea when the revised White Papers will be available?

15 Likes

Great update, great minds finding solutions in simplicity!

14 Likes

Props to @Southside and @joshuef! Good stuff. Whole team kicking ass and taking names.

7 Likes

18 posts were split to a new topic: Memory and performance testing

Thank you for the heavy work team MaidSafe! I add the translations in the first post :dragon:


Privacy. Security. Freedom

9 Likes

Sounds like music!

8 Likes

Thanks for the update all and plus Southside you nailed it!!! Community is strong here!

9 Likes

@JimCollinson What is the eta on the White Paper revisions? Thanks.

4 Likes

Think one of the best improvements so far right?

4 Likes

We are working on some Swiss regulator specific ones at the moment (that aren’t terribly interesting) but no eta on the more public facing revisions at the moment I’m afraid.

9 Likes

I’m not proficient but I can see how such a fix changes SafeNet towards full success, hats off to the Maidsafe team! :ok_hand: :wink:

4 Likes