[RFC] Data Types Refinement

Changelog Data Types RFC

2020-01-21

  • Replaced Guarded types with per-request-configuration of concurrency control.

Note: If you strongly disagree with any of the above updates, please discuss it in the forum topic, for possible revert, or other change.


Simplify the optimistic concurrency

(…and the data types, even further)

Background

Currently we have two distinct types for optimistic concurrency check. This means that once you create an instance, it is fixed for its lifetime, to one of those configurations.

One could ask why this design was chosen.

Let’s look at what the reason for optimistic concurrency is:

When multiple writers operate on the same location, you have a desire to not overwrite a value that you didn’t intend to overwrite. That means, if you read value 5 and want to increment it and write 6, you only want to store that 6 as long as the original value 5 has not changed in between. Otherwise, if another writer stored their operation in between, you would overwrite that change and lose information.
This is called a race condition.

Now, let’s look at this in our context.

  • Who are we doing this for?
  • What are we protecting?

In our capacity as one of the writers, we do this for ourselves, because we don’t want to overwrite some value unintentionally. The optimistic concurrency doesn’t protect the data from being corrupted by the other writers. They can still just write anything they want. They just reload until they have the correct version and can send it in with what ever value they want.

So, with optimistic concurrency control, we are working under the assumption that all writers share the same desire to not overwrite the data of someone else, and therefore carries out the correct operations to do so.

The problem

In the example above, we descibe how a writer wanting to bypass the optimistic concurrency, just have to retry writing and getting the version, while request is rejected, until it passes.

In fact, we have an application doing exactly this in our code today: the Authenticator. It does this because the need for optional concurrency control is there, but we have designed it as if it isn’t. So we circumvent it in code instead. As explained here:

“[For the safe authenticator] we may authorise two apps from different devices at the same time. App auth is versioned. So one will pass and one will fail. From an authenticator (not API) POV we’d like to recover from the error by retrying using the next version.”

So, we have introduced the concept of data type flavours, implemented two additional data type configurations denoted by these flavours, increased API footprint, and bloated the end user (as in developers) experience. All of this, to try enforce a certain user pattern based on the false assumption that it’s the only one they need.

Already before release, we have proved ourselves wrong by the implementation in Authenticator, which has to bypass the concurrency control, to work correctly.

So, what could we do different?

Solution

Do not fix the concurrency control of an instance over its entire lifespan, instead, pass in optimistic concurrency check as parameter to operations.

It’s as simple as adding an ExpectedVersion enum parameter, with one of the variants being Any, which would indicate that we will write regardless of the version.

pub enum ExpectedVersion {
    Any, // this means concurrency check is OFF
    Specific(u64), // this means concurrency check is ON
}

Why is this safe to do?

Because of what we described above: the concurrency control is meant for us - the current writer. We are not preventing someone else from circumventing it. Thus we can just as well simplify, widen the capability, and the end result is the same; the other writers still have to execute code correctly, for the concurrency control to work. If it’s not in their interest, then the original concurrency control was no help; they are rouge players with write access (so you messed up, or were hacked). And we will do it because it is in our interest.

Implementation status

Until we order mutations with PARSEC at the data handlers (on the “after Phase2a” list), the versions might not be the same in all the vaults.

That means to say, that version handling at the network requires more work and until then it might not work as expected.

While this is the case, ExpectedVersion.Any variant might be disabled or implemented later in case we want to avoid that uncertainty in wait for the work to finish.

Existing implementations

In EventSourcing databases, where we have concurrent writers to streams with versions, this is a standard practice since the invention of the concept within programming. Streams have continuous optional concurrency control, and the writer decides from time to time if it desires to maintain changes or overwrite them regardless.

Results

Let’s look at what it results in:

The current types…

UnpublishedUnsequencedMutableData
UnpublishedSequencedMutableData

UnpublishedUnsequencedAppendOnlyData
PublishedUnsequencedAppendOnlyData
UnpublishedSequencedAppendOnlyData
PublishedSequencedAppendOnlyData

UnpublishedImmutableData
PublishedImmutableData

…are under under way of becoming:

PrivateMap
PublicMap
PrivateGuardedMap
PublicGuardedMap

PrivateSequence
PublicSequence
PrivateGuardedSequence
PublicGuardedSequence

PrivateBlob
PublicBlob

…and with this proposal become:

PrivateMap
PublicMap

PrivateSequence
PublicSequence

PrivateBlob
PublicBlob

Less code

Additionally, we will be able to cut down on code and various more or less pointless distinctions there, that are nice excersices in ninja coding (kudos for that), but are not solving a real problem.

The implementations of Map and Sequence will simply require less code, and the API foot print will be smaller, leaving a much more digestible impression for newcoming developers, as well as a long term more ergonomic experience working with SAFENetwork.

15 Likes