Data Hierarchy Refinement
Note:
Bare in mind, that this is very much a work in progress, and there are inconsistencies as well as unfinished parts of this proposal. We have chosen to share it with the public in an earlier stage of iterations, for earlier exchange of ideas. If the proposal turns out to be desired and accepted, the earliest time of implementation would be post-Fleming.
Summary
This is a proposal describing a system where all data types are built of chunks, and together with decoupled metadata gives uniform data handling, for all types of data structures. This also solves the sustainability of the AppendOnlyData
/ MutableData
types.
All content is built up from chunks.
The chunk storage is organised in various types of data structures (also referred to as data types) where the data held in the structure is a chunk or a set of chunks (via indirection using the chunk(sā) network address(es), wrapped as a Pointer
).
The names of the data structures are the following:
Blob
, whose structure is just a single blob of data.Sequence
, whose structure is a sequence of data.Map
, whose structure is a set of unique keys, and a set of values of data, where a key maps to a value.
A Shell
instance holds information about a data type, such as the actual structure with the pointer(s) to the chunk(s), name, type tag, ownership and permissions.
Self-encryption could be applied to the chunks regardless which data type they belong to.
In other words; a user can put any type of data in any structure, the chunking of the data works in the same way regardless of data type.
Background
To chunk or not to chunk
In SAFE Network there are currently three data types, all handling data differently. Being a distributed network of data storage, there are some solutions that are more sustainable than others. Blob
storage for example, is inherently sustainable, as it chunks the data up in smaller pieces - chunks - and distribute them over the nodes in the network. What is kept is a reference, a map of the chunk addresses. If the data map is too large, the process is applied recursively. This way, data is always spread out over the network, regardless of its size.
Map
and Sequence
(the result of splitting up AppendOnlyData
and merging MutableData
), inherited the design from AD
and MD
that all the data is stored within the group of 8 nodes closest to the data in xor-space. This is for obvious reasons not a sustainable solution. Previously a hard-coded limit on entry size and count acted as a forced distribution over multiple instances, by simply not allowing the user to store more in the instance, at the cost of limiting the data type use cases and utility.
The most recent design removes the hard-coded limit (i.e. entry size and count is now unlimited), but does nothing to solve the problem: a group of 8 nodes will typically have a very limited storage size, which will act much the same as a hard cap on entry size and count.
The problems to solve include to allow for Map
and Sequence
to have a sustainable data storage design just like Blob
.
Moving around data
As described above, data is stored with the nodes closest to the data, i.e. id:s of the nodes and the hash of the data, are close in xor-space. As data in the network is only added or deleted, there is no change to actual data. It either exists, or does not exist. The older design of the data types complicated this concept.
Making a Blob
Public
would today change the XorName
of the data (as private XorName
is hash(owners + contents)
while for public itās just hash(contents)
), i.e. require the data to move in the network.
An instance of AppendOnlyData and MutableData were more or less to be considered a chunk, since they were limited to 1 MiB in size. But these chunks were thus mutable, and regardless all the metadata were held in the chunks.
In a distributed network, where there are already uncertainties in how much time nodes will spend syncing data (forum discussions with estimates on the time required, have not had entirely comforting results), it should be a high priority, a clearly expressed goal, that data should not move around more than necessary. Anything else is a negligence of the physical reality and a waste of resources which, considering the rudimentary analysis on sync times, in the end could also be infeasible.
First priority should be an acceptable UX, after which we can relax the requirements a bit, compromise and prioritize other values higher. The current design however, did not put this problem at the forefront.
It was considered and arguments have been formulated around the rationale for current design, along the lines of the following:
I.e. problematic to adjust the rules for when and how xorname change?
Yeah, I believe so, because in case of Blob it is required because of the ability of an owner to delete it.
And now imagine we both have the same piece of data (which amounts to the same XorName), unbeknown to both of us, and I delete my piece of data which has the same XorName as yours. What would happen in this case? You wonāt be able to access it anymore, hence we use a diff hashing mechanism for private Blob.
But the effort to solve the problem more or less ended there, which would be a natural result of not defining this problem as a high priority.
The idea of symlink is born
In the search for a simple unified way of handling data at the lowest level, and a way to decouple that from metadata changes, the focus was on extracting the structure of the data, and have everything at the lowest level be separate, as chunks. It would be more or less necessary if we were going to handle large amounts of data in all our data types, and keep the data truly distributed in the network.
Later, discussions led in to symlinks.
If instead of data moving, or indeed changing, references to data can be put in other namespaces. Like symlink, to a specific version of private data, (with whatever would be needed to access only that version).
[ā¦]
How it works now: itās stored at different locations, because hash(āhelloā + Bob) results in 123 and hash(āhelloā + Carol) results in 345.
I decide that I donāt need this file anymore, so I delete it (which is a fundamental property of private ImmutData - otherwise why have it in the first place?), so itās not available at location 123 anymore.
However, your file stored at 345 remains unaffected since itās your file stored at a diff location.
Now, imagine we use some sort of pointers or symlink.
I store my hello file and the location of actual data is now just hash(āhelloā) (as it works with public data), so letās say xorname 666.
I store a symlink data-loc: 666, owner: Bob at location 123 and you do the same, storing a symlink data-loc: 666, owner: Carol at loc 345.
You make your file public, and itās all fine as it suffices to just move a symlink to a new location, from 345 to 678.
However, now I want to delete my file ā¦ and what happens then? Do I delete my pointer only? Do I delete the actual data at loc. 666, so that your symlink doesnāt work anymore? Or do I not delete the actual file at all, meaning that this data becomes undeletable? Which in the latter case does defeat the purpose of having private ImmutableData in the first place.
[ā¦]
In combination with the symlink proposed we could have something like data is at place where it would be if published, each owner of private data has a symlink at hash(data + owner) pointing to it. When published data is not moved, but the xor_name used is no longer the unpublished one but the published one + ref_count is gone. If owner delete and refcount exists and reach 0 data deleted.
That would at least avoid having to move data when publishing, as well as deduplicate unpublished data.
But that probably have other issues.
[ā¦]
I agree, reference counting is a common/usual way to solve this sort of problem. When last reference (symlink) is dropped, then delete the item it points to. btw, this āsymlinkā proposal sounds to me more like a hardlink in filesystem layout, where it is hardlinks (filenames) that point to inodes, and symlinks point at hardlinks. info here
Metadata and chunks
The data stored to the network, i.e. chunks, doesnāt change. Chunks are added or deleted. What changes is metadata. So, a specific chunk, should only ever have one XorName
. Basically, it is hash(chunk.payload)
and so resides in one place in xor-space regardless of Private / Public
scope.
The metadata is basically a light-weight wrapper around Private / Public
scope, owners, permissions, what structure the data is organised in, and the Pointers
to the chunks, i.e. the registry of where the chunks reside in the network etc.
Metadata is what changes; owners history is extended, permissions history is extended (so those two, if we are nitpicky, are also only appended to), Private
can change to Public
, reorganizing the structure (adding, removing and swapping Pointers
to chunks).
The conclusion of this is the following:
Data has no reason to move around, other than section membership changes that would require it to be copied over to new members.
Not only in code, but even physically, what changes often should be separated from what doesnāt. This begs for metadata and data being separated.
Terminology
Word list
Gateway nodes
: A category of nodes performing validation and acting as a barrier and gateway to further access into the core ofSystem nodes
.System nodes
: A category of nodes performing core functionality of the system, such as storage and retrieval of data.Client nodes
: The subset ofGateway nodes
that connect to clients and manage their balances. Corresponds more or less toClientHandler
atElders
.Shell nodes
: The subset ofGateway nodes
that holdsShell
instances, and validate access to them and the chunks they point to. Corresponds more or less toDataHandler
atElders
.Chunk nodes
: The subset ofSystem nodes
that hold data in the network. Corresponds more or less toDataHolder
atAdults
. Should only care about storage and retrieval of chunks, and be oblivious to anything else.Chunk
: A piece of data, with size of at least 1 KiB and at most 1 MiB.RefCount
: A number used for counting number of unique clients referencing an individual (private) chunk.Data
: Loosely defined as that what content is comprised of, and chunks hold.Content
: Digital information input to the network by clients. On the network it is contained in a chunk or a set of chunks, or even in a structure of a combination of the aforementioned. The structure itself can be an essential part of the semantics of the content - even though strictly speaking the structure holds the content.DataStructure
: Comes in different types,DataTypes
, that define the structure in which data is held, such asSequence
,Map
orBlob
. Accessed through aShell
, which is held byGateway nodes
. ADataStructure
is light-weight as it only holdsPointers
(as well as versions and keys), and not the actualData
.Blob
: A structure of a singlePointer
. (Usually used for a single large file, which is stored in a chunk or a set of chunks, hence only necessary with a singlePointer
).Map
: Key-value structure ofPointers
.Sequence
: Append-only structure ofPointers
.Pointer
: A pointer to data, i.e. an address to a single chunk, or a set of addresses to chunks.Shell
: Contains a higher layer representation of the data held by chunks, also all its metadata. Without the information in theShell
, it would be impossible to locate the chunks that build up a certain piece of content stored to the network. It is like a map to the chunks, and a blueprint for how to reconstruct those chunks into a meaningful representation. This map and blueprint is protected from unauthorized access, using the permissions specified by the owner/user, and held in theShell
. This means that the access to the chunks and the content they are part of, is protected by theShell
. (This is why the nodes holding aShell
are part ofGateway nodes
, since they act as a gateway to the data.) AShell
is always located in the network using its Id, which is based on an arbitrary name specified by the creator (owner) in combination with a type tag. In case of private data, the owner is also included in the derivation of theShell
Id. AShell
is light-weight, as it only holds the light-weight metadata components, such as name, type tag, owner, permissions and pointers to data. When key components of the metadata change (those that its Id and location are derived from) it is therefore a light-weight operation to move the Shell from one group of nodes to another.ClientNodes(id)
: The 8Client nodes
closest to the client_id.ShellNodes(id)
: The 8Shell nodes
closest to the shell_id.ChunkNodes(id)
: The 8Chunk nodes
closest to the chunk_id.Scope
: TheScope
of the data is defined asPrivate
orPublic
.Public
data is always accessible by anyone, but permissions for modification can be restricted.Private
data is initially only accessible by the owner. Permissions can be added for specific users or groups thereof. However, it is not possible to add permissions for theUser::Anyone
category, because thePrivate
data instance would then be indistinguishable fromPublic
data in that regard.Private
data can be deleted.
Gateway nodes and System nodes
The Gateway nodes subspecialize in various validation areas, such as payment for operations, or permissions to data.
System nodes specialize in the actual core handling of the system functionality, such as storage and retrieval of chunks.
The distinction is meant to allow for additional subsets of gateway nodes
or system nodes
to follow the same architecture.
Quick explanation of the word list
All content is held in the network as data in one or more chunks.
The chunks are stored individually at the nodes. The references to the chunks are organised in data structures of various data types. These hold the network address(es) to the chunks, wrapped as Pointers
.
The data structures are the following:
Blob
, whose structure is just a single blob of data.Sequence
, whose structure is a sequence of data.Map
, whose structure is a relation between a set of keys and a set of values of data.
A Shell
instance holds information about a data structure instance, such as the actual structure with the pointer(s) to the chunk(s), name, type tag, ownership and permissions.
Self-encryption could be applied to the chunks regardless which data type they belong to.
The Shell holds the metadata
The Shell
The Shell
for a Private
data structure instance is found at the XorName
address which is hash(owner + name + datatype + tag)
. The Shell
for Public
instances are found at hash(name + datatype + tag)
.
The Shell
consists of:
- Id: (
hash(name + datatype + tag)
|hash(owner + name + datatype + tag)
) - Name: (
string, arbitrary name
) - TypeTag: (
u64, reserved ranges exist
) - Scope:
(Private | Public)
- OwnerHistory: (
Vec<Owner>
) - PermissionsHistory: (
Vec<Permissions>
) - DataStructure: (
Blob | Map | Sequence | Shell
) Enum holding value ofPointer
in case of variantBlob
, a structure ofPointers
in case ofMap
orSequence
(i.e.BTreeMap<Key, Vec<Pointer>>
andVec<Pointer>
), and a tuple ofVec<Pointer>
andBox<DataStructure>
whenShell
(where theVec<Pointer>
is for the previousShells
, and theDataStructure
is for the data appended since the previous one was blobified).
Pointer
holds either ChunkMap
or XorName
. It represents a Blob
, Map
or Sequence
value, which can be either just a chunk, or a set of chunks, if the stored value was big enough.
pub enum Pointer {
/// Points directly to
/// another Shell instance.
Shell(XorName),
/// From large content.
ChunkSet {
/// Locations of the chunks in the network
/// (the same as chunk post-encryption hashes)
chunk_ids: Vec<XorName>,
/// An encrypted ChunkMap.
chunk_map: Vec<u8>,
},
/// From small content.
SingleChunk(XorName),
}
Any request on data goes through such a Shell
, which is handled at Gateway nodes
, ShellNodes(id)
, (essentially corresponding to the DataHandlers
at section Elders
).
The request is validated against owner and permissions, etc.
If the request is valid, it is forwarded to the Chunk nodes
at ChunkNodes(id)
, for each chunk in the requested data, while a receipt of the request is returned to the client.
This is the Scatter-Gather
pattern, where the request is scattered over the set of ChunkNodes(id)
, and the aggregator
is the client, which will asynchronously receive the Chunk node
responses, and match them by the correlation ids also present in the receipt it received from the ShellNodes(id)
.
In other words:
ShellNodes(id)
has the metadata and by that can send all the necessary requests toChunkNodes(id)
.ShellNodes(id)
responds to the client, with a receipt of what responses it needs to expect fromChunkNodes(id)
, and how to restore that into the requested data.ChunkNodes(id)
receive information fromShellNodes(id)
of where to send the requested chunks, but have no idea whatShell
the chunk is part of (a chunk-address can exist in any number ofShell
instances).- The recipient client could be the owner or could be someone with read permissions on the chunk, for all that
ChunkNodes(id)
knows.
Shell data growth
If the Shell
size exceeds some pre-defined size, the Shell
is blobified, i.e. stored as a Blob
(with scope corresponding to the Shell
scope), and the current Shell
updated as follows:
- Id: (no change)
- Name: (no change)
- TypeTag: (no change)
- Scope: (no change)
- OwnerHistory: (
Vec<Owner>
with only the last entry, new entries go here) - PermissionHistory (
Vec<Permissions>
with only the last entry, new entries go here) - DataStructure: (Set to
Shell
enum variant. The value is a tuple of aVec<Pointer>
and aBox<DataStructure>
. The vector holds pointers to previous versions of theShell
, now stored asBlobs
. TheBox<DataStructure>
will just be theBlob
when structure isBlob
. In case ofMap/Sequence
- the latest versions of thePointers
will be kept in theDataStructure
, and new entries go here).
This Shell
is now the current version Shell
. Any changes to the metadata takes place in this instance. Previous versions are now immutable, as they have been blobified.
Previous versions are kept as references in a vector, which point to the Blobs
containing the serialized Shells
.
In case of Map
/ Sequence
, every previous version of Shell
holds earlier versions of Map
keys, or Sequence
values.
Since a Blob
doesnāt have growing data structure, the only way the Shell
would exceed the max-size, is if the owner or permissions history have grown beyond that size.
āModifyingā data
In the current version Shell
(the only one held at Gateway nodes
, earlier versions have been offloaded as Blobs
), you can append the new data to a Sequence
, or delete, insert or update to a Map
(NB: which internally is also appending, regardless of Private / Public). This is done by storing to the network: the data as a chunk or as a set of chunks; and storing to the data structure held in the Shell
: the reference to the chunk (the XorName
) or the set of chunks (the ChunkMap
) - i.e. Pointer
.
Deletion
In the case of Scope
being Private
, we allow deletion of the actual data. By refcounting the chunks, and only deleting the actual chunk if the decrement results in 0. The end result will be the exact same as if two copies of the chunk was maintained on the network. A client never accesses the raw data, it always accesses Shell
, and so that the network store the copies of same data from different users, at the same location in the network, is an implementation detail. In either way to go about it (multiple copies or deduplication), if more than one user has the data, the data still exists somewhere in the network when it is deleted by a user. The only difference between keeping two copies or one, is where in the network, which is a pure technicality completely opaque to the user since they always access it through Gateway nodes
handling the Shell
.
pub struct PublicChunk {
payload: Vec<u8>
}
pub struct PrivateChunk {
ref_count: u64, // Only private chunks have ref count, since they can be deleted.
payload: Vec<u8>
}
pub enum Chunk {
Public(PublicChunk),
Private(PrivateChunk),
}
XorName
of the chunk is hash(payload)
.
If an owner deletes a chunk, the ref_count
of the chunk is decremented. If the ref_count
is by that zero, the chunk is deleted from the network.
If an owner deletes a data structure instance, delete is called on all of its chunks (using the Pointer(s)
in the DataStructure
) and then the Shell
is deleted from the network (and all blobified versions of it). Such a process could be a longer running task, and we could store the deletion process state in the actual Shell
, to allow for monitoring of the process.
Storing data
Clients do the chunking of content and organize the encrypted ChunkMaps
(held in Pointers
) to them in some data structure. Then uploads the structure and the chunks to the Gateway nodes
closest to the client_id
- ClientNodes(id)
. These validate the balance, and then forwards the chunks to the Chunk nodes
- ChunkNodes(id)
. ClientNodes(id)
then continues with updating the Shell
specified in the request, at the corresponding ShellNodes(id)
. They could be updating a Map
with an insert, a Blob
with create and a Sequence
with an append, all with the same value - which would be an address to data, a Pointer
.
- Client chunks the data, and acquires a
ChunkMap
. - Client stores to network by sending all chunks to
ClientNodes(id)
. ClientNodes(id)
subtracts aStoreCost_a
.ClientNodes(id)
sends the chunks toChunkNodes(id)
.- Client encrypts the
ChunkMap
, and callsClientNodes(id)
to store it in some data structure. ClientNodes(id)
subtracts aStoreCost_b
.ClientNodes(id)
callsShellNodes(id)
to update theDataStructure
held by itsShell
, inserting/appending the encryptedChunkMap
.
If the content size is too small to undergo self encryption, client has to encrypt the data before producing the ChunkMap
.
A wrapper around the self-encryptor could handle this case, to ensure that any data leaving the client is always encrypted, as per the fundamentals.
Retrieving data
With a reference to a data structure instance, i.e. to its Shell
, the actual data is retrieved and reconstructed by calling ClientNodes(id)
for the relevant Pointer
(a single one in case of Blob
, the one mapped by a key for a Map
or for example the current version of a Sequence
). The ClientNodes(id)
looks up the Pointer
and sends a request to all relevant ChunkNodes(id)
for the XorNames
found in the chunk_id
field of the Pointer
, asking them to retrieve the chunk. Finally, ClientNodes(id)
returns the encrypted ChunkMap
held in the chunk_map
field of the Pointer
, to the client.
At the client, the encrypted ChunkMap
is decrypted, and used to reconstruct the content, from all its chunks that asynchronously drop in from the network as they traverse it from their respective data holders.
- Client calls
ClientNodes(id)
, requesting to access some entry in some data structure instance identified byshell_id
. ClientNodes(id)
retrieves the entry from theShellNodes(id)
nodes and A. request the chunks from the data holders (using thechunk_id
s part of thePointer
) B. returns the entry to the client.- The client decrypts the
chunk_map
of thePointer
part of the entry, and using the decrypted content (ChunkMap
) reconstructs the content from all chunks that come in as a result of the request.
What we achieve
The above achieves the following:
- All data is represented as chunks in the network.
- All such chunks are deduplicated (depends on implementation details though).
- Metadata is separated from data.
- Modifying metadata (e.g.
Private
toPublic
) does not move around data. - We make it clear that we have two layers, with one protecting validation layer, and one core system layer.
- We sustainably store āunlimitedā count of entries of āunlimitedā size in
Map
andSequence
. - It unifies the data handling and solves the problem at the system design level instead of code level.
- Any additional data types, and how they are handled, are implemented in the
Shell nodes
, whileChunk nodes
are oblivious to it, and only deals uniformly in chunks. - Any additional validation roles are implemented as a subset of
Gateway nodes
. - Any additional core system roles are implemented as a subset of
System nodes
.
Oblivious nodes, Data flows
When storing data to the network the client sends
- Chunks, which the client nodes forward to Chunk nodes.
- Pointers (in some structure), which the client nodes forward to shell nodes.
It follows by this that:
A. It is the responsibility of the client, to ensure that the chunks referenced in the Pointers
are stored to the network. Otherwise, a GET-request for the Pointers
would give an error.
B. It is the responsibility of the client to ensure that Pointers
are stored to a Shell
in the network. Otherwise the client will not be able to retrieve the chunks again, and restore the content.
C. Sharing of the contents, require encryption of the ChunkMap
using a key which allows for the other to decrypt it.
pub enum Pointer {
/// Points directly to
/// another Shell instance.
Shell(XorName),
/// From large content.
ChunkSet {
/// Locations of the chunks in the network
/// (the same as chunk post-encryption hashes)
chunk_ids: Vec<XorName>,
/// An encrypted ChunkMap.
chunk_map: Vec<u8>,
},
/// From small content.
SingleChunk(XorName),
}
pub type ChunkMap = Vec<ChunkDetails>;
pub struct ChunkDetails {
/// Index number (starts at 0)
pub chunk_num: u32,
/// Post-encryption hash of chunk
pub hash: Vec<u8>,
/// Pre-encryption hash of chunk
pub pre_hash: Vec<u8>,
/// Size before encryption (compression alters this as well as any possible padding depending
/// on cipher used)
pub source_size: u64,
}
Client full flow
(This is a simplified draft)
- Client wants to store content.
- The content is self-encrypted (if large enough).
- Each chunk is uploaded to the client Elders (
ClientNodes(id)
), who in turn send them to theChunkNodes
closest to thechunk_id
. - Client produces
Pointers
, with chunk ids (from self-encryption step) and the encryptedChunkMap
(also from self-encryption step). - Client requests a
Shell
to be created or updated with thePointer(s)
, by sending them to the client Elders (ClientNodes(id)
), where the interaction with theShellNodes
closest to theshell_id
, takes place, with the corresponding operation on theShell
.
Chunk
Shell
Unresolved questions
Caching
If Get
requests go via Gateway nodes
, and are then relayed to Chunk nodes
, who finally call the client, we are most likely passing the network in a one-way circle, rather than on the same path back and forth. This complicates the caching.
If it came to be critical, it should be possible to ensure the response path is the same as the request path (although that would not be a step in the preferred direction, in terms of network design).
However it is probably not required; Even if response do not take the same path as requests as happen now, when we talk about popular chunks, they should be popular by randomly distributed people over sections, so basically section on the response path of some will be on the request path of others.
Hidden chunks
(The hidden chunks property of this system has received some initial positive responses, but it remains an unresolved question until it has been examined deeper.)
The location of the chunks is known by the nodes holding a Shell
(thatās how they can fan out to the ChunkNodes
, to request for them to send the chunks to the requesting client). They are stored in the Pointer
field chunk_ids
. But how to combine them into the original content, is only known using the ChunkMap
. The ChunkMap
is stored encrypted in the Pointer
field chunk_map
.
This way, although intermediate nodes can correlate chunks and clients, only the client can read the raw content of chunks uploaded to the network.
Explanation
Letās say we wanted the feature of Hidden chunks
(not saying we in the end would, but in case of)ā¦ (disregarding encryption of content pre-selfencryption, as that would disable deduplication and complicate sharing).
ā¦we would then:
A. Using self-encryption, we produce the chunks, get their Ids and the corresponding ChunkMap
out of some content.
B. Upload the chunks to the network (through ClientNodes(id)
).
C. Encrypt the ChunkMap
locally.
D. Place the ids of the chunks, and the encrypted ChunkMap
into a Pointer
.
E. Choose a structure for the Pointer
to be stored in.
F. Create or update a Shell
instance with given Pointer
(through call to ClientNodes(id)
).
(The final update of a Shell
, can include a multitude of Pointer
operations, in a transaction.)
Consequences
Since all interaction with data goes through Gateway nodes
, by interacting with a Shell
(referenced by its name and tag), checking for the existence of a chunk on the network will not be possible.
This means that it will not be possible to self-encrypt some file and check whether that file exists on the network.
This effect could be considered an improved privacy.
Scenario: Letās say you are suspicious of some leak of information, a whistle-blower for example, and so you plant some documents where you know they will be found by someone you suspect. Later on you could be polling the network for the existence of these documents, and once found you will have revealed the whistle-blower.
A pro SAFENetwork user would know to attach some minimal information to any data uploaded, as to completely change the ChunkMap
, and protect from the above setup. Nonetheless, it is not unlikely that people would regularly fall into such traps, for some reason.
Storage cost payments
In the above examples, one payment is made when chunks are uploaded, another payment is made for operations on the Shell
(metadata).
This allows for differentiating the pricing. Maybe the cost per chunk could be variable based on the chunk size. Or maybe it is good to have one cost per chunk. Regardless of that, the Shell
operations could also be priced differently. It could be argued that updating the Shell
should be a cheaper operation.