Are Erasure Codes (Storj) better than Replication for the SAFE network?

mav · November 28, 2018, 12:05am

The storjv3 whitepaper linked at the top of this topic is fantastic, really a great read. The ideas and design are fun and innovative. I read it right through three times and some sections probably twice that. The bibliography is also full of great material.

I’ve taken some notes about aspects of the whitepaper that relate to SAFE. I could write twice as much again about the specifics of storj but it wouldn’t really belong in this forum.

In summary, Storj is about storage more than about a new internet.

p7 “With an anticipated 44 zettabytes of data expected to exist by 2020 and a market that will grow to $92 billion USD in the same time frame” - ie about data storage (source)

p10 “Cloud computing is estimated to be a $186.4 billion dollar market in 2018, and is expected to reach $302.5 billion by 2021” - ie about cloud computing (source)

Interesting figures, storage accounts for about half to a third of the total value of cloud computing.

p10 “We have found that in aggregate, enough small operator environments exist such that their combination over the internet constitutes significant opportunity and advantage for less-expensive and faster storage.”

I guess these findings are from the prior storj networks, since there’s no source for this. But sounds positive for both storj and SAFE viability.

p11 “Fixed costs are born by the network operators, who invest billions of dollars in building out a network of data centers and then enjoy significant economies of scale. The combination of large upfront costs and economies of scale means that there is an extremely limited number of viable suppliers of public cloud storage (arguably, fewer than five major operators worldwide). These few suppliers are also the primary beneficiaries of the economic return.”

This indicates a problem of all decentralized storage - the competitive advantage that centralized services can gain from economies of scale. This advantage is achieved by their ability to organize themselves efficiently.

Hopefully the SAFE network allows efficient enough organization of decentralized entities that it can provide similar economies of scale but without the centralization.

p13 “decentralized systems are susceptible to high churn rates where participants join the network and then leave for various reasons… Rhea et al. found that in many real world peer-to-peer systems, the median time a participant lasts in the network ranges from hours to mere minutes” (source, for some reason not linked in the whitepaper bibliography)

This churn rate is for an altruistic network, not an economically incentivised network. That would probably make a big difference to the participant behaviour.

Diving deeper into the source for this statistic, section “3.1 Emperical studies” says “Elsewhere we have surveyed published studies of deployed file-sharing networks” which links to this paper that says they present a DHT “able to function effectively for median node session times as short as 1.4 minutes, while using less than 900 bytes/s/node of maintenance bandwidth in a 1000-node system. This churn rate is faster than that observed in real file-sharing systems such as Gnutella, Kazaa, Napster, and Overnet.”

So the short duration time is observed for four different altruistic networks. I don’t think this prior research into high churn rates is necessarily applicable here.

p13 “any distributed system intended for high performance applications must continuously and aggressively optimize for low latency not only on an individual process scale but also for the system’s entire architecture.”

But not at the cost of geographical centralization. A tough balance to meet but one that’s ultimately calculable. It feels like decentralized storage will need to be a two step UX, where the user initially uploads to their ‘closest’ node for best speed and and latency, and the upload appears essentially complete to the user at that time. But in the background the network geographically distributes the data for redundancy (which takes time and should not affect performance from the client perspective). This is just my guess about the future direction of ux for decentralized storage. ‘Uploaded’ will probably come to mean ‘to the nearest point’ rather than ‘as finally distributed’. Like the surface of an ocean vs the undercurrents.

p14 “access to highbandwidth internet connections is unevenly distributed across the world”

I wonder if this assumption will break in the near future. I suspect it may. I suspect networks such as SAFE and storj will be the motivation for the changes that lead to that assumption breaking.

It’s a bit like saying ‘bitcoin works because cpus are evenly distributed across the world’ - well, that assumption broke a few years later because bitcoin itself intivised asic chips and now they’re not evenly distributed as per the original assumption. The network modified the world it exists in.

p15 “…we classify a “large” file as a few megabytes or greater in size”
“The initial product offering by Storj Labs is designed to function primarily as a decentralized object store for larger files.”
“We made protocol design decisions with the assumption that the vast majority of stored objects will be 4MB or larger. While smaller files are supported, they may simply be more costly to store.”
“Users can address this [ie managing lots of files smaller than a megabyte] with a packing strategy by aggregating and storing many small files as one large file.”
“The protocol supports seeking and streaming, which will allow users to download small files without requiring full retrieval of the aggregated object.”

The seeking and streaming is cool. It only adds a little complexity to the retrieval metadata. Could be nice to have a standard for this considering the optimum chunk size in SAFE is 1MB so it will likely want to have a similar packing feature.

I would have to ask why chunks in SAFE are 1MB (and not, say, 2MB or 512KB), and likewise why objects in storj are 4MB or larger (rather than, say, 1MB or larger). This doesn’t seem to be justified via calculations in either network.

I mainly wonder this with respect to possible future bandwidth developments. Will these chunk sizes seem short sighted? Can they be upgraded later? Is the chunk size going to be like IPv4 short-sightedness?

p16 “Note that creating a system that is robust in the face of Byzantine behaviour does not require a Byzantine fault tolerant consensus protocol—we avoid Byzantine consensus. See sections 4.9, 6.2, and appendix A for more details”

Important to understand storj is not really trying to detect malice in a distributed manner. The details get a bit specific to storj so I’ll leave it there.

This difference leads to significant impacts on the structure of the storj nodes and it functions at a different level of trust and security to SAFE. Not necessarily more or less trust and security, just very different.

p17 “To get to exabyte scale, minimizing coordination is one of the key components of our strategy.”

Exabyte scale is a nice target. I’m impressed they have such a tangible goal.

p19 “Storage nodes are selected to store data based on various criteria: ping time, latency, throughput, bandwidth caps, sufficient disk space, geographic location, uptime, history of responding accurately to audits, and so forth.”
“node selection is an explicit, non-deterministic process in our framework. This means that we must keep track of which nodes were selected for each upload via a small amount of metadata”

This is a really important aspect to understand about the storj network and one of the major differences to SAFE.

Clients choose their storage destination (maybe via automatic decision algorithms).

This means the structure of the storj network ends up in two distinct layers - a metadata layer and a storage layer.

SAFE combines both these layers using XOR space.

Because storj has a metadata layer it can more easily track files for repair via erasure coding.

SAFE can’t do it as easily since the file metadata is not available in the first place, and if it were it would be distributed across xor space.

The secure messaging algorithm for traversing xor space makes it much less practical to track and repair files via erasure coding.

For this reason I think erasure codes are fundamentally unsuited to being used at the network layer of the SAFE network. However they may still be useful at the client / app layer.

p19 “provides peer reachability, even in the face of firewalls and NATs where possible. This may require techniques like STUN [29], UPnP [30], NAT-PMP [31], etc.”

Equivalent of the crust project within maidsafe.

I’m not sure the exact intended use of STUN but one thing I’ve always been wary about (from when I was exploring webrtc) is “the protocol requires assistance from a third-party network server (STUN server) located on the opposing (public) side of the NAT, usually the public Internet.” (source). This seems like a potential privacy leak or DOS target etc.

p19 “provides authentication as in S/Kademlia, where each participant cryptographically proves the identity of the peer with whom they are speaking to avoid man-inthe-middle attacks.”

Equivalent of the MaidSafe-DHT project.

p19 “3.4 Redundancy”
p35 “4.7 Redundancy”
p63 “6.1 Hot files and content delivery”
p65 “7.1 Object repair costs”
p69 “7.3 Choosing erasure parameters”

These sections cover the main points being discussed in this topic about erasure codes. Quite cool that they use it at the network layer but I think it isn’t practical for SAFE due to differences in network structure.

I looked at the Blake paper that’s used to justify the redundancy scheme. It’s a great paper with valuable insights and ideas. But it bases the real world examples on altruistic networks rather than incentivised networks - “We apply a simple resource usage model to measured behavior from the Gnutella file-sharing network to argue that large-scale cooperative storage is limited by likely dynamics and cross-system bandwidth — not by local disk space.” (source, for some reason not linked in the whitepaper).

The table on p5 for hardware trends is really interesting. It shows 15 years of data, with disk increasing much more rapidly than bandwidth. Would be good to extend it with the next 13 years of data that have become available since then.

1990 - 60 MB Disk and 9.6 Kbps home access bandwidth
2005 - 0.5 TB Disk and 384 Kbps home access bandwidth

p24 “Encryption should use a pluggable mechanism that allows users to choose their desired encryption scheme.”

Great to have a pluggable mechanism.

MaidSafe is also considering a pluggable hash structure. Variable encryption schemes may be something that can be added to self_encryption or safe_crypto.

p26 “Storage nodes in our framework should limit their exposure to untrusted payers until confidence is gained that those payers are likely to pay for services rendered.”

This is going to be a limiting factor to the ability to scale.

Either scale happens fast and trust is assumed, or scale is slow and trust is earned.

It’s probably not a big deal in real life but I feel this is one of those edges which is ripe for social engineering, causing uproar and damage to confidence due to deliberately negligent trust of payment.

p26 “While we intend for the STORJ token to be the primary form of payment, in the future other alternate payment types could be implemented, including Bitcoin, Ether, credit or debit card, ACH transfer, or even physical transfer of live goats.”

The ‘transfer of live goats’ comment indicates there are out-of-band ways to make payments, so trust is involved.

It’s worth clarifying some missing context - there are two independent payment flows. One from the client and a second to the storage nodes. Client pays with goats [to the middleman] and the storage nodes receive payment [from the middleman] in storj tokens. This relationship is (to my perception) extremely dubious. The protocol is interesting but timing and trust factors seem to present too many edge cases for my tastes.

p28 “Users have accounts on and trust specific Satellites [ie metadata handlers]. Any user can run their own Satellite, but we expect many users to elect to avoid the operational complexity and create an account on another Satellite hosted by a trusted third party such as Storj Labs, a friend, group, or workplace.”

A satelite can be interpreted as part of the user client software or as part of the broader distributed network ecosystem, both are valid. This makes storj both a trusted and a trustless system at the same time, depending how the client uses satelite infrastructure. It’s a really interesting design.

p30 “there are three major actors in the network: metadata servers, object storage servers, and clients.”

This is a good starting point (as well as the related projects GFS and Lustre file systems) for anyone wanting to understand the structure of storj.

p31 “Storage nodes can choose with which Satellites to work.”

Another difference from SAFE. Vaults do not get to choose which parts of the network they interact with. Clients do not get to choose which vaults they interact with. But on storj, clients and storage nodes get to choose which metadata services they interact with.

This has pros and cons, but is getting a bit specific to storj so I’ll leave it at that.

p31 “Storage nodes are not paid for the initial transfer of data to store (ingress bandwidth). This is to discourage storage nodes from deleting data only to be paid for storing more, which became a problem with our previous version.”

Same as SAFE - pay for retrieval (GET) not for storage (PUT). Nice to see some precedent from real life tests on this concept.

p40 “The most trivial implementation for the metadata storage functionality we require will be to simply have each user use their preferred trusted database, such as MongoDB, MariaDB, Couchbase, PostgreSQL, SQLite, Cassandra, Spanner, or CockroachDB, to name a few.”

To me this removes a lot of the benefit of the storage network. Having to track metadata in a trusted non-distributed way is a substantial barrier. The whitepaper has a good list of justifications for the pros (Control, Simplicity, Coordination Avoidance) and cons (Availability, Durability, Trust) of this design, and are actively trying to improve it - “We expect and look forward to new systems and improvements specifically this in component of our framework”. And p64 “we plan to architect the Satellite out of the platform”.

p47 “The second subsystem slowly allows nodes to join the network.”

Would be interesting to do some rough calculations about how much time would be required to reach the goal of exabyte scale based on this slowness aspect.

p77 “B.4 Honest Geppetto. In this attack, the attacker operates a large number of “puppet” storage nodes on the network, accumulating reputation and data over time”

Interesting (and I think preferable) name for what has been labelled “The Google Attack” on the SAFE network.

p79 “The previous version of the Storj network had over 150,000 independently operated nodes”

Valuable bit of insight about the market.

Topic		Replies	Views
Object storage prior art and lit review Blog Posts	1	684	December 9, 2019
Safe features in use at the moment Features	1	802	April 19, 2016
Understanding the Information Dispersal Algorithm (IDA) Features	8	1712	June 10, 2022
Speed and Resiliency through Forward Error Correction Features	9	1643	March 7, 2017
Mojette Transform and SAFE Features	23	3215	August 31, 2015

Are Erasure Codes (Storj) better than Replication for the SAFE network?

Related topics