What Is Storj?

I read the post made by one of the co-founders of Storj on the google development forum but i am still a little confused as to what storj is. He mentions that storj is an open source software company building applications on the maidsafe protocol. They are crowdfunding at the moment for the development of a p2p storage application, but doesn’t maidsafe have there own storage solution on the SAFE network?

Where does Storj fit in to all of this?

1 Like

Yes there is a storage app for MaidSafe that stores and earns safecoin etc. (Viv did a blog showing it).

I think storj is a kind on front end for storage systems where they aggregate different platforms into a front end mechanism. I looked through the code to figure out authentication and encryption but could only find some javascript and a python script. I know they intend to create server type nodes where folk can sell their space to others or similar. I am not sure of the login or negotiation of resource to buyer etc. @lowry_jim is on the forum here so may be able to give a better explanation than I can though.

In terms of maidsafe integration I think the idea is to use maidsafe as a hard drive and sell that space, I doubt this part would be a good mechanism though and have told them, the maidsafe space will be free or extremely close to free, so it would be hard to get between people and their data and charge for that.

4 Likes

Thanks for the explanation, that helped to clear things up for me.

1 Like

So I’ll address some of the misconceptions. So what you are describing is Metadisk, our blockchain based file storage web application. We currently use HMAC-SHA256 convergent encryption. We think logins and authentication create unnecessary overhead, and instead use disposable token identifiers.

The cool thing about Metadisk is it offers APIs that allow traditional services to integrate with it without downloading any additional software. We have prototypes working for the file storage app, and also a decentralized video player as well.

We will be making an announcement about it sometime this week, but here you can take an early look at the whitepaper: http://metadisk.org/metadisk.pdf and website http://metadisk.org/.

Metadisk serves as a multi-platform frontend. We are looking for integration with Maidsafe, Siacoin, Permacoin, Filecoin, and our own data platform Storj. If the space is free or close to free then great. We will make sure it gets used through our many tools and applications we are building.

Our approach is simple. Decentralized data is happening. Lets start building applications now, not later. We have two functioning app prototypes, and more on the way.


As far as Storj is concerned we are taking a completely different approach to the same problem. I’m more concerned with some of the concerns raised by Peter Todd:

“I visited Maidsafe a few weeks ago and left with very similar concerns. Essentially I’d describe it as the “Google Attack” where the financial incentive to offer data storage gives a company like Google the opportunity to sell nearly 100% of the capacity. Even if they have incentives to be reliable they’re still a single point of failure that can go down at any moment.
The best I could come up with was to use multiple, trusted, geographically distinct auditing nodes. With the limited speed of light you can prove that more that one copy of the data is being held, although again, you can’t prove the nodes aren’t being run by the same organization. Similarly the auditing nodes are trusted, which means most users will be relying on centralized servers anyway.”

I’m finishing up my paper on the subject for Storj, that addresses these concerns more directly. Probably best if you just wait to read that. Perhaps you could enlightenment me more about how you are handling Sybil attacks?

http://maidsafe.net/SystemDocs/attacks/birthday_paradoxsybil_attack.html

That is interesting and alarming. I sometime see people talk of SHA as encryption which is weird (and completely wrong), you use a HMAC and think it is encryption (see HMAC - Wikipedia) can you expalin how you encrypt with this ? Do you not use a symmetric mechanism like AES or some asymmetric thing like RSA or ECC?

I am not sure you are encrypting at all if you are just sending hashes around the place, this seems to me to be a pretty significant misunderstanding of cryptography at the very basis. Please tell us you are not hashing data and think its encrypting it?

As for logins, that is interesting can you explain how people can with complete anonymity buy a token and where do they store them? If you walk up to a machine somewhere then how do you login, or get one of these tokens? I am interested.

Also can you tell me
1: Do you have servers ?
2: Is it possible to know where the data is?
3: Can others then delete this data or hand it over to agencies?
4: How do you identify nodes in a decentralised manner (i.e. cryptographically and with no central control)?
5: What does a node look like?
6: If nodes are long running, what are they coded in?
7: What p2p networking have you used and why?
8: How are people automatically part of the network and contributing?

I think I have a ton of questions that would be helpful to know the answers to. It seems storj will run on everything, but some of the projects you mention are diametrically opposed to your approach for many reasons.

If MaidSafe is used as storage alone like some random disk (which is a sign of a gross misunderstanding) and the data also is spread elsewhere then you run a large risk of leakage of information in a manner harmful to any of your users. This would be very bad.

There may be things storj does that is interesting and I would like to know and look at that code etc. the logins could be an example of some interesting feature, it would be nice to see the design of how this is better than a login (I do not like logins so always searching for this answer).

Not really sold on HMAC, just what we are using for our prototype for Metadisk. As stated we are using convergent encryption, which leads nicely to good duplication, and they user can add any salt they want for privacy. Users and applications are free to use any encryption methods they choose.

So to answer your question, no. We believe the user or applications should have the power to choose their own algorithms, in replacement or in conjunction with ours. In this way they are not entirely reliant on our codebase for security. Never know when something fun like heartbleed will pop up.


Did you read the Metadisk whitepaper linked above? The overview section provides a quick 10,000 foot view on how we handle that on the application level.

Take a look that first, and then I’ll start trying to tackle those questions one by one.


“We identified through both simulation, and mathematical prediction, that it would be possible to carry out a ‘birthday paradox’ attack on the network, whereby an attacker simply wishing to cause harm or disruption could flood the network with nodes it controlled, knowing they only need to surround a single address with 3 or more malicious nodes, in order to exert control over that node.”

Do you have that data or models published somewhere? Would be useful to me.

You can build and run the address_space_tool target in our Common module to see the simulation and play with the variables. You’d need to run it multiple times to get statistically significant results.

No worries Shaun, I was making the point HASH is not encryption so I cannot see how you are doing this. A hash is a digital representation of a data element and does not contain the data element, so how can you do this and call it convergent encryption? Can you point out the code I am truly interested as it would be a major breakthrough in cryptography if you have made a hash an encryption algorithm.

If you are saying users can choose their own encryption algorithm then how is that different from today’s broken systems?

I have read the paper and cannot see the answers to any questions in there. IT seems it will be possible for folk to know where data is, recognise it, delete or corrupt it and constant pings or similar (heartbeat_ to keep watching it. So if you take a normal user with 100Gb data and this is chunked into say 1Mb chunks then you are talking a huge overhead on pings alone.

IF users mutate data (which they will) then the blockchain bloat will become much more of an issue and then they need to remove the chunks they put up to make room for new ones, so you need reference counting and ownership semantics otherwise who owns the chunk in a convergent encrypted scheme, unless you alter the salt every time, which makes convergent encryption useless as a de-duplication mechanism.

I think we have come across most of these issues and could only solve them with a fully decentralised approach which requires a secure DHT implementation. I am really keen to see how you can do this using servers or nodes owned and controlled by people. Good to see different approaches to issues, i.e. your Sybil attack landscape must be very significant as well as potential Spartacus attacks, unless you ale use centralised PKI based systems or tie in namecoin type things and DNS or the .bit sites etc. (Spartacus attacks in this area are a disaster). Anyhow well done for attempting this, but there is a minefield of issues to resolve in these systems and after years we have uncovered an awful lot. If you read the papers and code you will see the pinnacle of the iceberg of issues and fixes.

If you decide on a pure p2p approach (I think any other approach is centralisation) then you could fire up omnet simulators and grab our simulations repo and run some fine grained tests to check the math etc. especially with heartbeat pulses across peered connections and borders and firewalls. It will help tremendously I think, although its pretty in depth and took a few doctors over a year to create here. It should be quicker to evaluate though as most of the simulations are there. I hope it helps.

This makes us investors very concerned. I keep seeing this repeated, which, for me, means safecoin will keep its low value or even decrease with time. Thoughts?

I’m not sure “free” is the correct wording to use. My understanding is that the network will reward people with coins for providing services to the network (i.e. data storage). Providing storage will have real-world costs for the user in buying the equipment (hard disks, CPUs, memory, etc,), and maintaining the connection (internet connection fee, electricity). The advantage is that many of these costs the user is paying anyway - so the network fees are amortized. Of course, you should be able to purchase coin outright and provide no services. Then you are effectively paying the other users in the network for the services you want. Basically if you are a net consumer of services you should have to acquire coin, and if you a net producer of services you should be given coin.

I hope I understand this correctly, because I find this is to be a highly attractive feature of the network. If someone doesn’t have the capability of setting up a proper environment for quick data storage, then they can pay someone to do it for them.

What I’m not sure of are the probabilities of getting paid (do low latency server farms have a significant advantage in this network?). Someone else will have to speak to that, I haven’t groked all that material yet.

@filipehdbr
I think @vtnerd gives a good description. The network will pay what it needs to in Safecoin to farmers in order to meet demand for storage. It will charge users in order to pay for this. So there’s a balance here. Safecoin’s value however floats as well, it is not tied to storage like a “gold standard” but will obtain value wrt fiat according to several factors (obviously supply and demand) - see the FAQ: What gives Safecoin value and differentiates it from altcoins?

@filipehdbr PS how many investors do you speak for? :wink:

Oh I see where the confusion lies (this is what happens when you respond at 5am) . So we are simply using the hash as the encryption key (plus any salt the user wants to add). We are currently using AES128-CTR for the file encryption. Everything in the Storj ecosystem is supposed to be modular. Don’t like the encryption module? Drop it and put in a new one. Don’t like the interface? Drop it and put in a new one. Trying to make it as modular as possible. If you have found anything you don’t like about AES128-CTR, feel free to suggest something, and we can swap it out.

Still another 20 pages that cover that in the Storj whitepaper,but that gives a nice little introduction that I wanted you to see first. If you have many chunks the trick is to pass the merkle roots of the challenges over the network, not each individually, or as you described you would kill the network with ping overhead.

The Sybil attacks pretty much comes from the consensus mechanisms. We believe decentralized and distributed audits are the way to go. In this way you can’t fall victim to 51% attacks that would kill an early network. You at long as you have 1 honest node, he/she/it can prove 1 thousands other nodes incorrect.

Yeah love to run through your sims. We don’t have to worry about borders and firewalls too much. We built Metadisk first because it could act as a gateway/relay. We both know our good friends at all the ISPs will try their very best throttle our networks once we gain some traction (Meshnet next right?)


Applications have higher profit margins, and especially as like you said data will trend to zero as networks like Storj and Maidsafe commoditize data. Dropbox just takes Amazon S3 and sells it at a 10x-100x. Which is why our model is more about building applications AND protocols.

It is like it is free to walk up stairs in wallmart but the goods are paid for. or free to go on Google, but pay with privacy. Each system should have incentives and rewards, but it does not mean charge for everything. Dropbox is ‘free’ for for XgB and their investors were not to upset, or facebook etc.

Part of a system being free does not mean there is no value, in fact in many areas it provides the value. So in SAFE the value of storing data is paid for by the network. As safecoins increase in value then the reward is better per coin, but less are required. The external value of safecoin has little to do with internal charges.

Parts of the network will incur a charge in safecoin, but not all parts at all times for all actions :smiley: HTH

Thanks for the clarification, I figured it was some kind of typo thing happening there.

How do you intend for this. I always wonder “who watches the watchmen” in these situations? If the network is autonomous then I can understand it.

PS It seems everyone talks of building protocols, I think its overused as a protocol can mean pretty much any rule-set. I think of data transfer mechanisms for protocols (tcp udp rudp etc.). In terms of other things there are algorithms (crypto etc.). Protocol seems to be what people are calling algorithms and code these days. I do not see maidsafe as a protocol, but it defines one in rUDP, even when we have protocol buffers I do not see those as defining a protocol as such (except internally). So I welcome you to join me in this push against developing protocols, as well as bitcoin 2.0 web 3.0 and all that stuff it gets in the way of actual meaning I think,

3 Likes

There are a few few: client controlled, federated servers, open servers, notary chains, buried keys, oracles, smart contracts, P2P networks, blockchains, and quorums. As stated before we want everything to be modular including our “distributed audits.” Each of them is valid in its own right, some exists and some are being worked on.

So in essence the answer to “who watches the watchmen”, is everyone. System must be open and audible at all times. Just as I can’t magically create an extra 1 million Bitcoins, I can’t fake a hash.

I like the idea of heterogeneous decentralized data networks, which is why I’m not really a fan of competition between any of the decentralized data protocols. Build toolsets and APIs to make the data flow freely between them (obviously there should be no plaintext exposure between networks).

Want to try to Sybil attack? Good luck getting through 11 unique algorithms, and multiple networks.


Yeah I agree. Some time needs to be spent standardizing the names. Perhaps get all the decentralized data guys on a call to hash some things out. If we can agree to call a widget a widget, would make things easier for everyone. If not you end up with DAC/DAO/DAPPs to describe the same thing. Agreed, too many buzz words, so count me in.

1 Like

That’s a lot of consensus, be really good to see how the level of trust is calculated with so many mechanisms that range from full trust of a single person to distributed trust in a blockchain. Interesting problem to solve for sure. I suppose you need to work that out as if one of these is hacked or targeted by any agency then a leak will happen across them all if you share data between them.

None of those mechanism hold any keys to the data. They all operate independently. It ranges from you doing the audits yourself, to census mechanism, then you start to remove the human elements are start to experiment with AI and independent systems.

As far as the heterogeneous networks, the keys should probably stay with the originating network to avoid data leakage.

Yip you have lost me there, if they do not know the data how can they provide auditing ?

1 Like

Yep, I thought that is what you meant… But raises more question for me… like, where do you store the message digest to decrypt the original message in the future?

All the best,