General Questions on Licensing, Network Security,

With the post I started the other day “Is the SAFE Network Safe?” and now this one I hope people don’t think I’m just here to cause trouble. Believe it or not I’m actually one of the most easy going guys around! Also don’t worry, I’m not going to be the guy who’s constantly trying to challenge you.

As I mentioned in my last post I’ve been following MaidSafe’s progress for a number of years now and find the technology very exciting (but admit I don’t know the details as intimately as I maybe should). If it can work, and also does no harm (I worry it may facilitate harm but will keep this post free of that) it will be fantastic but I recognise that the task is a mammoth one. The technology will no doubt improve over the years however certain core principles will have to be “set in stone” from the start as it seems backtracking may not be an option in certain areas.

I know very well that it’s much easier to ask “awkward” questions than it is to create things! However, if the SAFE Network (or another like it) becomes as ubiquitous as you all no doubt you want it to be then my life, the life of my children, their children, … could be affected. So the network is very important and it’s only right that questions, that may not be popular, are asked at the outset.

I’m sure you can understand that it’s hard to sift through all information that’s available, so I apologise if the answers to my queries are available elsewhere.

→ How is the licensing model enforced? How do you track who is and isn’t paying the 1% fee and then make sure those that aren’t paying up can’t use the network? I’m guessing that safecoin factors in here somewhere? Maybe all/certain safecoin transactions attract a fee that goes straight to MaidSafe? What happens if a service on the network doesn’t ask its customers to pay with safecoin but gains revenue by other means? Can a service that uses the network be deliberately starved of safecoin? If you can halt a service for not paying the fee then what’s to stop you stopping a service for some other reason?

→ How are nodes within the network kept updated? Suppose I found a way to exploit the network in some way; created a worm, found a potential buffer overflow, etc. then how do you ensure that all nodes are patched quickly? I’m guessing you use an auto-update mechanism so that node owners don’t have to worry about this. However can you guarantee that there won’t be a problem with the auto-update process and some nodes could remain unpatched indefinitely, potentially putting the entire network at risk?

→ I assume I can determine the IP addresses of other nodes on the network that I connect to (even if I just fire up WireShark)? I’m not too up to speed on your NAT traversal techniques but could I use your own open source code against the network? I’ll know which sockets the other nodes are listening on, I can scour your client node code for weak points (like opportunities for buffer overflow exploits), figure out how NAT traversal works and then launch an attack on the nodes I know about.

→ How do users authenticate themselves on the network and can developers of services on the network mess this up, say by caching credentials? Could I potentially find files on a node and then transfer them across to mine and masquerade as them? This is of course possible on the Internet however I don’t have a single Internet identity so if someone discovers my Gmail credentials that doesn’t mean they can access my Dropbox account…unless I’m mistaken I’d share my SAFE account across services (but have the option to have multiple identities if I wish – but guess most people wouldn’t do this).

→ I understand that the network uses techniques like Dropbox, etc. so that it doesn’t need to store too many copies of the same data…there’s no point in storing 1m copies of the same MP3. However how well does it cope with capacity reductions? I’ll give two examples of what I mean:

  1. I have a 1 PB drive and offer it up to the network. I fill the entire drive meaning potentially 1 PB of my unique data is within the network. Now my drive breaks down and I can’t afford to replace it, I’m back to a 1/2 TB drive. Can I still access the PB worth of data that I stored on the network previously since I’m no longer offering this much in return? If I can then does this mean there’s scope for abuse? If I can’t then is that fair?

  2. I am a large corporation and am offering a crazy amount of storage to the network and I also use my entire quota within the network. At some point I leave the network, taking my disks with me. I believe that my data remains on the network and also I’ve just reduced the overall capacity significantly. A similar thing could happen if a new distributed network became available and people migrate en masse from SAFE to this other one…think Facebook poaching everyone from the previous generation of social networking services. Could the people that still use the network end up losing data?

→ How does the bootstrapping process work? When I first create a node how do I get onto the network without knowing where to look? I guess after the initial install and bootstrap you could store a cache of the nodes you’ve seen in the past and then use whichever are available for bootstrapping subsequently but I’m not sure how you would achieve it after the initial install. I suppose you could dynamically allocate a set of current good node IP addresses during the installation process (assuming install happens at the same time as downloading the installer) but you very likely do something else…I’d be interested to hear.

I’m sure I could spend all day thinking up questions, the licensing model was the one I really wanted to ask but figured why not throw in a couple extra…I’ll leave you in peace now and try not to become a nuisance!

No worries, yes these are all answered, but all over the place (although many times). We are changing our web site to reflect documentation and centralise it for easy access. Many of the tech points will be answered in the dev wiki, code and http://maidsafe.net/SystemDocs/ Reddit is full of these (search for me dirvine) as well as this forum (a large duscussion on 1% is here 1% dev fee goes to Maidsafe the company or maidsafe the charity - Safe Network Forum )

There is so much documentation and answers now that we are flooded with it all. There are many papers on google scholar (search for maidsafe) and a recent published paper on attacks posted Redirecting to Google Groups

I hope some of this helps.

For a quick rundown of some of this then even just read this page alone maidsafe.net/SystemDocs/system_components/autonomous_network.html

Thanks for the links. Have started reading the “SystemDocs”, I think even with these it’ll be hard for me to work out the answers to some of the more subtle points to my questions but will give it a go.

In terms of the 1% discussion you linked to. I’ve read it a couple of times now but am still pretty vauge. @happybeing suggests that there may be some form of community voting and possibly even the MS Foundation would have a final say on licencing issues…I think you go onto say that this wouldn’t necessarily be needed, but I’m not sure why. This does fit more with what I’ve heard in my other thread though about it being impossible to shut things down.

From what I’ve gathered I can develop applications without an API key. How do MaidSafe know that I’m releasing an application that uses the SAFE Network and therefor expect to collect the 1%? How do you know if I go crypto and/or open source or not? How do you stop me from breaching the licence?

We take the mysql type route, if you steal please steal from us and when you are large enough then you will start to obey laws etc. Exposure as an IP theft is not good for large businesses and small non crypto closed companies may not be able to afford a license (though at 1% its not hard). We do not want to stop anyone innovating, this is paramount to me in particular. We will not monitor or attempt to either, its against the wishes of everyone I believe.

We want to encourage strongly open source innovations and crypto/decentralised based companies. So we compromise.

1 Like

Very good, that clears that question up nicely for me :smile:

I’ve spent quite a bit of time reading through the documentation. I’ve very briefly delved into code but I’m sure you’ll understand that it’s not going to be easy for me to answer my questions without spending weeks/months looking at it.

In terms of bootstrapping and NAT traversal I’ve read the “DHT NAT Traversal” paper and the little bit of info that’s in “SystemDocs”. It seems that “SystemDocs” contains a slightly updated form of a snippet from the paper but I’m a little unclear on what it’s saying - have you decided to start using TURN now to get around the symmetric router problem you were having before or not? Anyway, I’ve not seen any mention of how bootstrapping nodes are initially identified, i.e. immediately after node installation and you want to get onto the network for the very first time.

Unfortunatly the “Security of the MaidSafe Vault Network” paper doesn’t really cover the questions I had; things like poorly designed applications offering the opportunity to masquerade, the system update procedure, exploiting peers, …

Also I’m still unclear as to how things would work when people leave the network. I’ve read that there have been considerations on how to try and prevent people from deceiving the system into thinking they’re offering more space than they have but I guess my questions are more about the consequences of honest people who leave or reduce their network offering after storing large volumes of data.

Finally before tonight I pretty much assumed (without any consideration) that the algorithm used to de-duplicate data in the network was like a basic compression algorithm - not that I know much about either type of algorithm so what I was thinking is quite probably completely impractical! I was thinking that files would be chunked in such a way so that common elements between UniqueFileA and UniqueFileB could be identified and stored as chunks. The comparison with basic compression is where all the common elements from a file being compressed are found. From reading up on this I’m now thinking that in order for de-duplication to occur the files have to be exactly the same. Where did the estimate that 75% of all files within the network would be duplicates come from? I suppose this could be true in a file sharing P2P network but with systems like Dropbox people will be primarily storing data that’s unique to them and not shared; holiday snaps, spreadsheets, etc. If everyone has mostly unique data and they all use up most of their allowed storage space then will this cause problems?

Thanks

1 Like

No, because each file is encrypted on the client. And also you can’t tell where each “file” is on the network.

I/O flows over the network over intermediate nodes.
You can’t find out IP addresses of hosts which have chunks of data you’re reading.
But feel free to try.

You would have to pay Maidsafe to farmers to download that data.
Of course it’s fair, noone on the network is in any way beholden to you.

Look in Examples folder on Maidsafe Github.

The s/w will make random checks of vault conditions and cheaters will be economically punished.

Depends on what is stored and if people store movies, ISO files and so on, dedup ratio for those will be 99%. For other files it’ll be 0%. It’s impossible to know what will be the ratio of each file, so it’s pointless to discuss that. Of course this dedupe method won’t be very efficient but it’s better than nothing and will have to do until a better approach is devised.
Of course that won’t cause any problems for anyone willing to pay for more.

If it didn’t then, it does now to a decent extent (at least in terms of exploits/attacks/security).
There’s no need to obsess around system update procedures. Systems will be updated as needed (meaning, as the development decides; normally it’ll be a simple program restart, and in bad cases it may require “defrag-like” reorganization of vaults and so on). Does this matter to non-developers and how?

Thanks for the input @janitor

Here I was talking more about if I had physical access to a node. If my friend uses “SAFEBook” and this app stores details of my friends SAFE credentials to a file somewhere. Could I take this file, stick it on my computer and then masquerade as them? This question was an extension to the one that immediatly preceeded it - can app developers mess up and compromise security? Especially as you’ve a single account on the network that all apps share it could obviously be pretty dangerous.

I’m not sure if we’re talking about the same thing here. You can find out the IP address of the nodes you are immediatly connected to, i.e. the ones delivering the files to you, i.e. the last hop.

You’re not answering my question. That was a statement within the question, I know the network caters for this.

No, but is may well cause problems if many people leave the network and there are swathes of data floating around that nobody owns. This is just wasting network capacity.

Yes, it’s a potential security risk. See my question in the original post about system update.

We can’t say it’s impossible - same rules always apply - do not install random apps… As you know even (well, maybe I shouldn’t say “even”) Google distributed apps that were deliberately poorly designed (or worse).
In technical terms, apps can and should be written so that credentials cannot be “remembered” (stored). You can see that in some of MaidSafe videos on YouTube. You would have to enter a pin and then your password (I saw that in one of those videos, probably the one linked from the top sticky post on this forum).

Yes, those you can, but those aren’t the same nodes that are hosting the files. I think we agree on that, right?

You can launch an attack on your neighboring nodes, but since the s/w isn’t available to try, I can’t guess the extent of damage that can possibly be done that way.
In any networked service you always have neighboring nodes and you can always try to attack them. Because of that all networking services have some measures that prevent exactly that.
I didn’t read the API or protocol details and IIRC there are several types of services (network layout, file transfer, etc.) so yes, you could send malformed packets or try to flood them with requests, but that’s probably on top of their list of things they’d defend from.
You send too many fake or a single malformed request and they drop you for 1 minute. You do it again, they increase to 4, then 16, 25… Or they simply ban your IP from accessing the network for 60 minutes. MaidSafe can already defend itself from many copycat attacks that have been seen before.
I’d be surprised if they didn’t check all recent attacks that worked against BitTorrent and other file sharing networks and made sure they’re not exposed to the same vectors.

Some of the most common are covered here:
https://github.com/maidsafe/SystemDocs/tree/master/attacks

(I think if you run MaidSafe on Tor/Onion, then you can’t know your immediate neighbors).

I’m not sure they need to do that (although I can’t say they don’t).
They can simply send one of four chunks your way. The first time you can’t actually save the chunk you’ve accepted claiming you have enough space, they can bust you ass (they’d probably demote you to a lower priority level and do so every time you get smart until you stop getting any business; there’s only 4 levels and you you have to work hard to get to the top level, so the idea is self-defeating).
And of course they know (and you don’t) the hash you’re supposed to return because the client who owns the chunk knows it, and all 4 nodes that want to store a copy of the chunk must return the same hash).

Yes they are wasting, but that has been discussed on the forum: if you pay for something, there’s no expiration.
They could make a default (say, 5 years) and delete if you don’t pay up to extend the life of those files.
This feature (as I understand it) has not been been implemented yet, but they don’t have to do it now - they can start in year 4 :smile:
As time goes by they’ll see what proportion of files haven’t been accessed at all, and so on, and based on many factors (the prices, supply/demand, etc.) decide what needs to be done and how/when. It’s a known concern and they’ve been aware of it.

It’s not a new trick. All networked software does the same thing - you can just copy GPL code from various clients that implement that (and it’s not only file sharing networks, but any services):

  • Implement version info in API
  • Implement version checking on the network
  • Configure services to accept (partially or completely) or not (in case of security problems) communication from older clients

So if a security bug is found in v1.17, they release v1.18 and set it to refuse connections from v1.17 and lower and (if they add this functionality - I haven’t checked, maybe they already have it) have all clients on the network automatically update and restart like Windows does these days when a restart is required.
In case of a responsible disclosure you’ll have your client auto-updated before you even hear about any security problem with the SAFE network. If you refuse, you won’t be able to connect anywhere.

But even in case of new exploits unknown to MaidSafe, they could maybe take the network down or something like that.
Personally I think it’s reasonably to expect a 99.9% uptime from the SAFE network, so if the network is down up to 3 days a year, I wouldn’t care. For hackers to do significant damage they’d have to take control of users’ accounts and thereby data, not just crash their service or prevent them from downloading some silly video. But that’s not easy to do.

I think Windows and bitcoin is a good example of what to expect - of course millions of careless users whose computers are already hacked today will become zombie MaidSafe nodes the moment they install the s/w, but people who follow sensible security guidelines most likely won’t.

You can poke around the code or even try to install the s/w (Home · maidsafe-archive/MaidSafe Wiki · GitHub) and see how much damage you can do to it.

Security bugs appear in almost all software all the time, and are usually disclosed to the developers in a responsible manner. I don’t expect any major disasters.

Simple personal policy:

  1. Private & personal data: do not put on the SAFE network
  2. Semi-private & semi-personal: encrypt before putting on the SAFE network
  3. Other: put on the SAFE network
  4. As the s/w matures, relax the policy as needed
1 Like