Calculating the probability of data persistence

happybeing · February 6, 2024, 10:03am

I don’t think you would enter a password.

I imagine on starting a node, the software offering to provide a recovery phase to preserve chunks after a shutdown.

The user doesn’t have to write it down but if they do they can use it to resume without losing chunks.

Southside · February 6, 2024, 10:39am

Some years ago, the Scotcoin project investigated using TPMs to enable PoET (Proof of Elapsed Time). Then we found found out most TPMs cannot really be trusted…

Now perhaps things have changed but for now I’d be very suspicious of TPMs

EDIT: and it seems I’d be right to be suspicious Trusted platform module security defeated in 30 minutes, no soldering required | Ars Technica

neo · February 6, 2024, 10:42am

Nah, It is phrased as Node Wallet Password or as @happybeing says a node recovery phrase and just happens to also encrypt the temp key.

Without the node doing a encrypt before storing on the disk, then bad actors can send in the open data that would cause one of many things, like bring the network in disrepute and turn away good people or even attract legislation against running nodes in the worse case. But the internation police would be involved if bad illegal images. (images can be < 1MB)

But also for small files documents could end up not being self encrypted and stored in the clear. This would mean no one can trust Safe to store confidential documents even for larger ones that are encrypted. Perception is key here and the legal and/or business arena will steer clear of something that can expose their confidential data.

peca · February 6, 2024, 11:10am

Not for protecting wallet keys, but good enough for deniability - “I had no way to see what is in those encrypted chunks.”

neo · February 6, 2024, 11:32am

Talking to a consultant who worked with google (& Microsoft) back in the day in the area of dealing with illegal files being stored on their cloud services, I was able to gather a good deal of info on how this illegal data stored on your “cloud” is handled. Google was trying to resist having procedures etc. But the FBI can be forcible.

The law in countries like the UK, USA, AU etc is knowingly storing the illegal files. So basically google only had to have procedures in place to deal with a file when a file becomes known to be illegal material. I am not sure if they had to also scan the files to see if they match the international police’s list of illegal files. The list is a list of hash tags, so giving the list to cloud services was not giving them the actual files.

For Safe this would protect the node operators since the files are chunked and never does a node operator get the actual file even if never encrypted. With self encryption they are even more protected since they have no clue what those chunks hold.

The temp key was introduced to plug the edge cases where a bad actor client or small file that doesn’t get self encrypted or a bad actor bypassing their client’s encryption process and then alert authorities or the media to the existence of such data.

So with the temp key encryption then the node operator can truthfully and morally say they do not know what is being stored on their node. Even with a pass phrase and encrypting the temp key onto the disk the node operator is unawares of it or how to extract the temp key without an excellent knowledge of how it all works and knowledge/capability to reverse it. So unless an app exists on their machine to do this for them and decrypt all the chunks in the hope there is a unencrypted (by the client) chunk that also happens to be something juicy the operator is very unlikely to even be concerned.

The process of accusing the operator becomes extremely tendentious when all is encrypted without the operator having any suitable means to get at the original data.

loziniak · February 6, 2024, 2:01pm

Then you just restart the node afresh, get a new ID, and new, clean data.

Even with no encryption, the operator also can say that he does not know about storing any questionable content. And if he is told about this, he simply restarts the node with another key/ID, problem solved.

This app can ask node operator for password as well, and get access to clientside-unencrypted data. This app can break any simple protection as well, reversed bytes, brute-force keys etc., because Safenet’s source is open and the app can do anything Safenet node does.

loziniak · February 6, 2024, 2:07pm

Isn’t it possible to store “bad content” in small files? Like passwords to child-porn sites, crypto keys to Monero accounts, Al-Qaida’s internal chat messages… Probably someone could come up with some more interesting examples

loziniak · February 6, 2024, 2:17pm

And this complicates the safenet protocol even more. More bugs, slower development etc.

This assumes China going offline for some reason for long enough for people to turn off their computers and reset nodes keys/IDs.

Entire country’s consumer power outage could potentially happen. But perhaps it’s so unlikely, that it’s not a problem for Safenet. And if this happens, well… shit just happens sometime. And people live on.

But in China’s case this could be done on purpose. They are getting more and more independent from the rest of the world, so probably they could do more harm to other countries with such move, than to themselves?

And it wouldn’t have to be power outage, simple network cut-off would do.

I wouldn’t go for the second, because it will make small files (far more frequent I think) occupy even more storage space.

TylerAbeoJordan · February 6, 2024, 3:05pm

If larger files are more likely to fail over time via chunk loss, then would it make sense to have a function that increases replication count as files get larger in size? I’d presume higher charge rate too as maybe a perverse incentive otherwise.

digipl · February 6, 2024, 4:00pm

What I was considering is to let the node encrypt the chunk but forget the key and let the client do the decryption on GETs.

If we can get this encryption key to be derived from the data, without a bad actor being able to choose it, we could have, apart from the increased work, the perfect solution.

loziniak · February 6, 2024, 4:59pm

If you want the data to be confidential, you can encrypt it at the application layer.

Isn’t it that self-encrypted chunk is not decryptable without knowing the address of the file?

dirvine · February 6, 2024, 5:00pm

This might be key to us not having to address the issue? If we have chunks encrypted by default and some bad actor did do something then I think the node operator, if challenged on that chunk just needs to delete it.
We could get into banned lists etc. but it’s more work and they don’t work anyway.

Answer could be it is a non problem and I hope that to be the case and allow the network to fully halt and restart if needed.

TylerAbeoJordan · February 6, 2024, 5:07pm

e.g.: INT(LN(filesize/minsize))+mincopy; where minsize might be 1000MB and mincopy is 5.

This yields:

Profess · February 6, 2024, 5:24pm

I’ve been thinking about running “premium nodes”, and this morning @abbu400 came up with an idea that improves on what I had in mind.

neo:

abbu400:

On setup, the node operator is asked for a password, which is then stored in RAM.

This isn’t the method used at this time, the node generates a random key so that the node operator cannot do anything either and its why data is lost on restart.

BUT it is an excellent solution to the problem of the temp key+node info being decrypted by hackers or bad actors when on disk. Using the “password” to encrypt the random temp key (+node info) on disk.

And if the node operator forgets the password then tough the node has to restart afresh. Also tests on the password supplied can ensure that the password is strong enough for slowing down any attempts to brute force it.

@dirvine Maybe @abbu400 has the compromise that solves the problem of storing the temp key on disk. While a possible inconvenience to the node operators to provide a “password”

(Sorry for the acronyms).
The problem could be solved by using a “hardware password” - what I mean is that when the node starts up, the operator starts an external device (hardware device with password generating software), the temporary key is stored in RAM and the generated password will encrypt the temporary key and the node data on disk. The password is stored in the external device, which can be disconnected from the computer.

This solves several problems:

The use of an external device to generate and store a strong password removes the need for the operator to memorise/store it,
in the event of a physical takeover of the computer by a hacker/thief/authorities, the absence of a password hardware device makes it difficult to read the fragments stored on the node,
in case of accusations of illegal data storage, the user is protected (by full encryption and no technical requirements for running/recovering data fragments on the node),
if the authorities ask for the password, the operator can say that he has lost the device and the cost of losing the data is low,
when the computer is disconnected, the node can be easily restored, which increases the number of copies and, for example, in the case of large files, increases the chances of them working (less chance of losing a chunk that causes the whole file to be lost),
Perhaps these nodes can receive more ranges (XOR) and provide an intermediate link between a regular node and an archive node?
Simplicity and savings (bandwidth and power consumption).

The cost of the hardware device could be very low, e.g. an USB drive, which is relatively cheap even in poor countries, and open-s software for generating passwords is unlikely to be a problem.

This could make “premium nodes” more accessible, and in the early stages of SafeNet they could ‘manually’ provide more stability, reliability (data integrity) and safety, until experience of rof autonomous network operation comes along.

Is this possible?

to7m · February 6, 2024, 6:56pm

could you make a version, where you set the file size and calculate the survival prognosis based on that?

Sure, Safe network file prognosis for given size (editable) - Google Sheets

to7m · February 6, 2024, 7:23pm

Which would inevitably lead to pre-splitting the files in the clients to keep costs to the user down

to7m · February 6, 2024, 7:25pm

We could get into banned lists etc. but it’s more work and they don’t work anyway.

Could you elaborate on this? Blacklists of bad chunk hashes are essential to making sure we can avoid hosting bad content. I can’t governments accepting the network without this kind of functionality in place.

Next post because I’m not allowed 4 consecutive replies:

I still don’t see the point in node data encryption by default.

If you have 2 nodes, one which hosts a bad chunk and one which doesn’t, can anyone explain to me why law enforcement would go after 1 and not the other when the intention is clearly the same?

If the intention isn’t the same, that would imply one of them was refusing to use a blacklist to filter out bad chunks, in which case they could opt-in to node data encryption with a RAM key without having a significant impact on wider network data persistence.

danda · February 6, 2024, 7:37pm

Others may elaborate more.

to7m · February 6, 2024, 8:01pm

The existence of a blacklist and the ability to subscribe to it is not censorship. What’s your alternative then?

neo · February 6, 2024, 11:55pm

I was referring to if LEA was investigating. A bad actor could upload bad stuff, its another thing for a bad actor to install such an app on the nodes machine and run it, so if its there then LEA might have some reason to question further. Not some hacker putting it there since they would just drop the bad files instead. Just so you know the context of my statement and the point of it.

Nothing to the protocol, and was meant as an addon to the client code or as a 3rd party app. It simply uses the normal process and adds extra chunks that are uploaded and adds entries to the datamap. So those using the original datamap can do so (it remains where it is) and the people with the extended datamap can use it for extra protection.

Yep, and as I said I chose 15% for illustrative purposes. The 15% could include all users of a particular windows version that gets an update and stops working. Or malware designed to turn off all computers at 0:00 UTC and there is at least 15% of computers compromised with that malware.

Which is why i used it as a possible reason 15% go offline at once.

This is the point, bad actors who want to get sensitive data can set up a lot of nodes, say in the order of 100,000 to a million or more (ie gov agencies or big business) and have a heyday sucking up the unencrypted data before doing any of the normal node stuff like encrypting it. And then they start piecing together whatever they can. This is a major reason the client does the encryption in the first place.

Additionally as David says the bad actor encrypts the bad chunk and sends the decrypt key instead of an encrypt key. So then the node is decrypting the bad chunk before storing it. So a complete fail at achieving what it was intended to do.

And the ones you want using Safe (includes big business) ignore this since its extra steps and the perception that Safe is unsafe will reign supreme. Ordinary uses are not wanting to problem solve their storage needs and wants the path of least resistance. Having to run apps etc is not least resistance. We might be fine and happy to do that but not most (nearly all) future users that we are trying to attract.

Better safe than sorry. I’ve heard a case in Germany where the game writer was accused of having something on his disk and the police seized all his gear and will take 6 months before even mirroring his drives then similar length before checking the files and possibly returning his gear.

Perception is key, if its perceived that files and/or chunks can be stored unencrypted then it will be a factor in peoples decision to use Safe. May take people a year longer to decide to use Safe if they ever do.

Lets be safe and having the temp key is brilliant

And having a node recovery pass phrase seems to be a simple way to save the temp key (and other node info required for restart) and to recover it. Also better because it can be optional. With the pass phrase used the node can recover and without the pass phrase the node starts afresh.

@dirvine One extra thing I think would help if bad chunks is possible (not using temp key) is that the node wipes the previous chunks before starting. This way there is no chance of bad chunks existing as deleted files.

The problem I see is who will create it? That requires having the bad files in the first place to then pass them through the self encryption process to generate the list. Do you expect the international police to do it? Would they?

Topic		Replies	Views
What about a catastrophic event that wipes out millions of nodes Features	98	6268	February 15, 2018
What are the chances for data loss? Beginners	72	5244	January 27, 2016
Are Erasure Codes (Storj) better than Replication for the SAFE network? Features	109	6712	May 9, 2019
Storage proceeding Beginners	49	4982	October 5, 2015
Possible vulnerabilities Support	17	392	December 13, 2024

Calculating the probability of data persistence

Related topics