Calculating the probability of data persistence

digipl · February 5, 2024, 10:13am

Yes, the node performs the final encryption with the data sent by the client, but it’s information that has been previously encrypted. The sent key could be derived from the data itself, thus maintaining deduplication.
It would be adding one more layer to avoid the existence of keys in memory and the possibility of a node restart with all the data.

dirvine · February 5, 2024, 10:34am

How then though does client 2 read the data?

neo · February 5, 2024, 10:38am

Also the data is unencrypted in transit allowing nodes to record everything and since there is a lot of textual data, the network then becomes insecure for confidential documents

digipl · February 5, 2024, 11:07am

The datamap should contain the necessary information to download and decrypt the chunks.

The data is pre-encrypted by clients. The final layer of encryption is not meant to protect the data, which is already secured, but rather the nodes

dirvine · February 5, 2024, 11:08am

client 1 needs to give that to client 2 though? The nodes should not have access to it.

digipl · February 5, 2024, 11:58am

The nodes do not have access to the datamap but to a key that is sent along with a data chunk. This operation is represented as encrypt(key, chunk) = chunk2, which is what the node stores. If this key is derived from the data itself, deduplication would be maintained.
In GETs the clients ask for chunk2 and decrypt the chunk.
I don’t see the way Datamaps are managed changing with respect to the current configuration

It’s just a brainstorm, but in my view, the current idea of keys in memory is something that must be avoided.

dirvine · February 5, 2024, 12:14pm

There are ways to do this, but attackers can pass a key that rather than encrypting the chunk actually decrypts it into bad data.

I feel that same, it’s the ying yang part

Profess · February 5, 2024, 8:04pm

dirvine:

neo:

I implore you to implement the safe storage of the temp key & any required info so that the node can restart where it left off. This also makes archive nodes more easily restartable. Maybe using the weak encryption of the temp key and brute forcing it on restart.

I don’t feel we have a good solition here. The ying yang is this

no encryption
Hackers try and store bad data on nodes to poison the network

in ram key
On reboot you lose the data you held.

safe storage of the temp key
I am still not getting this, but if it’s a key on disk that is guessable then it’s the same problem as no encryption as folk will provide apps to read the data and we are back at square 1 again.

As I say I am not 100% on any track just yet, but it’s not my decision either. It’s worth chatting for sure. I was looking at encrypted volumes and such like, all messy and crappy. I have not found a simple solution here.

Another thing I was pondering was chunks that are so small you could not really hide any bad images and certainly not video in them. Then perhaps we can use entropy checks for detecting text, but only plain text and many text formats have a lot of entropy, but then you can check for file header information, but then again random data will sometimes have valid file header looking info too.

So a few things to think of.

It seems to me , that there is a strong fraction of proponents in the community (of which I include myself), of “restarting the node where it left off” e.g. after a loss of power.

IMHO I firmly believe that solving a solution to this problem would allow the network to gain in several important areas (which I don’t think need to be enumerated).

I have an idea in mind… but first I’d like to make sure - do we have a current definition of the conditions an archive node must meet?

This is probably the most up-to-date discussion about archive nodes:

riddim · February 5, 2024, 9:07pm

Well… Tpm2 wouldn’t get erased on restart and would enable encryption without knowing the exact key if I’m not mistaken (?) I just don’t know how access to this functionality would be prevented for other applications (is this really something we need to consider?) and for raspberries this would require additional hardware…

Another point about encryption is that it does sound like nodes currently encrypt all data with one global key via AES or blowfish or something… And decrypt it on replication/data delivery… doesn’t ransomware encrypt files with a single use random seed/key via AES (because that’s fast and not as compute intense as asymmetric encryption) and then appends this key+seed encrypted via RSA (this time a global key) to the file? That last part could be the one en/decrypted via tpm (?) and on data replication the recipients could even let the sender know their public keys and therefore they could get the data already encrypted +just replace the last bytes which would lower the computation cost for nodes?

One could even send the data then encrypted to the client with the same systematic to lower the compute cost on nodes further…

Just some thoughts on data encryption and nodes…

Ps: I hope this doesn’t result in premature optimization… In the end this node data encryption is just so the data is not laying around readable to any editor/image viewer … Just writing the files from end to start in reverted bit order would already result in invalid jpegs/text file encoding…and could survive reboots too

riddim · February 5, 2024, 9:35pm

True but every PC does have tpm2 now for years AFAIK and even raspberries can be equipped with them… Virtual machines can have virtual tpms as well… So not necessarily a huge limitation…

abbu400 · February 5, 2024, 11:12pm

I’ve never posted before, so bear with me if I make a fool of myself.
I’ve been reading all this stuff about the dangers of storing keys on disk versus the problem of data loss if they are stored in RAM. As for temp keys, I don’t understand how they are generated or how they would solve the dilemma. But a solution occurs to me which, in my ignorance, seems to be obvious, so it’s probably a non-starter, in which case I’d be interested to know why:
On setup, the node operator is asked for a password, which is then stored in RAM. All keys are also stored in RAM. Then using the password, the computer encrypts the keys and stores these on disk. After eg a power outage the computer forgets the keys and the password. But it still has the encrypted keys, so the node operator needs only tell it the password with which it can decrypt the keys then store them again in RAM. At no time would the node operator have access to the actual key values. And it would take ages for a third party to bruteforce the password because erroneous decryption of the keys would not be immediately obvious.

neo · February 5, 2024, 11:22pm

RPi, and the many other small computers not based on intel? Even 3 year old CPU’s from AMD didn’t have TPM on the motherboard although it could be installed

Never. The only foolish question is one that isn’t asked.

This isn’t the method used at this time, the node generates a random key so that the node operator cannot do anything either and its why data is lost on restart.

BUT it is an excellent solution to the problem of the temp key+node info being decrypted by hackers or bad actors when on disk. Using the “password” to encrypt the random temp key (+node info) on disk.

And if the node operator forgets the password then tough the node has to restart afresh. Also tests on the password supplied can ensure that the password is strong enough for slowing down any attempts to brute force it.

@dirvine Maybe @abbu400 has the compromise that solves the problem of storing the temp key on disk. While a possible inconvenience to the node operators to provide a “password” (maybe a better name is needed) for their nodes when starting up the nodes.

happybeing · February 5, 2024, 11:30pm

Excellent first post. There’s probably no perfect solution to this but that seems close.

TylerAbeoJordan · February 6, 2024, 1:15am

While a password is a nice option, I’d still vote for a config that allows all decent options. Such makes it difficult for a bad actor to predict the outcome and so increases the theoretical attack cost.

abbu400 · February 6, 2024, 3:02am

Glad to have been of service!

mav · February 6, 2024, 8:37am

Good discussion of a very similar feature for bitcoin obfuscation keys which ‘encrypts’ bitcoin data on disk to prevent it being removed by antivirus

github.com/bitcoin/bitcoin

Obfuscate database files

opened 02:03PM - 01 Sep 15 UTC

closed 03:51PM - 06 Oct 15 UTC

laanwj

Windows UTXO Db and Indexes

To avoid problems on Windows with Anti-Virus software, there needs to be an opti…on to obfuscate the keys/values written to the database files, especially the UTXO database. See #4069 for discussion. It should be really simple, just enough to make it useless to put AV signatures in transactions. E.g. generate a random key on first start, store the key in the database, then XOR all subsequent data read/written to the database with that. Possibly this obfuscation could include the block files as well, although I've never heard of problems with those - the most likely explanation is that AV software doesn't consider files above a certain size.

github.com/bitcoin/bitcoin

SAV reported bitcoind infected w Silly.218, crashed bitcoind (likely false positive)

opened 04:00AM - 19 Apr 14 UTC

closed 03:54PM - 06 Oct 15 UTC

ghost

Windows

OS: Win 7 x64 Bitcoin core: 0.9.1 While running bitcoind.exe --reindex: `c:\Pr…ogram Files\Bitcoin\daemon>bitcoind.exe --reindex` `Error: System error: Database I/O error` Symantec Anti Virus detected Silly.218 in `chainstate\052878.sst` directory (other names under which the malware is known: Virus.DOS.Dutch_Tiny.163.a (Kaspersky), Silly.218 (Symantec), Tiny-Family #3 (Avira)). Hash of the pattern detected: `E272F4FF4AD99D1C48C4888990893FC6193DB1CB9849C69B1710069BBD047E0D` As the file was automatically quarantined and deleted (I can't change SAV default settings), that crashed bitcoind and corrupted the bitcoin DB. Internet search on this yielded no results. This looks like a false positive.

digipl · February 6, 2024, 8:49am

I believe that in some countries (Australia,UK, Canada…) authorities may demand decryption keys under threat of imprisonment.

happybeing · February 6, 2024, 8:56am

That’s true but not the threat this is trying to mitigate. Which is people discovering dodgy content themselves and others using stories of that to discourage node running.

Those countries don’t generally use those laws to entrap innocents, and none of the solutions presented so far would stop the kind of attack you mention (heavy handed / sophisticated law enforcement). So in terms of effectiveness this seems a good option.

digipl · February 6, 2024, 9:19am

I don’t believe that people typically make a habit of checking whether the chunks they store contain anything illegal without encryption.

The problem lies primarily with the authorities in the event that this is the case, and that isn’t resolved if they can demand the decryption key.

The best solution still remains for nodes not to store any keys and for clients to handle this task.

peca · February 6, 2024, 9:22am

I am not saying it wrong solution, just a thought… Entering password could be seen as actively hiding something, with automatically generated keys it is more “it just works that way”.

Topic		Replies	Views
What about a catastrophic event that wipes out millions of nodes Features	98	6268	February 15, 2018
What are the chances for data loss? Beginners	72	5244	January 27, 2016
Are Erasure Codes (Storj) better than Replication for the SAFE network? Features	109	6712	May 9, 2019
Storage proceeding Beginners	49	4982	October 5, 2015
Possible vulnerabilities Support	17	392	December 13, 2024

Calculating the probability of data persistence

Related topics