Calculating the probability of data persistence

Yes, the node performs the final encryption with the data sent by the client, but it’s information that has been previously encrypted. The sent key could be derived from the data itself, thus maintaining deduplication.
It would be adding one more layer to avoid the existence of keys in memory and the possibility of a node restart with all the data.

How then though does client 2 read the data?

Also the data is unencrypted in transit allowing nodes to record everything and since there is a lot of textual data, the network then becomes insecure for confidential documents

1 Like

The datamap should contain the necessary information to download and decrypt the chunks.

The data is pre-encrypted by clients. The final layer of encryption is not meant to protect the data, which is already secured, but rather the nodes

client 1 needs to give that to client 2 though? The nodes should not have access to it.

1 Like

The nodes do not have access to the datamap but to a key that is sent along with a data chunk. This operation is represented as encrypt(key, chunk) = chunk2, which is what the node stores. If this key is derived from the data itself, deduplication would be maintained.
In GETs the clients ask for chunk2 and decrypt the chunk.
I don’t see the way Datamaps are managed changing with respect to the current configuration

It’s just a brainstorm, but in my view, the current idea of keys in memory is something that must be avoided.

1 Like

There are ways to do this, but attackers can pass a key that rather than encrypting the chunk actually decrypts it into bad data.

I feel that same, it’s the ying yang part

4 Likes

It seems to me , that there is a strong fraction of proponents in the community (of which I include myself), of “restarting the node where it left off” e.g. after a loss of power.

IMHO I firmly believe that solving a solution to this problem would allow the network to gain in several important areas (which I don’t think need to be enumerated).

I have an idea in mind… but first I’d like to make sure - do we have a current definition of the conditions an archive node must meet?

This is probably the most up-to-date discussion about archive nodes:

2 Likes

Well… Tpm2 wouldn’t get erased on restart and would enable encryption without knowing the exact key if I’m not mistaken (?) I just don’t know how access to this functionality would be prevented for other applications :face_with_monocle: (is this really something we need to consider?) and for raspberries this would require additional hardware…

Another point about encryption is that it does sound like nodes currently encrypt all data with one global key via AES or blowfish or something… And decrypt it on replication/data delivery… doesn’t ransomware encrypt files with a single use random seed/key via AES (because that’s fast and not as compute intense as asymmetric encryption) and then appends this key+seed encrypted via RSA (this time a global key) to the file? That last part could be the one en/decrypted via tpm (?) and on data replication the recipients could even let the sender know their public keys and therefore they could get the data already encrypted +just replace the last bytes which would lower the computation cost for nodes?

One could even send the data then encrypted to the client with the same systematic to lower the compute cost on nodes further…

Just some thoughts on data encryption and nodes…

Ps: I hope this doesn’t result in premature optimization… In the end this node data encryption is just so the data is not laying around readable to any editor/image viewer … Just writing the files from end to start in reverted bit order would already result in invalid jpegs/text file encoding…and could survive reboots too

2 Likes

True but every PC does have tpm2 now for years AFAIK and even raspberries can be equipped with them… Virtual machines can have virtual tpms as well… So not necessarily a huge limitation…

2 Likes

I’ve never posted before, so bear with me if I make a fool of myself.
I’ve been reading all this stuff about the dangers of storing keys on disk versus the problem of data loss if they are stored in RAM. As for temp keys, I don’t understand how they are generated or how they would solve the dilemma. But a solution occurs to me which, in my ignorance, seems to be obvious, so it’s probably a non-starter, in which case I’d be interested to know why:
On setup, the node operator is asked for a password, which is then stored in RAM. All keys are also stored in RAM. Then using the password, the computer encrypts the keys and stores these on disk. After eg a power outage the computer forgets the keys and the password. But it still has the encrypted keys, so the node operator needs only tell it the password with which it can decrypt the keys then store them again in RAM. At no time would the node operator have access to the actual key values. And it would take ages for a third party to bruteforce the password because erroneous decryption of the keys would not be immediately obvious.

11 Likes

RPi, and the many other small computers not based on intel? Even 3 year old CPU’s from AMD didn’t have TPM on the motherboard although it could be installed

Never. The only foolish question is one that isn’t asked.

This isn’t the method used at this time, the node generates a random key so that the node operator cannot do anything either and its why data is lost on restart.

BUT it is an excellent solution to the problem of the temp key+node info being decrypted by hackers or bad actors when on disk. Using the “password” to encrypt the random temp key (+node info) on disk.

And if the node operator forgets the password then tough the node has to restart afresh. Also tests on the password supplied can ensure that the password is strong enough for slowing down any attempts to brute force it.

@dirvine Maybe @abbu400 has the compromise that solves the problem of storing the temp key on disk. While a possible inconvenience to the node operators to provide a “password” (maybe a better name is needed) for their nodes when starting up the nodes.

6 Likes

Excellent first post. There’s probably no perfect solution to this but that seems close.

6 Likes

While a password is a nice option, I’d still vote for a config that allows all decent options. Such makes it difficult for a bad actor to predict the outcome and so increases the theoretical attack cost.

3 Likes

Glad to have been of service!

7 Likes

Good discussion of a very similar feature for bitcoin obfuscation keys which ‘encrypts’ bitcoin data on disk to prevent it being removed by antivirus

4 Likes

I believe that in some countries (Australia,UK, Canada…) authorities may demand decryption keys under threat of imprisonment.

2 Likes

That’s true but not the threat this is trying to mitigate. Which is people discovering dodgy content themselves and others using stories of that to discourage node running.

Those countries don’t generally use those laws to entrap innocents, and none of the solutions presented so far would stop the kind of attack you mention (heavy handed / sophisticated law enforcement). So in terms of effectiveness this seems a good option.

1 Like

I don’t believe that people typically make a habit of checking whether the chunks they store contain anything illegal without encryption.

The problem lies primarily with the authorities in the event that this is the case, and that isn’t resolved if they can demand the decryption key.

The best solution still remains for nodes not to store any keys and for clients to handle this task.

I am not saying it wrong solution, just a thought… Entering password could be seen as actively hiding something, with automatically generated keys it is more “it just works that way”.

1 Like