Zstd, igzip, and libdeflate are interesting compression utilities.
LibDeflate can be even faster than zstd, another performant library, for certain types of data. It is optimized for chunk compression (ex. 1MB). Could be perfect for self-encryption. It has rust bindings.
"Libdeflate is optimal in applications that have the input data up-front, or when (large) input datasets can be split into smaller chunks "
Update: The igzip library from intel is fast. I wonder how it performs on amd?
Does SAFE currently use file compression? I didn’t think it did. If it does then guessing compression would have to come first, then chunking, then self-encryption.
Iirc, compression is built into the self-encryption concept/process. It may not be yet implemented though. It makes no sense to not compress the chunks. Bandwidth is almost always the limiting factor, and compute is cheap compared to that. Read the docs
→ chunk file
→ Compress chunk
→ take hash
→ use the next chunk hash to encrypt the next chunk
→ Use previous 2 hashes and XOR the next chunk
→ Hash content and make this hash the name of the chunk
So this cyclic process (where the previous 2 chunks data are used to encrpt and xor) continues to the end of the file. At the end the first 2 chunks can finalise and be uploaded too.
From perpexity
MaidSafe’s self-encryption works by employing a system that automatically splits, renames, encrypts, and compresses data using algorithms. This process is based on the data itself, requiring no user intervention or passwords. The encrypted data is then dynamically stored at locations selected by the network, aiming to provide a high level of security without the need for user involvement
1
3 . The self-encryption feature is part of the MaidSafe network, which is characterized by its innovative approach to privacy, security, and freedom for its users
3 . Additionally, the self-encryption system is implemented through a library that provides secure encryption of data, with the encrypted chunks considered as safe as those encrypted by any other modern method
4 .
As long as it is the client who performs the self-encryption, it is not possible for the nodes to know whether the chunk is encrypted or not.
What the nodes do, in the latest testnet, is encrypt the chunk before storing it on disk.
Looks like brotli edges out deflate in this study, but this may not be an accurate comparison for the optimized library by E. Biggers mentioned in the op…