Impressive chunk based compression libraries

jlpell · February 3, 2024, 1:16pm

Zstd, igzip, and libdeflate are interesting compression utilities.

LibDeflate can be even faster than zstd, another performant library, for certain types of data. It is optimized for chunk compression (ex. 1MB). Could be perfect for self-encryption. It has rust bindings.

"Libdeflate is optimal in applications that have the input data up-front, or when (large) input datasets can be split into smaller chunks "

Update: The igzip library from intel is fast. I wonder how it performs on amd?

jlpell · February 3, 2024, 1:31pm

Here are some benchmarks. Libdeflate is rather impressive but igzip was not bested when it comes to brute speed. I added igzip to the OP.

github.com/zlib-ng/zlib-ng

Benchmark: zlib-ng vs isa-l, zlib, libdeflate, brotli

opened 05:06PM - 10 May 23 UTC

powturbo

discussion

[TurboBench](https://github.com/powturbo/TurboBench) : Build or download **[exec…utables](https://github.com/powturbo/TurboBench/releases)** and test with your own data. **Benchmark1:** [TurboBench: Dynamic/Static web content compression benchmark](https://github.com/powturbo/TurboBench/issues/43) **Benchmark 2:** [turbobench](https://github.com/powturbo/TurboBench) silesia.tar -eigzip,0,1,2,3/zlib_ng,1,3,6,9/libdeflate,1,3,6,9,12/zlib,1,3,6,9/memcpy Hardware: Lenovo Ideapad 5 pro - Ryzen 6600hs / (bold = pareto) MB=1.000.000 |C Size|ratio%|C MB/s|D MB/s|Name| |--------:|-----:|--------:|--------:|----------------| |64677910| 30.5|**7.47**|**1133.66**|**libdeflate 12**| |66715898| 31.5|**43.04**|1116.39|**libdeflate 9**| |67511452| 31.9|**119.35**|1127.36|**libdeflate 6**| |67644075| 31.9|15.55|483.82|zlib 9| |68152563| 32.2|27.79|734.94|zlib_ng 9| |68228660| 32.2|37.74|478.69|zlib 6| |68914854| 32.5|92.33|735.24|zlib_ng 6| |70166917| 33.1|**185.35**|1110.18|**libdeflate 3**| |71068342| 33.5|**203.57**|1085.40|**libdeflate 2**| |72490921| 34.2|138.61|694.45|zlib_ng 3| |72968832| 34.4|86.50|480.09|zlib 3| |73505577| 34.7|**288.56**|1075.72|**libdeflate 1**| |75138353| 35.5|271.18|1080.75|igzip 3| |76571415| 36.1|**598.69**|1047.07|**igzip 2**| |77260023| 36.5|127.47|448.95|zlib 1| |78154519| 36.9|**615.11**|1020.09|**igzip 1**| |87551010| 41.3|**638.49**|969.43|**igzip 0**| |100929713| 47.6|329.63|651.73|zlib_ng 1| |211948544|100.0|16146.00|16117.76|memcpy|

TylerAbeoJordan · February 3, 2024, 1:37pm

Does SAFE currently use file compression? I didn’t think it did. If it does then guessing compression would have to come first, then chunking, then self-encryption.

jlpell · February 3, 2024, 1:40pm

Iirc, compression is built into the self-encryption concept/process. It may not be yet implemented though. It makes no sense to not compress the chunks. Bandwidth is almost always the limiting factor, and compute is cheap compared to that. Read the docs

dirvine · February 3, 2024, 2:54pm

YEs self encryption does this

→ chunk file
→ Compress chunk
→ take hash
→ use the next chunk hash to encrypt the next chunk
→ Use previous 2 hashes and XOR the next chunk
→ Hash content and make this hash the name of the chunk

So this cyclic process (where the previous 2 chunks data are used to encrpt and xor) continues to the end of the file. At the end the first 2 chunks can finalise and be uploaded too.

From perpexity
MaidSafe’s self-encryption works by employing a system that automatically splits, renames, encrypts, and compresses data using algorithms. This process is based on the data itself, requiring no user intervention or passwords. The encrypted data is then dynamically stored at locations selected by the network, aiming to provide a high level of security without the need for user involvement
1

3
. The self-encryption feature is part of the MaidSafe network, which is characterized by its innovative approach to privacy, security, and freedom for its users
3
. Additionally, the self-encryption system is implemented through a library that provides secure encryption of data, with the encrypted chunks considered as safe as those encrypted by any other modern method
4
.

jlpell · February 3, 2024, 5:28pm

Do the current test nets check if a chunk has been encrypted?

digipl · February 3, 2024, 7:57pm

As long as it is the client who performs the self-encryption, it is not possible for the nodes to know whether the chunk is encrypted or not.
What the nodes do, in the latest testnet, is encrypt the chunk before storing it on disk.

As for compression, self-encryption use brotli.

github.com

maidsafe/self_encryption/blob/431382c8ad2edd068d5cfbfa676af058c1ffd685/src/encrypt.rs#L98


      
          }
          
          /// Encrypt the chunk
          pub(crate) fn encrypt_chunk(content: Bytes, pki: (Pad, Key, Iv)) -> Result<Bytes> {
              let (pad, key, iv) = pki;
              let mut compressed = vec![];
              let enc_params = BrotliEncoderParams {
                  quality: COMPRESSION_QUALITY,
                  ..Default::default()
              };
              let _size = brotli::BrotliCompress(
                  &mut Cursor::new(content.as_ref()),
                  &mut compressed,
                  &enc_params,
              )
              .map_err(|_| Error::Compression)?;
              let encrypted = encryption::encrypt(Bytes::from(compressed), &key, &iv)?;
              Ok(xor(&encrypted, &pad))
          }

jlpell · February 7, 2024, 2:01am

5 posts were split to a new topic: Proof of Encryption

jlpell · February 4, 2024, 3:34am

Looks like brotli edges out deflate in this study, but this may not be an accurate comparison for the optimized library by E. Biggers mentioned in the op…

jlpell · February 4, 2024, 3:52am

jlpell · February 10, 2024, 2:30am

Topic		Replies	Views
Compression on SAFEnet Development	6	1484	October 14, 2016
Low-level filesystem integration Features	13	1630	January 25, 2016
Questions about Chunks Features	1	1241	August 10, 2015
Self_encryption web-based tool Apps	10	627	May 18, 2021
Self_encryption and compression, questions and thoughts Features	67	1358	May 3, 2021

Related topics