Skip to content

KeibiSoft/NotesOnCryptographicPrimitives

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Practical Information Security Tradeoffs: choose the right primitive for immediate value

“There’s no such thing as 100% security.”, there is always a tradeoff between Confidentiality, Integrity and Availability when it comes to data. When it comes to files in production, always choose the tool that meets the risk profile for them. Do you need extremly fast checks to see if the files were modified? Or do you need cryptographic strengh integrity checks for anti-tampering? You care more about the confidentiality of the files? Or maybe a combination of all this 3 variables.

This post explores techniques and tradeoffs (with runnable benchmarks) when it comes to file confidentiality, integrity and accessibility.

Premise / audience

Target readers: engineers, DevOps, SREs, security architects, CTOs who want actionable guidance. Problem: teams often pick the “strongest” crypto (SHA‑512, longest key lengths, slow checks) or the “simplest” fastest check without matching risk. That leads to either over‑investment (high infra cost), under‑operated complexity, or false confidence.

Goal: show when to use SHA‑512, AES‑GCM, ChaCha20‑Poly1305, and a fast non‑cryptographic hash (Murmur) for real tasks: availability checks, tamper detection, and encryption. Provide 4 different approaches and benchmark them in two programming languages Haskell and rust.

People say "make this as secure as Fort Knox"

People say "make this as secure as Fort Knox", but security is risk management and operational capacity. For file pipelines (CDN, object storage, backups), the useful questions are:

  1. Is the file available for download? (Availability)
  2. Has the file been tampered with / changed? (Integrity)
  3. Is the file encrypted (confidentiality)? (Confidentiality + authenticity)

These are different requirements. We will incrementally build the techniques used in the next sections in order to address the above requirements and the following question: Whats the cost of processing this file? (Without going into actual monetary value, but more towards the computing resources value)

The availability of files

If you just care about the availability of the files, then you just need to store and serve the files on each request. This is good usually for files for public use. Thus, no need to encrypt them, nor hash them, nor anything. The End.

The integrity of the files

If you care about the integrity of the files, then hash the file and if any bit changes, the hash also changes. The End. Or not?

Here there are two different types of integrity checks:

*Cryptographic integrity checks, that are resistant to collisions. Which means that given a file and a hash for it, an attacker finds it very hard to modify the file such that the new hash of the file is the same as the old one.

The industry standards for such hash functions are usually the ones approved and recommended by NIST. More details about the strength of differnt cryptographic hash functions can be found on the NIST Computer Security Resource Center

Let's take SHA-512 for a closer insepction. This one is the "strongest" from the NIST list in terms of how hard is to find a collision for it.

By using SHA-512, usually the trade-off is in terms of CPU time, thus it might not work very well in low latency environments.

For example just by running the speed test from openssl speed sha512

Note: openssl speed uses in-memory buffers and reports MB/s for various block sizes.

The type represents blocks of bytes.

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha512           34411.15k   134991.65k   318518.91k   565828.85k   752341.73k   768283.57k

Now we also benchmarked two implementations for the SHA512, written in rust and haskell.

For 1G file of random data, as follows:

  1. Load the full file in memory and Hash it: (rust) SHA512 full avg over 10 runs: 2.750s (haskell) SHA512 full: 2.957566s

  2. Be RAM conscious and read the file in system block size chunks (which for my case is 4096 - you can find using diskutil info /) (rust) SHA512 4K blocks avg: 2.733s (haskell) SHA512 4K blocks: 3.950727s

  3. Be a bit less RAM conscious and use linux cp default block size (128 KiB): (rust) SHA512 128K blocks avg: 2.409s (haskell) SHA512 128K blocks: 3.365536s

  4. Be even less RAM conscious and use 10M as block size (I found out that on Mac and Windows works better in terms of read/write speed.) (rust) SHA512 10M blocks avg: 2.484s (haskell) SHA512 10M blocks: 2.770903s

Ok, but what if you do not need cryptographic strengh hash functions? What if you mostly care about checking if a file changed, or data was missing, or you just want to use it as a key to a hashtable and not because of an attacker modified it.

There is a class of non-cryptographic functions that fill this use case, and my favorite is Murmur link to wiki, and it is my favorite mostly because there is also an unrelated Romanian clothing brand named murmur.

Now let's also run the same benchmark for Murmur (Hash)

Now we also benchmarked two implementations for the SHA512, written in rust and haskell.

For 1G file of random data, as follows:

  1. Load the full file in memory and Hash it: (rust: murumur 3) Murmur full avg: 0.790s (haskell: murmur 2) Murmur full: 3.036555s

  2. Be RAM conscious and read the file in system block size chunks (which for my case is 4096 - you can find using diskutil info \) (rust: murmur 3) Murmur 4K blocks avg: 0.602s (haskell: murmur 2) Murmur 4K blocks: 3.234807s

  3. Be a bit less RAM conscious and use linux cp default block size (128 KiB): (rust: murmur 3) Murmur 128K blocks avg: 0.492s (haskell: murmur 2) Murmur 128K blocks: 3.022566s

  4. Be even less RAM conscious and use 10M as block size (I found out that on Mac and Windows works better in terms of read/write speed.) (rust: murmur 3) Murmur 10M blocks avg: 0.557s (haskell: murmur 2) Murmur 10M blocks: 2.951578s

Besides that my Haskell skills are bad, and the results between the rust and the haskell implementations are quite big, I blame it on the difference between language paradigms, and compiled to semi compliade, pure functional language.

Lets compare SHA512 to MURMUR to see the huge difference in wall clock.

Block Size SHA512 (Rust) SHA512 (Haskell) Murmur (Rust) Murmur (Haskell)
Full file (1G) 2.750s 2.957566s 0.790s 3.036555s
4K blocks 2.733s 3.950727s 0.602s 3.234807s
128K blocks 2.409s 3.365536s 0.492s 3.022566s
10M blocks 2.484s 2.770903s 0.557s 2.951578s

This is the tradeoff, thus make your own conclussions when you need to mitigate the risks for integrity and availaibility.

Now moving on to Confidentiality.

The Confidentiality of the files

In terms of just confidentiality, you would use a symmetric cipher to encrypt the files. If you do not care about integrity at all, then you would just use a stream cipher, that is quite fast; but you wouldnt know if bits were flipped, or data was modified.

In the realm of symmetric encryption where the intergrity matters, then you would use a symmetric encryption mode for block ciphers with authentication.

More details can be found on NIST Cryptographic standards and guidelines

But for this case lets just limit ourselves to the following ones:

AES256-GCM and ChaCha-Poly1305.

I chose them because of the following nit picks:

If your target hardware supports Hardware AES Support: Yes, more specifically Advanced Vector Extensions, then AES should be faster than ChaCha, otherwise, if not, the other way around.

(you can check by running diskutil info \)

I will not go in detail of how the symmetric encryption ciphers work, but limit to this:

For one key, you can encrypt no more than 64GiB of data. If you encrypt more, then you will have a nonce collision, and the encryption is easily broken.

How it works internally, the file is split into blocks and each block encrypted and authenticated, and by authentication I mean, hashed alongisde the password. In this way if the block is tampered with aka, the bits flipped, the MAC will not be cool with it, and catch it.

A cool trick you can do for files that are large is to split the file into chunks, and encrypt each chunk symmetrically, maybe also with other keys if the file size is too big. But now the problem is that an attacked can remove chunk aligned data or make permutations between them, which is not good. Thus you will need to take in exact order the MAC address of each chunk and perform an integrity check on them, maybe via SHA512, or another call of AES256-GCM on them, in the exact order of the blocks. Usually the MAC of each bloc is 16 bytes, and the total payload is small.

Enough talk! Give me the benchmarks!

Here you go!

AES256-GCM

openssl speed -evp aes-256-gcm

Note: openssl speed uses in-memory buffers and reports MB/s for various block sizes.

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
AES-256-GCM      69207.00k   255371.29k   825316.51k  2208601.77k  4060318.38k  4421708.46k

Now we also benchmarked two implementations for the AES256-GCM, written in rust and haskell.

For 1G file of random data, as follows:

  1. Load the full file in memory and Hash it: (rust) AES256-GCM full avg: 2.389s (haskell) AES256-GCM full: 4.929817s

  2. Be RAM conscious and read the file in system block size chunks (which for my case is 4096 - you can find using diskutil info \) (rust) AES256-GCM 4K blocks avg: 2.048s (haskell) AES256-GCM 4K blocks: 5.440893s

  3. Be a bit less RAM conscious and use linux cp default block size (128 KiB): (rust) AES256-GCM 128K blocks avg: 1.897s (haskell) AES256-GCM 128K blocks: 4.88268s

  4. Be even less RAM conscious and use 10M as block size (I found out that on Mac and Windows works better in terms of read/write speed.) (rust) AES256-GCM 10M blocks avg: 1.850s (haskell) AES256-GCM 10M blocks: 4.621191s

ChaCha-Poly1305

openssl speed -evp chacha20-poly1305

Note: openssl speed uses in-memory buffers and reports MB/s for various block sizes.

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
ChaCha20-Poly1305   283875.88k   574783.89k  1213670.23k  2217616.18k  2424297.74k  2418109.10k

Now we also benchmarked two implementations for the ChaCha20-Poly1305, written in rust and haskell.

For 1G file of random data, as follows:

  1. Load the full file in memory and Hash it: (rust) ChaCha full avg: 1.910s (haskell) ChaCha20-Poly1305 full: 2.789824s

  2. Be RAM conscious and read the file in system block size chunks (which for my case is 4096 - you can find using diskutil info \) (rust) ChaCha 4K blocks avg: 1.999s (haskell) ChaCha20-Poly1305 4K blocks: 4.417005s

  3. Be a bit less RAM conscious and use linux cp default block size (128 KiB): (rust) ChaCha 128K blocks avg: 1.416s (haskell) ChaCha20-Poly1305 128K blocks: 3.160261s

  4. Be even less RAM conscious and use 10M as block size (I found out that on Mac and Windows works better in terms of read/write speed.) (rust) ChaCha 10M blocks avg: 1.326s (haskell) ChaCha20-Poly1305 10M blocks: 2.878134s

And here are the tables:

OpenSSL Benchmark

Type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-256-GCM 69207.00k 255371.29k 825316.51k 2208601.77k 4060318.38k 4421708.46k
ChaCha20-Poly1305 283875.88k 574783.89k 1213670.23k 2217616.18k 2424297.74k 2418109.10k

And Rust and Haskell Benchmarks

File Read Method AES256-GCM (Rust Avg Time) AES256-GCM (Haskell Time) ChaCha20-Poly1305 (Rust Avg Time) ChaCha20-Poly1305 (Haskell Time)
1. Load full file in memory & Hash 2.389s 4.929817s 1.910s 2.789824s
2. Read file in 4K blocks 2.048s 5.440893s 1.999s 4.417005s
3. Read file in 128K blocks (Linux cp) 1.897s 4.88268s 1.416s 3.160261s
4. Read file in 10M blocks 1.850s 4.621191s 1.326s 2.878134s

Which makes me think that for sure I did something stupid in both the Rust and Haskell implementations, as they are wrong, and contradicting OpenSSL.

Closing remarks:

  1. For speed + scalability, a fast, non‑crypto hash (Murmur) is an excellent hot‑path detector for accidental corruption and availability checks. Use it where adversary resistance isn’t required.
  2. For strong integrity (legal/audit/forensic), SHA‑512 remains a clear choice: expensive but auditable and collision‑resistant.
  3. For confidentiality + integrity, AEAD is mandatory; AES‑GCM will win on hardware enabled servers; ChaCha20‑Poly1305 performs better on CPUs without AES hardware (although my implementations and system info say the opposite)
  4. Blockwise AEAD + final metadata AEAD is a pragmatic pattern to enable streaming encryption while retaining detection against reorders/drops. It’s slightly more complex but gives operational benefits for large datasets.
  5. Practical security is always a tradeoff: pick the primitive that matches your risk, performance budget, and operational capacity.

Annex

Code repository

Releases

No releases published

Packages

 
 
 

Contributors