Category: Cryptography

On hybrid signatures

Introduction

It has come to my attention that people are wagering dinner invitations to anyone who can convince me to publicly state my opinions and reasonings on hybrid signatures. So in the interest of not having people starve, here is a blog post on this topic and the recent controversies surrounding it.

This is going to be a somewhat lengthy rant, so in case you want to check if you owe somebody dinner, here the takeaway upfront:

Using pure ML-DSA for keys that can be rotated (even if that rotation is under great pain), is perfectly fine. Using the ML-DSA44 parameter set for such keys is also fine. I won’t go as far as telling you that you should never use hybrid signatures, but I will say that hybrid signatures have a hidden cost that often makes them not worthwhile.

Why hybrid signatures are easy

The point of a hybrid construction (in this context, there are about 3 different things in cryptography called a hybrid) is to take two schemes and create a new scheme of the same type that is as secure as the more secure of the two schemes. In the case of KEMs, you need key derivation functions for this, and if you are not careful you run into weird edge cases with IND-CCA that make your combined scheme technically (although not practically) insecure. In the case of signatures, constructing a hybrid is dead simple: You simply concatenated the public keys and signatures of the component schemes, and add two ampersands between the two component verification calls. Done. You know have a EUF-CMA hybrid signature scheme.

Why hybrid signatures are difficult

It turns out that in practice, creating a hybrid signature scheme is surprisingly difficult, mostly because people cannot agree which color the two ampersands should have.

Part of this has to do with understanding what motivations people have to advocate for hybrid schemes. The first (and in my opinion only permissible) motivation is the one I outlined above, wanting to have a signature scheme that has the security of both component schemes. However, a second motivation that many people have is to use hybrids in the transition itself. The scheme level is in my opinion the wrong layer to switch schemes, but what people essentially are arguing for is to continue using their old classical keys, just now augmented with an additional PQC key.

The less harmful version of a similar motivation is to use two completely separate implementations of the schemes, for example using an HSM for the classical part and a software implementation for the PQC part. While this is less dangerous, it forces the implementation to be crossing several layers, again leading to the wish of being able to use the keys somewhat separably. And once keys are used separately, an attacker starts to be able to take a hybrid signature and rip it apart into a separate, valid signature, or take a non-hybrid signature and a broken secondary scheme and glue them together into a hybrid signature the signer has never signed.

This property is called separability, and our simple concatenation scheme is very much separable. So are pretty much all constructions, to varying degree. And that varying degree allows for discussion of the color of ampersands. For example, we could introduce a domain separator into our hybrid scheme, and hope that that would make component signatures unusable. However, domain separators need to be chosen from a prefix-free set. And if you are only adding the domain separator to the hybrid, and not the already existing classical use, you run into the problem that the empty string is a prefix of every string, and as such your set of domain separators is very much not prefix-free.

Next, we have the issue of the context. Both ML-DSA and SLH-DSA have a context parameter in their specification, and if you are trying to describe a hybrid, you have to define around that, resulting in even more possible shades for your ampersands. You either ignore it and upset some people, or you retrofit it into the classical signature scheme and upset a bunch of other people. In either case, you are dealing with far more cryptography than you originally thought.

Next, we have our old friend, prehashing. ML-DSA with external-µ uses external µ, and with it a very specific construction using SHAKE256 as its compressing hash function, while, say, ECDSA-P256-SHA256 uses SHA256. Now you could sign µ with ECDSA, inventing ECDSA-P256-external-µ in the process, but that would be a different signature scheme and now you’ve upset the people who wanted to reuse HSMs. You can add yet another compressing hash function into the mix, and now you have something that starts to resemble the current LAMPS draft, arriving at a scheme that everybody hates the least.

This still leaves probably the biggest issue with hybrids: The combinatorical explosion. Given that you can combine any two schemes, you can create a lot of potential combinations. And people have created a lot of combinations.

This is an except of the aforementioned IETF signature combiner draft. 18 different combinations. Various different variants of RSA, and ECDSA combined with various different variants of ML-DSA. Some people want to match security levels, some people wanted to use a higher pqc security level, so both options go in. Some people wanted PKCS1, some people wanted PSS versions of RSA. So they go in. Some people wanted Brainpool curves, some people wanted Curve25519, some people wanted NIST curves. So they all go in. Rainbow ampersands for everyone!

This means implementers will now have to choose: Support all of the schemes? Probably not feasible. Support only some of them? Now you risk incompatibilities. It also means that regulators don’t have to choose: Don’t like the NIST curves? Just mandate the use of Curve25519, or better yet, Brainpool after all, it’s in the standard so surely everyone will support all of them, and systems of various compliance regimens will never interact with each other anyways, right?

All in all, the thing that makes hybrid signatures difficult is pretty much entirely due to social coordination reasons, and not technical reasons, but social coordination is a cost that cannot be neglected when looking at cryptographic primitives, and it has become clear that the social coordination costs of hybrid signatures is very high, much higher than many other standardization efforts have.

Are hybrid signatures schemes actually useful?

This brings us to the second question: Are hybrid signatures actually all that useful? After all, while the social coordination problem might be annoying, it is something that can usually be resolved, eventually at least. But here comes the kicker: Hybrid constructions are, by design, a temporary measure. So if the resolving of the social problems takes to long, it can result in the stable equilibrium point just moving past them entirely.

This brings us to a few important differences between key agreement schemes and signature schemes in practical deployments.

When it comes to key agreement, or more generally encryption schemes, your vulnerability begins when you create a ciphertext, and ends, well, never really. At some point your plaintext might only be of interest to historians, but don’t underestimate the lengths historians will go to in order to recover it. (I try to exclude them from my threat models, they are too scary).

For signatures on the other hand, your vulnerability begins and ends with your verifier starting and stopping to trust the public key. The bad news is that you don’t even need to create a signature to be potentially vulnerable, the good news is that your window of vulnerability is extremely finite. And even better yet: We have some methods to end the window of vulnerability early, by revoking the public key in question. Revocation is of course not without its problems, but even its theoretical existence can make us relax a lot more when it comes to novel cryptographic algorithms.

There are some keys that live virtually forever and can neither be revoked in an emergency, nor rotated in a normal key lifecycle management. These are keys such as firmware signing keys which are baked into silicon. At least on this mitigation aspect, those keys behave more like encryption keys (maybe sans the historian threat actors), and they need a separate analysis.

For all other keys, though, the time limit and revocation options mean that, opposed to encryption keys, the breakage of a signing key can be actively mitigated.

Signing keys differ from encryption keys in another important way: They are long lived. Where encryption keys are often ephemeral, there is no use case that would require an ephemeral signing key, that would not be better served with a cryptographic hash instead. After all, conceptually a signature scheme moves authentic channels through time, by using an existing authentic channel to transfer a public key which then can be used in the future to create a new authentic channel. If the channels existed at the same time, transfering a hash in the authentic channel would serve the same purpose.

The longer lifetime already increases compromise risks by completely classical means, so having robust rotation and revocation procedures in place for signing keys is already best practice.

Taken together, this means we have a rather large difference in threat model between signing keys and encryption keys: Where encryption keys should be hybrid, since there is no action possible if a break in the newer algorithm becomes known, a signing key can remediate this scenario.

Additionally, while an encryption key’s breakage is completely silent, in order to use a compromised signing key the attacker has to convince some endpoint to interact with the forgery they produced. This means, while still difficult to detect, signature keys being broken is detectable, especially if this type of breakage happens at scale.

This only leaves the question of how likely a break of ML-DSA is, and whether this break would likely happen in public or not.

ML-DSA mythbusting

A few months ago, somewhat similar circumstances led to me writing a blog post on ML-KEM mythbusting. Most of the answers in that blog equally apply to ML-DSA. Like ML-KEM, ML-DSA was created by a group of European researchers, was standardized under strict scrutiny of the international cryptographic community, and has a parameter space that is way too small to hide a NOBUS backdoor. While the entropy of the parameters (the calculation of which I leave as an exercise to the reader) is higher than that of ML-KEM, it still does not reach cryptographical relevance. I have included the parameter set below:

There is one thing where ML-KEM and ML-DSA differ: While ML-KEM uses the Fujisaki-Okamoto transform to create its KEM, and the changes to that transform was the one change that sparked some debate that I referred to in my mythbusting blog post, ML-DSA uses a variant of the Fiat-Shamir transform to construct its signature scheme. Fiat-Shamir requires the use of a nonce and is extremely sensitive to reuse. The nonce is used to mask the private key, which is otherwise part of the signature, and reusing a nonce leads to the same mask being used twice, leaking the key. A similar problem also occurs with ECDSA, which is technically not a Fiat-Shamir transform, but somewhat related. ECDSA is usually used with random nonces, relying on the random number generator at signing time to prevent nonce reuse. This has led to key compromises in the past, when the random number generator had trouble being random. For EdDSA (and deterministic ECDSA), the random number generator is instead replaced with a PRF, which uses the private key and the message to deterministically compute a nonce for a given message. This makes those schemes robust against random number generator failures at signing time, but, at least in the case of EdDSA, comes at the cost of a higher risk for fault attacks. As the nonce is computed from the private key and the message, but the challenge hash is computed from the public key and the message, any fault in the public key can lead to two different signatures being computed for the same nonce.

For ML-DSA, the designers went all out and put every mitigation into the algorithm that we are aware of. ML-DSA has randomness in its private key that it always uses to compute the nonce, but it will additionally accept external randomness, which reduces the chance of faults occurring. Additionally when computing the nonce and the challenge hash µ is used instead of the raw message, making it so that both nonce and signature depend on the public key, eliminating the potential faults that can break EdDSA. As a result, ML-DSA is substantially more robust against Fiat-Shamir related attacks than ECDSA or EdDSA are. Of course, if you manage to introduce a fault deep in the algorithm that leads to a reused nonce, ML-DSA will still fail, but the attack surface is substantially smaller. I will leave it as an exercise to the reader to determine the motivations of anyone who drags up Fiat-Shamir related attacks on ML-DSA as an argument against the algorithm, but stays quiet on the same attacks on, say, EdDSA.

Cryptography

So you want to deploy FN-DSA

Post author By Sophie Schmieg
Post date May 13, 2026
1 Comment on So you want to deploy FN-DSA

FN-DSA (née Falcon) is a proposed post-quantum signature standard that keeps polarizing engineers. Cryptography engineers on the one hand, tasked with potentially implementing this monstrosity, hate it with a passion. Protocol engineers on the other hand, especially when working with UDP, see a glimmer of hope that they might not have to deal with packet fragmentation and really would like to use it.

For the most part, I agree with my fellow cryptography engineers that the best approach to using FN-DSA is to not. However, prohibition is a failed approach, whether we are talking about drugs or about cryptography, so, in the interest of harm reduction, here are the essential caveats for anyone who wants to use FN-DSA, so that they can do so as safely as possible.

State of standard

The biggest problem with FN-DSA is that the standard currently does not exist. NIST has prepared an initial public draft (ipd) in August 2025, but this draft, at least as of writing this article in May 2026, is stuck somewhere in NIST’s publishing pipeline with unclear status so it continues to be more of an initial private draft for the time being.

Not having the ipd out means that all the points I’m making here are based on the round 3 candidate named Falcon that will eventually turn into FN-DSA, and it is very much possible that some or all of these points will be addressed in the actual ipd. Or the actual standard for that matter. Importantly, going from ipd to standard will take around a year, if the same scrutiny is applied as was with ML-KEM/ML-DSA/SLH-DSA, and an argument can be made that if anything FN-DSA might require substantially more scrutiny due to the complexity of the algorithm. There are usually breaking changes introduced between ipd and actual standard, so this means that anyone hoping that they can just start using FN-DSA to solve all their size related problems will have to wait for at least one, and possibly multiple years for the standard to actually be ready to implement.

Prehashing FN-DSA

Prehashing is a very strange topic. Half of the cryptographic community insists that it’s absolutely essential, while the other half has never heard of it. Usually the half that has never heard of it is theoretical cryptographers, which are very much not the target audience of this blog post, so I’ll keep the explanation of what prehashing is short. A much more thorough discussion can be found in my blog post on prehashing ML-DSA.

In a nutshell, prehashing turns signature generation into a multi-party algorithm, with one party (the signer) holding the private key, but having only limited computing resources, while the other party (the hasher) holds the message and has no limit on its computing power. In this scenario, one wants to limit the amount of data that is transferred to the signer, ideally to just a hash. In order to make this process transparent to any verifying parties, people usually want the resulting signature to be a fully standard compliant signature of the message, with no extra hash that the standard didn’t mention applied to the message.

Classical signature algorithms, in particular for RSA and ECDSA signatures usually decompose into a hashing step and a signing step. Practitioners, seeing this decomposition, took this as a natural way to introduce prehashing, with the hasher computing the hash and the signer than operating on this hash to compute the rest of the signature. This goes so far as X509 certificates defining the hash function to be used for a given message with that message, instead of with the corresponding public key, making X509 certificates technically not a fully defined signature scheme.

For RSA and ECDSA, this is mostly, but not entirely benign. RSA in particular has a forgery attack, where the hasher can forge a signature for a message the signer has never seen the hash of. However, as the signer only ever sees hashes of messages to begin with, it cannot actually do any meaningful verification of the message itself, and forgery attacks by the hasher are not considered relevant in the prehashing threat model. Key recovery attacks, on the hand, very much are. The point of prehashing is, after all, to have the private key in some environment that makes exfiltration substantially harder than if it was on the hasher, so any attack that allows the hasher to recover the key would be catastrophic and defeat the point of employing prehashing to begin with.

And it turns out Falcon has just one such key recovery attack. To understand why, one needs to realize that hash functions are used for two different purposes in signature schemes: As message compression and as a model for a random function. Prehashing works fine as long as the hash function is only used for compression, and starts to break when it also has to serve other security guarantees.

For Fiat-Shamir transformed signature schemes, the security relevant hash function will always be buried somewhere deeper in the signing algorithm, as the hash needs to include both the message and the commitment, which is only computed later in the algorithm. ML-DSA has a separate message compression hash (which computes the infamous µ), which can be used for prehashing. ECDSA is a YOLO signature scheme that doesn’t really have a clear security reduction, and just doesn’t hash the commitment into its challenge, and as far as we know nothing horrible happened.

RSA however, just like FN-DSA, is what is called a hash-and-sign scheme. Hash and sign schemes generally work by having some trapdoor function $f$ which is easy to evaluate in one direction, but require the private key to invert, and a hash function $h$ . A signature $s$ is then some value in the domain of $f$ for which $f(s) = h(m)$ holds. We can see that this hash serves a crucial security purpose by recognizing that without it, we would trivially find an existential forgery: Just pick a random $s$ from the domain, and compute $f(s)$ . Congratulations, you have forged a signature for $f(s)$ . With the presence of the hash function however, this does not get us very far, since we would still need a way to compute a preimage of $f(s)$ under $h$ , which we cannot do without breaking the hash function.

And now for the kicker: FN-DSA not only allows for forgeries when you take away its hash function, it flat out starts leaking its private key. The only way to have a secure FN-DSA implementation is for the signer to pick a random value, concatenate it with the message, hash and then compute the signature for that essentially random value. Signing non-random values will break the scheme.

At least in the round three algorithm description, there is no way of safely prehashing FN-DSA, but there is a very tempting looking hash function call that invites careless implementers to do so anyway. The only viable option to prehash FN-DSA is to introduce a separate message compression hash, and double hash the message.

About those floating points

Probably the most frequently talked about property of FN-DSA is that the spec uses floating point numbers. The floating point numbers are intrinsic to the mathematics of the algorithm (see the section below, where there will be copious mentions of $\mathbb{C}$ , the complex numbers). It is not possible to implement FN-DSA without floating points.

It is, however, possible to implement floating point operations in constant time. To do so, we just have to crack open our numerical analysis textbooks and implement floats by hand, by tracking two integers per floating point, one for mantissa and one for exponent. Multiplication is very straightforward to constant-timefy, in fact it’s likely already constant time and requires just two integer operations. Addition is slightly trickier, but also not the end of the world. You will need some constant time compares to properly shift both numbers to align, but not a big deal and only a relatively minor performance penalty. Division, however, is a nightmare. The standard division algorithm used for floating points is an iterative process using multiple multiplications and additions every iteration and returning when the result is accurate enough.

For a constant time algorithm, that will not do, and we can only return after having iterated for the same number of steps in any case, turning everything into worst case performance. We can calculate this iteration limit without much difficulty, but no currently existing hardware and only very limited existing software implementing things this way exist. On top of that, this forces us to have worst case complexity for every case, and, especially in the case of software implementations, it makes things terrifyingly slow.

When I implemented Falcon in my bad computer algebra system, a good portion of the time was spend refactoring my existing arbitrary precision floating point logic to switch to doubles when the arbitrary precision was arbitrarily set to 64 bits, solely because my non-constant time software implementation of floating point division logic was so painfully slow that I got to tired waiting on it while debugging the algorithm.

This has important performance implications for your signers. While FN-DSA’s verification logic is especially fast and simple, signing will either be horrendously slow or not constant time, in each case it might disqualify your application from having live signing.

Speaking of debugging the algorithm, another fun quirk of floating point logic is that it is only defined up to epsilon. Different implementations of the same IEEE floating point standard will give different results on some inputs. Usually that does not matter much, since the differences are only in the least significant bits, they are rounding errors, after all, so people working with floating points usually do not care about it, they either have done a proper error calculation, if they are numerical analysts, or they just accept things being subtly wrong. For cryptographers, on the other hand, this means that test vectors don’t quite work. In fact, Falcon was the first and only time I ever had to use a statistical test to debug an especially gnarly error in the implementation.

Variable signature sizes

Another fun quirk of FN-DSA is that the signature may or may not be variable in size. This is not as unusual as it first seems, in fact, if you had a look at your average ML-DSA signature, you’ve likely noticed the suspicious amount of zero bytes a valid signature has at the end. This is because ML-DSA signatures are also variable in size, the standard just pads them out to all be the same size.

For FN-DSA we don’t know yet what NIST will write in the standard, but they at least considered not doing that. After all, the whole point of FN-DSA is to be small, and padding signatures is the opposite of that. So be aware that signature may or may not be variable in size.

There is classical precedent for variable size signatures: ECDSA signatures, when encoded in ASN.1 format, are not constant in size. If ECDSA is used in the superior IEEE format, this is not the case however. And as you might guess by me calling it superior, the variable size signatures tend to cause some problems, to the point that if a team complains to me about a flaky verification failure, I pretty much already can tell you before looking at the artifacts that it happens because zeroes are not added or removed properly from the signature, leading to a 1/256 chance of ASN.1 signatures being malformed. All in all, this is a workable property, but definitely not one that has never caused a bug.

The math of FN-DSA

Lastly, a bit on the mathematics of FN-DSA. I will keep this section short and not go very deep here, given that the target audience is application developers and not cryptographers, maybe I will do so in a follow-up. But in short: The mathematics of FN-DSA are stunningly beautiful. It is based on NTRU instead of LWE, that means that instead of computing $As + e = t$ with s and e short, we compute $fG - gF = q$ with all for elements being short, and then set the public key to $h = f/g \text{ mod } q$ . These two problems are closely related (they are both p-adic rational reconstruction problems, very visibly so in the case of NTRU where our public key is literal a rational number, rounded to one q-adic digit), with the main difference being that NTRU computes a full lattice basis, while LWE only computes a single short lattice vector.

With this short lattice basis, we can now solve the closest lattice problem. And in order to sign, we do just that: We sign an integer $r$ by computing $s_1, s_2$ such that $s_1 + h\cdot s_2 = r$ and $s_1, s_2$ are short. In order to turn this into a signature scheme, you simply hash your message and sign the resulting hash.

However, if we actually really solved the closest vector problem when signing, we would leak information about our short basis, so what we do instead is use our basis to find a “close enough, but not necessarily the closest” vector. This is where all the tricky computation comes in, and where Falcon is deeply intertwined with the number field logic that ML-DSA just uses as a convenient speed-up. In order to compute this point, we look at the complex embeddings of the number field, use the fact that those are reflected in Fast Fourier Transforms, and that those FFTs translate to NTTs when the number field order is quotiented out by $(q)$ .

In order to be secure, as mentioned in the first paragraph, we need to ensure that we never sign the same message twice, and that we never let the attacker chose the integer we sign. We do this by, instead of hashing our message directly, hashing our message together with a random integer, i.e. that means we sign (and verify) a message by computing (or checking) $s_1, s_2$ such that $s_1 + h\cdot s_2 = h(r || m)$ and publishing $s_2, r$ as the signature ( $s_1$ can be recomputed from these two values, and we need to verify that they are both small).

That is the very abridged version of how the mathematics works, there is a lot more to say about how those Fourier transforms work and help with computing close but not too close lattice points, but if I ever end up explaining that part of the math, in needs to be in a separate blog post, so I can still tag this one as comprehensible cryptography.

Tags cryptography

Cryptography

A very unscientific guide to the security of various PQC algorithms

Post author By Sophie Schmieg
Post date December 13, 2025
No Comments on A very unscientific guide to the security of various PQC algorithms

After publishing my series on UOV, one feedback I got was that my blog posts made people feel more confident in the security of the scheme, because “at least someone is looking into these things”. I don’t necessarily know if that is the takeaway I would make from my posts, but it gave me the idea to write my extremely subjective, and very much biased guesstimates for how secure I consider various approaches and problem families within PQC.

Since unfortunately I do not possess infinite wisdom or the gift of time travel, these are at best informed guesses, and I take no responsibility for being wrong on any of them.

Generalities

There is a somewhat popular saying in cryptography “attacks only get better”. It’s a vacuously true statement, since obviously an attacker will always use the most powerful technique currently known, but I think it is also at least slightly misleading, implying that progress on attacks is not only inevitable, but also somewhat continuous.

Instead, what we are seeing is usually something like this: Initially, when a certain technique is first seriously discussed, attacks come in quickly and parameters have to be adjusted to account for them. With time, as our understanding of the space grows, we tend to refine those attacks, but it is a process of diminishing returns. It is possible that some novel mathematical technique starts a new spurt in advances in attacks, but importantly, there is usually no continuous improvement in attacks.

As an example, if we look at RSA, we first have the naive factoring algorithms such as trial division and Fermat’s method, which predate cryptographic use. Then, in the seventies, they get joined by the first major improvement in the space, Pollard’s rho. In the 80s, we get the quadratic sieve, as the first subexponential algorithm, joined by various lattice methods. Finally in the 90s, more than 30 years ago, we get the current best factoring algorithm, the general number field sieve, a refinement of the quadratic sieve, as well as further improvements on lattice techniques. Quantum algorithms also first enter the scene, with Shor’s algorithm. After that, successes die down substantially, mostly confined to relatively minor improvements to the general number field sieve.

This is not because we stopped working on factoring algorithms, but most of the effort shifted to other targets such as The Montes’ algorithm for factoring polynomials over discrete valuation rings.

If we look at elliptic curves, the story of attacks is even less exciting. There is, to this date, no known generic classical attack against elliptic curves that is better than a space-time traded off version of a brute force search. This is again not because the topic isn’t studied, elliptic curves are one of the most fundamental building blocks of algebraic geometry, and we know them in great depth. In fact, we know them well enough that we can even start to explain this lack of attacks: They are the most generic form of Diffie-Hellman out there.

All in all, this makes our job predicting the future of which algorithm is likely to break and which ones are likely to last, very, very hard. We are not looking at nice, predictable trends, but instead are mostly looking at a process that jumps in huge steps every few decades.

A different view to look at the same trends is to say that a scheme gets more trustworthy every time it survives an attack. From that point of view, attacks that fail teach us something about the scheme itself, adjusting our priors, making it more trustworthy. This is particularly true for attacks that tell us something fundamental about the underlying problem; the more general the attack, the more it can teach us why a scheme is resiliant.

But, now, without further ado, my personal list about how safe I think various approaches to PQC are, together with how familiar I am personally with the space and how much I think it has been studied.

1st Place: Hash-based Signatures

There isn’t much to say about hash-based signatures. They have a security reduction to the properties of the hash function used. Any signature scheme, and pretty much any public key encryption scheme requires a hash function somewhere in its construction, be it to compress the message, act as a random oracle, a key derivation function, or as a one-way function. If we cannot construct a secure hash function, we cannot do cryptography. In fact, if we consistently failed in creating secure hash functions, we would most likely live in a universe where P equals NP.

Hash-based signature schemes have reduction proofs that reduce their security to that of their underlying hash function. As such, hash-based signature schemes are at least as secure as any other asymmetric (or symmetric) cryptographic primitive. They have plenty of drawbacks, but lack of security is not one of them. While I haven’t studied them to great depth, there is also just not much to say about their security. They are secure.

Note that one of the drawbacks that some hash-based signature schemes have is the necessity to keep state (LMS/XMSS). While these schemes are as secure as their hash function if used correctly, the same is not true if the state is not managed correctly, i.e. if one-time-signatures are used more than once. While I have extremely high confidence in the mathematics of hash-based signatures, I also have extremely low confidence in our collective ability to not corrupt state once in a while.

2nd Place: Lattices

It is hard to overstate my confidence in lattices. General lattices, such as used in FrodoKEM, being broken is pretty much all but equivalent to proving P = NP, at which point all cryptography vanishes (since symmetric cryptography reduces to boolean satisfiability very easily), and it is time to find another career.

Lattices feature heavily in arithmetic number theory, as they arise very naturally when studying number fields. As such, lattice algorithms are actually far more central to mathematics than factoring algorithms. The number of problems an efficient lattice reduction algorithm solves is far higher than that of an efficient factoring algorithm. The main reason for that is that lattice problems are the simplest form of Diophantine equation problem, the linear Diophantine equation. You can see an example of this in one of my previous blog posts. This makes lattice reduction one of the most useful algorithm to calculate pretty much about anything in discrete mathematics.

Far from being constrained to just algebraic number theory, they also show up in algebraic geometry, in the description of Abelian varieties over the complex numbers. Or, as it turns out, p-adic numbers, as studied in my PhD thesis. Given how central they are to mathematics, I would be extremely surprised if someone, somehow, found a way to improve on generic lattice reduction. Even when it comes to quantum algorithms, lattice reduction is probably one of the most studied one, and so far, no generic improvement has been found, and several fundamental looking obstructions have been identified.

Lattices, as a mathematical object, have been studied pretty much for the same time as elliptic curves have been, since both arise from the same underlying questions about the circumference of an ellipsis. In this study, certain integrals arise naturally, defining a function that has two periods in the complex plane. In other words, functions that can be seen as defined on the complex numbers modulo a lattice. And the simplest of these functions $\wp$ , obeys a differential equation $\wp'^2 = 4\wp^3 + g_2\wp + g_3$ . In other words, $\wp$ and its derivative define a elliptic curve.

In cryptography, lattices also have been studied about as long as elliptic curve have. First as an attack, due to their mentioned ability to solve Diophantine equations, and soon after as cryptosystem themselves, by increasing the lattice rank to the point that the reduction becomes impossible to compute. The main reason you might not have heard of them before is their generally larger overhead compared to elliptic curves and RSA, making them unappealing in a world where elliptic curves and RSA are unbroken.

But we are not using generic lattices, we are specifically using module lattices. Those are the lattices coming from number field orders. A number field is a field extension of $\mathbb{Q}$ (such as adding the imaginary unit i to the rational numbers), and an order in such a number field is a generalization of the integers (such as adding the imaginary unit i to the integers, to obtain the number field order called the Gaussian integers). These number field orders are canonically lattices themselves, and any finitely generated module (I.e. vector space, but for rings) over them is again a lattice in a canonical way.

If there is a break of ML-KEM or ML-DSA, my money would be on exploiting this additional structure. However, even when it comes to this additional structure, it is very well understood and studied.

Looking at MLWE and NTRU specifically, both problems are deeply related to the p-adic rational reconstruction problem. In the case of MLWE, we need to switch to RLWE, but a number field order can be seen as a module over an order of some subfield, so this doesn’t really change the picture all that much.

So what is the rational reconstruction problem? Recall that, in order to attack LWE, we needed to find $s, e$ such that $As + e = t \, \text{mod} \, q$ , which mainly boils down to describing the kernel, the solutions to $As + e = 0\, \text{mod} \,q$ . For RLWE (or indeed, for NTRU), we need to switch to a number field order, which we mainly do by replacing the capital $A$ with a lower case $a$ . We can, of course, without much consequence, switch the sign of the error term, and write $as - e = 0 \, \text{mod} \, q$ , for the lattice we need to reduce. With a slight reordering, this is equivalent to $a = e/s \, \text{mod} \, q$ . Since $e$ and $s$ are small in some metric, this means that what we are asking is given a fraction with bounded numerator and denominator, which is only known modulo some ideal (or more generally a number of finite places), find the numerator and denominator.

We all know this problem when we replace the finite places with infinite places, especially over $\mathbb{Q}$ , albeit usually less dressed up in formal mathematics lingo: This is the question of which fraction fits best with some given limited precision decimal expansion, such as the question of whether an output of 1.666 came from an actual result that was 5/3, or 1666/1000.

This problem (over finite places, i.e. modulo a prime) arises relatively naturally when studying number fields, and the only way we know for solving it is lattice reduction.

This is a very common pattern in arithmetic number theory, you usually take problems that arise there and reformulate them until you can express them as a lattice problem, and then proceed to reduce the lattice when the number field is small enough. The opposite, where you can use the number theoretic properties of the number field to say something about a lattice without reducing it on the other hand is very rare.

That being said, we are not using a random number field when it comes to lattice cryptography, but a fairly small set of very specific ones, which have properties that are not usually encountered in many number fields, such as having a class number of 1, and an easy to calculate group of units (up to some finite cofactor easy to calculate, that is, but still this is usually a hard lattice problem for a random number field, but is easy for the cyclotomic fields heavily ramified over 2 that we want for our cryptographic purposes).

That being said, even with these blemishes, when it comes to module lattice cryptography, we are talking about a very well understood and explored part of mathematics, that should be very safe to use for cryptographic purposes.

3rd Place: Codes

I know a lot less about codes than I do about lattices, I’ve always considered them as the smaller sibling of lattices. Both schemes fundamentally work via underdetermined linear systems, where the solution has certain special properties. Being small in the case of lattices, and having lots of zeroes (i.e. being small in the Hamming metric) in the case of codes. Their construction has many similarities, to the point that code based cryptography can be attacked with the same lattice reduction techniques that lattice cryptography has to deal with. Compared to lattices, codes are far less central to mathematics, but whether that is a good or a bad thing is hard to say. But really, I haven’t studied codes to any necessary detail to have much of an opinion on them, other than that they are fine, probably, at least as long as lattices are fine. They are also less efficient than lattices in pretty much all of their instantiations, and at least I do not know how to think of them as a more general mathematical problem (akin to the p-adic rational reconstruction problem that governs MLWE/NTRU).

4th Place: Isogenies

Now to a bit of a controversial placement: Isogenies. What, even though SIKE was broken? Yeah, well obviously I don’t place SIKE at 4th place, it’s somewhat lower, right above Vigenère ciphers, and only because the attack is more interesting.

SQISign on the other hand is a different story. The main reason to place it ever so slightly above multivariate cryptography in my opinion is that we much better understand the underlying hard problem and how it relates to the scheme itself.

I am not ashamed to admit that I have a bias towards pretty mathematics, and SQISign does some of the most beautiful mathematics I know of. That being said, the scheme is for now too slow to actually be used in practice, and while it can be reduced to the endomorphism problem, we cannot currently rule out that the endomorphism problem ends up being easy, especially given that it is far less central to mathematics than lattices are. It has been studied somewhat extensively, though, but I am somewhat worried that the best experts on the endomorphism problem in algebraic geometry are just now slowly even learning about the existence of isogeny based cryptography. After all, the SIKE attack is based on a theorem discovered in 1997, and yet wasn’t discovered until 2022, showing a huge gap between academic algebraic/arithmetic geometry and cryptographers working on isogeny based crypto.

5th Place: Multivariate Cryptography

I’ve written a whole series on Unbalanced Oil and Vinegar, probably the most basic of the multivariate schemes. Since then, a new attack has come out, leveraging wedge products. While the attack is far from catastrophic, it also feels very arbitrary, similar to the Kipnis–Shamir attack on Balanced Oil and Vinegar, it seems to me that we are missing something to really have a full understanding of the space.

Humorously enough, even before the paper, I had tried unsuccessfully to attack UOV using wedge products, more precisely I tried to figure out if there is a structure in the cotangent space that can be exploited, so the fact that wedge products were a meaningful attack vector is not surprising per se, but still, if we want to trust UOV, we need to, in my opinion, have a better understanding of what the hard problem here actually is.

It is easy to point to Gröbner bases here, but in my opinion the gap from generic Gröbner basis computation to the specific UOV problem is quite large. While all NP-complete problems necessarily reduce to each other, reducing to a Gröbner basis computation is one of the easier reductions, just like you can reduce a computer program to a boolean circuits satisfiability problem by literally translating the instructions, you can reduce a problem about polynomials to a Gröbner basis computation.

One thing that particularly stands out to me about Multivariate Cryptography is that variations that have tried to reduce the size of the public key ended up broken quite often. To me, there is something missing about fully understanding what makes this problem hard to fully trust it, but my progress in understanding the problem space better has at least given me a glimpse of why basic UOV should be secure.

That being said, realistically, I should place them above isogenies, mostly because we have had more survived attacks in this space, but this my list, and if it doesn’t contain at least one upsetting placement, it wouldn’t be very subjective now, would it?

Bonus: Why RSA and Elliptic Curves both fall together

One question that I got asked recently was why RSA and elliptic curves, while looking so different as cryptosystems, are both susceptible to Shor’s attack, when all these other schemes barely spend a word talking about why Shor’s does not apply to them. While it is true that at first glance, RSA and elliptic curves do look very different, they are actually far more related than one might think, some of it is even already visible in classical attacks.

As I described in my post on why elliptic curves are really the only option for discrete logarithm problems, elliptic curves contain the multiplicative discrete logarithm as a subcase (at least if you allow for stable models). And for multiplicative discrete logarithm problems, we already have the same attacks working on RSA and DLOG. From that perspective it might be less surprising that an attack that is polynomial on RSA also solves ECC.

More concretely, the thing that Shor’s algorithm actually solves is the Abelian Hidden Subgroup problem: Given a group $G$ , a function $f \colon G \to X$ is said to hide the subgroup $H$ of $G$ if $f$ is constant on each coset, but different for different cosets. In particular, if $H$ is a normal subgroup, this means that $f$ is defined and injective on $G/H$ . The hidden subgroup problem is Abelian if the group in question is Abelian. This is a bit of a mouthful, so let’s look at a trivial example first, using $\mathbb{Z}$ as our group and try to hide $3\mathbb{Z}$ as a subgroup. A function would hide this subgroup if it has a different value on the cosets, for example, if the function was just the value of the integer modulo 3. For a slightly more interesting function, which actually meaningfully hides something, we can look at the world of variant Sudoko, where we often see the concept of a modular line or modular mirror or similar, which requires certain digits to have the same residue mod 3 (For example this one or that one). Solving these puzzles is usually done by coloring the corresponding digits in one of three colors, indicating the residue class mod 3. Importantly, it is (at least initially), not known which color corresponds to which residue class, which starts to show why the function is considered hiding this subgroup. Of course, even if you just mapped integers to colors, the hidden subgroup would still be pretty easy to find by anyone who can count to three (and importantly, solving the Sudoko has nothing to do with solving the hidden subgroup problem), but you can imagine that for a larger modulus, this becomes an actually hard problem.

While not necessary, it is very useful to know the classification problem for Abelian groups when looking at this question for Abelian groups in particular. All finitely generated Abelian groups can be written as the product $\mathbb{Z}^r \times \mathbb{Z}/m_1\mathbb{Z} \times \mathbb{Z}/m_2\mathbb{Z}\times \dots \times \mathbb{Z}/m_n\mathbb{Z}$ , where $m_1 | m_2 | \dots | m_n$ . Knowing this means we know very well how, at least in theory, any subgroup of an Abelian group looks like, which is going to make the next bits a bit easier to grasp in their generalities.

Knowing that Shor’s algorithms can solve the Abelian Hidden Subgroup problem, and now knowing what the Abelian Hidden Subgroup problem is, all that is left to do is to show where the subgroup is hiding, for both RSA and elliptic curves. As discussed, elliptic curves are more or less the most generic of all DLOG groups, so we don’t really need to concern ourselves with the intrinsics of how elliptic curves work, and can instead just take a generic group G (and as a bonus, this allows me to use multiplicative notation without feeling dirty). In fact, let’s start with DLOG.

So given two elements $a, b \in G$ , we are looking for $k$ such that $a^k = b$ . Instead of working with G as domain, we use two copies of $\mathbb{Z}/n\mathbb{Z}$ , and define our function $f \colon \mathbb{Z}/n\mathbb{Z} \times \mathbb{Z}/n\mathbb{Z} \to G$ as $(m_1, m_2) \mapsto a^{m_1}b^{-m_2}$ . Since $b=a^k$ , this is equal to $(m_1, m_2) \mapsto a^{m_1-k\cdot m_2}$ , i.e. it’s a linear transform on $\mathbb{Z}/n\mathbb{Z} \times \mathbb{Z}/n\mathbb{Z}$ followed by a discrete exponentiation.

But the discrete exponentiation is a group isomorphism, so we can basically ignore it for the purposes of hidden groups, since the hidden group definition does not really care about the range of the function to begin with. As a linear function, it is easy to see where $f$ maps to the unit, namely exactly for vectors generated by $(k, 1)$ .

Since $f$ is a group homomorphism, we can use the group isomorphism theorem to know that $f$ is constant on each of the cosets and injective on the quotient, i.e. $f$ hides an Abelian subgroup. Applying Shor’s algorithm, and obtaining a generator of this subgroup, we can recover k, since all elements of this subgroup have the form $(k\cdot m, m)$ .

Reformulating RSA into an Abelian Hidden Subgroup problem is even easier: The security of RSA is build on the attacker not knowing the order of the group, since the order of $\left(\mathbb{Z}/n\mathbb{Z}\right)^\times$ is $\varphi(n) = (p-1)(q-1)$ , from which we can recover n’s factors p and q easily. So how is order finding an Abelian Hidden Subgroup Problem? Just take a random element $a\in \left(\mathbb{Z}/n\mathbb{Z}\right)^\times$ and define $f$ as $f \colon \mathbb{Z} \to \left(\mathbb{Z}/n\mathbb{Z}\right)^\times\;;\;x\mapsto a^x$ . This function has the same result exactly for all the multiples of the order of a, in other words it hides $ord(a)\mathbb{Z}$ as a subgroup of $\mathbb{Z}$ . And the order of an element is always a divisor of the order of a group, so we can use this to find factors of n.

Hidden Subgroup Problems are more general than just this, and are mostly just a framework to restate problems to. In fact, we can restate lattice reduction as a hidden dihedral subgroup problem. But importantly, quantum computers are really good at operating on Abelian groups, but have, at least so far, have not shown any success whatsoever on non-Abelian groups. This does make sense, given their construction, and gives us some data on why lattices have withstood quantum cryptanalytic attacks so far.

Cryptography

ML-KEM Mythbusting

What is this?

There have been some recent concerns about ML-KEM, NIST’s standard for encryption with Post-Quantum Cryptography, related standards of the IETF, and lots of conspiracy theories about malicious actors subverting the standardization process. As someone who has been involved with this standardization process at pretty much every level, here a quick debunking of the various nonsense I have heard. So let’s get started, FAQ style.

Did the NSA invent ML-KEM?

No. It was first specified by a team of various European cryptographers, whom you can look up on their website.

Okay, but that was Kyber, not ML-KEM, did the NSA change Kyber?

No. The differences between Kyber and ML-KEM are pretty minute, mostly editorial changes by NIST. The only change that could be seen as actually interesting was a slight change to how certain key derivation mechanics worked. This change was suggested by Peter Schwabe, one of the original authors of Kyber, and is fairly straightforward to analyze. The reason for this change was that originally, Kyber was able to produce shared secrets of any length, by including a KDF step. But applications usually need their own KDF to apply to shared secrets, in order to bind the shared secret to transcripts and similar, so you would end up with two KDF calls. Since Kyber only uses the KDF to stretch the output, removing it slightly improves the performance of the algorithm without having any security consequences. Basically, there was a feature that turned out to not actually be a feature in real world scenarios, so NIST removed it, after careful consideration, and after being encouraged to do so by the literal author of the scheme, and under the watchful eyes of the entire cryptographic community. Nothing untoward happened here.

Okay but what about maybe there still being a backdoor?

There is no backdoor in ML-KEM, and I can prove it. For something to be a backdoor, specifically a “Nobody but us backdoor” (NOBUS), you need some way to ensure that nobody else can exploit it, otherwise it is not a backdoor, but a broken algorithm, and any internal cryptanalysis you might have will be caught up eventually by academia. So for something to be a useful backdoor, you need to possess some secret that cannot be brute forced that acts as a private key to unlock any ciphertext generated by the algorithm. This is the backdoor in DUAL_EC_DRBG, and, since the US plans to use ML-KEM themselves (as opposed to the export cipher shenanigans back in the day), would be the only backdoor they could reasonably insert into a standard.

However, if you have a private key, that cannot be brute forced, you need to have a public key as well, and that public key needs to be large enough to prevent brute-forcing, and be embedded into the algorithm, as a parameter. And in order to not be brute forceable, this public key needs to have at least 128 bits of entropy. This gives us a nice test to see whether a scheme is capable of having cryptographic NOBUS backdoors: We tally up the entropy of the parameter space. If the result is definitely less than 128 bits, the scheme can at most be broken, but cannot be backdoored.

So let’s do that for ML-KEM:

This is the set of parameters, let’s tally them up, with complete disregard for any of the choices being much more constrained than random integers would suggest (actually, I am too much of a nerd to not point out the constraints, but I will use the larger number for the tally).

Degree of the number field: 8 bits (actually, it has to be a power of two, so really only 3 bits)
Prime: 12 bits (actually, it has to be a prime, so 10.2 bits (Actually, actually, it has to be a prime of the form $k\cdot n + 1$ , and it has to be at least double the rank times degree, and 3329 is literally the smallest prime that fits that bill))
Rank of the module: 3 bits (well, the rank of the module is the main security parameter, it literally just counts from 2 to 4)
Secret and error term bounds: 2 + 2 bits (really these come from the size of the prime, the module rank, and the number field degree)
Compression strength: 4 + 3 bits

In total, this gives us 34 bits. Counted exceedingly generously. I even gave an extra bit for all the small numbers! Any asymmetric cryptosystem with a 34 bit public key would be brute forceable by a laptop within a few minutes, and ML-KEM would not be backdoored, but rather be broken. There is no backdoor in ML-KEM, because there simply is no space to hide a backdoor in ML-KEM.

And just to be sure, if you apply this same counting bits of parameters test to the famously backdoored DUAL_EC_DRBG, you indeed have multiple elliptic curve points defined in the standard without any motivation, immediately blowing our 128 bits of entropy budget for parameters. In fact, it would be trivial to fix DUAL_EC_DRBG by applying what’s called a “Nothing up my sleeves” paradigm: Instead of just having the elliptic curves points sit there, with no explanation, make it so that they are derived from digits of π, e, or the output of some hash function on some published seed. That would still not pass our test, but that is because I designed this test to be way too aggressive, as the remarks in the brackets show, there is not really any real choice to these parameters, they are just the smallest set of parameters that result in a secure scheme (making them larger would only make the scheme slower and/or have more overhead).

So no, there is no backdoor in ML-KEM.

But didn’t NIST fail basic math when picking ML-KEM?

No. In fact, I wrote an entire blog post about that topic, but “no” is an accurate summary of that post.

I thought ML-KEM was broken, something about a fault attack?

There are indeed fault attacks on ML-KEM. This is not super surprising, if you know what a fault attack (also called glitch attack) is. For a fault attack, you need to insert a mistake – a fault – in the computation of the algorithm. You can do this via messing with the physical hardware, things like ROWHAMMER that literally change the memory while the computation is happening. It’s important to analyze these types of failures, but literally any practical cryptographic algorithm in existence is vulnerable to fault attacks.

It’s literally computers failing at their one job and not computing very well. CPU and memory attacks are probably one of the most powerful families of attacks we have, and they have proven to be very stubborn to mitigate. But algorithms failing in the face of them is not particularly surprising, after all, if you can flip a single arbitrary bit, you might as well just set “verified_success” to true and call it a day. Technically, this is the strongest form of fault, where the attacker choses where it occurs, but even random faults usually demolish pretty much any cryptographic algorithm, and us knowing about these attacks is merely evidence of an algorithm being seen as important enough to do the math of how exactly they fail when you literally pull the ground out beneath them.

But what about decryption failure attacks? Those sound scary!

ML-KEM has a weird quirk: It is, theoretically, possible to create a ciphertext, in an honest fashion, that the private key holder will reject. If one were to successfully do so, one would learn information about the private key. But here comes the kicker: The only way to create this poisoned ciphertext is by honestly running the encapsulation algorithm, and hoping to get lucky. There is a slight way to bias the ciphertexts, but to do so, one still has to compute them, and the advantage would be abysmal, since ML-KEM forces the hand of the encapsulating party on almost all choices. The probability of this decapsulation failure can be computed with relatively straight-forward mathematics, the Cauchy-Schwartz inequality. And well, the parameters of ML-KEM are chosen in such a way that the actual probability is vanishingly small, less than $1:2^{100}$ . At this point, the attacker cannot really assume that they were observing a decapsulation failure anymore, as a whole range of other incredibly unlikely events, such as enough simultaneous bit flips due to cosmic radiation to evade error detection are far more likely. It is true that after the first decapsulation failure has been observed, the attacker has much more abilities to stack the deck in their favor, but to do so, you first need the first failure to occur, and there is not really any hope in doing so.

On top of this, the average ML-KEM key is used exactly once, as such is the fate of keys used in key exchange, further making any adaptive attack like this meaningless, but ML-KEM keys are safe to use even with multiple decapsulations.

But wasn’t there something called Kyberslash?

Yeah. It turns out, implementing cryptographic code is still hard. My modest bragging right is that my implementation, which would eventually morph into BoringSSL’s ML-KEM implementation, never had this problem, so I guess the answer here is to git gud, or something. But really, especially initially, there are some rough edges in new implementations as we learn the right techniques to avoid them. Importantly, this is a flaw of the implementation, not of the mathematics of the algorithm. In fact, the good news here is that implementationwise, ML-KEM is actually a lot simpler than elliptic curves are, so these kinds of minor side channel issues are likely to be rarer here, we just haven’t implemented it as much as elliptic curves.

Okay, enough about ML-KEM, what about hybrids and the IETF?

Okay, this one is a funny one. Well funny if you likely deeply dysfunctional bikeshedding, willful misunderstanding, and drama. First of, what are hybrids? Assume you have two cryptographic schemes that do the same thing, and you distrust both of them. But you do trust the combination of the two. That is, in essence, what hybrids allow you to do: Combine two schemes of the same type into one, so that the combined scheme is at least as secure as either of them. The usual line is that this is perfect for PQC, as it allows you to combine the well studied security of classical schemes with the quantum resistance of PQC schemes. Additionally, the overhead of elliptic curve cryptography, when compared with lattice cryptography, is tiny, so why not throw it in there. And generally I agree with that stance, although I would say that my trust in lattice cryptography is pretty much equal to my trust in elliptic curves, and quite a bit higher than my trust in RSA, so I would not see hybrids as absolutely, always and at every turn, superduper essential. But they are basically free, so why not? In the end, yes, hybrids are the best way to go, and indeed, this is what the IETF enabled people to do. There are various RFCs to that extent, to understand the current controversy, we need to focus on two TLS related ones: X25519MLKEM768 aka 0x11EC, and MLKEM1024. The former is a hybrid, the latter is not. And, much in line with my reasoning, 0x11EC is the default key exchange algorithm used by Chrome, Firefox, and pretty much all other TLS clients that currently support PQC. So what’s the point of MLKEM1024? Well it turns out there is one customer who really really hates hybrids, and only wants to use ML-KEM1024 for all their systems. And that customer happens to be the NSA. And honestly, I do not see a problem with that. If the NSA wants to make their own systems inefficient, then that is their choice. Why inefficient? It turns out that, due to the quirks of how TLS works, the client needs to predict what the server will likely accept. They could predict more things, but since PQC keys are quite chonky, sending more than one PQC key is making your handshakes slower. And so does mispredicting, since it results in the server saying “try again, with the right public key, this time”. So, if everyone but the NSA uses X25519MLKEM768, the main effect is that the NSA has slower handshakes. As said, I don’t think it’s reasonable to say their handshakes are substantially less secure, but sure, if you really think ML-KEM is broken, then yes, the NSA has successfully undermined the IETF in order to make their own systems less secure, while not impacting anyone else. Congratulations to them, I guess.

But doesn’t the IETF actively discourage hybrids?

No. To understand this, we need to look at two flags that come with TLS keyexchange algorithms: Recommended and Mandatory To Implement. Recommended is a flag with three values, Yes, No, and Discouraged. The Discouraged state is used for algorithms known to be broken, such as RC4. Clearly ML-KEM, with or without a hybrid, is not known to be broken, so Discouraged is the wrong category. It is true that 0x11EC is not marked as Recommended, mostly because it started out as an experimental combination that then somehow ended up as the thing everybody was doing, and while lots of digital ink was spilled on whether or not it should be recommended, nobody updated the flag before publishing the RFC. (technically the RFC is not published yet, but the rest is pretty much formality, and the flag is unlikely to change) So yes, technically the IETF did not recommend a hybrid algorithm. But your browsers and everybody else is using it, so there is that. And just in case you were worried about that, the NSA option of MLKEM1024 is also not marked as recommended.

Edited to add (2026-06-30): In the meantime, the IETF has ended up updating the IANA registry flipping the bit for 0x11EC to recommended. Pure MLKEM1024 continues to not be recommended.

Lastly, Mandatory To Implement is an elaborate prank by the inventors of TLS to create more discussions on mailing lists. As David Benjamin once put it, the only algorithm that is actually mandatory to implement is the null algorithm, as that is the name of the initial state of a TLS connection, before an algorithm has been negotiated. Otherwise, at least my recommendation, is to respond with this gif

whenever someone requests a MTI algorithm you don’t want to support. The flag has literally zero meaning. Oh and yeah, neither of the two algorithms is MTI.

Edited to Add: Bonus Rounds

I got a few additional questions after this blog post was published, so I’m adding them here.

But won’t IETF/browser/server support open us up to downgrade attacks?

No. TLS key negotiation is robust against downgrade attacks. If your browser is some 10 years out of date, it might be vulnerable to downgrade attacks to SSL3, but in that case, I would also not worry about PQC.

In more technical detail, both the client’s list of supported ciphers and the server’s pick of cipher are added to the key derivation, so any manipulation of these choices will not result in a successfully negotiated TLS 1.3 key. TLS 1.3’s downgrade protections are even more fun, a server that could support TLS 1.3, but is negotiating TLS 1.2 needs to set a specific server random, to signal to the client that it would really have preferred TLS 1.3. Combined, this protects TLS 1.3 key negotiations, and so adding additional key agreements that neither the clients nor the server prefers will never be picked.

But why does the NSA not like hybrids?

This is a point I initially left out because all I can tell you is what the NSA says are their reasons to not like hybrids, and evaluate whether or not they are sensible, but I cannot tell you whether these are the actual reasons.

When it comes to their stated reasons, we have basically three that come up again and again:

Fear of combinatorial explosion leading to interoperability issues.
Lack of need due to using double encryption already.
A wish to signal to the rest of the world (including, apparently, some parts of the US government itself), that the NSA trusts lattice cryptography.

When it comes to the first point, their assessment is somewhat half right: We have seen a huge combinatorial explosion because people desperately wanted support for their favorite classical scheme and their favorite PQC scheme, leading to somewhat ridiculous numbers of hybrids defined by the LAMPS, CFRG and PGP working groups. However, TLS key negotiation has not fallen into this trap. There is only one hybrid algorithm, and everybody seems to like and use it. Note that CNSA2, the guidance by the NSA that favorites not using hybrids, well predates even the first experiments with Kyber that eventually morphed into 0x11EC.

For the second point, it is important to know that apparently, in case of very sensitive information, it already is common practice for intelligence operations to encrypt twice, with stacks implemented by completely separate vendors. This makes sense, it protects both against insider threats and against implementation mistakes, albeit at a substantial performance penalty (I don’t know if they just don’t have performance critical super sensitive stuff). With this capability already being present, you can create a hybrid cipher by having one layer classical and the other one PQC, and do not need a hybrid key exchange.

The last point is something that is more between the lines and has come up in some more informal conversations: The NSA needs to show that they, and the oodles of mathematicians working for it, actually trust the PQC algorithms. While for any other organization using hybrids might just seem like a prudent choice, if the NSA does so, we would rightfully question whether they actually trust PQC enough for it to be deployed at all. By mandating the use of pure lattice based cryptography, but only for their own systems, they show that they are willing to put their money where their mouth is. In the end, this means that if lattice cryptography/ML-KEM turns out to be broken it is their systems only that are at risk.

As I said, these to me are reasonable arguments, but they of course do not mean that they are the actual motivations of the organization. Also keep in mind, that, like any large organization, there is not necessarily a single NSA opinion, but rather just different individuals that have different motivations

But could you add a backdoor to MLWE, if you wanted to, and how would that look like?

Okay, nobody asked this, but I’m going to answer it anyways, since this is my blog post.

MLWE is the underlying mechanism behind ML-KEM, the more general mathematical description of the problem instead of the very concrete description of which bytes go where. And if you wanted to backdoor it, you actually could: I’m going to describe this for RLWE, the ring version of MLWE, the principle is the same, but for rings it’s easier to write up. The public key $a, t$ for RLWE is computed as $t = as + e$ with $a$ being any number field integer, and $s, e$ being short number field integers. To attack this, one first constructs the lattice consisting of all pairs $(p, q)$ of number field integers that fulfill $0 = ap + q$ , and finds the closest lattice vector in this lattice to a special solution $t = as' + e'$ . The difference of this closest lattice vector from our special solution $(s',e')$ is by definition of closest lattice point, short, and is equal to the original secret (or at least equivalent in power to it). The security of RLWE comes from the fact that this closest vector computation is hard for a randomly chosen $a$ . So if you wanted to backdoor RLWE, all you need to do is to not chose $a$ at random, and first chose short $p, q$ such that $a = -q/p$ modulo the prime. To any external party, it would be impossible to distinguish this type of $a$ from a genuinely created one, and the attacker could break any public key created from it (In fact, this is exactly how NTRU, a slightly different lattice based cryptography scheme, works).

So if you wanted to backdoor an MLWE type algorithm, all you would need to do is to specify a single matrix $A$ (the higher dimensional equivalent of our integer $a$ of RLWE) to be used for all keygen. This would be secure if the matrix was created randomly, and it would be impossible to distinguish a randomly created matrix from one containing the backdoor, unless all of lattice cryptography was broken, unless a nothing-up-my-sleeves transform was used, such as hashing a public seed to the matrix.

But: This would fail the test I outlined above; The matrix $A$ has 4 to 16 number field integer elements modulo (3329) (the ML-KEM prime), each having 256 elements (the number field degree), so such a parameter would add whopping 12288 to 49152 bits to our tally, showing that our test would successfully detects this kind of problem.

In case you wondered, the way ML-KEM choses it’s matrix is simple: Each public key gets their own, random matrix. Only the person running the keygen could potentially backdoor the matrix, and they have the private key anyways. And actually, they cannot even do that, since ML-KEM compresses the matrix by simply giving a derivation seed as the public key.

But are cryptographic NOBUS backdoors really the only option?

This was a question that I received several times. Fine, I proved that there cannot be a cryptographic NOBUS backdoor, but is that really the only option. In short, yes.

I think what threw people off here is that implementations can have much more sophisticated backdoors, since no test can guarantee the correct response to all queries, only to the queries tested. However, algorithms are different.

Any backdoor you want to insert into an algorithm needs to be inserted in the mathematical description of the algorithm itself, since anyone can chose to implement a given algorithm from scratch and expect it to interoperate. And this leaves you with only two options: A backdoor that theoretically anyone can find and exploit, aka a broken algorithm, or a backdoor that, even if found, can only be exploited by the person who put it there, aka a NOBUS backdoor. The former would make the algorithm unfit for your own use, as any such backdoor is likely to be discovered by interested parties sooner rather than later, given the intense scrutiny these algorithms are put under. The second type has to have some form of secret with enough entropy, otherwise an attacker can simply try all possible values that the backdooring party might possess to find the correct value, in other words, the backdoor would again be of the first type.

Edited to add (2026-07-01): A variant of this argument that I’ve seen is the question of whether there could be a break that only affects a certain percentage of keys (a so called weak keys issue). A state actor in possession of this break could then guard their own implementations against this break by adding some additional code to the key generation algorithm, while still being able to break a non-insignificant amount of traffic. However, this is also not a NOBUS backdoor: Everyone can find this problem by just reading the spec. That means it is only a matter of time until this issue is found, either by other adversaries or by the public. An adversary that only cares about their own traffic, and can highly control every single endpoint producing and consuming this traffic, might indeed not be overly motivated to share such a weakness. However, since the secure algorithm now no longer follows the spec (even if that deviance is not visible to the outside), this adversary can now no longer contract implementations out to a third party, as that third party would implement the original, flawed spec. In fact, as cryptography is usually indistinguishable from random (albeit, in the case of ML-KEM, not uniform random), they could run a completely different algorithm internally, and would not have to care about defensive uses of the standard at all. Even worse, if the adversary does not control all endpoints, or cares about traffic beyond their own, i.e. has any defensive interest in the standard at all, they would now again allow others to break this encryption. In the case where everyone but the adversary uses a hybrid scheme, not much of anything would happen as long as the classical cryptography remains secure. In general, the question of “what if there are some weak keys that only the NSA knows about” is more or less the same as “what if the NSA is pulling an elaborate bait and switch and just saying and contracting a bunch of ML-KEM work, but secretly using something else entirely”.

Cryptography Cryptography (Incomprehensible)

There is no Diffie-Hellman but Elliptic Curve Diffie-Hellman

Post author By Sophie Schmieg
Post date May 23, 2025
No Comments on There is no Diffie-Hellman but Elliptic Curve Diffie-Hellman

When I first learned about Diffie-Hellman and especially elliptic curve Diffie-Hellman, I had one rather obvious question: Why elliptic curves? Why use this strange group that seems rather arbitrary, with its third intersection of a line and then reflected? Why not use, say, the Monster Group? Surely a monster is better equipped to guard your secrets than some curve thing named after, but legally distinct from, a completely different curve thing!

I wouldn’t have a satisfying answer to this question until way into my graduate studies, and the answer makes a perfect blog post for the “Cryptography (Incomprehensible)” category of this blog, so here we go.

Groups

First, as a recap, what do we mean by Diffie-Hellman? Well, we need a group $G$ , and element of this group $g \in G$ with some order $p$ , so $p$ is the smallest positive integer with $g^p = e$ where $e \in G$ is the neutral element of the group. The we take our private key $s \in \mathbb{Z}/p\mathbb{Z}$ and compute our public key as $p = g^s$ . We can now compute a shared secret with some other public key $q = g^r$ as $K = q^s = g^{rs} = g^{sr} = p^r$ .

This should work for any group $G$ , no? So why doesn’t it work for the monster? Let’s take a closer look at the map that computes public keys from private keys (in fact, the rest of this blog will only focus on that map, the whole key exchange thing doesn’t really matter much, you could also compute Schnorr signatures or whatever). Given any group element $g \in G$ , we can always create the exponentiation map $\mathbb{Z} \to G\colon k \mapsto g^k$ . This map is a group homomorphism and is not necessarily injective. In fact, for finite groups, it is never injective. There is an old trick in algebra, where you can make any group homomorphism injective, by dividing the domain by its kernel. The kernel in question here is the ideal $(p)$ of all multiples of our order $p$ , so setting our key space to $\mathbb{Z}/p\mathbb{Z}$ is a sensible first step with no drawbacks, you don’t really get new keys by allowing all integers. The map is also not necessarily surjective, though. In fact, image of the map is exactly $\langle g \rangle$ , the subgroup of $G$ consisting of all multiples of the generator $g$ (This is why we call $g$ the generator, it generates the whole thing).

So we can somewhat simplify our private to public map by looking at $\mathbb{Z}/p\mathbb{Z} \to \langle g \rangle \colon k \mapsto g^k$ . With that our private to public key map is not just a group homomorphism, but also injective and surjective, in other words, a group isomorphism.

A tangent on finite simple groups

This gives us the first answer to “why not the monster group?”, although it is an answer that leads to even more questions. To get to that answer we have to ask an even deeper question, namely, what groups even are there? We want to work on a computer, so finite would be nice, and we have heard of small subgroup attacks, so something without normal subgroups, please. (If you don’t know what a normal subgroup is, it’s things you can divide out of a group). A finite simple group, in other words.

One of the most impressive results in 20th century mathematics has to be the classification of finite simple groups. The (not quite) ultimate answer to the question of “what groups even are there?”

To classify groups, we need to figure out what we mean by two groups being the same. Clearly, the numbers on a clock and $\mathbb{Z}/12\mathbb{Z}$ are the same thing, even if the clock uses Roman numerals. So simple relabeling of group elements should not count as a separate thing. In other words, we want to classify finite simple groups up to isomorphism.

The answer to this classification problem is absolutely mind boggeling: First, there are infinite families of groups. For example $C_p$ , the cyclic groups with prime order. Another such family are the alternating groups, from $A_5$ onwards. There are 16 more such infinite families, the others being said to be of “Lie type”. And then there are 26 groups that just, sort, of hang out by themselves. They are not part of an infinite family, they just exist, and there is no way to make them bigger or anything. The monster group mentioned above is the largest of them, and I totally did not chose it in the intro because it has a cool name and let’s me go on a tangent about sporadic groups.

But with this classification out of the way, we can now spot a problem (we could have spotted the problem before stating the classification result, but come on, it’s such a cool result!). Namely that, in order to even answer the question “what groups are there”, we had to say that we want to look at groups up to isomorphism. But our private to public map is a group isomorphism! To a group theorist, the idea that $\mathbb{Z}/p\mathbb{Z}$ somehow differs from $\langle g \rangle$ is preposterous, they both clearly are just $C_p$ the cyclic group of order p!

Group Objects

What where we talking about again, oh right Diffie-Hellman. So with the language of category theory under our belt, we can now point exactly to why “any group should works” is not working. As long as we are only looking at groups, we have to look at groups up to isomorphism (otherwise relabeling things would be a different thing), but our private to public key map is a group isomorphism. We are not looking at the right category. Diffie-Hellman does not work with groups.

We have to go deeper.

Enter group objects. I love this topic. So much in fact that I’ve written an entire 90 page rambling novel about it once and then people said I am now allowed to call myself a doctor.

The relevant part is in chapter 3, wedged in there between copious amounts of abstract nonsense.

To go through it in a bit more detail, we have some category $\mathcal{C}$ that we use as our base. We want to make some object $G$ in that category into a group. But remember, this is category theory, so $G$ is not necessarily a set. I mean, it will be a set in any practical situation, but we can pretend that we don’t know that, for some really, really cool math. And you should always pretend that you don’t know something when it leads to some really, really cool math. So the only thing we have is homomorphisms. In the cropped away parts of the chapter, we did narrow things down to locally small categories that have a terminal object, and named that terminal object $0$ . So, if we don’t have elements, how can we describe the group axioms? Well the first one, associativity, is about multiplication. Multiplication is a map, mapping two elements of the group to one element, so we can just say that we require that map to be a homomorphism in $Hom(G \times G, G)$ (Do not ask what $G\times G$ is if $G$ is not a set. It’s something about universal properties, and I promised there would be no universal properties). To define associativity, we need that $(G \times G) \times G$ is the same as $G \times (G \times G)$ . This is due to the universal property thing that we do not talk about, but it gives us this canonical isomorphism, i.e. those two constructions are the same, and they are the same in some very obvious way. Now associativity is just saying that the multiplication morphism, applied twice in one order, is the same as the multiplication morphism, applied twice, in some other order, what you see in the screenshot as the first property. Similarly, commutativity is also just a statement on the multiplication morphism, with yet another canonical morphism, explaining the fourth, optional property.

Way more interesting and cooler is the second property, which defines the neutral element. So how does one define a neutral element, when you try to pretend very hard that you do not have any elements? This is where our terminal object comes in. We designate an element of $e \in Hom(0, G)$ and call it the neutral morphism. Note that 0 is the terminal object, not the initial object, so e is not necessarily unique. In fact, if we take the category of sets as our base (which will give us back the groups we know and love), the terminal object, as discussed above, is the singleton set. A map from the singleton set into a set can be seen as pointing at one specific element in our target set. To put differently, an element of $Hom(0, G)$ in the category of sets is just an element of $G$ . Now we need to describe what makes the neutral element, well, neutral. It should interact with our multiplication morphism in some way. We now perform the most miraculous part of our definition. We want to define the neutral property, i.e. that $x \cdot e = x$ , but without mentioning $x$ , and instead phrase it as a series of maps. Let’s stay with $x$ for a bit, though, to come up with a battle plan. Since we want to talk about $x \cdot e$ , we want to make a statement about $m(x, e)$ , using the prefix notation instead of the more familiar infix. So we need to construct $(x, e)$ , ideally starting from $x$ , so we can have a nice equality in the end. So starting from a single element, we want two. There is a way to do this, by using the diagonal morphism $G \to G\times G \colon x \mapsto (x, x)$ . That’s not quite $(x, e)$ , but it at least has two components. The first component we can leave unchanged, in other words, we apply the identity to it. But how do we get rid of the second component? Well, we can map it to our terminal object 0. There is exactly one way to do that. Now that we are in 0, we can apply our neutral morphism $e$ to get back to $G$ . If we write $0$ for the only element in our terminal set 0, and write $e$ for the neutral element in our group, what we can do is a chain of maps like this: $G \to G \times G \to G \times 0 \to G \times G \colon x \mapsto (x, x) \mapsto (x, 0) \mapsto (x, e)$ . Now we can apply $m$ and expect to end up at the same place. In other words $m \circ (id_G \times (e \circ 0)) = id_G$ . And suddenly, we have described what a neutral element does, without having, you know, elements.

The last property, about the inverse, uses a similar construction, this time with another morphism that maps from the group object to itself, to find the inverse. I leave it to the reader to go through why it has to look this exact way.

Finding our category

And with that, we now finally have the tools to actually answer our original question. What is the thing that we map into and what options do we have for that? We have established that it is not a group, as groups would not be able to distinguish private and public key. We need some additional structure, something that tells us how we can operate on the elements of our thing, without saying that the $g \cdot g = g^2$ and nothing else. We still want something that behaves like a group, though, since the whole stuff with agreeing on keys and stuff fundamentally relied on there being some group things going on. We need a group object. But what is our category $\mathcal{C}$ we use as a base? It can’t be SET, since that would lead straight back to groups. We probably want something with finite objects, because computers aren’t very good at dealing with things that don’t fit into their little memories and disks and registers and stuff. We do want to be able to compute at the very least the multiplication morphism $m$ . We will need a tiny amount of handwaving, but I think it is not too hard to see that taking tuples of elements of finite fields as a base might be a good idea. Computations on these tuples would involve addition, multiplication, typical field stuff. Adding and multiplying elements of a field? It kind of looks like we want our morphisms to be polynomials! And, as it turns out, there is a category just like that, the category of algebraic varieties. The category of some tuples of finite field elements, where you map stuff around by applying polynomials. So our public key space should be the group objects in the category of algebraic varieties, usually called algebraic groups.

We have classified those. There are two main types, the linear groups and the abelian varieties. You can multiply them together, too but we are really interested in simple groups, so we want them in their most basic forms. First, the linear groups. They come in two flavors, the additive group $\mathbb{G}_a$ and the multiplicative group $\mathbb{G}_m$ . The additive group would work for Diffie-Hellman, if it wasn’t for this very efficient algorithm that can solve the discrete log problem in this group, named “division”. The multiplicative group we set aside and get to in the next chapter. That leaves us with abelian varieties. Varieties have an important invariant, called the dimension. So what are the abelian varieties of dimension 1? Elliptic curves. And they work great. We could go to higher dimensions (and in fact people have explored this), but it turns out you don’t really get anything but headaches from that. They result in secure constructions, but point counting is hard, and you get these “special divisors” which make constant time programming a nightmare, and get nothing in return. So that is why elliptic curves, even though at first glance looking like a quite arbitrary and odd choice for a “group” are actually pretty much the only choice there is.

What about finite field Diffie-Hellman?

This only leaves the multiplicative group. It showed up in our classification, and you probably know that it has been used in cryptography (under the name finite field Diffie-Hellman), so shouldn’t the title of this blog post be “There is no Diffie-Hellman but Elliptic Curve Diffie-Hellman and also Finite Field Diffie-Hellman”?

I mean yeah, but it would be an awful title, way less clickbaity, so let’s end this blog post by showing that Finite Field Diffie-Hellman is also just Elliptic Curve Diffie-Hellman. Well “Elliptic Curve” Diffie-Hellman, but I’m getting ahead of myself.

We’ve classified a lot of objects in this blog post, so what about one more? How can we describe all elliptic curves (up to isomorphism)?

To do this, we need the so called j invariant, defined as $j = 1728 \frac{4a^3}{4a^3 - 27b^2}$ , for an elliptic curve given by $Y^2 = X^3 + aX + b$ . Two elliptic curves are isomorphic (over the algebraic closure) if and only if they have the same j invariant.

How many j invariants are there? In other words, does every element of the base field show up? It turns out, yes. Given an element $j$ , if $j = 0$ , we can use the curve $Y^2 = X^3 + 1$ . For $j = 1728$ , we can use the curve $Y^2 = X^3 + X$ . For all other cases, we can just use $Y^2 = X^3 + \frac{4(1728 - j)}{27j}X + \left(\frac{4(1728 - j)}{27j}\right)^2$ . You can check that this will have the right j invariant.

In other words, two elliptic curves are isomorphic (over the algebraic closure) if they share the same j invariant, and there is one j invariant for every element of our base field. So we can think of our elliptic curves as points on the affine line, and somehow, the collection of all elliptic curves, themselves behaves like an algebraic variety. (This is an extremely powerful concept called the moduli space). Affine varieties always feel like they are incomplete, you really want to see if there are any points of infinity you could add to them. The affine line in particular really wants to be a projective line, and add its one point at infinity it is due. And it turns out, we can do this in a sensible way (constructing the moduli of stable curves of genus 1). This point at infinity? It belongs to the curve $Y^2 = X^3 - X^2$ . It kind of looks like an elliptic curve at first glance. Sure it’s not in Weierstraß form, but you can rectify that and give an isomorphic curve that has that form (I’m too lazy to do that right now). However, this is not an elliptic curve. In order for a curve to be elliptic, it needs to not have any singularities, and this curve has, …, well, here is a plot of it:

When a curve intersects itself in this awkward way, that’s a singularity. But other than that, it is perfectly fine. In fact, you can even use the very same formulas as you use for elliptic curves to do point addition and everything, just stay away from that singularity at $(0, 0)$ . And when you do that, you notice something strange. You can simplify the formula. It’s a bit on the lengthier side of algebra when doing it by hand the long way around, but very much doable. And when you simplify it, you notice that the points on the nodal curve (save for the singularity) can be described by a single non-zero field element, and that adding two points is the same as multiplying the corresponding field elements. In other words, this curve, the somewhat awkward reject elliptic curve, gives us the multiplicative group. Finite field Diffie-Hellman is also just Elliptic Curve Diffie-Hellman, just over a curve that accidentally twisted itself a bit. And we found that curve naturally, trying to hang out with its elliptic curve friends on the same moduli space (as opposed to the cuspial curve $Y^2 = X^3$ , with its way nastier singularity, which happens to give you the additive group instead, and does not hang out on this moduli space).

And so, far from being arbitrary, there was never an actual choice for Diffie-Hellman. There is only Elliptic Curves.

Tags ellipticcurves, incomprehensible

Cryptography Standardization

How not to format a private key

Post author By Sophie Schmieg
Post date February 19, 2025
No Comments on How not to format a private key

Last time I had an issue with standardization going into the, in my opinion, wrong direction, I wrote a blog post about it. Much to my own surprise, that actually worked, and as a reward you get more blog posts about somewhat convoluted standardization issues.

In order to entice you to at least skim the background section, here a bit of a teaser: Do you like undefined behavior? Do you enjoy two standard conforming implementations agreeing on all input bytes, and all entropy the system produced, but disagreeing on the output bytes? Then you might be happy with the proposed “compromise”. Otherwise, read on to find out what this is all about.

The Problem

The current conundrum is about the format to use for the private key of ML-KEM/ML-DSA when using these keys for certificates. It turns out that NIST left a tiny hole in their definition of these formats, which now causes some considerable problems. When defining the key generation algorithm of ML-KEM, NIST added this statement in the standard:

Excerpt of FIPS 203, describing that the seed used to generate a key can be used instead of the expanded format also described in the standard.

Similarly, for ML-DSA, there is the almost identical statement:

Screenshot of FIPS 204, slightly edited to remove page break

This means there are fundamentally, and NIST approvedly, two different representations for ML-KEM/ML-DSA private keys, in that you can either store the random seed used as input for the respective key generation algorithm or store the output of that algorithm.

Before we watch how we, as a community, managed to turn this rather benign issue into an absolute mess, let’s look at the two private key formats NIST has defined a little bit more closely.

Seeds

First, there are the seeds, 64 bytes in the case of ML-KEM, and 32 bytes in the case of ML-DSA. Aside from being really small, especially when compared to anything else in PQC, seeds have a number of other advantages. What the key generation does, in both cases, is to feed this seed into a number of cryptographic hash functions and eXtensible Output Function (XOF), basically a cryptographic hash function that produces any number of output bytes, instead of a fixed number as would be expected from a hash function. These outputs are then used to generate the actual key material.

This has a number of advantages. I already mentioned their compact size, but another property is that it is impossible to create a malformed key out of a seed. Since the seed is the input in hash functions and XOFs, the actual key material derived from it will depend on every bit of the seed equally, and changing a single bit changes the key in its entirety, there is no way of independently chose parts of the private key. This is why seeds defend against the attacks outlined in Kemmy Schmidt, and their main advantage over other formats (small caveat here is that for ML-KEM, the seed actually consists of 2 separate, concatenated seeds, and the format still allows for one of the two Kemmy Schmidt attacks. We could avoid that by deriving the ML-KEM seed from an even higher level seed, but NIST opted not to do that). Module Lattice schemes in particular have private keys $(s, e)$ such that the public key is $t = As + e\,\text{mod}\, q$ . s and e have to be drawn from a specific distribution in order for the key to be secure, with the key generation algorithm describing how to do so, meaning that the seed enforces this specific distribution, which can not easily be proven to be case afterwards.

But also, like with any cryptographic hash function, there is no going back. The seed is high entropy, so there is no way of reversing the hash functions and XOFs, and the only way to get the seed later is by saving it.

Semi-Expanded Private Keys

So if seeds are so great, why aren’t we using them everywhere? For elliptic curves, the reason is that the private key is extremely simple. Any integer in the correct range is a valid private key, so there isn’t really much of a key generation algorithm. The parameters for popular curves are set to be very close to powers of two, so there isn’t even much rejection sampling to be done. X25519 goes even further and simply ignores the extremely small bias introduced by not rejection sampling and declares every 32 byte string to be a valid private key.

For RSA, the situation is the other way around. RSA key generation has to compute a substantial amount of variables, including rejection sampling not one, but two large primes. Primality testing is fairly expensive, and numbers in this range (for RSA-2048) are only prime with a probability of roughly 1 in 700 (though this improves if you don’t test even numbers). All in all, the RSA key generation algorithm is painfully slow, and you really want to avoid running it from scratch every time you load a key, which has lead to RSA storing the results of the key generation algorithm as a cached computation. That being said, there would be a way to add seeds to RSA, by only noting down the succeeding roles for p and q, but all in all this ship has sailed, circumnavigated the globe several times, and is about to be decommissioned.

This brings us to the semi-expanded key formats of ML-KEM/ML-DSA. As I mentioned, we need to sample some parts of the private key from a specific distribution, but unlike with RSA, this distribution is rather simple (centered binomial in ML-KEM’s case, uniform in ML-DSA’s case). In particular, no rejection sampling of the private key material takes place, as the parameters are chosen in such a way that everything works out with powers of two. This makes ML-KEM’s/ML-DSA’s key generation algorithm very fast, and not much of a burden to run on load. In fact, the slowest part of the key generation algorithm is sampling the matrix $A$ . But this matrix is stored as a seed in both the public key and private key format anyways, so you don’t get around that step as is, and this is also the reason why I keep calling this key format semi-expanded. It expanded some aspects of the key, but left others to be recomputed.

As far as I can tell, the only reason this format exists to begin with, is as a compromise to make performance metrics look better in the competition, even though they do not apply in practice.

I discussed how cryptographic primitives mathematically speaking are tuples of functions before, with KEMs in particular being a triple $(G, E, D)$ and signatures being a triple $(G, S, V)$ and expanded how this mathematical definition warps and bends when hitting reality. To understand the difference between performance in competition conditions and performance in real life, we need to again look at the code that would implement such a triple, starting at the same code snippet from the last blog post:

typedef std::vector<uint8_t> Message;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();
Signature sign(PrivateKey sk, Message m);
bool verify(PublicKey pk, Message m, Signature s);

As you might have noticed when reading that previous blog post, I left some objects undefined in this API. While I specify that messages and signatures are binary strings, I say nothing about the keys, and leave them as generic objects. The competition simply defined these keys as byte strings as well, and used this API as their performance benchmark.

However, when we in practice want to perform multiple operations using the same key material, we would usually use some in-memory representation that fixes things like alignment and similar at the very minimum, so forcing these objects to be byte arrays would be overly limiting for actual implementations. What we really want is a function to parse and serialize objects, as common in object oriented programming, leaving us with this type of API:

typedef std::vector<uint8_t> Message;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();
Signature sign(PrivateKey sk, Message m);
bool verify(PublicKey pk, Message m, Signature s);

std::vector<uint8_t> serialize_private_key(PrivateKey sk);
PrivateKey parse_private_key(std::vector<uint8_t>  sk);
std::vector<uint8_t> serialize_public_key(PublicKey sk);
PublicKey parse_public_key(std::vector<uint8_t>  sk);

This is the API many libraries use in practice, and with this API, you are usually not very concerned about slight performance regressions in the parsing operations, as they amortize. You are of course not completely isolated from them, so completely regenerating RSA keys is not going to fly, but the rather straight-forward LWE keygen algorithm is not a problem. This leads to a situation where the competition produced a key format that is inferior in practical applications, and a good lesson in “you get what you measure”.

The Options

With the problem now explained to in unnecessary detail, we can finally discuss what is happening at IETF. In case you skip everything above, to recap, we have two formats, one short (seed) and one long (semi-expanded) one, and a one-way map between the two, mapping from seeds to semi-expanded keys, but no way to invert this mapping.

Which brings us to IETF now having to define a key format. The most elegant choice would be to just always use the seed, it is the more general format, and call it a day. But some folks have already started implementing HSMs before the standard was done, and they forgot to add the ability to read seeds. I have to admit, I don’t have particularly large amounts of sympathy for that position, implementing a pre-standard comes with risks, and given that the vast majority of implementations of ML-KEM and ML-DSA, including in HSMs, is yet to be written, we should not bend over backwards to accommodate over-eager adopters. But I do have some sympathy, and there are ways that we could make both sides happy.

The simplest would be to define two separate private key formats, using separate algorithm identifiers (OID) in PKCS8. The algorithm identifier in PKCS8 is shared between private key and public key, so this leads to the rather awkward outcome of having two public keys that use the exact same algorithm using different IDs. I still think that that is a acceptable outcome, and will expand on why in the last section, but for now let us just accept that there was significant pushback to this idea.

The next simple thing would be to have a way to switch between seed and semi-expanded key, using a construction like ASN.1 CHOICE. In other words, have a two value enum as first byte, followed by either the seed or the semi-expanded key, depending on the enum value. This format is somewhat incredibly awkward, but it is workable. It is how we define elliptic curve points for example, which contains an encoding for both compressed and uncompressed points. In all likelihood all that would happen is that we would see seeds winning out in the end as the more flexible option, with the occasional old hardware having to have its keys rewritten by a software layer.

Which brings us to the last option, which prompted the reason for me writing this blog post. Making decisions is hard, so what about a standard that just, like, did not make one? The suggestion is to use a list of two optional fields, one for the seed and one for the semi-expanded key, and leave the interpretation up to the importing library. This has two edge cases, one that is easily fixed, and one that is inherent to this lack of decisiveness. First, the key could be empty. This can be fixed by disallowing that option, and only allow seed, semi-expanded, or both to be present. But this leaves the other bad choice: both. If there is both a seed and a semi-expanded key present, you have not one, but two equivalent keys in your format. And nothing guarantees that they are the same key. Sure you could run the key generation algorithm and check, but if you do that, then why did you include the semi-expanded key to begin with, it is literally the output of this check.

In case you are wondering, NIST agrees with me on this one, and recommends the following check to be run when you have both seed and semi-expanded key available. However, they different from all other consistency checks, they opted to not make this check mandatory, mostly due to private key import being a rare occurrence, if I understand them correctly.

And of course, since the semi-expanded key is superfluous when the seed is present and this seed consistency check is run, the proposal is that the recipient does not perform it, leaving us with implementation defined behavior in the case of an inconsistent key.

On Well-Definedness

So why am I so vehemently opposed to any format that supports both seed and semi-expanded key material to be present in a single key blob without checking consistency? It boils down to the mathematical concept of well-definedness. Funnily enough, the concept of being well-defined is in itself not really well-defined, and depends on the context. In general it means that a definition should be internally consistent, and depending on context, the object defined should actually exist and be unique. For example, “the smallest integer bigger than 3.4” is a well-defined statement. This integer exists and is uniquely determined to be 4, so the definite article in the statement and the superlative are warranted. Compare this to the statement “the smallest real number bigger than 3.4” and you can see a statement that is not well defined. Real numbers are not well-ordered, and for any real number bigger than 3.4 there are infinitely many more real numbers, equally bigger than 3.4 that are smaller than the chosen number. There is no “smallest” number. In other words, when a mathematician says “this statement is not well-defined” they are implying that what is said is simply nonsensical, and not a legal statement to begin with, so it can be considered no further.

But aside from offending mathematicians, for standards there is a very practical concern here: The whole point of a standard is to describe how to interpret something. If two implementations of the same standard having vastly different behaviors would be acceptable, then why did you write a standard to begin with?

Implementation defined behavior has concrete security ramifications. The point of a private key format in particular is to transfer private key material from one system to another, which is already a fairly rare and not well-tested situation. It is easy to imagine one part of a system writing one field, and the other part of the system reading the other. In particular, since the seed is unrecoverable after the fact, a system that only has the semi-expanded key available might, instead of writing NULL in the seed field, write “NULL”, which the other system then prefers over the semi-expanded key that contains the actual key material, both compromising the system (as “NULL” is very brute-forcable) and potentially losing the actual private key forever.

This brings us to the last topic, the aforementioned resistance against using separate OIDs. The reason for this resistance has a historic predecessor, RSA. RSA keys have two different OIDs, one for “RSA” and the other for “RSA encryption”. This causes lots of pain, and with the lens of well-definedness, it is somewhat immediately obvious why: “RSA” is not a well-defined algorithm, but is actually four separate algorithms, and those algorithms aren’t even fully specified by the key format. We have RSA-PKCS1 for signatures, RSA-PKCS1 for encryption, RSA-PSS for signatures, and RSA-OAEP for encryption. The “RSA” key type is used for all four, and while the “RSA encryption” key type is used only for two of them. This is a classic problem of not properly defining your algorithm and then being surprised by the confusion it causes. Compare that to the ML-KEM/ML-DSA situation, where, if we used two separate OIDs, we merely would have an alias, with the explicit definition of the public key being the same for both OIDs. While it would be annoying to write an explicit “handle this case like the other case” everywhere, it would be nowhere near as bad as “handle this case somewhat, but not exactly like that other case”.

Cryptography

Unbalanced Oil and Vinegar, Part III

Post author By Sophie Schmieg
Post date January 8, 2025
No Comments on Unbalanced Oil and Vinegar, Part III

In Part I, we looked at the problem we want attackers of UOV to solve. In Part II we had plenty of oil and vinegar, but did not really discussed the whole unbalanced part of the scheme. So in this third part of this three part series, I will discuss why we are using Unbalanced Oil and Vinegar in particular, and discuss the Kipnis–Shamir attack on the original Oil and Vinegar scheme.

The paper that presented this attack is formulated in the language of matrices, but in my opinion, and showing my biases as a algebraic geometer, one can’t fully understand the problem when in this language, and I will instead present it in the language of bilinear forms, that I have build over the last two blog posts. Using bilinear forms is useful, since it allows us to present things without having to chose an explicit basis, which means that we do not really have to distinguish between the public and the private key all that much.

Preliminaries

Most of this section is a mini course in linear algebra and in particular bilinear forms and dual vector spaces. I will try to also formulate the attack purely in terms of matrices, so if this is going too far for you, you might still follow along with the attack, but as mentioned, using the language of bilinear forms in my opinion gives a deeper understanding.

First, as a quick refresher, as discussed in Part II, a UOV public key is a set of $m$ bilinear forms $(\cdot, \cdot)_i$ , which have the property that there exists a subspace $O\subset X$ such that $(x, y)_i = 0$ for all $x, y \in O, i = 1, \dots, m$ . This subspace $O$ is the private key we are looking for. The attack in question is a key recovery attack working with the public key only, so we don’t really need to repeat how signing works, and we will focus solely on how to recover this subspace given the bilinear forms, with Part II discussing the signature scheme.

We also need some additional vocabulary to talk about bilinear forms. Bilinear forms in positive characteristic behave somewhat weirdly sometimes, so it makes sense to prove stuff from scratch anyways.

First, we need to discuss non-degenerate bilinear forms. A bilinear form is called non-degenerate, if, for all vectors $x \in X, x\neq 0$ there is at least one vector $y \in X$ such that $(x, y) \neq 0$ . If you look at the matrix defining the bilinear form, this is equivalent of saying that this matrix has full rank.

The reason we care about non-degenerate bilinear forms is because they allow us, on a finite dimensional vector space at least, to “transpose” vectors, i.e. turn column vectors into row vectors. If you know quantum physics, what a non-degenerate bilinear form does is turn a ket vector into a bra vector. If you don’t know quantum physics, don’t worry about it, those are just terms the quantum physicists invented because they didn’t know enough about bilinear forms and dual vector spaces.

To understand dual vector spaces, all you need to know here is that linear forms, i.e. linear functions that output a single scalar, themselves also form a vector space over the same field, since you can add linear forms and multiply them with scalars. Conceptually, these are literally just row vectors, as mentioned above, since those can be multiplied with column vectors to give elements of the base field, i.e. row vectors define all the linear forms that exists (for a finite dimensional vector space. Infinite dimensional vector spaces are very, very different, and to take back my comment on quantum physicists a bit, those are what quantum physics deals with, but still, understanding dual vector spaces would make their field a whole lot more approachable). We write $X^*$ for this dual vector space.

To tie this to our bilinear forms, the important thing a bilinear form allows you to do is to “curry”, or partially evaluate it, that is given a bilinear form $(\cdot,\cdot)$ and an element $x \in X$ , we can look at the linear form $(x, \cdot)$ .

And for non-degenerate forms, doing this partial evaluation gives you an extra property, namely that, if $\sum_{i=1}^n \alpha_i (b_i, \cdot) = 0$ , we know that $\sum_{i=1}^n \alpha_i b_i = 0$ as well, since the linear form 0 is defined by being 0 for all vectors in the vector space, and linearity in the first argument means that $\sum_{i=1}^n \alpha_i (b_i, \cdot) = \left(\sum_{i=1}^n \alpha_i b_i, \cdot\right)$ , so by definition of non-degeneracy it follows that $\sum_{i=1}^n \alpha_i b_i = 0$ . What this means is that, if $(b_1, \dots, b_n)$ is linear independent, then $(b_1, \cdot), \dots, (b_n, \cdot)$ is linear independent as well. This proves the statement I made above of finite dimensional vector spaces having the same dimension as their dual, and means that every non-degenerate bilinear form gives us a way of mapping column vectors to row vectors. This might seem like a lot of work for literally just turning a page 90 degrees, but as I said, quantum physicists still pay the price for thinking they could turn papers better all by themselves.

Next, we need to talk about orthogonality. This term really only makes geometric sense in characteristic 0, but we can still use it algebraically: Two vectors $x, y \in X$ are called orthogonal if $(x, y) = 0$ . For a subvectorspace $V \subset X$ we define $V^\perp := \{x \in X\;;\;(v, x) = 0\, \forall v \in V\}$ . With this definition, we can rephrase the trapdoor of our bilinear forms as there being a subspace $O \subset X$ such that $O \subset O^{\perp_i}$ . Basically, a subspace that is perpendicular to itself in all bilinear forms of the public key.

While this is somewhat obviously nonsensically in a geometric way, we can rescue at least one property of perpendicularity into positive characteristic, and that is the dimension of the orthogonal space:

For a non-degenerate bilinear form $(\cdot, \cdot)$ , and a subvectorspace $V\subset X$ , we have $\text{dim}\, V^\perp = \text{dim}\, X - \text{dim}\,V$ . To see this, take a basis of $V$ . For every basis vector $b_i$ , we again look at the linear form $(b_i, \cdot)$ . This gives us $\text{dim}\,V$ many linearly independent linear forms, all of which vanishing on $V^\perp$ . We can consider these forms as a linear function $X \to K^{\text{dim}\,V}$ , and since they are linear independent, we know that this linear function has full rank. Vanishing on all forms makes $V^\perp$ the kernel of this function, and by the kernel-rank theorem, this implies that $\text{dim}\, V^\perp = \text{dim}\, X - \text{dim}\,V$ .

The Attack

With all of this out of the way, we can now look at the attack itself. So in the following, we have $m$ bilinear forms $(\cdot, \cdot)_i$ , defined over an $n$ -dimensional vector space $X$ with $n = 2m$ , i.e. there are as many oil vectors as there are vinegar vectors, and everything is “balanced”. We also have a secret, $m$ -dimensional sub vector space $O\subset X$ such that $O\subset O^{\perp_i}$ for all $i = 1, \dots, m$ , as discussed above. One thing to note here is that $\text{dim}\, O^{\perp_i} = n - m = m = \text{dim}\, O$ , so in this balanced case, and only in this balanced case, we further get $O = O^{\perp_i}$ , since a subvectorspace that has the same dimension as the space is equal to the space.

The main idea behind the Kipnis–Shamir attack is that bilinear forms are hard, but linear functions are easy, and that for two non-degenerate bilinear forms $(\cdot, \cdot)_1$ and $(\cdot, \cdot)_2$ there is exactly one linear function $\varphi_{12}$ such that $(x, y)_1 = (\varphi(x), y)_2$ for all $x, y \in X$ . We know this, because both $(\cdot, \cdot)_1$ and $(\cdot, \cdot)_2$ are non-degenerate, so both can map a basis of $X$ to a basis of the dual vector space $X^*$ , so $\varphi$ is the map given by this base change. More concretely, if $(\cdot, \cdot)_i$ is represented by the matrix $A_i$ , i.e. $(x, y)_i = x^T A_i y$ , then $\varphi$ , as a linear function, is represented by $(A_1A_2^{-1})^T$ , since $(\varphi(x), y) = ((A_1A_2^{-1})^Tx)^TA_2y = x^T A_1 A_2^{-1}A_2y = x^T A_1y$ . If we hadn’t done all this work about bilinear forms, we would now need to add a two line proof that this is invariant under base change. I instead opted for writing several paragraphs worth of linear algebra, which, as surely we will all agree, is far preferable.

So far, so unremarkable. You have a bunch of bilinear forms, you can define this linear function. But now, we can ask what $\varphi(O)$ is. We know that for any vector $x \in O$ , $(x, y)_1 = 0$ for all vectors $y \in O$ , so we know that $(\varphi(x), y)_2 = (x, y)_1 = 0$ as well, so we know that $\varphi(x) \in O^{\perp_2}$ . But as pointed out above, in the balanced case $O = O^{\perp_2}$ , so $\varphi(x) \in O$ . This means that $\varphi(O) = O$ (technically, I have not proven equality, but that follows pretty easily by applying non-degeneracy again). This means $\varphi|_O$ , $\varphi$ restricted to the subvector space $O$ is an endomorphism.

But for a linear function, restricting to a subvectorspace as an endomorphism is the same as saying that this subvectorspace is a product of eigenspaces. And we can compute the eigenspaces of a linear function. By itself that is only moderately helpful, since we don’t know which eigenspaces of $\varphi$ are part of $O$ and which aren’t. But we don’t just have one $\varphi$ , we have a linear function for every pair of bilinear forms. And we can intersect eigenspaces, and find the common eigenspace of multiple linear functions. Two random linear functions are very unlikely to share an eigenspace, and we have $m(m-1)/2$ linear functions sharing an eigenspace. So that common eigenspace is just $O$ . I describe how to compute that shared eigenspace below, but see the paper for more details.

To put it together, the attack algorithm works as follows:

Discard all singular matrices, as they correspond to degenerate bilinear forms.
Compute $A_i A_j^{-1}$ for enough $i, j$ .
Take an arbitrary linear combination of the computed matrices $C = \sum_{i, j = 1}^m \alpha_{ij}A_i A_j^{-1}$
Compute the characteristic polynomial of $C$ and factorize it.
Due to the common eigenspace, the characteristic polynomial will always have a factor of degree m. But it might have more factors. If there are more factors, discard and try a new linear combination. If you do not find such a matrix and factorization, continue at step 7.
By Cayley–Hamilton, the kernel of factor of the characteristic polynomial, evaluated at the matrix, is the eigenspace of the matrix. Since our matrix has a characteristic polynomial which only has two factors, one of the two eigenspaces is the oil space, which we can quickly verify.
Read the paper for why finding such a characteristic polynomial is very likely, so you never reach this step.

A geometric interpretation

I have to admit, when I first read about this attack, I found it extremely devastating. A polynomial time attack on one member of a family of schemes is a very frightening thing to see, and while the attack clearly requires n = 2m to work (Otherwise the whole eigenspace argument falls apart), at least I would like to see some form of an argument why n > 2m somehow is not just mitigating this specific attack, but also fundamentally has a structure that is somehow more complex than the balanced case is.

I think I have found such an argument, but as a fair word of warning, what follows will be some take no prisoners, deep, deep end of the algebraic geometric pool. Well the shallow end of the deep end, but you get my drift. I will try to keep the argument understandable for more general audiences, but you’ll have to take a bunch of things I say at face value. The whole argument is part of a deeper investigation I did on UOV, but that should properly be a different blog post, or maybe a CFAIL paper or something like that.

With the disclaimers out of the way, here we go:

If an algebraic geometer is given m polynomials, even if they are just quadratic forms, our basic instincts kicks in, and we study the geometric object obtained by setting all these polynomials to zero. I will call this the oil-and-vinegar variety, i.e. $S = \text{Spec} K[X]/((X, X)_i)$ . Note that even though these polynomials are homogenous, we need to keep our ~~animalistic~~geometric urges in check and not look at $\text{Proj}$ for now. This variety is a subvariety of $\mathbb{A}^n$ , and since it is given by m polynomials, it, assuming the polynomials are complete intersection, will be $n - m$ dimensional. For the non-geometers: This is similar how having m linear equations creates an n – m dimensional vector space, with “complete intersection” being the equivalent to things being linearly independent, only that stuff can go wrong in a lot more interesting ways, and in general stuff bends a lot more, but importantly, if you take random polynomials, they are very likely to be complete intersection, so if our oil and vinegar polynomials were not, that in itself would be remarkable. It’s unfortunately not really possible to check whether polynomials are complete intersection without computing Gröbner bases, see part I, but when I tried it the tiny cases I could compute Gröbner bases for, things were complete intersection up until n = 2m (for n < 2m, it cannot be complete intersection as we will see). But I didn’t compute how likely complete intersection is in general.

The other thing to notice is that $O$ , seen as a linear subvariety of $\mathbb{A}^n$ , is a subvariety of $S$ , since the quadratic forms, by definition, evaluate to 0 on $O$ . And $O$ is m dimensional. This shows that for n < 2m, $S$ cannot be defined by polynomials in complete intersection, since it’s dimension m is greater than n – m.

So in the balanced case, where n = 2m, $S$ is a m dimensional algebraic variety, with an m dimensional linear subvariety. In other words, an irreducible component of $S$ . The Kipnis–Shamir attack isolates this linear component from the other, nonlinear components of $S$ .

Artist’s impression of the OV-variety for m = 1, n = 2. You can see the linear subspace as irreducible component. Note that this is a polynomial of degree 3, so it is not actually a valid OV-variety, but it would look boring otherwise.

In the unbalanced case however, n > 2m, so $\text{dim}\,S = n - m > m = \text{dim}\,O$ . Which makes $O$ a closed subvariety with codimension greater than 0, i.e. $O$ lies on some, itself non-linear, irreducible component of $S$ . It does at least conceptually make sense to me that there might be an algorithm to tease out a linear irreducible component, but that linear closed subvarieties of non-linear components are much harder to pry away.

Plot of a m = 1, n = 3 OV-variety. There are plenty of linear subspaces to chose from (which would not usually be the case), but they are all located on a single irreducible component which is decisively non-linear. It’s still not the greatest picture, since the plotting software limits me to three dimensions only.

Tags cryptography

Cryptography

Unbalanced Oil and Vinegar Part II

Post author By Sophie Schmieg
Post date December 30, 2024
No Comments on Unbalanced Oil and Vinegar Part II

After Part I looked at the hard problem underlying Unbalanced Oil and Vinegar, it is now finally time to talk about the algorithm itself.

Verify

As with many signature algorithms, looking at the verification routine first is a good idea. The verification algorithm is usually simpler, and gives you an idea of what the signature scheme is trying to do, with the signing algorithm then showing you how it does it.

In the case of UOV, we use a paradigm called “hash and sign”, in many ways the simplest signing scheme paradigm, not using any fancy transforms. All we need is a trapdoor one-way function $f$ , i.e. a function that everybody can evaluate efficiently at any point $x$ , computing $y = f(x)$ , but where, given $y$ , finding an $x$ such that $f(x) = y$ is difficult, unless one knowns some extra information $k$ , in which case it is easy. In other words, there is a function $g$ that is easy to evaluate such that $f(g(k, y)) = y$ .

Given such a trapdoor one-way function, we can construct a signature scheme by simply making $x$ the signature of the (hash of the) message $y$ , i.e. the signature of a message $r$ is the value $x$ such that $f(x) = h(r)$ . The most prominent example of a signature scheme using this paradigm is RSA-PKCS1, where modular exponentiation is the one-way function, and the factorization of the modulus is the trapdoor information making it reversible.

As we discussed in part I, we want to use quadratic forms as our hard problem. At this point, it makes some sense to discuss different ways to write down quadratic forms. They are all interchangeable, and really which one is used is mainly a matter of what is most useful in a given context.

The most straightforward way of writing down a quadratic form is by writing it as a polynomial, i.e. by writing $f = \sum_{i = 1}^n \sum_{j = i}^n a_{ij} X_i X_j$ . Note that the second sum starts at $i$ , since we are operating over commutative fields, so we can always reorder things so that $i \le j$ . This will be useful later. Also note that we only use a homogenous polynomial, i.e. a polynomial without linear or constant components. This is both a simplification and something that does not matter in practice, as you could always homogenize the polynomial.

The second way to notate a quadratic form is using matrix-vector multiplications. If we put our variables $X_i$ into a vector $x = (X_1, \dots, X_n)$ , then $x^T A x$ will evaluate to the exact same polynomial as above, with $A$ being a triangular matrix. We can shift around the coefficients of $A$ that are not on the diagonal, as long as $A + A^T$ remains the same (if the characteristic is not equal to two, which will be a common caveat in this post).

Finally, looking at $x^T A x$ it makes sense to ask what would happen if we did not use the same vector twice, but instead kept two different ones. This gives us $x^T A y = (x, y)_A$ , a bilinear form, i.e. a function that is linear in both arguments and evaluates to a single field element. In other words, given a bilinear form $(\cdot, \cdot)_A$ , we get the associated quadratic form by evaluating at the diagonal. Note that for the bilinear form, the order very much matters, since $x_i y_j \neq x_j y_i$ , so in this case, we can no longer simply assume upper triangular matrices. In fact, usually, when studying quadratic forms, mathematicians restrict themselves to symmetric bilinear forms as the associated bilinear forms, i.e. forms with $(x, y)_A = (y, x)_A$ , corresponding to symmetric matrices $A$ . This gives a one-to-one correspondence between quadratic forms and bilinear forms, and is super convenient, with the slight downside of not working in characteristic 2. As mentioned, this is a more general thing, having to do with the fact that the degree of the polynomial equals the characteristic, something algebraic geometers call the wild case. And guess which base field UOV uses? That’s right $\mathbb{F}_{256}$ . Super nice for computers, since a single byte corresponds to a field element, but really inconvenient for the more theoretical analysis. We can mostly ignore this, but keep in mind that pretty much everything that uses bilinear forms might have an unwritten asterisk that goes into unreasonable detail about the situation in characteristic 2.

With all this on notation out of the way, we can finally write down the public key and the verification algorithm:

A UOV public key consists of $m$ quadratic forms, given as symmetric matrices $A_i$ , of dimension $n \times n$ . A signature of a message $r$ is a pair $(x, s)$ , with $x$ being an $n$ dimensional vector, and $s$ some salt, such that $x^T A_i x = h(s || r)_i$ .

Written this way, it is clear that the generic way of solving this problem would be equivalent of solving the multivariate quadratic equation problem. Indeed, the signature is literally given by a solution to the multivariate quadratic equation problem defined by public key and message. But keep in mind that just because the problem is formulated as a multivariate quadratic equation, that does not mean that you have to use generic solvers to attack it.

Note that, different from RSA, every message has more than one possible preimage, so even without the salt would allow for multiple signatures. By adding the salt we gain some additional security guarantees, in that it no longer matters whether the signer has signed the exact same message before, since with a different salt it results in a completely different challenge anyways.

Sign

It is now finally time to discuss the trapdoor used for UOV. After all, if our public key was completely random, it would indeed be impossible to forge a signature, but it would be equally impossible to actually compute a signature for the signer as well. So we need to first construct a special system of quadratic equations that can be solved easily, and then find a way to transform it into one that hopefully is indistinguishable from a randomly chosen one. We use the matrix notation discussed above for this. In particular, if the matrices $A_i$ have the form $\begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}$ , with $V_i$ being a symmetric $(n-m) \times (n-m)$ matrix, and $M_i$ being a $(n-m) \times m$ matrix. Written in this block fashion, it makes sense to distinguish the first $n - m$ variables, called the vinegar variables from the last $m$ variables, called the oil variables. If we look at what this form of matrix does to the polynomial, we see that having that zero block in the lower right means that while vinegar variables are multiplied with all other variables, oil variables are only ever multiplied with vinegar variables, and never with each other. We also conveniently have exactly as many oil variables as we have equations to solve. And solving the equations is now very easy: First, randomly chose the value of the vinegar variables. Partially evaluating the polynomials, we will now have constant terms (where vinegar variable was multiplied with vinegar variable), and linear terms (where vinegar variable was multiplied with oil variable), but crucially, we do no longer have any quadratic terms, as we did not allow oil variables to be multiplied with other oil variables. In other words, we have $m$ linear equations with $m$ variables to solve. Which we can do by applying Gaussian elimination. This system will not always have a unique solution, but that is reasonably unlikely, and if it doesn’t, we can simply chose a different set of values for the vinegar variables.

This gives us a system of quadratic equations that is easy to solve, and this will be our private key, and what we actually solve when signing.

So how do we switch from this private system of quadratic equations to the public system? Well, with a base change, of course. This is a linear function on our $n$ variables, mixing them around, explaining the name of the scheme. We can write this using a $n \times n$ matrix $B$ , which allows us to fully write down the private key and the signing algorithm:

A UOV private key is a $n \times n$ matrix $B$ such that the public key $A_i$ can be expressed as $B^T A_i B = \begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}$ . To sign, we first compute the hash of the salted message, chose random values for the vinegar variables, giving us $(v, X)$ as preliminary signature, leaving variables for the last m coefficients. We then compute $(v^T, X^T)\begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}\begin{pmatrix}v\\X\end{pmatrix} = v^TV_iv + v^TM_iX + X^TM_i^Tv = v^TV_iv + 2v^TM_iX$ , giving us $m$ linear equations, which we solve for $X$ , giving us a private signature $(v, x)$ . Since $A_i$ can be expressed as $B^T A_i B = \begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}$ , and we just computed $(v, x)$ such that $(v^T, x^T)\begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}\begin{pmatrix}v\\x\end{pmatrix} = h(s || r)_i$ , we now know that $h(s || r)_i = (v^T, x^T)\begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}\begin{pmatrix}v\\x\end{pmatrix} = (v^T, x^T)B^T A_i B\begin{pmatrix}v\\x\end{pmatrix}$ , so $B\begin{pmatrix}v\\x\end{pmatrix}$ is our signature.

Bonus: Self verifying public keys

In my opinion, the best way to understand an algorithm is to have fun with it, looking for interesting properties and silliness that it allows for or thwarts. I’ve already discussed how one can reconstruct a UOV public key from $n(n+1)/2$ message signature pairs, but UOV is an endless pit of funny properties, so here we can discuss how to construct amusing public keys, such as, for example, a public key that is an executable python script verifying an input signature using its own code as the public key. This, as a side benefit, will also explain how we can drastically compress UOV public keys as well.

To do this, we need to take the approach algebraists love to take, and get rid of the basis altogether, i.e. we now talk about some n dimensional vector space $X$ . So how can we describe the special property of our quadratic form, or its associated bilinear form, without explicitly referencing its matrix structure (and thus the specific basis)? Well, we made the matrix symmetric (since we ignore the whole “it’s actually characteristic 2, so it’s all weird” thing), and we know that symmetric matrices lead to symmetric bilinear forms, so we know that we are looking for m symmetric bilinear forms. The main property of the form in our special basis was that 0 in the lower right corner. The coefficients of a matrix of a bilinear form describe the result of the bilinear form applied to the corresponding basis vectors, i.e. $a_{ij} = (e_i, e_j)_A$ , so having a 0 in this position means that there are m linear independent vectors which, when put into the bilinear form in both positions, will evaluate to 0. So to get rid of the basis, we need an m-dimensional subvectorspace $O$ of $X$ , with the property that $(x, y)_i = 0 \;\;\forall x, y \in O$ for i from 1 to m. It is this subvectorspace which makes up the private key. Since it corresponds to the last m variables when choosing a basis, it makes sense to call this the oil space, containing vectors consisting of only nonzero oil variables above.

To continue our description of UOV without choosing a basis, we need to extend our oil space to the full vector space, by finding some space $V$ such that $V \oplus O = X$ . Thankfully, there are tons of such vector spaces, and, as it will turn out later that the choice of vinegar space does not matter at all.

With the vinegar space chosen, we can now reformulate the full signing algorithm, without the use of base changes:

We know that all vectors $x \in X$ have a unique representation as $v + o = x$ , with $v \in V, o \in O$ . So we know that $(v+o, v+o)_i = (v, v)_i + (v, o)_i + (o, v)_i + (o, o)_i = (v, v)_i + 2(v, o)_i$ , given the property of our bilinear forms. Choosing $v \in V$ randomly, we again get a linear equation per bilinear form to compute $o$ , and in particular, if $v$ is not very unluckily chosen, there is now only one $o$ that will give the right value for all $(\cdot, \cdot)_i$ . Since the decomposition is unique, if we had chosen a different vinegar space, there would be a different vinegar vector that would have resulted in the same signature (assuming again that we do not get unlucky and end up with an underdefined system of equations and reroll). But we chose the vinegar vector at random anyways, so other than determining the lower dimensional space of unlucky vinegar vectors, the choice of the vinegar space has no consequence for our signing algorithm, as long as it intersects the oil space trivially to form the direct product, and even the basis we chose makes no difference, since we only use this vector space for choosing a random element from it, an operation that is independent from the choice of basis.

Let’s now look at the oil space, and what consequences different choices of basis have here. We assume that the vinegar space and oil space are fixed as above, and that we have chosen a random vinegar vector such that the system of equations $(v, v)_i + 2(v, \cdot)_i$ has full rank. But then there is exactly one oil vector that solves for a particular right hand side, and that oil vector again does not depend on the basis we use for the oil space, just on the fact that it is in this oil space. So we can also chose the basis of the oil space basis as convenient. Note that, different from the vinegar space, this is not true for the oil space itself, which we had to secretly determine and now have no way of changing, just its basis.

Lastly, we can ask how likely it is for a given random $n-m$ dimensional sub space and $m$ dimensional sub space to intersect trivially. It turns out, this is again very likely (the chance of having a non trivial intersection is about one over the number of field elements). This means we don’t restrict our choice in oil space too much if we require it to intersect trivially with a specific vinegar space.

So, to put everything together: The choice of vinegar space and its basis has no real relevance to the signing algorithm, so we might as well chose the first $n-m$ unit vectors to span this space. This does not really restrict the choice of oil space, and within that space the choice of basis again does not matter. We obviously cannot chose this part of the basis as the unit basis, but we can chose the basis such that the last m coefficients agree with with the last m unit vectors, i.e. chose the basis of $O$ as $(o_i, e_i)$ , with $e_i$ being the m dimensional unit vectors. All in all, this means our base change matrix now looks like this: $B = \begin{pmatrix}I_{n-m}&B_{12}\\0&I_m\end{pmatrix}$ . Applying this base change to our public key as above, we get $\begin{pmatrix}I_{n-m}&0\\B_{12}^T&I_m\end{pmatrix}\begin{pmatrix}A_{11}&A_{12}\\A_{12}^T&A_{22}\end{pmatrix}\begin{pmatrix}I_{n-m}&B_{12}\\0&I_m\end{pmatrix} = \begin{pmatrix}A_{11}&A_{11}B_{12} + A_{12}\\B_{12}^TA_{11} + A_{12}^T&B_{12}^TA_{11}B_{12} + B_{12}^TA_{12} + A_{12}^TB_{12} + A_{22}\end{pmatrix}$ . So in order for the form to have the required shape, we need only that $A_{22} = -B_{12}^TA_{11}B_{12} - B_{12}^TA_{12} - A_{12}^TB_{12}$ .

All of this taken together, this means that the only part of the public key that actually depends on the private key is the lower right $m \times m$ block of the m quadratic forms. Which means that the rest of the public matrices can be chosen at random. The UOV competition entry cleverly uses this to make the public key much smaller, from $m\cdot n(n+1)/2$ bytes for the expanded public keys, to $m\cdot m(m+1)/2 + 16$ bytes for the compressed public key, using the 16 extra bytes as a seed to input into an XOF, to generate $A_{11}$ and $A_{12}$ . I, in turn, can use this property to keep the public key large, but hide amusing message in them, like the aforementioned python script.

Tags cryptography

Cryptography Cryptography (Incomprehensible)

Unbalanced Oil and Vinegar, Part I

Post author By Sophie Schmieg
Post date December 24, 2024
No Comments on Unbalanced Oil and Vinegar, Part I

Introduction

While there are many schemes discussed in the currently ongoing second onramp for PQC signatures, Unbalanced Oil and Vinegar (UOV) is both one of the most serious contenders and also an extremely curious scheme. In this three part blog series, we will take a closer look.

In a previous blog post, I looked at how to recover an UOV public key from loads of valid signatures. That blog post very intentionally left out a lot of details about the actual scheme, like why it is believed to be secure, or how the private key actually works.

It belongs to a larger family of multivariate schemes, which is most famous for having most of their members broken, with paper titles ranging from the rather dry “Cryptanalysis of the oil and vinegar signature scheme” to the very memeable “Breaking Rainbow Takes a Weekend on a Laptop“. Interestingly enough, for a field with this many broken schemes, Unbalanced Oil and Vinegar has so far stood the test of time. In this first part, we will try to understand why the underlying hard problem, multivariate quadratic equations, is hard.

What solves polynomial equations

If you want to cut this blog post short, you can just read the last section, about why the problem is NP-complete. It is independent from the rest of the blog post, and I think slightly easier to understand. Doing so however, misses out on building some deeper understanding of what it means to solve systems of polynomial equations.

If you attended any form of primary eduction, you have probably learned the formula for “solving” quadratic equations, $ax^2 + bx + c = 0$ is $\frac{-b\pm\sqrt{b^2-4ac}}{2a}$ . In fact, in German high school, we called this formula the “midnight formula” on account that you should be able to recite it without hesitation if your math teacher wakes you up at midnight and asks you how to solve quadratic equations. This approach is called “solving by radicals”. It solves a polynomial equation by expressing the roots of the polynomial with field operations and a special function $\sqrt[n]{a}$ , which returns the single positive real solution of $X^n - a = 0$ .

However, a problem soon starts occurring with this idea, without even making our polynomials multivariate (i.e. univariate, i.e. with a single variable): It does not work. While there are formulas for degree three and degree four univariate polynomial equations, with a thrilling history involving lots of drama and zero-knowledge proofs, there is, for a history equally steeped in very different drama, but without zero-knowledge proofs, no formula for quintic polynomials. As in, there is provably no way of solving a quintic polynomial equation with just the use of field operations and radicals over the rational numbers. (There also isn’t a way to do this over finite field, but for entirely different reasons, funnily also relating back to the same Galois guy).

If you take a step back, this is not super surprising. After all, the way I presented it above, all a radical means is assuming that you already have a way to solve an entirely different polynomial equation, and then making it weird by introducing such ill-defined concepts like the real numbers (yuck).

So what do we (as in algebraists) mean by solving a (univariate) polynomial equation? Well first we need to distinguish between reducible and irreducible polynomials. If the polynomial is reducible, than factoring it will give you new polynomials, and each solution of any factor is a solution of the original polynomial. If the polynomial is irreducible however, we just, sorta, do nothing, and just assume that $x$ is a solution, and try to express things with respect to that solution.

For UOV in particular, we are concerned with rational solutions, i.e. solutions that themselves are part of the base field. Thankfully, despite all of this rambling above, those are always well defined, even if you don’t have a formula for it: They are just the roots of the linear factors of our polynomial. When it comes to finding rational solutions, the good news do not stop here: Factoring polynomials (both over finite fields and over the rationals, and also over all sorts of other fields like $\mathbb{Q}_p((\zeta))$ ) is a problem of polynomial complexity, so there are efficient algorithms for computing the factorization, and by extend finding the rational solutions, of any univariate polynomial. And if you are a German high school student, after reading this section, you are officially allowed to answer with “it’s complicated”, roll over, and get a good night’s rest when your teacher wakes you in the middle of the night.

Gröbner bases

So far for solving univariate polynomial equations. But, as the name of the field “multivariate cryptography” suggests, what we really are interested in are multivariate polynomials, i.e. polynomials with multiple variables. We also need to solve systems of equations, i.e. multiple equations simultaneously. Here we first encounter something really weird about multivariate polynomials, by looking at multivariate monomials.

While univariate monomials, i.e. expressions of the form $X^n$ are a total well-order, i.e. all monomials compare with each other through divisibility, and there is a smallest monomial (1), the same is not true for multivariate monomials: We still get a natural partial order via divisibility (e.g. $X^3Y^5$ is divisible by and ergo bigger than $X^2Y^3$ ), not all monomials compare anymore, (e.g. $X^3Y^5$ is neither divisible by nor divides $X^2Y^6$ ). This quickly becomes a problem, for example division with remainder is no longer well-defined.

We can, however, fix this by extending our existing partial order (which, as an aside, is a lattice, which has nothing to do with lattices of lattice cryptography, which in turn for ML-KEM and friends are an order, which has nothing to do with the comparison thingy called order. Mathematicians are very bad at naming things.) to a total order. The easiest way to extend the order is by using the lexicographical order, where we first compare the degrees of the first variable, and if equal, compare the second variable, and so on, until we find differing degrees, which then decide which monomial is bigger (e.g. $X^3Y^5$ is bigger than $X^2Y^6$ , but smaller than $X^3Y^6$ ). This is not the only possible order, and the choice of order will become very important later.

With the order chosen, we can now go ahead and define division with remainder again, by taking the leading monomial of the divisor, checking whether it divides the leading monomial of the dividend, and if so, subtracting the appropriately multiple of the divisor from the dividend. We can do this repeatedly with a whole set of divisors, to end up with a remainder, but we will notice that the order of these divisors matters, and that in general our division with remainder algorithm lost a lot of the nice properties that it had in the univariate case.

Enter Gröbner bases. These are special sets of polynomials, for which division with remainder regains at least some of the useful properties they had in the univariate case. In particular, when using a Gröbner basis, the remainder is a canonical representative of the ring of all polynomials modulo the ideal generated by the set of polynomials. Why do we care about this? It turns out that there is a rather intimate relationship between the modulo operation, and the set where all polynomials simultaneously vanish, i.e. the solution of our system of multivariate polynomials, called the Nullstellensatz. You just know how important it is because we didn’t translate it from Hilbert’s original German. The details here go far beyond this blog post (as you might have noticed when I started talking about generating ideals etc, without warning the reader that they’ll need commutative algebra to proceed), and are thankfully not necessary to understand the basics.

For us, interested in particular rational solutions though, there is some value in looking more deeply into how a Gröbner basis for the lexicographical order looks like:

Given a system of polynomial equations (of dimension 0, don’t worry about that, though, it’s a technicality), the Gröbner basis with respect to the lexicographical order is an equivalent set of polynomial equations (i.e. each solution of one is a solution of the other), that specifically has the form $(f_1, f_2, \dots, f_n)$ , with $f_n$ being univariate in $X_m$ , $f_{n-1}$ being bivariate in $X_m, X_{m-1}$ , etc.

In particular, given such a Gröbner basis, we can find the rational solution by solving the last polynomial (which is univariate), taking each zero, partially evaluating the second to last polynomial obtaining another univariate polynomial, solving that, and so on. Note that this algorithm by itself might already be exponential: Nothing guarantees us that any given zero will end up with a valid solution after having factored tons of univariate polynomials, but in practice that is not usually the problem.

Finding Gröbner bases

What is the problem though, is finding the Gröbner basis to begin with. In fact, this problem is so complicated and interesting, that I decided to write an entire blog post about it, in case you haven’t noticed that I have not really talked about UOV so far, and won’t really mention the scheme other than as a motivating factor for all this until part 2.

So let’s finally look at the algorithm to find Gröbner bases, Buchbergers’s algorithm (there are slight improvements on this algorithm called F4 and F5, but they don’t drastically change things). The algorithm is simple enough to write it out in its entirety here:

buchberger(set of polynomials S) ->
list of polynomials S'
  for every pair (p1, p2) in S:
    let m1 be the leading monomial of p1
    let m2 be the leading monomial of p2
    let m be the lcm of m1 and m2
    compute p = m/m1 p1 - m/m2 p2
    reduce p by S with quotient and remainder
    if p != 0, add p to S
  reduce every element of S by S without it
  return S if the loop adds no new polynomials

This algorithm is a beauty! Notice that, the way it is written, it is not even clear that it terminates, much less what complexity it has. The “simplest” argument I know as to why it terminates goes something like this: If we look at the ideal generated by the leading monomials of S, then the element p we add in each loop iteration, will either have a new leading monomial (otherwise it would have been reduced), not yet part of this ideal, or it will reduce to 0 and not be added. So for each loop iteration, this ideal of leading monomials strictly grows. But polynomials rings are Noetherian, and as such, every strictly ascending chain of ideals will eventually terminate. At that point, the algorithm returns. While this proves that Buchberger’s algorithm always terminates, it does so by invoking one of the strangest properties of commutative algebra, and it does so without giving a constructive proof which could be used to determine the runtime complexity. One can compute the runtime complexity, by doing things more carefully, and arrives at the wonderful terms like $O(d^{2^n})$ for the degree of the polynomials in the eventual basis, so yeah, it might be a while, and even just writing down the basis is not trivially possible. In fact, in practice it is the space complexity of Buchberger’s algorithm which destroys all hope of ever getting a solution far before the time complexity becomes relevant.

And yet, algebraic geometers are so desperate for Gröbner bases, that they will continue running this algorithm for their current pet problem, based on nothing more than just hope. And it turns out, that hope is not always misplaced. I mentioned monomial orders before, and it turns out that Gröbner basis computation very sensitively depends on which monomial order one choses. A problem which will just run out of memory in one order might finish in seconds in another. You can also vary when you reduce which polynomial by which set, and come up with all sorts of fun distributed computation problems that will greatly influence the runtime of this algorithm. And what is nice is that once you have one Gröbner basis, you can quite efficiently compute other Gröbner bases as well, so nothing stops us from first computing the Gröbner basis in the more efficient graded lexicographical order, and then switching to the desired lexicographical order for cheap.

And this is the biggest weakness of multivariate cryptography. Where lattice cryptography has its nice worst case to average case reduction, multivariate cryptography has a very hard worst case problem, but a very finicky to control average case. So finicky in fact that often authors will just state how much RAM they spend on trying to find a Gröbner basis, since there is no better way of estimating runtime.

Multi-variate quadratic equations solving is NP-complete

Now to the last, and as promised independent section of this blog post, showing that solving multivariate quadratic equations is NP complete. We do this by reducing the problem to SAT, boolean satisfiability. For conciseness, we only do this for multivariate polynomials over $\mathbb{F}_2$ , the field with 2 elements, but showing it for every other field is not substantially different.

We do this, by looking at one multivariate polynomial in particular: $f(X, Y) = XY + 1$ .

Since our field has only two elements, we can quickly check what $f$ evaluates to: $f(0, 0) = 1, f(0, 1) = 1, f(1, 0) = 1, f(1, 1) = 0$ . In other words, if we associate 0 with false and 1 with true, f is equal to NAND. Famously every boolean circuit can be expressed just by using NAND gates, and wiring up gates is the same as chaining function calls, which in turn is the same as substituting polynomials. This means every boolean expression equals one multivariate polynomial, with the degree being exponentially bounded by the circuit depth. In order to get back to a quadratic system of polynomials, we now introduce a new helper variable Y for each variable X that has a degree higher than a square, by adding the polynomial $X^2 - Y$ to our system of polynomials. This allows us to halve the degree of the polynomial again, and taken together, we get a polynomial bounded system of quadratic equations (and if we do both steps together, we never explode the degree of the polynomials in illegitimate ways). This reduces SAT to MVQ. On the other hand, given a possible solution, verifying it can be done simply by evaluating quadratic polynomials, an operation that itself is polynomial in time, so MVQ is in NP, making it NP-complete.

But remember, this only goes for generic systems of multivariate quadratic equations, and is no proof that UOV, which we will discuss in the next part, is even worst case hard, much less does it make statements about the average case complexity.

Tags cryptography

Cryptography

HashML-DSA considered harmful

Post author By Sophie Schmieg
Post date November 5, 2024
No Comments on HashML-DSA considered harmful

I ~~mentioned~~ ranted about this topic as a section of a previous blog post (at the very end), but the topic keeps coming up, so I am escalating to a full blog post, since obviously that will help with all these people who are wrong on the internet standardization.

The Problem

Prehashing is a paradigm often used within the context of digital signature schemes. To understand where the problem is, let’s start with the normal definition of a signature scheme, as used by cryptographers, as a tuple of three functions (G, S, V), with the following mappings:

$G\colon 0 \overset{R}{\rightarrow} \mathcal{K} \times \mathcal{P}$ (Key Generation)

$S\colon \mathcal{K} \times \mathcal{M} \overset{R}{\rightarrow} \mathcal{S}$ (Signing algorithm)

$V\colon \mathcal{P} \times \mathcal{M} \times \mathcal{S} \rightarrow \{\top, \bot\}$ (Verification algorithm)

In order to be secure, you’ll need some more stuff, like saying that a signature produced by S will verify when plugged into V with the same data, but that stuff is boring (not really boring) and Dan Boneh has already written it up in his book, so I’ll skip the details here.

As you can see, in the world of mathematics, where everything is perfect and wonderful, there are no hashes anywhere, so to understand what prehashing is about, we will unfortunately go a layer deeper, and pretend to implement these functions in made-up pseudo-code which happens to vaguely resemble C++.

typedef std::vector<uint8_t> Message;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();
Signature sign(PrivateKey sk, Message m);
bool verify(PublicKey pk, Message m, Signature s);

While we now specify that messages and signatures are byte arrays, this API is still very much the same as the mathematical description of a signature algorithm, and no hash function is anywhere to be seen. But there is a subtle problem with this API, or at least something that can be a subtle problem: While PrivateKey, PublicKey, and Signature are all types that, for any given algorithm, are constant in the size they take in memory, (or at least bounded, looking at Falcon), the message can be literally anything, from a few bytes to theoretically petabytes worth of data. This is especially a problem if you want to hold the private key in a separate device, such as an HSM or a Cloud service, and cannot easily transmit or process large quantities of data.

We mainly care about signing here, since private keys are the ones that need protecting, so a first attempt could be to introduce a streamed version of sign():

typedef std::vector<uint8_t> Message;
typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init(PrivateKey sk);
void sign_update(State* st, MessageChunk m);
Signature sign_finalize(State* st);

bool verify(PublicKey pk, Message m, Signature s);

There is still no hash to be found, and the state should probably be some object oriented thing, but we can now solve the problem: To sign a large piece of data, we call sign_init() to create a new stream, call sign_update as often as we need to, updating our state, and call finalize to get the signature. While this is no longer the exact same set of methods as the mathematical definition had, we can see the init/update/finalize pattern as a way to implement a streaming interface.

And there still is no hash. And while this interface is nice and all, it means that our HSM and/or Cloud service now need to manage state, which while possible, is still a bit of a problem. But we can make one more, somewhat more radical change to our API, by delaying at which point the private key gets involved:

typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init();
void sign_update(State* st, MessageChunk m);
Signature sign_finalize(State* st, PrivateKey sk);

bool verify(PublicKey pk, Message m, Signature s);

Assuming our State object is again bounded in size, we now could do sign_init() and sign_update() directly where the data lives, and only have sign_finalize() happen on the HSM/Cloud Service, with the State object itself being transmitted. Note that we still need to trust the caller of sign_init() and sign_update(), since the remote side would no longer be able to tell what it is signing, but we only have to trust that caller to correctly compute the State, and do not need to expose the private key there.

And if we squint a bit, we can now see the hash: With the addition of just one tiny function, we can make it more explicit:

typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Hash;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init();
void sign_update(State* st, MessageChunk m);
Hash sign_finalize_local(State* st);
Signature sign_finalize_remote(Hash message_identifier, PrivateKey sk);

bool verify(PublicKey pk, Message m, Signature s);

Now we can nicely see that signing can be written as the composition of a hash function call, followed by the actual signature algorithm.

So as long as we just keep our interface like this, we can use HSMs and Cloud services and all that good remote oracle stuff, without running into pesky data transmission problems sending over half a Petabyte.

The Problem with the Problem

This, however, does not readily work with every signature scheme. As stated, the signature scheme must decompose into a hash function call and a call to sign the output of that hash function, and not all signature schemes do. In fact, of the two signature schemes NIST just standardized, exactly zero decompose like that, in part because NIST actively encouraged having a property called “non-resignability”, which guarantees that the message identifier by itself, at least for messages with enough entropy to not be brute-forcable, is not enough to create a signature that verifies under a different, possibly attacker controlled key.

But people need the ability to use HSMs and remote oracles, and pushed for NIST to add it to the schemes, so NIST decided to instead of standardizing two schemes, they actually standardized four: ML-DSA, SLH-DSA, HashML-DSA, and HashSLH-DSA, with the latter two being absent from the draft standard. And those four standards are mutually exclusive, have different properties, and are not very well described as separate algorithms. I will go more into HashML-DSA (and HashSLH-DSA) in the last section of this post (which, just as a teaser, is called “The Bad Place”), but, with the hopefully not as futile as it feels wish to prevent us going there, present the in my opinion far better alternatives first.

The Solution (Part 1)

When introducing the whole problem, I somewhat intentionally named only one variable, the message_identifier, and presented the steps leading up to the requirement of decomposing the signature scheme into a hash function and the signing of the message identifier.

The main trick to get to this decomposition was moving the private key from init to finalize. Obviously, any one roundtrip protocol will need the private key to be moved to finalize, since the signature cannot be computed before all message data is accumulated, forcing finalize to be the remote function call that uses the secret key as a side input.

But do we have to make sign_init() to be a method that is completely independent of the rest of the signature scheme? After all, we do have a very conveniently public parameter that we could feed the signing procedure from the very start:

typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Hash;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init(PublicKey pk);
void sign_update(State* st, MessageChunk m);
Hash sign_finalize_local(State* st);
Signature sign_finalize_remote(Hash messsage_identifier, PrivateKey sk);

bool verify(PublicKey pk, Message m, Signature s);

In other words, the message identifier is no longer independent of the key, but only depends on the public key, and not the private key, so the private key can stay remote, but the resulting message identifier is still unique to our signature. A very convenient property, if say, you wanted to not be able to create a signature for a different public key using only the message identifier for messages with sufficient entropy.

And because it turns out to be such a convenient way to achieve this property, it turns out that ML-DSA, when it was still a little contestant going by Dilithium and trying to impress NIST, did exactly this.

And NIST thought that this was a good idea as well, so they left the following comment in Algorithm 7, line 6 of ML-DSA:

message representative that may optionally be computed in a different cryptographic module

And if you look at the rest of ML-DSA, things indeed only depend on this message identifier µ from there on out, and also notice that tr is a hash of the public key, meaning that it is possible to reorder the program in such a way that µ is computed without any knowledge of the private key at all, and can be passed to the HSM/remote oracle (aka the “cryptographic module”, in NIST speak) while being computed in a different “cryptographic module” such as a software library like BoringSSL.

In other words, and to be very explicit, here is how you could implement the above API for ML-DSA:

typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Hash;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init(PublicKey pk) {
  State st = SHAKE256_init();
  SHAKE256_absorb(&st, SHAKE256_oneshot(pk.serialize(), 64));
  SHAKE256_absorb(&st, "\0\0");
  return st;
}

void sign_update(State* st, MessageChunk m) {
  SHAKE256_absorb(&st, m);
}

Hash sign_finalize_local(State* st) {
  return SHAKE256_squeeze(st, 64);
}

Signature sign_finalize_remote(Hash message_identifier, PrivateKey sk);

bool verify(PublicKey pk, Message m, Signature s);

Or more explicitly: You can prehash ML-DSA with the hash function SHAKE256(SHAKE256(pk, 64) || 0x00 || 0x00 || m), where pk is the public key, and m is the message. Note that this hash function is for the case of using an empty context, but I leave figuring out the hash functions with context to my capable reader (Hint, there are two 0x00’s here for a reason).

The Solution (Part 2)

I want to be very clear: The first part of the solution solves the situation for ML-DSA. Pretty much completely. In my opinion, part 1 of the solution should be the most preferred way for solving this problem whenever we encounter it in its pure form. It allows the signing infrastructure to make choices about the locality of data and private key without having to bother the verifiers with it, allowing for someone to change it later, without having to update every verifier in existence. It also gives us this nice non-resignability bonus property, for whatever that is worth.

There are a few other things to tie up, though. First, this does not work for SLH-DSA, which is a multipass signature scheme, and has to hash the input data multiple times, so even the first API refinement would have led to an unbounded state object. Second, you might want to have resignability, i.e. you might want to produce signatures for the same piece of data with multiple public keys at once, even if that makes some academics sad. And third, you might want to sign the actual petabytes I promised above, and find yourself in need of a tad bit more parallelism than a single output stream can provide.

Thankfully, all three scenarios actually have the same solution: You need a protocol. By protocol I mean any description of the larger context your signature scheme is used in, and not necessarily an interactive protocol of two or more online parties. And the best part is: you already have a protocol, since a signature scheme, by itself, is pretty useless, and needs some larger framework that defines what data in what format is signed by whom, how public keys are distributed and trusted, etc. You never deploy a signature scheme by itself, it is always part of a larger system. And with just a few tweaks to that larger system, you make all the rest of your prehashing woes go away.

For example, when defining what data is signed by your signature algorithm, you could decide to simply sign the hash of some data, instead of the data itself. You could even sign the root of a Merkle tree, if say, you wanted to sign a petabyte of data using massively parallel data stores. You could sign the hash of your large data blob together with some unhashed metadata, which would give the remote side some auditing capabilities. The possibilities are endless. While I often blog about the rather minute details of primitives, it is actually this endlessness of possibilities of the protocol space of cryptographic system that is by far the largest part of my day job. Using the protocol to define a system that does not have to transport immense amount of data over constrained networks or to constrained devices is something that you will have to do in many instances of this problem, and in the simplest case this can be as easy as writing “The signer signs a hash of the data provided”. Yes, this means the data is hashed twice, which is a type of inefficiency, but these kind of tradeoffs are commonplace in protocol design.

The Bad Place

This leaves us with the bad place, HashML-DSA and HashSLH-DSA. Those two signature schemes are, in my opinion, what happens when protocol questions get pushed into the primitive layer, and leave a situation that is either ill-defined, or unnecessary. Note that even when pushing this protocol question into the primitive layer, you do not actually absolve yourself from the need of having a protocol defining the larger deployment of a signature scheme, but you merely make the protocol more complicated to analyze, as the primitive is now diverging from the neatly defined building block it is supposed to be.

The way HashML-DSA and HashSLH-DSA are defined, there are two different ways of signing the hash of some data. Option 1 is to just sign this hash, using the method described in the previous paragraph, by declaring the hash to be the actual data. Option 2 is to use HashML-DSA, and invoke a special version of ML-DSA that flips a bit and adds an OID.

The verifier will need to know which option you used, since, different from the solution (part 1), the resulting signature will differ. So in reality, both scenarios are actually protocol decisions, and nothing has been gained by the use of HashML-DSA.

But since HashML-DSA/HashSLH-DSA are no longer actual signature schemes as defined by the mathematical definition above, but require another parameter, the hash used, as a side input, you now have the problem of how to communicate this side input to the verifier, with two basic options: Either, you make it part of the public key, or equivalently, the verification logic, in which case it is simply superfluous, since your verification logic could have just said to first hash the data and then verify the signature on the hash. Or, you put the choice of hash with the data, and have it as an untrusted side input. The latter is the worst case scenario, because a signature scheme is not guaranteed to be secure if the payload describes how to verify itself. This is the problem with JWT, and the problem of countless security vulnerabilities in existence, and an extremely common mistake to make. The mathematical proofs used to show a signature scheme does not permit forgeries simply do not apply when the signature scheme is not fully defined. In the case of HashML-DSA, this, as far as I can tell, seems to mostly be a theoretical problem, as the hash OID should prevent forgeries, even if a weak hash function has been used in the past, but the mere fact that it encourages sending information necessary to define the verification algorithm through an untrusted channel, is a net negative, given how the whole thing is a completely unnecessary in the first place. We would be best off by ignoring the existence of the HashML-DSA/HashSLH-DSA variants completely, and instead operate on well defined signature schemes at all times.

Side note: A similar thing can be said about the context string, it too breaks the mathematical framework used and cuts through the abstraction layers of what a signature scheme does, but it is pretty easy to ignore or use correctly, so my visceral hatred for the HashML-DSA/HashSLH-DSA variants does not extend quite the same way here. In the end, both boil down to something Cas Cramer said, somewhat jokingly, about the misbinding issues with KEMs: “There should have been a precompetition, where the properties of the candidates for the competition are being set”. We should have clearly defined the APIs as used in practice from the beginning, including being clear about serialization (which causes the misbinding issues), implicit rejection (which causes even more misbinding issues), and prehashing capabilities of signature schemes. That would have avoided the last minute introduction of HashML-DSA, likely caused the use of explicit rejection in KEMs, and made papers like unbindable Kemmy Schmidt unnecessary to be written months before the standards were finalized.

Introduction

Why hybrid signatures are easy

Why hybrid signatures are difficult

Are hybrid signatures schemes actually useful?

ML-DSA mythbusting

State of standard

Prehashing FN-DSA

About those floating points

Variable signature sizes

The math of FN-DSA

Generalities

1st Place: Hash-based Signatures

2nd Place: Lattices

3rd Place: Codes

4th Place: Isogenies

5th Place: Multivariate Cryptography

Bonus: Why RSA and Elliptic Curves both fall together

What is this?

Did the NSA invent ML-KEM?

Okay, but that was Kyber, not ML-KEM, did the NSA change Kyber?

Okay but what about maybe there still being a backdoor?

But didn’t NIST fail basic math when picking ML-KEM?

I thought ML-KEM was broken, something about a fault attack?

But what about decryption failure attacks? Those sound scary!

But wasn’t there something called Kyberslash?

Okay, enough about ML-KEM, what about hybrids and the IETF?

But doesn’t the IETF actively discourage hybrids?

Edited to Add: Bonus Rounds

But won’t IETF/browser/server support open us up to downgrade attacks?

But why does the NSA not like hybrids?

But could you add a backdoor to MLWE, if you wanted to, and how would that look like?

But are cryptographic NOBUS backdoors really the only option?

Groups

A tangent on finite simple groups

Categories

Group Objects

Finding our category

What about finite field Diffie-Hellman?

The Problem

Seeds

Semi-Expanded Private Keys

The Options

On Well-Definedness

Preliminaries

The Attack

A geometric interpretation

Verify

Sign

Bonus: Self verifying public keys

Introduction

What solves polynomial equations

Gröbner bases

Finding Gröbner bases

Multi-variate quadratic equations solving is NP-complete

The Problem

The Problem with the Problem

The Solution (Part 1)

The Solution (Part 2)

The Bad Place