Categories
Cryptography

A very unscientific guide to the security of various PQC algorithms

After publishing my series on UOV, one feedback I got was that my blog posts made people feel more confident in the security of the scheme, because “at least someone is looking into these things”. I don’t necessarily know if that is the takeaway I would make from my posts, but it gave me the idea to write my extremely subjective, and very much biased guesstimates for how secure I consider various approaches and problem families within PQC.

Since unfortunately I do not possess infinite wisdom or the gift of time travel, these are at best informed guesses, and I take no responsibility for being wrong on any of them.

Generalities

There is a somewhat popular saying in cryptography “attacks only get better”. It’s a vacuously true statement, since obviously an attacker will always use the most powerful technique currently known, but I think it is also at least slightly misleading, implying that progress on attacks is not only inevitable, but also somewhat continuous.

Instead, what we are seeing is usually something like this: Initially, when a certain technique is first seriously discussed, attacks come in quickly and parameters have to be adjusted to account for them. With time, as our understanding of the space grows, we tend to refine those attacks, but it is a process of diminishing returns. It is possible that some novel mathematical technique starts a new spurt in advances in attacks, but importantly, there is usually no continuous improvement in attacks.

As an example, if we look at RSA, we first have the naive factoring algorithms such as trial division and Fermat’s method, which predate cryptographic use. Then, in the seventies, they get joined by the first major improvement in the space, Pollard’s rho. In the 80s, we get the quadratic sieve, as the first subexponential algorithm, joined by various lattice methods. Finally in the 90s, more than 30 years ago, we get the current best factoring algorithm, the general number field sieve, a refinement of the quadratic sieve, as well as further improvements on lattice techniques. Quantum algorithms also first enter the scene, with Shor’s algorithm. After that, successes die down substantially, mostly confined to relatively minor improvements to the general number field sieve.

This is not because we stopped working on factoring algorithms, but most of the effort shifted to other targets such as The Montes’ algorithm for factoring polynomials over discrete valuation rings.

If we look at elliptic curves, the story of attacks is even less exciting. There is, to this date, no known generic classical attack against elliptic curves that is better than a space-time traded off version of a brute force search. This is again not because the topic isn’t studied, elliptic curves are one of the most fundamental building blocks of algebraic geometry, and we know them in great depth. In fact, we know them well enough that we can even start to explain this lack of attacks: They are the most generic form of Diffie-Hellman out there.

All in all, this makes our job predicting the future of which algorithm is likely to break and which ones are likely to last, very, very hard. We are not looking at nice, predictable trends, but instead are mostly looking at a process that jumps in huge steps every few decades.

A different view to look at the same trends is to say that a scheme gets more trustworthy every time it survives an attack. From that point of view, attacks that fail teach us something about the scheme itself, adjusting our priors, making it more trustworthy. This is particularly true for attacks that tell us something fundamental about the underlying problem; the more general the attack, the more it can teach us why a scheme is resiliant.

But, now, without further ado, my personal list about how safe I think various approaches to PQC are, together with how familiar I am personally with the space and how much I think it has been studied.

1st Place: Hash-based Signatures

There isn’t much to say about hash-based signatures. They have a security reduction to the properties of the hash function used. Any signature scheme, and pretty much any public key encryption scheme requires a hash function somewhere in its construction, be it to compress the message, act as a random oracle, a key derivation function, or as a one-way function. If we cannot construct a secure hash function, we cannot do cryptography. In fact, if we consistently failed in creating secure hash functions, we would most likely live in a universe where P equals NP.

Hash-based signature schemes have reduction proofs that reduce their security to that of their underlying hash function. As such, hash-based signature schemes are at least as secure as any other asymmetric (or symmetric) cryptographic primitive. They have plenty of drawbacks, but lack of security is not one of them. While I haven’t studied them to great depth, there is also just not much to say about their security. They are secure.

Note that one of the drawbacks that some hash-based signature schemes have is the necessity to keep state (LMS/XMSS). While these schemes are as secure as their hash function if used correctly, the same is not true if the state is not managed correctly, i.e. if one-time-signatures are used more than once. While I have extremely high confidence in the mathematics of hash-based signatures, I also have extremely low confidence in our collective ability to not corrupt state once in a while.

2nd Place: Lattices

It is hard to overstate my confidence in lattices. General lattices, such as used in FrodoKEM, being broken is pretty much all but equivalent to proving P = NP, at which point all cryptography vanishes (since symmetric cryptography reduces to boolean satisfiability very easily), and it is time to find another career.

Lattices feature heavily in arithmetic number theory, as they arise very naturally when studying number fields. As such, lattice algorithms are actually far more central to mathematics than factoring algorithms. The number of problems an efficient lattice reduction algorithm solves is far higher than that of an efficient factoring algorithm. The main reason for that is that lattice problems are the simplest form of Diophantine equation problem, the linear Diophantine equation. You can see an example of this in one of my previous blog posts. This makes lattice reduction one of the most useful algorithm to calculate pretty much about anything in discrete mathematics.

Far from being constrained to just algebraic number theory, they also show up in algebraic geometry, in the description of Abelian varieties over the complex numbers. Or, as it turns out, p-adic numbers, as studied in my PhD thesis. Given how central they are to mathematics, I would be extremely surprised if someone, somehow, found a way to improve on generic lattice reduction. Even when it comes to quantum algorithms, lattice reduction is probably one of the most studied one, and so far, no generic improvement has been found, and several fundamental looking obstructions have been identified.

Lattices, as a mathematical object, have been studied pretty much for the same time as elliptic curves have been, since both arise from the same underlying questions about the circumference of an ellipsis. In this study, certain integrals arise naturally, defining a function that has two periods in the complex plane. In other words, functions that can be seen as defined on the complex numbers modulo a lattice. And the simplest of these functions \wp, obeys a differential equation \wp'^2 = 4\wp^3 + g_2\wp + g_3. In other words, \wp and its derivative define a elliptic curve.

In cryptography, lattices also have been studied about as long as elliptic curve have. First as an attack, due to their mentioned ability to solve Diophantine equations, and soon after as cryptosystem themselves, by increasing the lattice rank to the point that the reduction becomes impossible to compute. The main reason you might not have heard of them before is their generally larger overhead compared to elliptic curves and RSA, making them unappealing in a world where elliptic curves and RSA are unbroken.

But we are not using generic lattices, we are specifically using module lattices. Those are the lattices coming from number field orders. A number field is a field extension of \mathbb{Q} (such as adding the imaginary unit i to the rational numbers), and an order in such a number field is a generalization of the integers (such as adding the imaginary unit i to the integers, to obtain the number field order called the Gaussian integers). These number field orders are canonically lattices themselves, and any finitely generated module (I.e. vector space, but for rings) over them is again a lattice in a canonical way.

If there is a break of ML-KEM or ML-DSA, my money would be on exploiting this additional structure. However, even when it comes to this additional structure, it is very well understood and studied.

Looking at MLWE and NTRU specifically, both problems are deeply related to the p-adic rational reconstruction problem. In the case of MLWE, we need to switch to RLWE, but a number field order can be seen as a module over an order of some subfield, so this doesn’t really change the picture all that much.

So what is the rational reconstruction problem? Recall that, in order to attack LWE, we needed to find s, e such that As + e = t \, \text{mod} \, q, which mainly boils down to describing the kernel, the solutions to As + e = 0\, \text{mod} \,q. For RLWE (or indeed, for NTRU), we need to switch to a number field order, which we mainly do by replacing the capital A with a lower case a. We can, of course, without much consequence, switch the sign of the error term, and write as - e = 0 \, \text{mod} \, q, for the lattice we need to reduce. With a slight reordering, this is equivalent to a = e/s \, \text{mod} \, q. Since e and s are small in some metric, this means that what we are asking is given a fraction with bounded numerator and denominator, which is only known modulo some ideal (or more generally a number of finite places), find the numerator and denominator.

We all know this problem when we replace the finite places with infinite places, especially over \mathbb{Q}, albeit usually less dressed up in formal mathematics lingo: This is the question of which fraction fits best with some given limited precision decimal expansion, such as the question of whether an output of 1.666 came from an actual result that was 5/3, or 1666/1000.

This problem (over finite places, i.e. modulo a prime) arises relatively naturally when studying number fields, and the only way we know for solving it is lattice reduction.

This is a very common pattern in arithmetic number theory, you usually take problems that arise there and reformulate them until you can express them as a lattice problem, and then proceed to reduce the lattice when the number field is small enough. The opposite, where you can use the number theoretic properties of the number field to say something about a lattice without reducing it on the other hand is very rare.

That being said, we are not using a random number field when it comes to lattice cryptography, but a fairly small set of very specific ones, which have properties that are not usually encountered in many number fields, such as having a class number of 1, and an easy to calculate group of units (up to some finite cofactor easy to calculate, that is, but still this is usually a hard lattice problem for a random number field, but is easy for the cyclotomic fields heavily ramified over 2 that we want for our cryptographic purposes).

That being said, even with these blemishes, when it comes to module lattice cryptography, we are talking about a very well understood and explored part of mathematics, that should be very safe to use for cryptographic purposes.

3rd Place: Codes

I know a lot less about codes than I do about lattices, I’ve always considered them as the smaller sibling of lattices. Both schemes fundamentally work via underdetermined linear systems, where the solution has certain special properties. Being small in the case of lattices, and having lots of zeroes (i.e. being small in the Hamming metric) in the case of codes. Their construction has many similarities, to the point that code based cryptography can be attacked with the same lattice reduction techniques that lattice cryptography has to deal with. Compared to lattices, codes are far less central to mathematics, but whether that is a good or a bad thing is hard to say. But really, I haven’t studied codes to any necessary detail to have much of an opinion on them, other than that they are fine, probably, at least as long as lattices are fine. They are also less efficient than lattices in pretty much all of their instantiations, and at least I do not know how to think of them as a more general mathematical problem (akin to the p-adic rational reconstruction problem that governs MLWE/NTRU).

4th Place: Isogenies

Now to a bit of a controversial placement: Isogenies. What, even though SIKE was broken? Yeah, well obviously I don’t place SIKE at 4th place, it’s somewhat lower, right above Vigenère ciphers, and only because the attack is more interesting.

SQISign on the other hand is a different story. The main reason to place it ever so slightly above multivariate cryptography in my opinion is that we much better understand the underlying hard problem and how it relates to the scheme itself.

I am not ashamed to admit that I have a bias towards pretty mathematics, and SQISign does some of the most beautiful mathematics I know of. That being said, the scheme is for now too slow to actually be used in practice, and while it can be reduced to the endomorphism problem, we cannot currently rule out that the endomorphism problem ends up being easy, especially given that it is far less central to mathematics than lattices are. It has been studied somewhat extensively, though, but I am somewhat worried that the best experts on the endomorphism problem in algebraic geometry are just now slowly even learning about the existence of isogeny based cryptography. After all, the SIKE attack is based on a theorem discovered in 1997, and yet wasn’t discovered until 2022, showing a huge gap between academic algebraic/arithmetic geometry and cryptographers working on isogeny based crypto.

5th Place: Multivariate Cryptography

I’ve written a whole series on Unbalanced Oil and Vinegar, probably the most basic of the multivariate schemes. Since then, a new attack has come out, leveraging wedge products. While the attack is far from catastrophic, it also feels very arbitrary, similar to the Kipnis–Shamir attack on Balanced Oil and Vinegar, it seems to me that we are missing something to really have a full understanding of the space.

Humorously enough, even before the paper, I had tried unsuccessfully to attack UOV using wedge products, more precisely I tried to figure out if there is a structure in the cotangent space that can be exploited, so the fact that wedge products were a meaningful attack vector is not surprising per se, but still, if we want to trust UOV, we need to, in my opinion, have a better understanding of what the hard problem here actually is.

It is easy to point to Gröbner bases here, but in my opinion the gap from generic Gröbner basis computation to the specific UOV problem is quite large. While all NP-complete problems necessarily reduce to each other, reducing to a Gröbner basis computation is one of the easier reductions, just like you can reduce a computer program to a boolean circuits satisfiability problem by literally translating the instructions, you can reduce a problem about polynomials to a Gröbner basis computation.

One thing that particularly stands out to me about Multivariate Cryptography is that variations that have tried to reduce the size of the public key ended up broken quite often. To me, there is something missing about fully understanding what makes this problem hard to fully trust it, but my progress in understanding the problem space better has at least given me a glimpse of why basic UOV should be secure.

That being said, realistically, I should place them above isogenies, mostly because we have had more survived attacks in this space, but this my list, and if it doesn’t contain at least one upsetting placement, it wouldn’t be very subjective now, would it?

Bonus: Why RSA and Elliptic Curves both fall together

One question that I got asked recently was why RSA and elliptic curves, while looking so different as cryptosystems, are both susceptible to Shor’s attack, when all these other schemes barely spend a word talking about why Shor’s does not apply to them. While it is true that at first glance, RSA and elliptic curves do look very different, they are actually far more related than one might think, some of it is even already visible in classical attacks.

As I described in my post on why elliptic curves are really the only option for discrete logarithm problems, elliptic curves contain the multiplicative discrete logarithm as a subcase (at least if you allow for stable models). And for multiplicative discrete logarithm problems, we already have the same attacks working on RSA and DLOG. From that perspective it might be less surprising that an attack that is polynomial on RSA also solves ECC.

More concretely, the thing that Shor’s algorithm actually solves is the Abelian Hidden Subgroup problem: Given a group G, a function f \colon G \to X is said to hide the subgroup H of G if f is constant on each coset, but different for different cosets. In particular, if H is a normal subgroup, this means that f is defined and injective on G/H. The hidden subgroup problem is Abelian if the group in question is Abelian. This is a bit of a mouthful, so let’s look at a trivial example first, using \mathbb{Z} as our group and try to hide 3\mathbb{Z} as a subgroup. A function would hide this subgroup if it has a different value on the cosets, for example, if the function was just the value of the integer modulo 3. For a slightly more interesting function, which actually meaningfully hides something, we can look at the world of variant Sudoko, where we often see the concept of a modular line or modular mirror or similar, which requires certain digits to have the same residue mod 3 (For example this one or that one). Solving these puzzles is usually done by coloring the corresponding digits in one of three colors, indicating the residue class mod 3. Importantly, it is (at least initially), not known which color corresponds to which residue class, which starts to show why the function is considered hiding this subgroup. Of course, even if you just mapped integers to colors, the hidden subgroup would still be pretty easy to find by anyone who can count to three (and importantly, solving the Sudoko has nothing to do with solving the hidden subgroup problem), but you can imagine that for a larger modulus, this becomes an actually hard problem.

While not necessary, it is very useful to know the classification problem for Abelian groups when looking at this question for Abelian groups in particular. All finitely generated Abelian groups can be written as the product \mathbb{Z}^r \times \mathbb{Z}/m_1\mathbb{Z} \times \mathbb{Z}/m_2\mathbb{Z}\times \dots \times \mathbb{Z}/m_n\mathbb{Z}, where m_1 | m_2 | \dots | m_n. Knowing this means we know very well how, at least in theory, any subgroup of an Abelian group looks like, which is going to make the next bits a bit easier to grasp in their generalities.

Knowing that Shor’s algorithms can solve the Abelian Hidden Subgroup problem, and now knowing what the Abelian Hidden Subgroup problem is, all that is left to do is to show where the subgroup is hiding, for both RSA and elliptic curves. As discussed, elliptic curves are more or less the most generic of all DLOG groups, so we don’t really need to concern ourselves with the intrinsics of how elliptic curves work, and can instead just take a generic group G (and as a bonus, this allows me to use multiplicative notation without feeling dirty). In fact, let’s start with DLOG.

So given two elements a, b \in G, we are looking for k such that a^k = b. Instead of working with G as domain, we use two copies of \mathbb{Z}/n\mathbb{Z}, and define our function f \colon \mathbb{Z}/n\mathbb{Z} \times \mathbb{Z}/n\mathbb{Z} \to G as (m_1, m_2) \mapsto a^{m_1}b^{-m_2}. Since b=a^k, this is equal to (m_1, m_2) \mapsto a^{m_1-k\cdot m_2}, i.e. it’s a linear transform on \mathbb{Z}/n\mathbb{Z} \times \mathbb{Z}/n\mathbb{Z} followed by a discrete exponentiation.

But the discrete exponentiation is a group isomorphism, so we can basically ignore it for the purposes of hidden groups, since the hidden group definition does not really care about the range of the function to begin with. As a linear function, it is easy to see where f maps to the unit, namely exactly for vectors generated by (k, 1).

Since f is a group homomorphism, we can use the group isomorphism theorem to know that f is constant on each of the cosets and injective on the quotient, i.e. f hides an Abelian subgroup. Applying Shor’s algorithm, and obtaining a generator of this subgroup, we can recover k, since all elements of this subgroup have the form (k\cdot m, m).

Reformulating RSA into an Abelian Hidden Subgroup problem is even easier: The security of RSA is build on the attacker not knowing the order of the group, since the order of \left(\mathbb{Z}/n\mathbb{Z}\right)^\times is \varphi(n) = (p-1)(q-1), from which we can recover n’s factors p and q easily. So how is order finding an Abelian Hidden Subgroup Problem? Just take a random element a\in \left(\mathbb{Z}/n\mathbb{Z}\right)^\times and define f as f \colon \mathbb{Z} \to \left(\mathbb{Z}/n\mathbb{Z}\right)^\times\;;\;x\mapsto a^x. This function has the same result exactly for all the multiples of the order of a, in other words it hides ord(a)\mathbb{Z} as a subgroup of \mathbb{Z}. And the order of an element is always a divisor of the order of a group, so we can use this to find factors of n.

Hidden Subgroup Problems are more general than just this, and are mostly just a framework to restate problems to. In fact, we can restate lattice reduction as a hidden dihedral subgroup problem. But importantly, quantum computers are really good at operating on Abelian groups, but have, at least so far, have not shown any success whatsoever on non-Abelian groups. This does make sense, given their construction, and gives us some data on why lattices have withstood quantum cryptanalytic attacks so far.

Categories
Cryptography

ML-KEM Mythbusting

What is this?

There have been some recent concerns about ML-KEM, NIST’s standard for encryption with Post-Quantum Cryptography, related standards of the IETF, and lots of conspiracy theories about malicious actors subverting the standardization process. As someone who has been involved with this standardization process at pretty much every level, here a quick debunking of the various nonsense I have heard. So let’s get started, FAQ style.

Did the NSA invent ML-KEM?

No. It was first specified by a team of various European cryptographers, whom you can look up on their website.

Okay, but that was Kyber, not ML-KEM, did the NSA change Kyber?

No. The differences between Kyber and ML-KEM are pretty minute, mostly editorial changes by NIST. The only change that could be seen as actually interesting was a slight change to how certain key derivation mechanics worked. This change was suggested by Peter Schwabe, one of the original authors of Kyber, and is fairly straightforward to analyze. The reason for this change was that originally, Kyber was able to produce shared secrets of any length, by including a KDF step. But applications usually need their own KDF to apply to shared secrets, in order to bind the shared secret to transcripts and similar, so you would end up with two KDF calls. Since Kyber only uses the KDF to stretch the output, removing it slightly improves the performance of the algorithm without having any security consequences. Basically, there was a feature that turned out to not actually be a feature in real world scenarios, so NIST removed it, after careful consideration, and after being encouraged to do so by the literal author of the scheme, and under the watchful eyes of the entire cryptographic community. Nothing untoward happened here.

Okay but what about maybe there still being a backdoor?

There is no backdoor in ML-KEM, and I can prove it. For something to be a backdoor, specifically a “Nobody but us backdoor” (NOBUS), you need some way to ensure that nobody else can exploit it, otherwise it is not a backdoor, but a broken algorithm, and any internal cryptanalysis you might have will be caught up eventually by academia. So for something to be a useful backdoor, you need to possess some secret that cannot be brute forced that acts as a private key to unlock any ciphertext generated by the algorithm. This is the backdoor in DUAL_EC_DRBG, and, since the US plans to use ML-KEM themselves (as opposed to the export cipher shenanigans back in the day), would be the only backdoor they could reasonably insert into a standard.

However, if you have a private key, that cannot be brute forced, you need to have a public key as well, and that public key needs to be large enough to prevent brute-forcing, and be embedded into the algorithm, as a parameter. And in order to not be brute forceable, this public key needs to have at least 128 bits of entropy. This gives us a nice test to see whether a scheme is capable of having cryptographic NOBUS backdoors: We tally up the entropy of the parameter space. If the result is definitely less than 128 bits, the scheme can at most be broken, but cannot be backdoored.

So let’s do that for ML-KEM:

This is the set of parameters, let’s tally them up, with complete disregard for any of the choices being much more constrained than random integers would suggest (actually, I am too much of a nerd to not point out the constraints, but I will use the larger number for the tally).

  • Degree of the number field: 8 bits (actually, it has to be a power of two, so really only 3 bits)
  • Prime: 12 bits (actually, it has to be a prime, so 10.2 bits (Actually, actually, it has to be a prime of the form k\cdot n + 1, and it has to be at least double the rank times degree, and 3329 is literally the smallest prime that fits that bill))
  • Rank of the module: 3 bits (well, the rank of the module is the main security parameter, it literally just counts from 2 to 4)
  • Secret and error term bounds: 2 + 2 bits (really these come from the size of the prime, the module rank, and the number field degree)
  • Compression strength: 4 + 3 bits

In total, this gives us 34 bits. Counted exceedingly generously. I even gave an extra bit for all the small numbers! Any asymmetric cryptosystem with a 34 bit public key would be brute forceable by a laptop within a few minutes, and ML-KEM would not be backdoored, but rather be broken. There is no backdoor in ML-KEM, because there simply is no space to hide a backdoor in ML-KEM.

And just to be sure, if you apply this same counting bits of parameters test to the famously backdoored DUAL_EC_DRBG, you indeed have multiple elliptic curve points defined in the standard without any motivation, immediately blowing our 128 bits of entropy budget for parameters. In fact, it would be trivial to fix DUAL_EC_DRBG by applying what’s called a “Nothing up my sleeves” paradigm: Instead of just having the elliptic curves points sit there, with no explanation, make it so that they are derived from digits of π, e, or the output of some hash function on some published seed. That would still not pass our test, but that is because I designed this test to be way too aggressive, as the remarks in the brackets show, there is not really any real choice to these parameters, they are just the smallest set of parameters that result in a secure scheme (making them larger would only make the scheme slower and/or have more overhead).

So no, there is no backdoor in ML-KEM.

But didn’t NIST fail basic math when picking ML-KEM?

No. In fact, I wrote an entire blog post about that topic, but “no” is an accurate summary of that post.

I thought ML-KEM was broken, something about a fault attack?

There are indeed fault attacks on ML-KEM. This is not super surprising, if you know what a fault attack (also called glitch attack) is. For a fault attack, you need to insert a mistake – a fault – in the computation of the algorithm. You can do this via messing with the physical hardware, things like ROWHAMMER that literally change the memory while the computation is happening. It’s important to analyze these types of failures, but literally any practical cryptographic algorithm in existence is vulnerable to fault attacks.

It’s literally computers failing at their one job and not computing very well. CPU and memory attacks are probably one of the most powerful families of attacks we have, and they have proven to be very stubborn to mitigate. But algorithms failing in the face of them is not particularly surprising, after all, if you can flip a single arbitrary bit, you might as well just set “verified_success” to true and call it a day. Technically, this is the strongest form of fault, where the attacker choses where it occurs, but even random faults usually demolish pretty much any cryptographic algorithm, and us knowing about these attacks is merely evidence of an algorithm being seen as important enough to do the math of how exactly they fail when you literally pull the ground out beneath them.

But what about decryption failure attacks? Those sound scary!

ML-KEM has a weird quirk: It is, theoretically, possible to create a ciphertext, in an honest fashion, that the private key holder will reject. If one were to successfully do so, one would learn information about the private key. But here comes the kicker: The only way to create this poisoned ciphertext is by honestly running the encapsulation algorithm, and hoping to get lucky. There is a slight way to bias the ciphertexts, but to do so, one still has to compute them, and the advantage would be abysmal, since ML-KEM forces the hand of the encapsulating party on almost all choices. The probability of this decapsulation failure can be computed with relatively straight-forward mathematics, the Cauchy-Schwartz inequality. And well, the parameters of ML-KEM are chosen in such a way that the actual probability is vanishingly small, less than 1:2^{100}. At this point, the attacker cannot really assume that they were observing a decapsulation failure anymore, as a whole range of other incredibly unlikely events, such as enough simultaneous bit flips due to cosmic radiation to evade error detection are far more likely. It is true that after the first decapsulation failure has been observed, the attacker has much more abilities to stack the deck in their favor, but to do so, you first need the first failure to occur, and there is not really any hope in doing so.

On top of this, the average ML-KEM key is used exactly once, as such is the fate of keys used in key exchange, further making any adaptive attack like this meaningless, but ML-KEM keys are safe to use even with multiple decapsulations.

But wasn’t there something called Kyberslash?

Yeah. It turns out, implementing cryptographic code is still hard. My modest bragging right is that my implementation, which would eventually morph into BoringSSL’s ML-KEM implementation, never had this problem, so I guess the answer here is to git gud, or something. But really, especially initially, there are some rough edges in new implementations as we learn the right techniques to avoid them. Importantly, this is a flaw of the implementation, not of the mathematics of the algorithm. In fact, the good news here is that implementationwise, ML-KEM is actually a lot simpler than elliptic curves are, so these kinds of minor side channel issues are likely to be rarer here, we just haven’t implemented it as much as elliptic curves.

Okay, enough about ML-KEM, what about hybrids and the IETF?

Okay, this one is a funny one. Well funny if you likely deeply dysfunctional bikeshedding, willful misunderstanding, and drama. First of, what are hybrids? Assume you have two cryptographic schemes that do the same thing, and you distrust both of them. But you do trust the combination of the two. That is, in essence, what hybrids allow you to do: Combine two schemes of the same type into one, so that the combined scheme is at least as secure as either of them. The usual line is that this is perfect for PQC, as it allows you to combine the well studied security of classical schemes with the quantum resistance of PQC schemes. Additionally, the overhead of elliptic curve cryptography, when compared with lattice cryptography, is tiny, so why not throw it in there. And generally I agree with that stance, although I would say that my trust in lattice cryptography is pretty much equal to my trust in elliptic curves, and quite a bit higher than my trust in RSA, so I would not see hybrids as absolutely, always and at every turn, superduper essential. But they are basically free, so why not? In the end, yes, hybrids are the best way to go, and indeed, this is what the IETF enabled people to do. There are various RFCs to that extent, to understand the current controversy, we need to focus on two TLS related ones: X25519MLKEM768 aka 0x11EC, and MLKEM1024. The former is a hybrid, the latter is not. And, much in line with my reasoning, 0x11EC is the default key exchange algorithm used by Chrome, Firefox, and pretty much all other TLS clients that currently support PQC. So what’s the point of MLKEM1024? Well it turns out there is one customer who really really hates hybrids, and only wants to use ML-KEM1024 for all their systems. And that customer happens to be the NSA. And honestly, I do not see a problem with that. If the NSA wants to make their own systems inefficient, then that is their choice. Why inefficient? It turns out that, due to the quirks of how TLS works, the client needs to predict what the server will likely accept. They could predict more things, but since PQC keys are quite chonky, sending more than one PQC key is making your handshakes slower. And so does mispredicting, since it results in the server saying “try again, with the right public key, this time”. So, if everyone but the NSA uses X25519MLKEM768, the main effect is that the NSA has slower handshakes. As said, I don’t think it’s reasonable to say their handshakes are substantially less secure, but sure, if you really think ML-KEM is broken, then yes, the NSA has successfully undermined the IETF in order to make their own systems less secure, while not impacting anyone else. Congratulations to them, I guess.

But doesn’t the IETF actively discourage hybrids?

No. To understand this, we need to look at two flags that come with TLS keyexchange algorithms: Recommended and Mandatory To Implement. Recommended is a flag with three values, Yes, No, and Discouraged. The Discouraged state is used for algorithms known to be broken, such as RC4. Clearly ML-KEM, with or without a hybrid, is not known to be broken, so Discouraged is the wrong category. It is true that 0x11EC is not marked as Recommended, mostly because it started out as an experimental combination that then somehow ended up as the thing everybody was doing, and while lots of digital ink was spilled on whether or not it should be recommended, nobody updated the flag before publishing the RFC. (technically the RFC is not published yet, but the rest is pretty much formality, and the flag is unlikely to change) So yes, technically the IETF did not recommend a hybrid algorithm. But your browsers and everybody else is using it, so there is that. And just in case you were worried about that, the NSA option of MLKEM1024 is also not marked as recommended.

Lastly, Mandatory To Implement is an elaborate prank by the inventors of TLS to create more discussions on mailing lists. As David Benjamin once put it, the only algorithm that is actually mandatory to implement is the null algorithm, as that is the name of the initial state of a TLS connection, before an algorithm has been negotiated. Otherwise, at least my recommendation, is to respond with this gif

whenever someone requests a MTI algorithm you don’t want to support. The flag has literally zero meaning. Oh and yeah, neither of the two algorithms is MTI.

Edited to Add: Bonus Rounds

I got a few additional questions after this blog post was published, so I’m adding them here.

But won’t IETF/browser/server support open us up to downgrade attacks?

No. TLS key negotiation is robust against downgrade attacks. If your browser is some 10 years out of date, it might be vulnerable to downgrade attacks to SSL3, but in that case, I would also not worry about PQC.

In more technical detail, both the client’s list of supported ciphers and the server’s pick of cipher are added to the key derivation, so any manipulation of these choices will not result in a successfully negotiated TLS 1.3 key. TLS 1.3’s downgrade protections are even more fun, a server that could support TLS 1.3, but is negotiating TLS 1.2 needs to set a specific server random, to signal to the client that it would really have preferred TLS 1.3. Combined, this protects TLS 1.3 key negotiations, and so adding additional key agreements that neither the clients nor the server prefers will never be picked.

But why does the NSA not like hybrids?

This is a point I initially left out because all I can tell you is what the NSA says are their reasons to not like hybrids, and evaluate whether or not they are sensible, but I cannot tell you whether these are the actual reasons.

When it comes to their stated reasons, we have basically three that come up again and again:

  • Fear of combinatorial explosion leading to interoperability issues.
  • Lack of need due to using double encryption already.
  • A wish to signal to the rest of the world (including, apparently, some parts of the US government itself), that the NSA trusts lattice cryptography.

When it comes to the first point, their assessment is somewhat half right: We have seen a huge combinatorial explosion because people desperately wanted support for their favorite classical scheme and their favorite PQC scheme, leading to somewhat ridiculous numbers of hybrids defined by the LAMPS, CFRG and PGP working groups. However, TLS key negotiation has not fallen into this trap. There is only one hybrid algorithm, and everybody seems to like and use it. Note that CNSA2, the guidance by the NSA that favorites not using hybrids, well predates even the first experiments with Kyber that eventually morphed into 0x11EC.

For the second point, it is important to know that apparently, in case of very sensitive information, it already is common practice for intelligence operations to encrypt twice, with stacks implemented by completely separate vendors. This makes sense, it protects both against insider threats and against implementation mistakes, albeit at a substantial performance penalty (I don’t know if they just don’t have performance critical super sensitive stuff). With this capability already being present, you can create a hybrid cipher by having one layer classical and the other one PQC, and do not need a hybrid key exchange.

The last point is something that is more between the lines and has come up in some more informal conversations: The NSA needs to show that they, and the oodles of mathematicians working for it, actually trust the PQC algorithms. While for any other organization using hybrids might just seem like a prudent choice, if the NSA does so, we would rightfully question whether they actually trust PQC enough for it to be deployed at all. By mandating the use of pure lattice based cryptography, but only for their own systems, they show that they are willing to put their money where their mouth is. In the end, this means that if lattice cryptography/ML-KEM turns out to be broken it is their systems only that are at risk.

As I said, these to me are reasonable arguments, but they of course do not mean that they are the actual motivations of the organization. Also keep in mind, that, like any large organization, there is not necessarily a single NSA opinion, but rather just different individuals that have different motivations

But could you add a backdoor to MLWE, if you wanted to, and how would that look like?

Okay, nobody asked this, but I’m going to answer it anyways, since this is my blog post.

MLWE is the underlying mechanism behind ML-KEM, the more general mathematical description of the problem instead of the very concrete description of which bytes go where. And if you wanted to backdoor it, you actually could: I’m going to describe this for RLWE, the ring version of MLWE, the principle is the same, but for rings it’s easier to write up. The public key a, t for RLWE is computed as t = as + e with a being any number field integer, and s, e being short number field integers. To attack this, one first constructs the lattice consisting of all pairs (p, q) of number field integers that fulfill 0 = ap + q, and finds the closest lattice vector in this lattice to a special solution t = as' + e'. The difference of this closest lattice vector from our special solution (s',e') is by definition of closest lattice point, short, and is equal to the original secret (or at least equivalent in power to it). The security of RLWE comes from the fact that this closest vector computation is hard for a randomly chosen a. So if you wanted to backdoor RLWE, all you need to do is to not chose a at random, and first chose short p, q such that a = -q/p modulo the prime. To any external party, it would be impossible to distinguish this type of a from a genuinely created one, and the attacker could break any public key created from it (In fact, this is exactly how NTRU, a slightly different lattice based cryptography scheme, works).

So if you wanted to backdoor an MLWE type algorithm, all you would need to do is to specify a single matrix A (the higher dimensional equivalent of our integer a of RLWE) to be used for all keygen. This would be secure if the matrix was created randomly, and it would be impossible to distinguish a randomly created matrix from one containing the backdoor, unless all of lattice cryptography was broken, unless a nothing-up-my-sleeves transform was used, such as hashing a public seed to the matrix.

But: This would fail the test I outlined above; The matrix A has 4 to 16 number field integer elements modulo (3329) (the ML-KEM prime), each having 256 elements (the number field degree), so such a parameter would add whopping 12288 to 49152 bits to our tally, showing that our test would successfully detects this kind of problem.

In case you wondered, the way ML-KEM choses it’s matrix is simple: Each public key gets their own, random matrix. Only the person running the keygen could potentially backdoor the matrix, and they have the private key anyways. And actually, they cannot even do that, since ML-KEM compresses the matrix by simply giving a derivation seed as the public key.

But are cryptographic NOBUS backdoors really the only option?

This was a question that I received several times. Fine, I proved that there cannot be a cryptographic NOBUS backdoor, but is that really the only option. In short, yes.

I think what threw people off here is that implementations can have much more sophisticated backdoors, since no test can guarantee the correct response to all queries, only to the queries tested. However, algorithms are different.

Any backdoor you want to insert into an algorithm needs to be inserted in the mathematical description of the algorithm itself, since anyone can chose to implement a given algorithm from scratch and expect it to interoperate. And this leaves you with only two options: A backdoor that theoretically anyone can find and exploit, aka a broken algorithm, or a backdoor that, even if found, can only be exploited by the person who put it there, aka a NOBUS backdoor. The former would make the algorithm unfit for your own use, as any such backdoor is likely to be discovered by interested parties sooner rather than later, given the intense scrutiny these algorithms are put under. The second type has to have some form of secret with enough entropy, otherwise an attacker can simply try all possible values that the backdooring party might possess to find the correct value, in other words, the backdoor would again be of the first type.

Categories
Cryptography Cryptography (Incomprehensible)

There is no Diffie-Hellman but Elliptic Curve Diffie-Hellman

When I first learned about Diffie-Hellman and especially elliptic curve Diffie-Hellman, I had one rather obvious question: Why elliptic curves? Why use this strange group that seems rather arbitrary, with its third intersection of a line and then reflected? Why not use, say, the Monster Group? Surely a monster is better equipped to guard your secrets than some curve thing named after, but legally distinct from, a completely different curve thing!

I wouldn’t have a satisfying answer to this question until way into my graduate studies, and the answer makes a perfect blog post for the “Cryptography (Incomprehensible)” category of this blog, so here we go.

Groups

First, as a recap, what do we mean by Diffie-Hellman? Well, we need a group G, and element of this group g \in G with some order p, so p is the smallest positive integer with g^p = e where e \in G is the neutral element of the group. The we take our private key s \in \mathbb{Z}/p\mathbb{Z} and compute our public key as p = g^s. We can now compute a shared secret with some other public key q = g^r as K = q^s = g^{rs} = g^{sr} = p^r.

This should work for any group G, no? So why doesn’t it work for the monster? Let’s take a closer look at the map that computes public keys from private keys (in fact, the rest of this blog will only focus on that map, the whole key exchange thing doesn’t really matter much, you could also compute Schnorr signatures or whatever). Given any group element g \in G, we can always create the exponentiation map \mathbb{Z} \to G\colon k \mapsto g^k. This map is a group homomorphism and is not necessarily injective. In fact, for finite groups, it is never injective. There is an old trick in algebra, where you can make any group homomorphism injective, by dividing the domain by its kernel. The kernel in question here is the ideal (p) of all multiples of our order p, so setting our key space to \mathbb{Z}/p\mathbb{Z} is a sensible first step with no drawbacks, you don’t really get new keys by allowing all integers. The map is also not necessarily surjective, though. In fact, image of the map is exactly \langle g \rangle, the subgroup of G consisting of all multiples of the generator g (This is why we call g the generator, it generates the whole thing).

So we can somewhat simplify our private to public map by looking at \mathbb{Z}/p\mathbb{Z} \to \langle g \rangle \colon k \mapsto g^k. With that our private to public key map is not just a group homomorphism, but also injective and surjective, in other words, a group isomorphism.

A tangent on finite simple groups

This gives us the first answer to “why not the monster group?”, although it is an answer that leads to even more questions. To get to that answer we have to ask an even deeper question, namely, what groups even are there? We want to work on a computer, so finite would be nice, and we have heard of small subgroup attacks, so something without normal subgroups, please. (If you don’t know what a normal subgroup is, it’s things you can divide out of a group). A finite simple group, in other words.

One of the most impressive results in 20th century mathematics has to be the classification of finite simple groups. The (not quite) ultimate answer to the question of “what groups even are there?”

To classify groups, we need to figure out what we mean by two groups being the same. Clearly, the numbers on a clock and \mathbb{Z}/12\mathbb{Z} are the same thing, even if the clock uses Roman numerals. So simple relabeling of group elements should not count as a separate thing. In other words, we want to classify finite simple groups up to isomorphism.

The answer to this classification problem is absolutely mind boggeling: First, there are infinite families of groups. For example C_p, the cyclic groups with prime order. Another such family are the alternating groups, from A_5 onwards. There are 16 more such infinite families, the others being said to be of “Lie type”. And then there are 26 groups that just, sort, of hang out by themselves. They are not part of an infinite family, they just exist, and there is no way to make them bigger or anything. The monster group mentioned above is the largest of them, and I totally did not chose it in the intro because it has a cool name and let’s me go on a tangent about sporadic groups.

But with this classification out of the way, we can now spot a problem (we could have spotted the problem before stating the classification result, but come on, it’s such a cool result!). Namely that, in order to even answer the question “what groups are there”, we had to say that we want to look at groups up to isomorphism. But our private to public map is a group isomorphism! To a group theorist, the idea that \mathbb{Z}/p\mathbb{Z} somehow differs from \langle g \rangle is preposterous, they both clearly are just C_p the cyclic group of order p!

Categories

So, uhm, we kinda – accidentally – by asking the innocent question why we use elliptic curves instead of any other group for Diffie-Hellman, have blown a wide hole in the entire concept of Diffie-Hellman. If the private key space and the public key space are one and the same, then how does Diffie-Hellman even exist? How can it be secure, if we cannot even distinguish private keys from public keys?

Well, we are already talking to the group theorists, and are having an existential crisis of sorts, so why not just give in and let them pull us deeper, into the realm of the category theorists? Things can’t possibly get worse, can they?

A category is a class of objects, such that for any two objects X, Y there is a class Hom(X, Y) of homomorphisms between X and Y. Homomorphisms are just that, elements of this class. We are generous enough to require there to be an identity homomorphism in Hom(X, X) and a composition operations, where the existence of f \in Hom(X, Y) and g \in Hom(Y, Z) gives you a homomorphism h = g \circ f, but that’s about it. In particular, we do not even necessarily require our objects to be sets. We definitely do not require our homomorphisms to be functions. In fact, just so that nobody gets any stupid ideas, when we don’t call them homomorphisms, we call them arrows. A category is really more like a directed graph. A graph that has so many vertices that they do not necessarily fit in a set any more, and where two vertices might have so many edges between them that they don’t fit either. This is why we said class above, a class is a collection of things that is potentially larger than any set. For example, there is a class of all sets, but there cannot be a set of all sets, because nobody knows how to shave them or something. In fact, sets, written as SET is an example of a category. The objects are sets, and homomorphisms are the functions between those sets.

In order to not fully drift off into space, we will only look at locally small categories. Locally small categories are categories where the homomorphisms actually do form a set. Which makes life a whole lot simpler, and pretty much encompasses everything we want to know about anyways.

So aside from SET, what are examples of categories? Why does this kinda thin definition turn out to be so useful that is sprung force an entire field of mathematics, with so much abstract nonsense? Well, it turns out, pretty much every thing can be seen as a category. Graphs? Category. Groups? Category. Rings? Category. Fields? Category. Topological spaces? Category. Analytic varieties? Category. Algebraic varieties? You might have guessed it, also a category. Vector spaces? Category. Modules? Category.

It turns out, if all you require from things is that they somehow have some homomorphisms between them that vaguely act like functions, you can pull in pretty much every mathematical object known to humanity. But is it useful? We didn’t even define our objects to be sets, yet all the examples I gave above are sets. Surely something more concrete than an absurdly infinite directed graph would be better? It turns out, categories are immensely useful. You do have to redefine terms in somewhat awkward ways, but once you have done that you can make statements that are true for fields, rings, groups, vector spaces, topological space, etc all at once. For example, we no longer can define a kernel as all the elements mapping to zero, because remember, objects aren’t necessarily sets, so they do not necessarily have elements, so instead we have to say fancy words like “the kernel of a homomorphism f in a locally small category with zero homomorphisms is a homomorphism k that when composed with f is zero, and is universal in that property”. We won’t quite need that full apparatus today (In particular, we won’t need universal properties), but we do need the slightly simpler concept of initial and terminal objects. An initial object I \in \mathcal{C} is an object such that for any X \in \mathcal{C}, Hom(I, X) has exactly one element. A terminal object T \in \mathcal{C} is the dual object, i.e. for every X \in \mathcal{C} we have exactly one element in Hom(X, T). If an object is both initial and terminal we call this object a zero object, and the unique homomorphism it has for a given X the zero homomorphism. Examples are important for this, so let’s first look at SET. The initial object here is the empty set, since there is a unique function (the empty function) from the empty set to any given set. The empty set however is not the terminal object, as there are no functions into the empty set from a non-empty set. Instead, the singleton set with exactly one element takes the place of the terminal object. When we look at GROUP, we see an example of a category with a zero object, the trivial group consisting of only the neutral element. There is exactly one group homomorphism from the trivial group to any given group, since group homomorphisms have to map neutral elements to neutral elements, and there is exactly one group homomorphism from any given group to the trivial group, given by mapping everything to the neutral element. To tie off our examples, a slightly more interesting case is the category RING of (commutative) rings. Our terminal object is again the trivial ring consisting only of the element 0, but it cannot be the initial object, since a ring homomorphism needs to map 1 to 1, and the trivial ring, having only one element, cannot map that element to both 0 and 1. Instead, the initial object of the category of rings are the integers \mathbb{Z}. The most generic cyclic group.

Group Objects

What where we talking about again, oh right Diffie-Hellman. So with the language of category theory under our belt, we can now point exactly to why “any group should works” is not working. As long as we are only looking at groups, we have to look at groups up to isomorphism (otherwise relabeling things would be a different thing), but our private to public key map is a group isomorphism. We are not looking at the right category. Diffie-Hellman does not work with groups.

We have to go deeper.

Enter group objects. I love this topic. So much in fact that I’ve written an entire 90 page rambling novel about it once and then people said I am now allowed to call myself a doctor.

The relevant part is in chapter 3, wedged in there between copious amounts of abstract nonsense.

To go through it in a bit more detail, we have some category \mathcal{C} that we use as our base. We want to make some object G in that category into a group. But remember, this is category theory, so G is not necessarily a set. I mean, it will be a set in any practical situation, but we can pretend that we don’t know that, for some really, really cool math. And you should always pretend that you don’t know something when it leads to some really, really cool math. So the only thing we have is homomorphisms. In the cropped away parts of the chapter, we did narrow things down to locally small categories that have a terminal object, and named that terminal object 0. So, if we don’t have elements, how can we describe the group axioms? Well the first one, associativity, is about multiplication. Multiplication is a map, mapping two elements of the group to one element, so we can just say that we require that map to be a homomorphism in Hom(G \times G, G) (Do not ask what G\times G is if G is not a set. It’s something about universal properties, and I promised there would be no universal properties). To define associativity, we need that (G \times G) \times G is the same as G \times (G \times G). This is due to the universal property thing that we do not talk about, but it gives us this canonical isomorphism, i.e. those two constructions are the same, and they are the same in some very obvious way. Now associativity is just saying that the multiplication morphism, applied twice in one order, is the same as the multiplication morphism, applied twice, in some other order, what you see in the screenshot as the first property. Similarly, commutativity is also just a statement on the multiplication morphism, with yet another canonical morphism, explaining the fourth, optional property.

Way more interesting and cooler is the second property, which defines the neutral element. So how does one define a neutral element, when you try to pretend very hard that you do not have any elements? This is where our terminal object comes in. We designate an element of e \in Hom(0, G) and call it the neutral morphism. Note that 0 is the terminal object, not the initial object, so e is not necessarily unique. In fact, if we take the category of sets as our base (which will give us back the groups we know and love), the terminal object, as discussed above, is the singleton set. A map from the singleton set into a set can be seen as pointing at one specific element in our target set. To put differently, an element of Hom(0, G) in the category of sets is just an element of G. Now we need to describe what makes the neutral element, well, neutral. It should interact with our multiplication morphism in some way. We now perform the most miraculous part of our definition. We want to define the neutral property, i.e. that x \cdot e = x, but without mentioning x, and instead phrase it as a series of maps. Let’s stay with x for a bit, though, to come up with a battle plan. Since we want to talk about x \cdot e, we want to make a statement about m(x, e), using the prefix notation instead of the more familiar infix. So we need to construct (x, e), ideally starting from x, so we can have a nice equality in the end. So starting from a single element, we want two. There is a way to do this, by using the diagonal morphism G \to G\times G \colon x \mapsto (x, x). That’s not quite (x, e), but it at least has two components. The first component we can leave unchanged, in other words, we apply the identity to it. But how do we get rid of the second component? Well, we can map it to our terminal object 0. There is exactly one way to do that. Now that we are in 0, we can apply our neutral morphism e to get back to G. If we write 0 for the only element in our terminal set 0, and write e for the neutral element in our group, what we can do is a chain of maps like this: G \to G \times G \to G \times 0 \to G \times G \colon x \mapsto (x, x) \mapsto (x, 0) \mapsto (x, e). Now we can apply m and expect to end up at the same place. In other words m \circ (id_G \times (e \circ 0)) = id_G. And suddenly, we have described what a neutral element does, without having, you know, elements.

The last property, about the inverse, uses a similar construction, this time with another morphism that maps from the group object to itself, to find the inverse. I leave it to the reader to go through why it has to look this exact way.

Finding our category

And with that, we now finally have the tools to actually answer our original question. What is the thing that we map into and what options do we have for that? We have established that it is not a group, as groups would not be able to distinguish private and public key. We need some additional structure, something that tells us how we can operate on the elements of our thing, without saying that the g \cdot g = g^2 and nothing else. We still want something that behaves like a group, though, since the whole stuff with agreeing on keys and stuff fundamentally relied on there being some group things going on. We need a group object. But what is our category \mathcal{C} we use as a base? It can’t be SET, since that would lead straight back to groups. We probably want something with finite objects, because computers aren’t very good at dealing with things that don’t fit into their little memories and disks and registers and stuff. We do want to be able to compute at the very least the multiplication morphism m. We will need a tiny amount of handwaving, but I think it is not too hard to see that taking tuples of elements of finite fields as a base might be a good idea. Computations on these tuples would involve addition, multiplication, typical field stuff. Adding and multiplying elements of a field? It kind of looks like we want our morphisms to be polynomials! And, as it turns out, there is a category just like that, the category of algebraic varieties. The category of some tuples of finite field elements, where you map stuff around by applying polynomials. So our public key space should be the group objects in the category of algebraic varieties, usually called algebraic groups.

We have classified those. There are two main types, the linear groups and the abelian varieties. You can multiply them together, too but we are really interested in simple groups, so we want them in their most basic forms. First, the linear groups. They come in two flavors, the additive group \mathbb{G}_a and the multiplicative group \mathbb{G}_m. The additive group would work for Diffie-Hellman, if it wasn’t for this very efficient algorithm that can solve the discrete log problem in this group, named “division”. The multiplicative group we set aside and get to in the next chapter. That leaves us with abelian varieties. Varieties have an important invariant, called the dimension. So what are the abelian varieties of dimension 1? Elliptic curves. And they work great. We could go to higher dimensions (and in fact people have explored this), but it turns out you don’t really get anything but headaches from that. They result in secure constructions, but point counting is hard, and you get these “special divisors” which make constant time programming a nightmare, and get nothing in return. So that is why elliptic curves, even though at first glance looking like a quite arbitrary and odd choice for a “group” are actually pretty much the only choice there is.

What about finite field Diffie-Hellman?

This only leaves the multiplicative group. It showed up in our classification, and you probably know that it has been used in cryptography (under the name finite field Diffie-Hellman), so shouldn’t the title of this blog post be “There is no Diffie-Hellman but Elliptic Curve Diffie-Hellman and also Finite Field Diffie-Hellman”?

I mean yeah, but it would be an awful title, way less clickbaity, so let’s end this blog post by showing that Finite Field Diffie-Hellman is also just Elliptic Curve Diffie-Hellman. Well “Elliptic Curve” Diffie-Hellman, but I’m getting ahead of myself.

We’ve classified a lot of objects in this blog post, so what about one more? How can we describe all elliptic curves (up to isomorphism)?

To do this, we need the so called j invariant, defined as j = 1728 \frac{4a^3}{4a^3 - 27b^2}, for an elliptic curve given by Y^2 = X^3 + aX + b. Two elliptic curves are isomorphic (over the algebraic closure) if and only if they have the same j invariant.

How many j invariants are there? In other words, does every element of the base field show up? It turns out, yes. Given an element j, if j = 0, we can use the curve Y^2 = X^3 + 1. For j = 1728, we can use the curve Y^2 = X^3 + X. For all other cases, we can just use Y^2 = X^3 + \frac{4(1728 - j)}{27j}X + \left(\frac{4(1728 - j)}{27j}\right)^2. You can check that this will have the right j invariant.

In other words, two elliptic curves are isomorphic (over the algebraic closure) if they share the same j invariant, and there is one j invariant for every element of our base field. So we can think of our elliptic curves as points on the affine line, and somehow, the collection of all elliptic curves, themselves behaves like an algebraic variety. (This is an extremely powerful concept called the moduli space). Affine varieties always feel like they are incomplete, you really want to see if there are any points of infinity you could add to them. The affine line in particular really wants to be a projective line, and add its one point at infinity it is due. And it turns out, we can do this in a sensible way (constructing the moduli of stable curves of genus 1). This point at infinity? It belongs to the curve Y^2 = X^3 - X^2. It kind of looks like an elliptic curve at first glance. Sure it’s not in Weierstraß form, but you can rectify that and give an isomorphic curve that has that form (I’m too lazy to do that right now). However, this is not an elliptic curve. In order for a curve to be elliptic, it needs to not have any singularities, and this curve has, …, well, here is a plot of it:

When a curve intersects itself in this awkward way, that’s a singularity. But other than that, it is perfectly fine. In fact, you can even use the very same formulas as you use for elliptic curves to do point addition and everything, just stay away from that singularity at (0, 0). And when you do that, you notice something strange. You can simplify the formula. It’s a bit on the lengthier side of algebra when doing it by hand the long way around, but very much doable. And when you simplify it, you notice that the points on the nodal curve (save for the singularity) can be described by a single non-zero field element, and that adding two points is the same as multiplying the corresponding field elements. In other words, this curve, the somewhat awkward reject elliptic curve, gives us the multiplicative group. Finite field Diffie-Hellman is also just Elliptic Curve Diffie-Hellman, just over a curve that accidentally twisted itself a bit. And we found that curve naturally, trying to hang out with its elliptic curve friends on the same moduli space (as opposed to the cuspial curve Y^2 = X^3, with its way nastier singularity, which happens to give you the additive group instead, and does not hang out on this moduli space).

And so, far from being arbitrary, there was never an actual choice for Diffie-Hellman. There is only Elliptic Curves.

Categories
Cryptography Standardization

How not to format a private key

Last time I had an issue with standardization going into the, in my opinion, wrong direction, I wrote a blog post about it. Much to my own surprise, that actually worked, and as a reward you get more blog posts about somewhat convoluted standardization issues.

In order to entice you to at least skim the background section, here a bit of a teaser: Do you like undefined behavior? Do you enjoy two standard conforming implementations agreeing on all input bytes, and all entropy the system produced, but disagreeing on the output bytes? Then you might be happy with the proposed “compromise”. Otherwise, read on to find out what this is all about.

The Problem

The current conundrum is about the format to use for the private key of ML-KEM/ML-DSA when using these keys for certificates. It turns out that NIST left a tiny hole in their definition of these formats, which now causes some considerable problems. When defining the key generation algorithm of ML-KEM, NIST added this statement in the standard:

Excerpt of FIPS 203, describing that the seed used to generate a key can be used instead of the expanded format also described in the standard.

Similarly, for ML-DSA, there is the almost identical statement:

Screenshot of FIPS 204, slightly edited to remove page break

This means there are fundamentally, and NIST approvedly, two different representations for ML-KEM/ML-DSA private keys, in that you can either store the random seed used as input for the respective key generation algorithm or store the output of that algorithm.

Before we watch how we, as a community, managed to turn this rather benign issue into an absolute mess, let’s look at the two private key formats NIST has defined a little bit more closely.

Seeds

First, there are the seeds, 64 bytes in the case of ML-KEM, and 32 bytes in the case of ML-DSA. Aside from being really small, especially when compared to anything else in PQC, seeds have a number of other advantages. What the key generation does, in both cases, is to feed this seed into a number of cryptographic hash functions and eXtensible Output Function (XOF), basically a cryptographic hash function that produces any number of output bytes, instead of a fixed number as would be expected from a hash function. These outputs are then used to generate the actual key material.

This has a number of advantages. I already mentioned their compact size, but another property is that it is impossible to create a malformed key out of a seed. Since the seed is the input in hash functions and XOFs, the actual key material derived from it will depend on every bit of the seed equally, and changing a single bit changes the key in its entirety, there is no way of independently chose parts of the private key. This is why seeds defend against the attacks outlined in Kemmy Schmidt, and their main advantage over other formats (small caveat here is that for ML-KEM, the seed actually consists of 2 separate, concatenated seeds, and the format still allows for one of the two Kemmy Schmidt attacks. We could avoid that by deriving the ML-KEM seed from an even higher level seed, but NIST opted not to do that). Module Lattice schemes in particular have private keys (s, e) such that the public key is t = As + e\,\text{mod}\, q. s and e have to be drawn from a specific distribution in order for the key to be secure, with the key generation algorithm describing how to do so, meaning that the seed enforces this specific distribution, which can not easily be proven to be case afterwards.

But also, like with any cryptographic hash function, there is no going back. The seed is high entropy, so there is no way of reversing the hash functions and XOFs, and the only way to get the seed later is by saving it.

Semi-Expanded Private Keys

So if seeds are so great, why aren’t we using them everywhere? For elliptic curves, the reason is that the private key is extremely simple. Any integer in the correct range is a valid private key, so there isn’t really much of a key generation algorithm. The parameters for popular curves are set to be very close to powers of two, so there isn’t even much rejection sampling to be done. X25519 goes even further and simply ignores the extremely small bias introduced by not rejection sampling and declares every 32 byte string to be a valid private key.

For RSA, the situation is the other way around. RSA key generation has to compute a substantial amount of variables, including rejection sampling not one, but two large primes. Primality testing is fairly expensive, and numbers in this range (for RSA-2048) are only prime with a probability of roughly 1 in 700 (though this improves if you don’t test even numbers). All in all, the RSA key generation algorithm is painfully slow, and you really want to avoid running it from scratch every time you load a key, which has lead to RSA storing the results of the key generation algorithm as a cached computation. That being said, there would be a way to add seeds to RSA, by only noting down the succeeding roles for p and q, but all in all this ship has sailed, circumnavigated the globe several times, and is about to be decommissioned.

This brings us to the semi-expanded key formats of ML-KEM/ML-DSA. As I mentioned, we need to sample some parts of the private key from a specific distribution, but unlike with RSA, this distribution is rather simple (centered binomial in ML-KEM’s case, uniform in ML-DSA’s case). In particular, no rejection sampling of the private key material takes place, as the parameters are chosen in such a way that everything works out with powers of two. This makes ML-KEM’s/ML-DSA’s key generation algorithm very fast, and not much of a burden to run on load. In fact, the slowest part of the key generation algorithm is sampling the matrix A. But this matrix is stored as a seed in both the public key and private key format anyways, so you don’t get around that step as is, and this is also the reason why I keep calling this key format semi-expanded. It expanded some aspects of the key, but left others to be recomputed.

As far as I can tell, the only reason this format exists to begin with, is as a compromise to make performance metrics look better in the competition, even though they do not apply in practice.

I discussed how cryptographic primitives mathematically speaking are tuples of functions before, with KEMs in particular being a triple (G, E, D) and signatures being a triple (G, S, V) and expanded how this mathematical definition warps and bends when hitting reality. To understand the difference between performance in competition conditions and performance in real life, we need to again look at the code that would implement such a triple, starting at the same code snippet from the last blog post:

typedef std::vector<uint8_t> Message;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();
Signature sign(PrivateKey sk, Message m);
bool verify(PublicKey pk, Message m, Signature s);

As you might have noticed when reading that previous blog post, I left some objects undefined in this API. While I specify that messages and signatures are binary strings, I say nothing about the keys, and leave them as generic objects. The competition simply defined these keys as byte strings as well, and used this API as their performance benchmark.

However, when we in practice want to perform multiple operations using the same key material, we would usually use some in-memory representation that fixes things like alignment and similar at the very minimum, so forcing these objects to be byte arrays would be overly limiting for actual implementations. What we really want is a function to parse and serialize objects, as common in object oriented programming, leaving us with this type of API:

typedef std::vector<uint8_t> Message;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();
Signature sign(PrivateKey sk, Message m);
bool verify(PublicKey pk, Message m, Signature s);

std::vector<uint8_t> serialize_private_key(PrivateKey sk);
PrivateKey parse_private_key(std::vector<uint8_t>  sk);
std::vector<uint8_t> serialize_public_key(PublicKey sk);
PublicKey parse_public_key(std::vector<uint8_t>  sk);

This is the API many libraries use in practice, and with this API, you are usually not very concerned about slight performance regressions in the parsing operations, as they amortize. You are of course not completely isolated from them, so completely regenerating RSA keys is not going to fly, but the rather straight-forward LWE keygen algorithm is not a problem. This leads to a situation where the competition produced a key format that is inferior in practical applications, and a good lesson in “you get what you measure”.

The Options

With the problem now explained to in unnecessary detail, we can finally discuss what is happening at IETF. In case you skip everything above, to recap, we have two formats, one short (seed) and one long (semi-expanded) one, and a one-way map between the two, mapping from seeds to semi-expanded keys, but no way to invert this mapping.

Which brings us to IETF now having to define a key format. The most elegant choice would be to just always use the seed, it is the more general format, and call it a day. But some folks have already started implementing HSMs before the standard was done, and they forgot to add the ability to read seeds. I have to admit, I don’t have particularly large amounts of sympathy for that position, implementing a pre-standard comes with risks, and given that the vast majority of implementations of ML-KEM and ML-DSA, including in HSMs, is yet to be written, we should not bend over backwards to accommodate over-eager adopters. But I do have some sympathy, and there are ways that we could make both sides happy.

The simplest would be to define two separate private key formats, using separate algorithm identifiers (OID) in PKCS8. The algorithm identifier in PKCS8 is shared between private key and public key, so this leads to the rather awkward outcome of having two public keys that use the exact same algorithm using different IDs. I still think that that is a acceptable outcome, and will expand on why in the last section, but for now let us just accept that there was significant pushback to this idea.

The next simple thing would be to have a way to switch between seed and semi-expanded key, using a construction like ASN.1 CHOICE. In other words, have a two value enum as first byte, followed by either the seed or the semi-expanded key, depending on the enum value. This format is somewhat incredibly awkward, but it is workable. It is how we define elliptic curve points for example, which contains an encoding for both compressed and uncompressed points. In all likelihood all that would happen is that we would see seeds winning out in the end as the more flexible option, with the occasional old hardware having to have its keys rewritten by a software layer.

Which brings us to the last option, which prompted the reason for me writing this blog post. Making decisions is hard, so what about a standard that just, like, did not make one? The suggestion is to use a list of two optional fields, one for the seed and one for the semi-expanded key, and leave the interpretation up to the importing library. This has two edge cases, one that is easily fixed, and one that is inherent to this lack of decisiveness. First, the key could be empty. This can be fixed by disallowing that option, and only allow seed, semi-expanded, or both to be present. But this leaves the other bad choice: both. If there is both a seed and a semi-expanded key present, you have not one, but two equivalent keys in your format. And nothing guarantees that they are the same key. Sure you could run the key generation algorithm and check, but if you do that, then why did you include the semi-expanded key to begin with, it is literally the output of this check.

In case you are wondering, NIST agrees with me on this one, and recommends the following check to be run when you have both seed and semi-expanded key available. However, they different from all other consistency checks, they opted to not make this check mandatory, mostly due to private key import being a rare occurrence, if I understand them correctly.

And of course, since the semi-expanded key is superfluous when the seed is present and this seed consistency check is run, the proposal is that the recipient does not perform it, leaving us with implementation defined behavior in the case of an inconsistent key.

On Well-Definedness

So why am I so vehemently opposed to any format that supports both seed and semi-expanded key material to be present in a single key blob without checking consistency? It boils down to the mathematical concept of well-definedness. Funnily enough, the concept of being well-defined is in itself not really well-defined, and depends on the context. In general it means that a definition should be internally consistent, and depending on context, the object defined should actually exist and be unique. For example, “the smallest integer bigger than 3.4” is a well-defined statement. This integer exists and is uniquely determined to be 4, so the definite article in the statement and the superlative are warranted. Compare this to the statement “the smallest real number bigger than 3.4” and you can see a statement that is not well defined. Real numbers are not well-ordered, and for any real number bigger than 3.4 there are infinitely many more real numbers, equally bigger than 3.4 that are smaller than the chosen number. There is no “smallest” number. In other words, when a mathematician says “this statement is not well-defined” they are implying that what is said is simply nonsensical, and not a legal statement to begin with, so it can be considered no further.

But aside from offending mathematicians, for standards there is a very practical concern here: The whole point of a standard is to describe how to interpret something. If two implementations of the same standard having vastly different behaviors would be acceptable, then why did you write a standard to begin with?

Implementation defined behavior has concrete security ramifications. The point of a private key format in particular is to transfer private key material from one system to another, which is already a fairly rare and not well-tested situation. It is easy to imagine one part of a system writing one field, and the other part of the system reading the other. In particular, since the seed is unrecoverable after the fact, a system that only has the semi-expanded key available might, instead of writing NULL in the seed field, write “NULL”, which the other system then prefers over the semi-expanded key that contains the actual key material, both compromising the system (as “NULL” is very brute-forcable) and potentially losing the actual private key forever.

This brings us to the last topic, the aforementioned resistance against using separate OIDs. The reason for this resistance has a historic predecessor, RSA. RSA keys have two different OIDs, one for “RSA” and the other for “RSA encryption”. This causes lots of pain, and with the lens of well-definedness, it is somewhat immediately obvious why: “RSA” is not a well-defined algorithm, but is actually four separate algorithms, and those algorithms aren’t even fully specified by the key format. We have RSA-PKCS1 for signatures, RSA-PKCS1 for encryption, RSA-PSS for signatures, and RSA-OAEP for encryption. The “RSA” key type is used for all four, and while the “RSA encryption” key type is used only for two of them. This is a classic problem of not properly defining your algorithm and then being surprised by the confusion it causes. Compare that to the ML-KEM/ML-DSA situation, where, if we used two separate OIDs, we merely would have an alias, with the explicit definition of the public key being the same for both OIDs. While it would be annoying to write an explicit “handle this case like the other case” everywhere, it would be nowhere near as bad as “handle this case somewhat, but not exactly like that other case”.

Categories
Cryptography

Unbalanced Oil and Vinegar, Part III

In Part I, we looked at the problem we want attackers of UOV to solve. In Part II we had plenty of oil and vinegar, but did not really discussed the whole unbalanced part of the scheme. So in this third part of this three part series, I will discuss why we are using Unbalanced Oil and Vinegar in particular, and discuss the Kipnis–Shamir attack on the original Oil and Vinegar scheme.

The paper that presented this attack is formulated in the language of matrices, but in my opinion, and showing my biases as a algebraic geometer, one can’t fully understand the problem when in this language, and I will instead present it in the language of bilinear forms, that I have build over the last two blog posts. Using bilinear forms is useful, since it allows us to present things without having to chose an explicit basis, which means that we do not really have to distinguish between the public and the private key all that much.

Preliminaries

Most of this section is a mini course in linear algebra and in particular bilinear forms and dual vector spaces. I will try to also formulate the attack purely in terms of matrices, so if this is going too far for you, you might still follow along with the attack, but as mentioned, using the language of bilinear forms in my opinion gives a deeper understanding.

First, as a quick refresher, as discussed in Part II, a UOV public key is a set of m bilinear forms (\cdot, \cdot)_i, which have the property that there exists a subspace O\subset X such that (x, y)_i = 0 for all x, y \in O, i = 1, \dots, m. This subspace O is the private key we are looking for. The attack in question is a key recovery attack working with the public key only, so we don’t really need to repeat how signing works, and we will focus solely on how to recover this subspace given the bilinear forms, with Part II discussing the signature scheme.

We also need some additional vocabulary to talk about bilinear forms. Bilinear forms in positive characteristic behave somewhat weirdly sometimes, so it makes sense to prove stuff from scratch anyways.

First, we need to discuss non-degenerate bilinear forms. A bilinear form is called non-degenerate, if, for all vectors x \in X, x\neq 0 there is at least one vector y \in X such that (x, y) \neq 0. If you look at the matrix defining the bilinear form, this is equivalent of saying that this matrix has full rank.

The reason we care about non-degenerate bilinear forms is because they allow us, on a finite dimensional vector space at least, to “transpose” vectors, i.e. turn column vectors into row vectors. If you know quantum physics, what a non-degenerate bilinear form does is turn a ket vector into a bra vector. If you don’t know quantum physics, don’t worry about it, those are just terms the quantum physicists invented because they didn’t know enough about bilinear forms and dual vector spaces.

To understand dual vector spaces, all you need to know here is that linear forms, i.e. linear functions that output a single scalar, themselves also form a vector space over the same field, since you can add linear forms and multiply them with scalars. Conceptually, these are literally just row vectors, as mentioned above, since those can be multiplied with column vectors to give elements of the base field, i.e. row vectors define all the linear forms that exists (for a finite dimensional vector space. Infinite dimensional vector spaces are very, very different, and to take back my comment on quantum physicists a bit, those are what quantum physics deals with, but still, understanding dual vector spaces would make their field a whole lot more approachable). We write X^* for this dual vector space.

To tie this to our bilinear forms, the important thing a bilinear form allows you to do is to “curry”, or partially evaluate it, that is given a bilinear form (\cdot,\cdot) and an element x \in X, we can look at the linear form (x, \cdot).

And for non-degenerate forms, doing this partial evaluation gives you an extra property, namely that, if \sum_{i=1}^n \alpha_i (b_i, \cdot) = 0, we know that \sum_{i=1}^n \alpha_i b_i = 0 as well, since the linear form 0 is defined by being 0 for all vectors in the vector space, and linearity in the first argument means that \sum_{i=1}^n \alpha_i (b_i, \cdot) = \left(\sum_{i=1}^n \alpha_i b_i, \cdot\right), so by definition of non-degeneracy it follows that \sum_{i=1}^n \alpha_i b_i = 0. What this means is that, if (b_1, \dots, b_n) is linear independent, then (b_1, \cdot), \dots, (b_n, \cdot) is linear independent as well. This proves the statement I made above of finite dimensional vector spaces having the same dimension as their dual, and means that every non-degenerate bilinear form gives us a way of mapping column vectors to row vectors. This might seem like a lot of work for literally just turning a page 90 degrees, but as I said, quantum physicists still pay the price for thinking they could turn papers better all by themselves.

Next, we need to talk about orthogonality. This term really only makes geometric sense in characteristic 0, but we can still use it algebraically: Two vectors x, y \in X are called orthogonal if (x, y) = 0. For a subvectorspace V \subset X we define V^\perp := \{x \in X\;;\;(v, x) = 0\, \forall v \in V\}. With this definition, we can rephrase the trapdoor of our bilinear forms as there being a subspace O \subset X such that O \subset O^{\perp_i}. Basically, a subspace that is perpendicular to itself in all bilinear forms of the public key.

While this is somewhat obviously nonsensically in a geometric way, we can rescue at least one property of perpendicularity into positive characteristic, and that is the dimension of the orthogonal space:

For a non-degenerate bilinear form (\cdot, \cdot), and a subvectorspace V\subset X, we have \text{dim}\, V^\perp = \text{dim}\, X - \text{dim}\,V. To see this, take a basis of V. For every basis vector b_i, we again look at the linear form (b_i, \cdot). This gives us \text{dim}\,V many linearly independent linear forms, all of which vanishing on V^\perp. We can consider these forms as a linear function X \to K^{\text{dim}\,V}, and since they are linear independent, we know that this linear function has full rank. Vanishing on all forms makes V^\perp the kernel of this function, and by the kernel-rank theorem, this implies that \text{dim}\, V^\perp = \text{dim}\, X - \text{dim}\,V.

The Attack

With all of this out of the way, we can now look at the attack itself. So in the following, we have m bilinear forms (\cdot, \cdot)_i, defined over an n-dimensional vector space X with n = 2m, i.e. there are as many oil vectors as there are vinegar vectors, and everything is “balanced”. We also have a secret, m-dimensional sub vector space O\subset X such that O\subset O^{\perp_i} for all i = 1, \dots, m, as discussed above. One thing to note here is that \text{dim}\, O^{\perp_i} = n - m = m = \text{dim}\, O, so in this balanced case, and only in this balanced case, we further get O = O^{\perp_i}, since a subvectorspace that has the same dimension as the space is equal to the space.

The main idea behind the Kipnis–Shamir attack is that bilinear forms are hard, but linear functions are easy, and that for two non-degenerate bilinear forms (\cdot, \cdot)_1 and (\cdot, \cdot)_2 there is exactly one linear function \varphi_{12} such that (x, y)_1 = (\varphi(x), y)_2 for all x, y \in X. We know this, because both (\cdot, \cdot)_1 and (\cdot, \cdot)_2 are non-degenerate, so both can map a basis of X to a basis of the dual vector space X^*, so \varphi is the map given by this base change. More concretely, if (\cdot, \cdot)_i is represented by the matrix A_i, i.e. (x, y)_i = x^T A_i y, then \varphi, as a linear function, is represented by (A_1A_2^{-1})^T, since (\varphi(x), y) = ((A_1A_2^{-1})^Tx)^TA_2y = x^T A_1 A_2^{-1}A_2y = x^T A_1y. If we hadn’t done all this work about bilinear forms, we would now need to add a two line proof that this is invariant under base change. I instead opted for writing several paragraphs worth of linear algebra, which, as surely we will all agree, is far preferable.

So far, so unremarkable. You have a bunch of bilinear forms, you can define this linear function. But now, we can ask what \varphi(O) is. We know that for any vector x \in O, (x, y)_1 = 0 for all vectors y \in O, so we know that (\varphi(x), y)_2 = (x, y)_1 = 0 as well, so we know that \varphi(x) \in O^{\perp_2}. But as pointed out above, in the balanced case O = O^{\perp_2}, so \varphi(x) \in O. This means that \varphi(O) = O (technically, I have not proven equality, but that follows pretty easily by applying non-degeneracy again). This means \varphi|_O, \varphi restricted to the subvector space O is an endomorphism.

But for a linear function, restricting to a subvectorspace as an endomorphism is the same as saying that this subvectorspace is a product of eigenspaces. And we can compute the eigenspaces of a linear function. By itself that is only moderately helpful, since we don’t know which eigenspaces of \varphi are part of O and which aren’t. But we don’t just have one \varphi, we have a linear function for every pair of bilinear forms. And we can intersect eigenspaces, and find the common eigenspace of multiple linear functions. Two random linear functions are very unlikely to share an eigenspace, and we have m(m-1)/2 linear functions sharing an eigenspace. So that common eigenspace is just O. I describe how to compute that shared eigenspace below, but see the paper for more details.

To put it together, the attack algorithm works as follows:

  1. Discard all singular matrices, as they correspond to degenerate bilinear forms.
  2. Compute A_i A_j^{-1} for enough i, j.
  3. Take an arbitrary linear combination of the computed matrices C = \sum_{i, j = 1}^m \alpha_{ij}A_i A_j^{-1}
  4. Compute the characteristic polynomial of C and factorize it.
  5. Due to the common eigenspace, the characteristic polynomial will always have a factor of degree m. But it might have more factors. If there are more factors, discard and try a new linear combination. If you do not find such a matrix and factorization, continue at step 7.
  6. By Cayley–Hamilton, the kernel of factor of the characteristic polynomial, evaluated at the matrix, is the eigenspace of the matrix. Since our matrix has a characteristic polynomial which only has two factors, one of the two eigenspaces is the oil space, which we can quickly verify.
  7. Read the paper for why finding such a characteristic polynomial is very likely, so you never reach this step.

A geometric interpretation

I have to admit, when I first read about this attack, I found it extremely devastating. A polynomial time attack on one member of a family of schemes is a very frightening thing to see, and while the attack clearly requires n = 2m to work (Otherwise the whole eigenspace argument falls apart), at least I would like to see some form of an argument why n > 2m somehow is not just mitigating this specific attack, but also fundamentally has a structure that is somehow more complex than the balanced case is.

I think I have found such an argument, but as a fair word of warning, what follows will be some take no prisoners, deep, deep end of the algebraic geometric pool. Well the shallow end of the deep end, but you get my drift. I will try to keep the argument understandable for more general audiences, but you’ll have to take a bunch of things I say at face value. The whole argument is part of a deeper investigation I did on UOV, but that should properly be a different blog post, or maybe a CFAIL paper or something like that.

With the disclaimers out of the way, here we go:

If an algebraic geometer is given m polynomials, even if they are just quadratic forms, our basic instincts kicks in, and we study the geometric object obtained by setting all these polynomials to zero. I will call this the oil-and-vinegar variety, i.e. S = \text{Spec} K[X]/((X, X)_i). Note that even though these polynomials are homogenous, we need to keep our animalisticgeometric urges in check and not look at \text{Proj} for now. This variety is a subvariety of \mathbb{A}^n, and since it is given by m polynomials, it, assuming the polynomials are complete intersection, will be n - m dimensional. For the non-geometers: This is similar how having m linear equations creates an n – m dimensional vector space, with “complete intersection” being the equivalent to things being linearly independent, only that stuff can go wrong in a lot more interesting ways, and in general stuff bends a lot more, but importantly, if you take random polynomials, they are very likely to be complete intersection, so if our oil and vinegar polynomials were not, that in itself would be remarkable. It’s unfortunately not really possible to check whether polynomials are complete intersection without computing Gröbner bases, see part I, but when I tried it the tiny cases I could compute Gröbner bases for, things were complete intersection up until n = 2m (for n < 2m, it cannot be complete intersection as we will see). But I didn’t compute how likely complete intersection is in general.

The other thing to notice is that O, seen as a linear subvariety of \mathbb{A}^n, is a subvariety of S, since the quadratic forms, by definition, evaluate to 0 on O. And O is m dimensional. This shows that for n < 2m, S cannot be defined by polynomials in complete intersection, since it’s dimension m is greater than n – m.

So in the balanced case, where n = 2m, S is a m dimensional algebraic variety, with an m dimensional linear subvariety. In other words, an irreducible component of S. The Kipnis–Shamir attack isolates this linear component from the other, nonlinear components of S.

Artist’s impression of the OV-variety for m = 1, n = 2. You can see the linear subspace as irreducible component. Note that this is a polynomial of degree 3, so it is not actually a valid OV-variety, but it would look boring otherwise.

In the unbalanced case however, n > 2m, so \text{dim}\,S = n - m > m = \text{dim}\,O. Which makes O a closed subvariety with codimension greater than 0, i.e. O lies on some, itself non-linear, irreducible component of S. It does at least conceptually make sense to me that there might be an algorithm to tease out a linear irreducible component, but that linear closed subvarieties of non-linear components are much harder to pry away.

Plot of a m = 1, n = 3 OV-variety. There are plenty of linear subspaces to chose from (which would not usually be the case), but they are all located on a single irreducible component which is decisively non-linear. It’s still not the greatest picture, since the plotting software limits me to three dimensions only.

Categories
Cryptography

Unbalanced Oil and Vinegar Part II

After Part I looked at the hard problem underlying Unbalanced Oil and Vinegar, it is now finally time to talk about the algorithm itself.

Verify

As with many signature algorithms, looking at the verification routine first is a good idea. The verification algorithm is usually simpler, and gives you an idea of what the signature scheme is trying to do, with the signing algorithm then showing you how it does it.

In the case of UOV, we use a paradigm called “hash and sign”, in many ways the simplest signing scheme paradigm, not using any fancy transforms. All we need is a trapdoor one-way function f, i.e. a function that everybody can evaluate efficiently at any point x, computing y = f(x), but where, given y, finding an x such that f(x) = y is difficult, unless one knowns some extra information k, in which case it is easy. In other words, there is a function g that is easy to evaluate such that f(g(k, y)) = y.

Given such a trapdoor one-way function, we can construct a signature scheme by simply making x the signature of the (hash of the) message y, i.e. the signature of a message r is the value x such that f(x) = h(r). The most prominent example of a signature scheme using this paradigm is RSA-PKCS1, where modular exponentiation is the one-way function, and the factorization of the modulus is the trapdoor information making it reversible.

As we discussed in part I, we want to use quadratic forms as our hard problem. At this point, it makes some sense to discuss different ways to write down quadratic forms. They are all interchangeable, and really which one is used is mainly a matter of what is most useful in a given context.

The most straightforward way of writing down a quadratic form is by writing it as a polynomial, i.e. by writing f = \sum_{i = 1}^n \sum_{j = i}^n a_{ij} X_i X_j. Note that the second sum starts at i, since we are operating over commutative fields, so we can always reorder things so that i \le j. This will be useful later. Also note that we only use a homogenous polynomial, i.e. a polynomial without linear or constant components. This is both a simplification and something that does not matter in practice, as you could always homogenize the polynomial.

The second way to notate a quadratic form is using matrix-vector multiplications. If we put our variables X_i into a vector x = (X_1, \dots, X_n), then x^T A x will evaluate to the exact same polynomial as above, with A being a triangular matrix. We can shift around the coefficients of A that are not on the diagonal, as long as A + A^T remains the same (if the characteristic is not equal to two, which will be a common caveat in this post).

Finally, looking at x^T A x it makes sense to ask what would happen if we did not use the same vector twice, but instead kept two different ones. This gives us x^T A y = (x, y)_A, a bilinear form, i.e. a function that is linear in both arguments and evaluates to a single field element. In other words, given a bilinear form (\cdot, \cdot)_A, we get the associated quadratic form by evaluating at the diagonal. Note that for the bilinear form, the order very much matters, since x_i y_j \neq x_j y_i, so in this case, we can no longer simply assume upper triangular matrices. In fact, usually, when studying quadratic forms, mathematicians restrict themselves to symmetric bilinear forms as the associated bilinear forms, i.e. forms with (x, y)_A = (y, x)_A, corresponding to symmetric matrices A. This gives a one-to-one correspondence between quadratic forms and bilinear forms, and is super convenient, with the slight downside of not working in characteristic 2. As mentioned, this is a more general thing, having to do with the fact that the degree of the polynomial equals the characteristic, something algebraic geometers call the wild case. And guess which base field UOV uses? That’s right \mathbb{F}_{256}. Super nice for computers, since a single byte corresponds to a field element, but really inconvenient for the more theoretical analysis. We can mostly ignore this, but keep in mind that pretty much everything that uses bilinear forms might have an unwritten asterisk that goes into unreasonable detail about the situation in characteristic 2.

With all this on notation out of the way, we can finally write down the public key and the verification algorithm:

A UOV public key consists of m quadratic forms, given as symmetric matrices A_i, of dimension n \times n. A signature of a message r is a pair (x, s), with x being an n dimensional vector, and s some salt, such that x^T A_i x = h(s || r)_i.

Written this way, it is clear that the generic way of solving this problem would be equivalent of solving the multivariate quadratic equation problem. Indeed, the signature is literally given by a solution to the multivariate quadratic equation problem defined by public key and message. But keep in mind that just because the problem is formulated as a multivariate quadratic equation, that does not mean that you have to use generic solvers to attack it.

Note that, different from RSA, every message has more than one possible preimage, so even without the salt would allow for multiple signatures. By adding the salt we gain some additional security guarantees, in that it no longer matters whether the signer has signed the exact same message before, since with a different salt it results in a completely different challenge anyways.

Sign

It is now finally time to discuss the trapdoor used for UOV. After all, if our public key was completely random, it would indeed be impossible to forge a signature, but it would be equally impossible to actually compute a signature for the signer as well. So we need to first construct a special system of quadratic equations that can be solved easily, and then find a way to transform it into one that hopefully is indistinguishable from a randomly chosen one. We use the matrix notation discussed above for this. In particular, if the matrices A_i have the form \begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}, with V_i being a symmetric (n-m) \times (n-m) matrix, and M_i being a (n-m) \times m matrix. Written in this block fashion, it makes sense to distinguish the first n - m variables, called the vinegar variables from the last m variables, called the oil variables. If we look at what this form of matrix does to the polynomial, we see that having that zero block in the lower right means that while vinegar variables are multiplied with all other variables, oil variables are only ever multiplied with vinegar variables, and never with each other. We also conveniently have exactly as many oil variables as we have equations to solve. And solving the equations is now very easy: First, randomly chose the value of the vinegar variables. Partially evaluating the polynomials, we will now have constant terms (where vinegar variable was multiplied with vinegar variable), and linear terms (where vinegar variable was multiplied with oil variable), but crucially, we do no longer have any quadratic terms, as we did not allow oil variables to be multiplied with other oil variables. In other words, we have m linear equations with m variables to solve. Which we can do by applying Gaussian elimination. This system will not always have a unique solution, but that is reasonably unlikely, and if it doesn’t, we can simply chose a different set of values for the vinegar variables.

This gives us a system of quadratic equations that is easy to solve, and this will be our private key, and what we actually solve when signing.

So how do we switch from this private system of quadratic equations to the public system? Well, with a base change, of course. This is a linear function on our n variables, mixing them around, explaining the name of the scheme. We can write this using a n \times n matrix B, which allows us to fully write down the private key and the signing algorithm:

A UOV private key is a n \times n matrix B such that the public key A_i can be expressed as B^T A_i B = \begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}. To sign, we first compute the hash of the salted message, chose random values for the vinegar variables, giving us (v, X) as preliminary signature, leaving variables for the last m coefficients. We then compute (v^T, X^T)\begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}\begin{pmatrix}v\\X\end{pmatrix} = v^TV_iv + v^TM_iX + X^TM_i^Tv = v^TV_iv + 2v^TM_iX, giving us m linear equations, which we solve for X, giving us a private signature (v, x). Since A_i can be expressed as B^T A_i B = \begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}, and we just computed (v, x) such that (v^T, x^T)\begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}\begin{pmatrix}v\\x\end{pmatrix} = h(s || r)_i, we now know that h(s || r)_i = (v^T, x^T)\begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}\begin{pmatrix}v\\x\end{pmatrix} = (v^T, x^T)B^T A_i B\begin{pmatrix}v\\x\end{pmatrix}, so B\begin{pmatrix}v\\x\end{pmatrix} is our signature.

Bonus: Self verifying public keys

In my opinion, the best way to understand an algorithm is to have fun with it, looking for interesting properties and silliness that it allows for or thwarts. I’ve already discussed how one can reconstruct a UOV public key from n(n+1)/2 message signature pairs, but UOV is an endless pit of funny properties, so here we can discuss how to construct amusing public keys, such as, for example, a public key that is an executable python script verifying an input signature using its own code as the public key. This, as a side benefit, will also explain how we can drastically compress UOV public keys as well.

To do this, we need to take the approach algebraists love to take, and get rid of the basis altogether, i.e. we now talk about some n dimensional vector space X. So how can we describe the special property of our quadratic form, or its associated bilinear form, without explicitly referencing its matrix structure (and thus the specific basis)? Well, we made the matrix symmetric (since we ignore the whole “it’s actually characteristic 2, so it’s all weird” thing), and we know that symmetric matrices lead to symmetric bilinear forms, so we know that we are looking for m symmetric bilinear forms. The main property of the form in our special basis was that 0 in the lower right corner. The coefficients of a matrix of a bilinear form describe the result of the bilinear form applied to the corresponding basis vectors, i.e. a_{ij} = (e_i, e_j)_A, so having a 0 in this position means that there are m linear independent vectors which, when put into the bilinear form in both positions, will evaluate to 0. So to get rid of the basis, we need an m-dimensional subvectorspace O of X, with the property that (x, y)_i = 0 \;\;\forall x, y \in O for i from 1 to m. It is this subvectorspace which makes up the private key. Since it corresponds to the last m variables when choosing a basis, it makes sense to call this the oil space, containing vectors consisting of only nonzero oil variables above.

To continue our description of UOV without choosing a basis, we need to extend our oil space to the full vector space, by finding some space V such that V \oplus O = X. Thankfully, there are tons of such vector spaces, and, as it will turn out later that the choice of vinegar space does not matter at all.

With the vinegar space chosen, we can now reformulate the full signing algorithm, without the use of base changes:

We know that all vectors x \in X have a unique representation as v + o = x, with v \in V, o \in O. So we know that (v+o, v+o)_i = (v, v)_i + (v, o)_i + (o, v)_i + (o, o)_i = (v, v)_i + 2(v, o)_i, given the property of our bilinear forms. Choosing v \in V randomly, we again get a linear equation per bilinear form to compute o, and in particular, if v is not very unluckily chosen, there is now only one o that will give the right value for all (\cdot, \cdot)_i. Since the decomposition is unique, if we had chosen a different vinegar space, there would be a different vinegar vector that would have resulted in the same signature (assuming again that we do not get unlucky and end up with an underdefined system of equations and reroll). But we chose the vinegar vector at random anyways, so other than determining the lower dimensional space of unlucky vinegar vectors, the choice of the vinegar space has no consequence for our signing algorithm, as long as it intersects the oil space trivially to form the direct product, and even the basis we chose makes no difference, since we only use this vector space for choosing a random element from it, an operation that is independent from the choice of basis.

Let’s now look at the oil space, and what consequences different choices of basis have here. We assume that the vinegar space and oil space are fixed as above, and that we have chosen a random vinegar vector such that the system of equations (v, v)_i + 2(v, \cdot)_i has full rank. But then there is exactly one oil vector that solves for a particular right hand side, and that oil vector again does not depend on the basis we use for the oil space, just on the fact that it is in this oil space. So we can also chose the basis of the oil space basis as convenient. Note that, different from the vinegar space, this is not true for the oil space itself, which we had to secretly determine and now have no way of changing, just its basis.

Lastly, we can ask how likely it is for a given random n-m dimensional sub space and m dimensional sub space to intersect trivially. It turns out, this is again very likely (the chance of having a non trivial intersection is about one over the number of field elements). This means we don’t restrict our choice in oil space too much if we require it to intersect trivially with a specific vinegar space.

So, to put everything together: The choice of vinegar space and its basis has no real relevance to the signing algorithm, so we might as well chose the first n-m unit vectors to span this space. This does not really restrict the choice of oil space, and within that space the choice of basis again does not matter. We obviously cannot chose this part of the basis as the unit basis, but we can chose the basis such that the last m coefficients agree with with the last m unit vectors, i.e. chose the basis of O as (o_i, e_i), with e_i being the m dimensional unit vectors. All in all, this means our base change matrix now looks like this: B = \begin{pmatrix}I_{n-m}&B_{12}\\0&I_m\end{pmatrix}. Applying this base change to our public key as above, we get \begin{pmatrix}I_{n-m}&0\\B_{12}^T&I_m\end{pmatrix}\begin{pmatrix}A_{11}&A_{12}\\A_{12}^T&A_{22}\end{pmatrix}\begin{pmatrix}I_{n-m}&B_{12}\\0&I_m\end{pmatrix} = \begin{pmatrix}A_{11}&A_{11}B_{12} + A_{12}\\B_{12}^TA_{11} + A_{12}^T&B_{12}^TA_{11}B_{12} + B_{12}^TA_{12} + A_{12}^TB_{12} + A_{22}\end{pmatrix}. So in order for the form to have the required shape, we need only that A_{22} = -B_{12}^TA_{11}B_{12} - B_{12}^TA_{12} - A_{12}^TB_{12}.

All of this taken together, this means that the only part of the public key that actually depends on the private key is the lower right m \times m block of the m quadratic forms. Which means that the rest of the public matrices can be chosen at random. The UOV competition entry cleverly uses this to make the public key much smaller, from m\cdot n(n+1)/2 bytes for the expanded public keys, to m\cdot m(m+1)/2 + 16 bytes for the compressed public key, using the 16 extra bytes as a seed to input into an XOF, to generate A_{11} and A_{12}. I, in turn, can use this property to keep the public key large, but hide amusing message in them, like the aforementioned python script.

Categories
Cryptography Cryptography (Incomprehensible)

Unbalanced Oil and Vinegar, Part I

Introduction

While there are many schemes discussed in the currently ongoing second onramp for PQC signatures, Unbalanced Oil and Vinegar (UOV) is both one of the most serious contenders and also an extremely curious scheme. In this three part blog series, we will take a closer look.

In a previous blog post, I looked at how to recover an UOV public key from loads of valid signatures. That blog post very intentionally left out a lot of details about the actual scheme, like why it is believed to be secure, or how the private key actually works.

It belongs to a larger family of multivariate schemes, which is most famous for having most of their members broken, with paper titles ranging from the rather dry “Cryptanalysis of the oil and vinegar signature scheme” to the very memeable “Breaking Rainbow Takes a Weekend on a Laptop“. Interestingly enough, for a field with this many broken schemes, Unbalanced Oil and Vinegar has so far stood the test of time. In this first part, we will try to understand why the underlying hard problem, multivariate quadratic equations, is hard.

What solves polynomial equations

If you want to cut this blog post short, you can just read the last section, about why the problem is NP-complete. It is independent from the rest of the blog post, and I think slightly easier to understand. Doing so however, misses out on building some deeper understanding of what it means to solve systems of polynomial equations.

If you attended any form of primary eduction, you have probably learned the formula for “solving” quadratic equations, ax^2 + bx + c = 0 is \frac{-b\pm\sqrt{b^2-4ac}}{2a}. In fact, in German high school, we called this formula the “midnight formula” on account that you should be able to recite it without hesitation if your math teacher wakes you up at midnight and asks you how to solve quadratic equations. This approach is called “solving by radicals”. It solves a polynomial equation by expressing the roots of the polynomial with field operations and a special function \sqrt[n]{a}, which returns the single positive real solution of X^n - a = 0.

However, a problem soon starts occurring with this idea, without even making our polynomials multivariate (i.e. univariate, i.e. with a single variable): It does not work. While there are formulas for degree three and degree four univariate polynomial equations, with a thrilling history involving lots of drama and zero-knowledge proofs, there is, for a history equally steeped in very different drama, but without zero-knowledge proofs, no formula for quintic polynomials. As in, there is provably no way of solving a quintic polynomial equation with just the use of field operations and radicals over the rational numbers. (There also isn’t a way to do this over finite field, but for entirely different reasons, funnily also relating back to the same Galois guy).

If you take a step back, this is not super surprising. After all, the way I presented it above, all a radical means is assuming that you already have a way to solve an entirely different polynomial equation, and then making it weird by introducing such ill-defined concepts like the real numbers (yuck).

So what do we (as in algebraists) mean by solving a (univariate) polynomial equation? Well first we need to distinguish between reducible and irreducible polynomials. If the polynomial is reducible, than factoring it will give you new polynomials, and each solution of any factor is a solution of the original polynomial. If the polynomial is irreducible however, we just, sorta, do nothing, and just assume that x is a solution, and try to express things with respect to that solution.

For UOV in particular, we are concerned with rational solutions, i.e. solutions that themselves are part of the base field. Thankfully, despite all of this rambling above, those are always well defined, even if you don’t have a formula for it: They are just the roots of the linear factors of our polynomial. When it comes to finding rational solutions, the good news do not stop here: Factoring polynomials (both over finite fields and over the rationals, and also over all sorts of other fields like \mathbb{Q}_p((\zeta))) is a problem of polynomial complexity, so there are efficient algorithms for computing the factorization, and by extend finding the rational solutions, of any univariate polynomial. And if you are a German high school student, after reading this section, you are officially allowed to answer with “it’s complicated”, roll over, and get a good night’s rest when your teacher wakes you in the middle of the night.

Gröbner bases

So far for solving univariate polynomial equations. But, as the name of the field “multivariate cryptography” suggests, what we really are interested in are multivariate polynomials, i.e. polynomials with multiple variables. We also need to solve systems of equations, i.e. multiple equations simultaneously. Here we first encounter something really weird about multivariate polynomials, by looking at multivariate monomials.

While univariate monomials, i.e. expressions of the form X^n are a total well-order, i.e. all monomials compare with each other through divisibility, and there is a smallest monomial (1), the same is not true for multivariate monomials: We still get a natural partial order via divisibility (e.g. X^3Y^5 is divisible by and ergo bigger than X^2Y^3), not all monomials compare anymore, (e.g. X^3Y^5 is neither divisible by nor divides X^2Y^6). This quickly becomes a problem, for example division with remainder is no longer well-defined.

We can, however, fix this by extending our existing partial order (which, as an aside, is a lattice, which has nothing to do with lattices of lattice cryptography, which in turn for ML-KEM and friends are an order, which has nothing to do with the comparison thingy called order. Mathematicians are very bad at naming things.) to a total order. The easiest way to extend the order is by using the lexicographical order, where we first compare the degrees of the first variable, and if equal, compare the second variable, and so on, until we find differing degrees, which then decide which monomial is bigger (e.g. X^3Y^5 is bigger than X^2Y^6, but smaller than X^3Y^6). This is not the only possible order, and the choice of order will become very important later.

With the order chosen, we can now go ahead and define division with remainder again, by taking the leading monomial of the divisor, checking whether it divides the leading monomial of the dividend, and if so, subtracting the appropriately multiple of the divisor from the dividend. We can do this repeatedly with a whole set of divisors, to end up with a remainder, but we will notice that the order of these divisors matters, and that in general our division with remainder algorithm lost a lot of the nice properties that it had in the univariate case.

Enter Gröbner bases. These are special sets of polynomials, for which division with remainder regains at least some of the useful properties they had in the univariate case. In particular, when using a Gröbner basis, the remainder is a canonical representative of the ring of all polynomials modulo the ideal generated by the set of polynomials. Why do we care about this? It turns out that there is a rather intimate relationship between the modulo operation, and the set where all polynomials simultaneously vanish, i.e. the solution of our system of multivariate polynomials, called the Nullstellensatz. You just know how important it is because we didn’t translate it from Hilbert’s original German. The details here go far beyond this blog post (as you might have noticed when I started talking about generating ideals etc, without warning the reader that they’ll need commutative algebra to proceed), and are thankfully not necessary to understand the basics.

For us, interested in particular rational solutions though, there is some value in looking more deeply into how a Gröbner basis for the lexicographical order looks like:

Given a system of polynomial equations (of dimension 0, don’t worry about that, though, it’s a technicality), the Gröbner basis with respect to the lexicographical order is an equivalent set of polynomial equations (i.e. each solution of one is a solution of the other), that specifically has the form (f_1, f_2, \dots, f_n), with f_n being univariate in X_m, f_{n-1} being bivariate in X_m, X_{m-1}, etc.

In particular, given such a Gröbner basis, we can find the rational solution by solving the last polynomial (which is univariate), taking each zero, partially evaluating the second to last polynomial obtaining another univariate polynomial, solving that, and so on. Note that this algorithm by itself might already be exponential: Nothing guarantees us that any given zero will end up with a valid solution after having factored tons of univariate polynomials, but in practice that is not usually the problem.

Finding Gröbner bases

What is the problem though, is finding the Gröbner basis to begin with. In fact, this problem is so complicated and interesting, that I decided to write an entire blog post about it, in case you haven’t noticed that I have not really talked about UOV so far, and won’t really mention the scheme other than as a motivating factor for all this until part 2.

So let’s finally look at the algorithm to find Gröbner bases, Buchbergers’s algorithm (there are slight improvements on this algorithm called F4 and F5, but they don’t drastically change things). The algorithm is simple enough to write it out in its entirety here:

buchberger(set of polynomials S) ->
list of polynomials S'
  for every pair (p1, p2) in S:
    let m1 be the leading monomial of p1
    let m2 be the leading monomial of p2
    let m be the lcm of m1 and m2
    compute p = m/m1 p1 - m/m2 p2
    reduce p by S with quotient and remainder
    if p != 0, add p to S
  reduce every element of S by S without it
  return S if the loop adds no new polynomials

This algorithm is a beauty! Notice that, the way it is written, it is not even clear that it terminates, much less what complexity it has. The “simplest” argument I know as to why it terminates goes something like this: If we look at the ideal generated by the leading monomials of S, then the element p we add in each loop iteration, will either have a new leading monomial (otherwise it would have been reduced), not yet part of this ideal, or it will reduce to 0 and not be added. So for each loop iteration, this ideal of leading monomials strictly grows. But polynomials rings are Noetherian, and as such, every strictly ascending chain of ideals will eventually terminate. At that point, the algorithm returns. While this proves that Buchberger’s algorithm always terminates, it does so by invoking one of the strangest properties of commutative algebra, and it does so without giving a constructive proof which could be used to determine the runtime complexity. One can compute the runtime complexity, by doing things more carefully, and arrives at the wonderful terms like O(d^{2^n}) for the degree of the polynomials in the eventual basis, so yeah, it might be a while, and even just writing down the basis is not trivially possible. In fact, in practice it is the space complexity of Buchberger’s algorithm which destroys all hope of ever getting a solution far before the time complexity becomes relevant.

And yet, algebraic geometers are so desperate for Gröbner bases, that they will continue running this algorithm for their current pet problem, based on nothing more than just hope. And it turns out, that hope is not always misplaced. I mentioned monomial orders before, and it turns out that Gröbner basis computation very sensitively depends on which monomial order one choses. A problem which will just run out of memory in one order might finish in seconds in another. You can also vary when you reduce which polynomial by which set, and come up with all sorts of fun distributed computation problems that will greatly influence the runtime of this algorithm. And what is nice is that once you have one Gröbner basis, you can quite efficiently compute other Gröbner bases as well, so nothing stops us from first computing the Gröbner basis in the more efficient graded lexicographical order, and then switching to the desired lexicographical order for cheap.

And this is the biggest weakness of multivariate cryptography. Where lattice cryptography has its nice worst case to average case reduction, multivariate cryptography has a very hard worst case problem, but a very finicky to control average case. So finicky in fact that often authors will just state how much RAM they spend on trying to find a Gröbner basis, since there is no better way of estimating runtime.

Multi-variate quadratic equations solving is NP-complete

Now to the last, and as promised independent section of this blog post, showing that solving multivariate quadratic equations is NP complete. We do this by reducing the problem to SAT, boolean satisfiability. For conciseness, we only do this for multivariate polynomials over \mathbb{F}_2, the field with 2 elements, but showing it for every other field is not substantially different.

We do this, by looking at one multivariate polynomial in particular: f(X, Y) = XY + 1.

Since our field has only two elements, we can quickly check what f evaluates to: f(0, 0) = 1, f(0, 1) = 1, f(1, 0) = 1, f(1, 1) = 0. In other words, if we associate 0 with false and 1 with true, f is equal to NAND. Famously every boolean circuit can be expressed just by using NAND gates, and wiring up gates is the same as chaining function calls, which in turn is the same as substituting polynomials. This means every boolean expression equals one multivariate polynomial, with the degree being exponentially bounded by the circuit depth. In order to get back to a quadratic system of polynomials, we now introduce a new helper variable Y for each variable X that has a degree higher than a square, by adding the polynomial X^2 - Y to our system of polynomials. This allows us to halve the degree of the polynomial again, and taken together, we get a polynomial bounded system of quadratic equations (and if we do both steps together, we never explode the degree of the polynomials in illegitimate ways). This reduces SAT to MVQ. On the other hand, given a possible solution, verifying it can be done simply by evaluating quadratic polynomials, an operation that itself is polynomial in time, so MVQ is in NP, making it NP-complete.

But remember, this only goes for generic systems of multivariate quadratic equations, and is no proof that UOV, which we will discuss in the next part, is even worst case hard, much less does it make statements about the average case complexity.

Categories
Cryptography

HashML-DSA considered harmful

I mentioned ranted about this topic as a section of a previous blog post (at the very end), but the topic keeps coming up, so I am escalating to a full blog post, since obviously that will help with all these people who are wrong on the internet standardization.

The Problem

Prehashing is a paradigm often used within the context of digital signature schemes. To understand where the problem is, let’s start with the normal definition of a signature scheme, as used by cryptographers, as a tuple of three functions (G, S, V), with the following mappings:

G\colon 0 \overset{R}{\rightarrow} \mathcal{K} \times \mathcal{P} (Key Generation)

S\colon \mathcal{K} \times \mathcal{M} \overset{R}{\rightarrow} \mathcal{S} (Signing algorithm)

V\colon \mathcal{P} \times \mathcal{M} \times \mathcal{S} \rightarrow \{\top, \bot\} (Verification algorithm)

In order to be secure, you’ll need some more stuff, like saying that a signature produced by S will verify when plugged into V with the same data, but that stuff is boring (not really boring) and Dan Boneh has already written it up in his book, so I’ll skip the details here.

As you can see, in the world of mathematics, where everything is perfect and wonderful, there are no hashes anywhere, so to understand what prehashing is about, we will unfortunately go a layer deeper, and pretend to implement these functions in made-up pseudo-code which happens to vaguely resemble C++.

typedef std::vector<uint8_t> Message;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();
Signature sign(PrivateKey sk, Message m);
bool verify(PublicKey pk, Message m, Signature s);

While we now specify that messages and signatures are byte arrays, this API is still very much the same as the mathematical description of a signature algorithm, and no hash function is anywhere to be seen. But there is a subtle problem with this API, or at least something that can be a subtle problem: While PrivateKey, PublicKey, and Signature are all types that, for any given algorithm, are constant in the size they take in memory, (or at least bounded, looking at Falcon), the message can be literally anything, from a few bytes to theoretically petabytes worth of data. This is especially a problem if you want to hold the private key in a separate device, such as an HSM or a Cloud service, and cannot easily transmit or process large quantities of data.

We mainly care about signing here, since private keys are the ones that need protecting, so a first attempt could be to introduce a streamed version of sign():

typedef std::vector<uint8_t> Message;
typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init(PrivateKey sk);
void sign_update(State* st, MessageChunk m);
Signature sign_finalize(State* st);

bool verify(PublicKey pk, Message m, Signature s);

There is still no hash to be found, and the state should probably be some object oriented thing, but we can now solve the problem: To sign a large piece of data, we call sign_init() to create a new stream, call sign_update as often as we need to, updating our state, and call finalize to get the signature. While this is no longer the exact same set of methods as the mathematical definition had, we can see the init/update/finalize pattern as a way to implement a streaming interface.

And there still is no hash. And while this interface is nice and all, it means that our HSM and/or Cloud service now need to manage state, which while possible, is still a bit of a problem. But we can make one more, somewhat more radical change to our API, by delaying at which point the private key gets involved:

typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init();
void sign_update(State* st, MessageChunk m);
Signature sign_finalize(State* st, PrivateKey sk);

bool verify(PublicKey pk, Message m, Signature s);

Assuming our State object is again bounded in size, we now could do sign_init() and sign_update() directly where the data lives, and only have sign_finalize() happen on the HSM/Cloud Service, with the State object itself being transmitted. Note that we still need to trust the caller of sign_init() and sign_update(), since the remote side would no longer be able to tell what it is signing, but we only have to trust that caller to correctly compute the State, and do not need to expose the private key there.

And if we squint a bit, we can now see the hash: With the addition of just one tiny function, we can make it more explicit:

typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Hash;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init();
void sign_update(State* st, MessageChunk m);
Hash sign_finalize_local(State* st);
Signature sign_finalize_remote(Hash message_identifier, PrivateKey sk);

bool verify(PublicKey pk, Message m, Signature s);

Now we can nicely see that signing can be written as the composition of a hash function call, followed by the actual signature algorithm.

So as long as we just keep our interface like this, we can use HSMs and Cloud services and all that good remote oracle stuff, without running into pesky data transmission problems sending over half a Petabyte.

The Problem with the Problem

This, however, does not readily work with every signature scheme. As stated, the signature scheme must decompose into a hash function call and a call to sign the output of that hash function, and not all signature schemes do. In fact, of the two signature schemes NIST just standardized, exactly zero decompose like that, in part because NIST actively encouraged having a property called “non-resignability”, which guarantees that the message identifier by itself, at least for messages with enough entropy to not be brute-forcable, is not enough to create a signature that verifies under a different, possibly attacker controlled key.

But people need the ability to use HSMs and remote oracles, and pushed for NIST to add it to the schemes, so NIST decided to instead of standardizing two schemes, they actually standardized four: ML-DSA, SLH-DSA, HashML-DSA, and HashSLH-DSA, with the latter two being absent from the draft standard. And those four standards are mutually exclusive, have different properties, and are not very well described as separate algorithms. I will go more into HashML-DSA (and HashSLH-DSA) in the last section of this post (which, just as a teaser, is called “The Bad Place”), but, with the hopefully not as futile as it feels wish to prevent us going there, present the in my opinion far better alternatives first.

The Solution (Part 1)

When introducing the whole problem, I somewhat intentionally named only one variable, the message_identifier, and presented the steps leading up to the requirement of decomposing the signature scheme into a hash function and the signing of the message identifier.

The main trick to get to this decomposition was moving the private key from init to finalize. Obviously, any one roundtrip protocol will need the private key to be moved to finalize, since the signature cannot be computed before all message data is accumulated, forcing finalize to be the remote function call that uses the secret key as a side input.

But do we have to make sign_init() to be a method that is completely independent of the rest of the signature scheme? After all, we do have a very conveniently public parameter that we could feed the signing procedure from the very start:

typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Hash;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init(PublicKey pk);
void sign_update(State* st, MessageChunk m);
Hash sign_finalize_local(State* st);
Signature sign_finalize_remote(Hash messsage_identifier, PrivateKey sk);

bool verify(PublicKey pk, Message m, Signature s);

In other words, the message identifier is no longer independent of the key, but only depends on the public key, and not the private key, so the private key can stay remote, but the resulting message identifier is still unique to our signature. A very convenient property, if say, you wanted to not be able to create a signature for a different public key using only the message identifier for messages with sufficient entropy.

And because it turns out to be such a convenient way to achieve this property, it turns out that ML-DSA, when it was still a little contestant going by Dilithium and trying to impress NIST, did exactly this.

And NIST thought that this was a good idea as well, so they left the following comment in Algorithm 7, line 6 of ML-DSA:

message representative that may optionally be computed in a different cryptographic module

And if you look at the rest of ML-DSA, things indeed only depend on this message identifier µ from there on out, and also notice that tr is a hash of the public key, meaning that it is possible to reorder the program in such a way that µ is computed without any knowledge of the private key at all, and can be passed to the HSM/remote oracle (aka the “cryptographic module”, in NIST speak) while being computed in a different “cryptographic module” such as a software library like BoringSSL.

In other words, and to be very explicit, here is how you could implement the above API for ML-DSA:

typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Hash;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init(PublicKey pk) {
  State st = SHAKE256_init();
  SHAKE256_absorb(&st, SHAKE256_oneshot(pk.serialize(), 64));
  SHAKE256_absorb(&st, "\0\0");
  return st;
}

void sign_update(State* st, MessageChunk m) {
  SHAKE256_absorb(&st, m);
}

Hash sign_finalize_local(State* st) {
  return SHAKE256_squeeze(st, 64);
}

Signature sign_finalize_remote(Hash message_identifier, PrivateKey sk);

bool verify(PublicKey pk, Message m, Signature s);

Or more explicitly: You can prehash ML-DSA with the hash function SHAKE256(SHAKE256(pk, 64) || 0x00 || 0x00 || m), where pk is the public key, and m is the message. Note that this hash function is for the case of using an empty context, but I leave figuring out the hash functions with context to my capable reader (Hint, there are two 0x00’s here for a reason).

The Solution (Part 2)

I want to be very clear: The first part of the solution solves the situation for ML-DSA. Pretty much completely. In my opinion, part 1 of the solution should be the most preferred way for solving this problem whenever we encounter it in its pure form. It allows the signing infrastructure to make choices about the locality of data and private key without having to bother the verifiers with it, allowing for someone to change it later, without having to update every verifier in existence. It also gives us this nice non-resignability bonus property, for whatever that is worth.

There are a few other things to tie up, though. First, this does not work for SLH-DSA, which is a multipass signature scheme, and has to hash the input data multiple times, so even the first API refinement would have led to an unbounded state object. Second, you might want to have resignability, i.e. you might want to produce signatures for the same piece of data with multiple public keys at once, even if that makes some academics sad. And third, you might want to sign the actual petabytes I promised above, and find yourself in need of a tad bit more parallelism than a single output stream can provide.

Thankfully, all three scenarios actually have the same solution: You need a protocol. By protocol I mean any description of the larger context your signature scheme is used in, and not necessarily an interactive protocol of two or more online parties. And the best part is: you already have a protocol, since a signature scheme, by itself, is pretty useless, and needs some larger framework that defines what data in what format is signed by whom, how public keys are distributed and trusted, etc. You never deploy a signature scheme by itself, it is always part of a larger system. And with just a few tweaks to that larger system, you make all the rest of your prehashing woes go away.

For example, when defining what data is signed by your signature algorithm, you could decide to simply sign the hash of some data, instead of the data itself. You could even sign the root of a Merkle tree, if say, you wanted to sign a petabyte of data using massively parallel data stores. You could sign the hash of your large data blob together with some unhashed metadata, which would give the remote side some auditing capabilities. The possibilities are endless. While I often blog about the rather minute details of primitives, it is actually this endlessness of possibilities of the protocol space of cryptographic system that is by far the largest part of my day job. Using the protocol to define a system that does not have to transport immense amount of data over constrained networks or to constrained devices is something that you will have to do in many instances of this problem, and in the simplest case this can be as easy as writing “The signer signs a hash of the data provided”. Yes, this means the data is hashed twice, which is a type of inefficiency, but these kind of tradeoffs are commonplace in protocol design.

The Bad Place

This leaves us with the bad place, HashML-DSA and HashSLH-DSA. Those two signature schemes are, in my opinion, what happens when protocol questions get pushed into the primitive layer, and leave a situation that is either ill-defined, or unnecessary. Note that even when pushing this protocol question into the primitive layer, you do not actually absolve yourself from the need of having a protocol defining the larger deployment of a signature scheme, but you merely make the protocol more complicated to analyze, as the primitive is now diverging from the neatly defined building block it is supposed to be.

The way HashML-DSA and HashSLH-DSA are defined, there are two different ways of signing the hash of some data. Option 1 is to just sign this hash, using the method described in the previous paragraph, by declaring the hash to be the actual data. Option 2 is to use HashML-DSA, and invoke a special version of ML-DSA that flips a bit and adds an OID.

The verifier will need to know which option you used, since, different from the solution (part 1), the resulting signature will differ. So in reality, both scenarios are actually protocol decisions, and nothing has been gained by the use of HashML-DSA.

But since HashML-DSA/HashSLH-DSA are no longer actual signature schemes as defined by the mathematical definition above, but require another parameter, the hash used, as a side input, you now have the problem of how to communicate this side input to the verifier, with two basic options: Either, you make it part of the public key, or equivalently, the verification logic, in which case it is simply superfluous, since your verification logic could have just said to first hash the data and then verify the signature on the hash. Or, you put the choice of hash with the data, and have it as an untrusted side input. The latter is the worst case scenario, because a signature scheme is not guaranteed to be secure if the payload describes how to verify itself. This is the problem with JWT, and the problem of countless security vulnerabilities in existence, and an extremely common mistake to make. The mathematical proofs used to show a signature scheme does not permit forgeries simply do not apply when the signature scheme is not fully defined. In the case of HashML-DSA, this, as far as I can tell, seems to mostly be a theoretical problem, as the hash OID should prevent forgeries, even if a weak hash function has been used in the past, but the mere fact that it encourages sending information necessary to define the verification algorithm through an untrusted channel, is a net negative, given how the whole thing is a completely unnecessary in the first place. We would be best off by ignoring the existence of the HashML-DSA/HashSLH-DSA variants completely, and instead operate on well defined signature schemes at all times.

Side note: A similar thing can be said about the context string, it too breaks the mathematical framework used and cuts through the abstraction layers of what a signature scheme does, but it is pretty easy to ignore or use correctly, so my visceral hatred for the HashML-DSA/HashSLH-DSA variants does not extend quite the same way here. In the end, both boil down to something Cas Cramer said, somewhat jokingly, about the misbinding issues with KEMs: “There should have been a precompetition, where the properties of the candidates for the competition are being set”. We should have clearly defined the APIs as used in practice from the beginning, including being clear about serialization (which causes the misbinding issues), implicit rejection (which causes even more misbinding issues), and prehashing capabilities of signature schemes. That would have avoided the last minute introduction of HashML-DSA, likely caused the use of explicit rejection in KEMs, and made papers like unbindable Kemmy Schmidt unnecessary to be written months before the standards were finalized.

Categories
Cryptography

Unbindable Kemmy Schmidt

If you have been terminally online on IETF mailing lists, you might have seen this thread, where, in extremely uncharacteristic fashion for the IETF, everybody just agreed to only use the seed used in key generation for ML-KEM’s private key instead of the format defined by NIST, something allowed by the NIST standard, but not particularly encouraged. But even if you’re not in the habit of voluntarily reading standardization email lists, the reason behind this is a fun piece of cryptography, a good excuse to explain the Fujisaki-Okamoto transform, and yet another example of my favorite topic, properties primitives never promised to have, but people like to assume anyways.

Misbinding

As the title suggests, this has something to do with binding properties, so first, let’s go into what those are. My favorite example here are invisible salamanders, of which Soatok just wrote an excellent blog post about, so read that and come back here if you need a refresher. Or read it because you want an excuse to look at cute blue dhole pictures, also always a valid reason to read Soatok’s blog.

For our purposes, invisible salamanders could be described as such: Can Eve craft a ciphertext that both Alice and Bob decrypt successfully, but with a different resulting plaintext? Or, to introduce the vocabulary we use to express this a bit more succinctly, does the AEAD ciphertext bind the plaintext?

More generally, for a misbinding attack we have two honest parties performing a successful cryptographic operation on some input variables, creating some output variables. These operations are either the same operation, or at least related, so one party might encrypt while the other decrypts, but importantly both parties arrive at the same set of variables, even if the role of input and output gets switched. We say a variable X binds a variable Y, if, knowing that Alice’s value for X equals Bob’s value for X, we can prove that Alice’s value for Y has to equal Bob’s value for Y, even if other variables differ. The limitation to successful cryptographic operations is to get rid of trivial attacks, since the whole point of these misbinding attacks is to confuse deputies, not to alert them of an ongoing attack.

Going back to the invisible salamander example with this theory, we can now neatly formalize it: The operation Alice and Bob are performing is AEAD-decrypt, which has two input variables, the key and the ciphertext, and one output variable, the plaintext. We are interested in whether the ciphertext binds the plaintext, which means that both Alice and Bob are given the same ciphertext, but potentially have different keys. In fact, since AEAD decryption is a deterministic algorithm, if our attacker wants to have any chance of accomplishing their task of producing different plaintexts, this third variable, the key will indeed have to be different between our two honest participants. We can distinguish three different attacker models, with respect to that key.

The first option is that the attacker does not know either key. We call such an attacker honest, and write HON-BIND-CT-PT (HONest Attacker, CipherText binds PlainText) for a primitive where the binding holds.

The second option is that the attacker is aware of the keys, but cannot control them, i.e. the keys where leaked to them. We write LEAK-BIND-CT-PT for an AEAD where no attacker can come up with a ciphertext that would decrypt to two different plaintexts, even if they know both keys. This is the invisible salamander scenario, so we could also say that AES-GCM is not LEAK-BIND-CT-PT. (Since cryptographers love consistency, in reality the property is called key commitment when talking about AEADs, otherwise it would be too easy)

The third option, featuring the most powerful attacker, would be an attacker who does not only know the keys, but is allowed to chose them for Alice and Bob. In the AEAD scenario, the ability to chose the key is usually not all that interesting, since AEAD keys usually do not have much structure, but it will be crucial for the KEM misbinding attacks we actually want to discuss here.

With all that theory out of the way, we can now define the various misbinding terms that Cas and friends discuss in Keeping Up With The KEMs, the paper that started the whole tradition of naming things after TV show puns. The paper is a good read, so if you want more details, give it a try.

As a quick refresher, a KEM is a primitive consisting of three functions, and is essentially asymmetric encryption, but with the cryptographers choosing the message for you. KeyGen generates a private/public key pair, Encaps takes a public key and produces a shared secret and a ciphertext, and Decaps takes a private key and a ciphertext and recovers the shared secret.

For misbinding, similarly to the AEAD example, we can have Alice and Bob perform the Encaps or Decaps function, collect the input and output variables: PK (public key), SK (private key), CT (ciphertext), and K (shared secret), and ask which of these bind which with which attacker model. As one slight complication, in order to properly compare the keys, we need introduce a helper function that maps a private key to its corresponding public key, which I’ll call PrivateToPublic. The existence of this function is guaranteed for honestly generated keys, but it is not clear that such a function is well defined for malformed keys, but in the case of ML-KEM, the public key is simply part of the expanded private key form of the NIST standard, so this is not a problem.

So what binding properties might one be interested in? Let’s get some trivial ones out of the way. Clearly the public key does not bind the ciphertext, since we can just encapsulate twice and get different ciphertexts. Somewhat more subtly, the ciphertext does also not bind either the shared secret or the public key, since the Kyber team went with implict rejection, so a decapsulation operation never fails, it just gives you some gibberish, so having the same ciphertext says nothing about the keys used or the resulting shared secret if those keys differ.

This leaves the question whether the shared secret binds public key and/or ciphertext, i.e. BIND-K-PK and BIND-K-CT, in our three attacker models honest (knows only the public keys involved), leak (know the private keys involved) and malicious (choses the private keys involved).

The Fujisaki-Okamoto transform

The neat part about all the misbinding attacks we are going to discuss is that they do not rely on the lattice cryptography that underlies ML-KEM at all, just on a nifty transform used by pretty much all KEMs in the PQC competition, so everything I write here applies just as much to BIKE, HQC, Classic McEliece, etc. The attacks would even have applied to SIKE, were it not horribly broken along the way. All you need to look at to analyze the binding properties are the exact details of the variant of the Fujisaki-Okamoto transform used by the different schemes, without bothering with the actual hard problem they are based on.

The point of the Fujisaki-Okamoto transform is to take an asymmetric encryption scheme that is only secure if everyone behaves nicely (IND-CPA) and turn it into an encapsulation scheme where the attacker is allowed to adaptively chose a ciphertext, learning the decapsulation at each step, and still not being able to break things. Similar to the Fiat-Shamir transform for signatures, it takes one scheme that is much easier to construct and reason about, and transforms it into one that is much more useful, using the power of hash functions. We will also need a PRF, a pseudo random function, also known as a keyed hash function, for the implicit rejection step.

So how does it work? As mentioned, we start with an asymmetric encryption scheme, i.e. three functions (CPA.KeyGen, CPA.Encrypt, CPA.Decrypt), and construct three functions (FO.KeyGen, FO.Encaps, FO.Decaps), based on those. For KeyGen, we don’t need to do much:

FO.KeyGen() {
  sk, pk = CPA.KeyGen()
  z = Rand(32)
  fo-sk = (sk, z)
  return fo-sk, pk
}

In other words, we just use the original KeyGen function, and only add some random numbers to the private key, which we will use later.

In Encaps, we can start to see the main trick of the transform, but it still looks fairly straightforward:

FO.Encaps(pk) {
  m = Rand(32)
  // H is a hash function that outputs
  // two 32 byte strings.
  k, r = H(m)
  // derandomized call of CPA.Encrypt
  ct = CPA.Encrypt(pk, m, r)
  return k, ct
}

Written out, we choose a random message, then use a hash function to turn this message into both our shared secret and some entropy, and then encrypt our original message using this entropy obtained from hashing the message itself. Note that you can stretch 32 bytes of randomness into more randomness as necessary for the encryption algorithm, using a deterministic random bit generator as needed.

With that, we now come to the main course, the decapsulation function, where we use all this setup:

FO.Decaps(sk, ct) {
  // decrypt as normal
  m = CPA.Decrypt(sk, ct)
  // hash the recovered message as in Encaps
  k, r = H(m)
  // Perform the same operation the
  // encapsulating party should have performed
  expected_ct = CPA.Encrypt(pk, m, r)
  if (ct == expected_ct) {
    return k
  } else {
    // The encapsulating party was dishonest.
    // Instead of releasing the potentially
    // tainted shared secret to the caller,
    // return random gibberish depending
    // on the ciphertext
    return PRF(sk.z, ct) 
  }
}

The encapsulating party, if they were honest, has included their entire state in the ciphertext, since we derandomized the encryption call based on the message itself. So the decapsulating party can rerun the encapsulation in its entirety, and if they detect any foul play, they discard the shared secret they recovered, and return something completely unrelated. This means, all the adaptive chosen ciphertext abilities of our attackers go nowhere. Sure they can manipulate the ciphertext, and carefully craft something that would leak information, usually whether a closely related ciphertext decrypts to the same or a different shared secret, but any modification of the ciphertext would just lead to a malformed ciphertext, which the decapsulating party then rejects.

We should briefly discuss why the decapsulating party, after discovering the manipulated ciphertext, not simply fails and instead does this dance with the PRF. The reason here is kind of strange, it just so happens that that construction is theoretically more secure, since it doesn’t leak the fact that we have detected the manipulation to the attacker. In practice however, this does not really matter, since you usually will want to use the output of a KEM in an AEAD, and the AEAD will very visibly fail if the key was not what was expected, so it’s a weird artifact of the specific incentives the KEM designers are under, not actually holistically the best choice.

Unbindable Kemmy Schmidt

Now, we can finally discuss what this is all about, and look into the misbinding attacks that Fujisaki-Okamoto transformed KEM schemes are and are not vulnerable to. We just need to correct a tiny oversimplification I made in the previous section, and look at the Fujisaki-Okamoto transform that is actually used for ML-KEM:

This is NIST’s definition of ML-KEM’s encapsulate method. You can see that it is almost the same as what I have written above, except that NIST has the random message m as an input instead of computing it directly, which does not matter, and that the hash function call that computes the shared secret and the randomness for the CPA encryption scheme also includes a hash of the public key, which will matter a lot. At first glance, this sounds like good news for resistance to misbinding attacks: We include the public key in the transform, so if someone were to change the public key, they would mess with the ciphertext, and the Fujisaki-Okamoto transform will catch them red handed. And indeed, that is true for honest or leaking attackers. The Fujisaki-Okamoto transform is incredibly robust and unintentionally also defended us against a whole class of misbinding attacks!

For the malicious attacker though, this very hash will become the Achilles heel of the implementation. And another peculiar incentive KEM designers were under is to blame for it. One way to judge a KEM is using microbenchmarks, i.e. doing lots of decapsulation calls in a row and checking how fast they are, independent of the larger context that they are used in. And if you want your microbenchmarks to look better, you’ll want to avoid unnecessary repetitions of cacheable operations. But you also don’t want to scare off people by caching everything and having a ridiculously sized private key, so you trade off what you cache and what you compute on the fly. In a realistic scenario, you would have a wire format for storing your key and a in-memory format for your expanded key with all cacheable elements precomputed, but in the competition microbenchmark scenario, you’ll want to keep things simple if you can, and use the same format for both. Long story short, what the Kyber team decided to do was to cache the hash of the public key in the private key. The hash is only 32 bytes large, so it doesn’t increase the size of the private key all that much, but it avoids a full hash function call on the kilo byte sized public key, so it makes an ideal thing to include in the private key. The private key, for your normal IND-CCA2 purposes, is considered authentic, so storing random cached values in there is not a security problem.

But for the misbinding property, we have this whole malicious attacker concept, who is allowed to chose the private key! And that’s exactly what they do.

So we have a ML-KEM private key, which, if honestly formed, contains the CPA private key, the public key, the hash of the public key h, and the rejection secret z. To attack, Kemmy prepares two private/public key pairs, sk1/pk1 and sk2/pk2. We want Alice to encapsulate, so we ignore sk1 and set pk1 aside for her. For Bob, who we want to decapsulate, we take all components of sk2, except for the public key hash h2, which we replace with the hash h1 of pk1 instead to obtain our manipulated sk2′. Kemmy then gives pk1 to Alice and lets her encapsulate a message, obtaining ct1 and the shared secret k. Since Kemmy knows the private key, she can recover the random message m that Alice used in the Fujisaki-Okamoto transform, and prepares a second ciphertext ct2 by almost following the rules:

Kemmy.Encaps(m, pk2, h1) {
  k, r = H(m || h1)
  ct = CPA.Encrypt(pk2, m, r)
  return k, ct
}

In other words, instead of using the hash of the actual public key when encapsulating, we hash the other public key instead. The ciphertext so obtained will differ from Alice’s ciphertext, since it was produced using pk2, so this attack will misbind both ciphertext and public key to our unsuspecting Bob. Speaking of whom, Bob’s decapsulate, using the manipulated private key sk2′ will go as follows:

FO.Decaps(sk2', ct) {
  // decrypt as normal, since we have
  // not manipulated the CPA private key
  m = CPA.Decrypt(sk2', ct)
  // Remember that sk2'.h equals to
  // the hash of pk1, so both Alice and Bob
  // arrive at the same k and r
  k, r = H(m || sk2'.h)
  // Same operation Kemmy did!
  expected_ct = CPA.Encrypt(pk2, m, r)
  if (ct == expected_ct) {
    // Since Bob repeated Kemmy's operations,
    // this case will be chosen
    return k
  } else {
    // Not executed
    return PRF(sk.z, ct) 
  }
}

So both Alice and Bob recovered the same shared secret, but they got so via completely different public keys and ciphertexts, because Kemmy messed with some internal cached value of the private key, showing that ML-KEM, when using expanded private keys, is neither MAL-BIND-K-CT nor MAL-BIND-K-PK.

But Kemmy’s exploits don’t stop here. There is a second, very similar attack that uses the other Fujisaki-Okamoto transform related parameter of the private key, the rejection secret z. This attack is even stranger, and only works if both Alice and Bob are decapsulating. We also need to use the same ciphertext for both of them, so we only aim for breaking MAL-BIND-K-PK. The again run this attack by manipulating the private keys. This time, we take the honestly generated private keys sk1 and sk2, and replace their rejection secrets with a shared one, i.e. sk1.z = sk2.z. For the ciphertext, we are not going to put in any effort, and just use some random string. A random string, with overwhelming probability, will fail the Fujisaki-Okamoto transform’s test, and move to the rejection case. So the public keys corresponding to Alice and Bob’s private keys do not matter at all, we discard any variable we’ve calculated with them, after all. What does matter though is the rejection secret z, which Kemmy replaced to be the same for both keys. Since it is the same value for z, and since we have given both Alice and Bob the same ciphertext ct, both of them will compute the same rejection “shared” secret, which is supposed to be gibberish, and not shared with anyone. But now, through our machinations, what was supposed to be a clever way to fail the decapsulation has actually become a shared value. So we found another way to break MAL-BIND-K-PK.

Mitigations

Both attacks rely on manipulating the private key in a way that no genuinely created private key would ever be. And it turns out, there is a simple trick to avoid any and all manipulations to the private key: Just use the seed that was used to generate the key pair as the private key. When you need the private key, you can just call KeyGen, and get the expanded private key from there. Somewhat unfortunately, the way the NIST standard is structured, the seed is (d, z), with z being the rejection secret of the second attack, so if we want to defend against the second attack as well, we would need to introduce another key derivation step that expands an even more fundamental seed g into both d and z, which, from what we can tell at this time, seems to not be FIPS compliant. But the second attack is also substantially weaker, requiring two decapsulating parties, so even just using (d, z) as private key should mitigate the worst of things.

The best part is that we can prove, without ever using any of the internals of the CPA scheme, that any KEM that uses the Fujisaki-Okamoto transform and stores the private key as a seed, has both MAL-BIND-K-PK and MAL-BIND-K-CT properties, and a fortiori also has LEAK-BIND-K-PK, LEAK-BIND-K-CT, HON-BIND-K-PK, HON-BIND-K-CT, since those only allow for weaker attackers. In other words, Kemmy is the worst thing that can happen to a Fujisaki-Okamoto transformed KEM, and the transform actually prevents most misbinding attacks, especially always prevents the much more powerful honest or leak attacks.

Conclusions

You might have noticed that I performed a small trick when introducing the misbinding attacks: The motivating example, invisible salamanders, does not actually apply to KEMs, but to AEADs. We know that invisible salamanders can be quite powerful in real world scenarios, but I completely omitted any motivating example where a malicious misbinding attack on a KEM actually matters. The reason for that is simply that I am not aware of any real life protocol, or even just hypothetical protocol, where a malicious misbinding attack on a public key encapsulation scheme actually matters. This is not to say that it does not exist, but we are talking about a scenario where the attacker does not only know the private key, but has intentionally manipulated it, and in the case of the more powerful attack, has intentionally manipulated it in a way that makes it impossible for an honest party to encapsulate to it. That is not to say that it is impossible for these attacks to matter, just that the attacker model here is incredibly strong.

As another point of comparison, we should also look at classical KEMs, i.e. RSA-KEM and ECIES-KEM, and check which binding properties those KEMs have.

For RSA-KEM, which is just “take a random number between 0 and n, hash it for the shared secret and raise it to e mod n for the ciphertext”, we can find a misbinding attack for an honest attacker: Take a random number smaller than both moduli, then the shared secret will be the same, since it does not depend on the modulus, but, when raised to the power e modulo the different moduli, will produce different ciphertext. So right of the bat, HON-BIND-K-CT and HON-BIND-K-PK are violated.

For ECIES-KEM, the situation is a bit more complex. ECIES-KEM uses a static-ephemeral key exchange to create a KEM, i.e. the public key is a static ECDH public key, the ciphertext is a ephemerally created, random ECDH public key, and the shared secret between those to public key is used to computed the shared secret of the KEM. In order to be IND-CCA2 secure, the shared secret needs to also depend on the ephemeral public key, so ECIES-KEM needs to, for the very least, hash the shared secret and the ephemeral public key together. The reason for that is that elliptic curve public keys are not uniquely encoded, and can in particular differ in the sign of the y coordinate without changing the shared secret of an ECDH key agreement. So if the shared secret did not depend on the ephemeral public key, an attacker could query the oracle for the ephemeral public key with the sign of the y coordinate switched, and would learn the shared secret of the challenge ciphertext. If we just want IND-CCA2 security though, it is not necessary to include the static public key in the hash that produces the shared secret, and could theoretically omit it. However, if we do so, an honest attacker could flip the sign of the public key, and produce a (extremely weak) HON-BIND-K-PK violation. If the version of ECIES-KEM used includes both static and ephemeral public keys in the key derivation function call for creating the shared secret, this problem goes away and ECIES-KEM becomes MAL-BIND-K-PK and MAL-BIND-K-CT. However, the fact that all classical KEMs exhibit some weakness when it comes to misbinding, and that those weaknesses show up in the far weaker attacker model, is probably a good sign that misbinding properties for KEMs are not the worst attack vector imaginable, and far weaker than invisible salamanders when it comes to real life protocols.

But then again, using the seed, especially in the case of ML-KEM where the key generation function is incredibly cheap, is a very straightforward fix to the problem, and by deciding to do so, we do not have to care whether or not misbinding attacks matter in practice, and as a neat side benefit drastically reduce the size of the private key, and it is really nice to see the IETF pretty unanimously agreeing to do so. It is always better to reduce the sharp edges in cryptographic libraries after all, and one attack automatically mitigated is one less attack to worry about.

Categories
Cryptography

PQC for non-cryptographers

Yesterday, Chandler asked about an overview of the new PQC algorithms, including hybrids, for non-cryptographers. And since I’m currently procrastinating writing something about TLS, I might as well write that overview first.

Unfortunately, I am a cryptographer, which means I easily forget that the average person probably only knows what AEADs, Key Agreements, and of course Digital Signatures do.

XKCD no. 2501 “Average Familiarity”

So, I will do my best, but please tell me where a bit more detail is needed.

Overview

NIST has standardized three new algorithms, going by the names ML-KEM (aka FIPS 203 aka Kyber), ML-DSA (aka FIPS 204 aka Dilithium) and SLH-DSA (aka FIPS 205 aka SPHINCS+). The first is a Key Encapsulation Method, the other two are digital signature schemes. One thing they all have in common is their comparatively much larger sizes for public keys, ciphertexts, and signatures.

Additionally, I will also discuss X-Wing, a hybrid Key Encapsulation Method combining both classical and PQ algorithms, which is currently making its way through IETF.

Key Encapsulation

What is a KEM

We start of this overview of PQC algorithms with a new primitive, key encapsulation methods (KEMs). Technically, KEMs aren’t new, it just turns out that, while they are arguably the superior way of dealing with asymmetric encryption, they haven’t been used in very explicit ways.

So what is a key encapsulation method? Basically it is a primitive with the following interface, in some fake programming language vaguely resembling C++:

// Non-deterministic method
pair<PrivateKey, PublicKey> GenerateKeyPair();

// Non-deterministic method
pair<ByteString, Ciphertext> Encapsulate(PublicKey pk);

// deterministic method
ByteString Decapsulate(PrivateKey sk, Ciphertext ct);

Here, you can see the close relation to asymmetric encryption, essentially a KEM is cryptographers saying that you shouldn’t actually decide what you want to encrypt, and instead that choice should be made for you. There is a good reasons for that: Asymmetric cryptography is far slower than symmetric cryptography, so really you should only ever encrypt a symmetric encryption key anyways, and a KEM just always does this for you. There is another advantage to always encrypting random numbers: It makes designing a cryptographic primitive far easier, because now the cryptographer doesn’t have to account for people encrypting “The eagle has landed” over and over again, and expecting that to be secure.

As an example, let’s look at RSA-KEM, a classical KEM that unfortunately never got the popularity it deserved:

pair<ByteString, Ciphertext> Encapsulate(PublicKey pk) {
  // Select a random number between zero and the modulus.
  BigInt msg = Random(0, pk.n);
  // Hash the random number to obtain the shared secret.
  ByteString shared_secret = Hash(msg);
  // Compute msg^e mod n.
  Ciphertext ct = ModPow(msg, pk.e, pk.n); 
  return make_pair(shared_secret, ct);
}

Pseudocode for RSA-KEM. This is an actually secure and standardized way of using RSA.

Basically, this is RSA as you learned it in high school/college, without all that weird optimal padding stuff that OAEP has to add just to allow you to encrypt your “arbitrary message” which usually is just random numbers anyhow.

So now that we know how to build a KEM out of RSA, let’s quickly also do so with elliptic curve, so we can accurately compare to PQC algorithms. Actually, it’s not even so much elliptic curves, but any Diffie-Hellman like key agreement scheme. So again, with fake C++ inspired pseudo code:

pair<ByteString, Ciphertext> Encapsulate(PublicKey pk) {
  PrivateKey eph_sk;
  PublicKey eph_pk;
  // Create an ephemeral key pair.
  std::tie(eph_sk, eph_pk) = GenerateKeyPair();
  ByteString shared_secret = KeyAgreement(eph_sk, pk);
  return make_pair(shared_secret, eph_pk);
}

Pseudocode for ECIES-KEM. This is not a secure KEM as is, but the details to make it secure are not interesting here.

Basically, to use ECDH as a KEM we create an ephemeral key pair, use the ephemeral public key as ciphertext and use the key agreement to create a shared secret. Again we can see how it simplifies the algorithm to not let the caller chose what is encrypted, here by using that we cannot actually control which secret is agreed upon, just that it is random.

KEMs do have one disadvantage over asymmetric encryption schemes: You can no longer encrypt the same message to multiple recipients, since each encapsulate call will create a new, random shared secret. There is a way to recover this ability, through using something called an mKEM, but that would go beyond the scope of this blog post and is not directly feasible with what NIST has standardized.

This section has gone on long enough without mentioning anything actually post-quantum, so it is high time that we change that, with the introduction of ML-KEM, née Kyber, into the mix. ML-KEM is, as the name implies, a KEM, and comes in three different sizes, called 512, 768, and 1024. Funnily enough, those numbers refer to some abstract mathematical thing (the lattice rank), and never show up in the code itself, so calling them 2, 3, and 4 (the module rank) would probably have been the better choice.

AlgorithmPublic Key SizeCiphertext SizeSpeedSecurity Scaling
RSA-2048-KEM~256 bytes256 bytesVery fast encaps, slow decapssubexponential*
ECIES-KEM (X25519)32 bytes32 bytesfast encaps and decapsexponential*
ML-KEM-7681184 bytes1088 bytesfast encaps and decapsexponential
X-Wing1216 bytes1120 bytesfast encaps and decapsexponential

Overview of different KEMs at the recommended (by me) parameter sets.
*: Unless a quantum computer is build, when it becomes polynomial

In this table, you can see an entry I haven’t mentioned so far, X-Wing. X-Wing is what is called a hybrid KEM, combining both X25519 and ML-KEM-768 into a single primitive. You can tell that by noticing that the public key and the ciphertext are just concatenated versions of the two schemes. The real magic of X-Wing happens in its key derivation function, where it combines the two shared secrets in such a way that the resulting primitive is a secure KEM as long as either X25519 or ML-KEM-768 are a secure KEM.

All in all, the extra cost of the X25519 key agreement is likely so small that you will want to go with X-Wing for most practical applications.

Security Levels

You might have noticed that I only put ML-KEM-768 in the table above, and that X-Wing does not have any parameter choice at all, only being defined for X25519 and ML-KEM-768 as constituent parts. The reason for that is a bit longer rant on how security levels are not the most useful tool to describe the security of cryptographic schemes. There are two main arguments for this: First, looking at RSA-2048, we get a security level of around 112 bits, and subexponential scaling means that we have to more than double the key size in order to double the security level. 112 bit is somewhat low, but for all we can currently predict, RSA-2048 will be broken at right around the same time as RSA-4096 will be, by a cryptographically relevant quantum computer. In other words, there isn’t really a good reason to use a larger RSA modulus, in terms of security strength.

On the other side, ML-KEM-512 is believed to have 128 bits of security, but the bounds there are quite tight and it is very possible that future improvements drop the security further. That has led to a sort of consensus among many cryptographers that it is best to use ML-KEM768 by default, with this consensus being reflected in the choices X-Wing made as well (full disclosure: I am not a X-Wing coauthor, but have been asked about my opinion on parameter choices by them).

Security and public key/ciphertext size scales mostly linear in ML-KEM’s security parameter, so ML-KEM-512 has a 800 byte/768 byte public key/ciphertext and ML-KEM-1024 has a 1568 byte public key and ciphertext. Performance scales quadratically in the security parameter, which makes using larger security levels a lot less painful than with RSA and elliptic curves, which scale worse than cubical. To finish this section on security levels, while there is mostly a consensus around ML-KEM-768 being the recommended choice, there is one important outlier here, with the NSA recommending ML-KEM-1024 for all clearance levels.

Implicit Rejection and Misbinding

There is a weird property that ML-KEM has, that warrants its own subsection, implicit rejection. This deals with the question of what happens when an adversary manipulates a ciphertext instead of creating a fresh one, or constructs an invalid ciphertext. What is weird here is that ML-KEM’s decapsulation method will detect such a manipulation, but instead of failing, it will return a “shared” secret that is only known to a party with access to the private key. While this behavior seems weird at first, it mimics what RSA-KEM and ECIES-KEM do involuntarily in this case as well, but it is a strange behavior that has some even stranger side effects, and really only exists because of the specific metrics that the NIST competition incentivized, and not because it is necessarily the best choice in practice. One of the even stranger side effects is on ML-KEM’s binding properties, a property related to the invisible salamander properties some AEADs exhibit. Those were first discussed in a paper with the excellent title “Keeping up with the KEMs”, and I also had to chime in on the topic, and, trying to keep with the pun, wrote “Unbindable Kemmy Schmidt”. Compared to invisible salamanders, these attacks are far harder to imagine being exploitable in a real world setting, but there is an easy and FIPS compliant way to deal with the more powerful attack, by using the 64 byte seed as the private key instead of the expanded version that ML-KEM also defines. This has led to libraries like BoringSSL making the seed the standard representation of the private key on the wire, which has the side benefit of being very small.

More on Performance

To conclude this section on KEMs, a few more words on performance. I already mentioned that performance scales quadratically in the security parameter, but, another thing worth noting is what is needed to make ML-KEM fast in practice. There are two main parts to ML-KEM that influence its performance: Decompressing a matrix using SHAKE (a SHA3 variant) and performing matrix-vector multiplication on this matrix. Different from both RSA and ECC, this is not done modulo a particularly large prime, with the prime used, 3329, comfortably fitting into a 16 bit integer, with 4 bits left to spare. Together, this means that to accelerate performance, you need to accelerate SHA3 and/or use vector instruction sets. If you want to get even more performance, the remaining somewhat costly part of ML-KEM is a NTT, i.e. a Fast Fourier Transform, so a vector unit that can speed those up will likely also help ML-KEM.

Another important note on performance is the caching behavior of ML-KEM. As mentioned, one of the most expensive steps in ML-KEM is the expansion of a seed into a matrix. This matrix is part of the public key, and does not change between encapsulate or decapsulate calls as long as the same public key is used, so it can be cached if memory is plentiful. On the wire, the matrix here is only 32 bytes in size (and the reason why the public key does not scale perfectly linearly), expanded it becomes k^2 \cdot 256 \cdot 2 byte in size, with k = 2, 3, 4 being the security parameter. So if you have 4.6 KB to spare, and you want to do multiple encapsulation or decapsulation operations with the same key, or created an ephemeral key that you will immediately use to decapsulate with a roundtrip later, keep the fully expanded key in memory between operations and don’t unmarshal from the wire format every time.

The performance of X-Wing is pretty much what you get when you have to run both X25519 and ML-KEM together. You can technically parallelize these calls right up to the key derivation step, but that will likely just result in race conditions and not have a major impact on performance, since both are decently fast, as far as asymmetric cryptography is concerned.

Digital Signature Schemes

With the detailed discussion of KEMs out of the way, we can get to digital signatures. On the primitive side, those are the same digital signatures you are probably familiar with, using the same interface as ECDSA or RSA signatures. For completeness sake, here is the interface:

// Non-deterministic method
pair<PrivateKey, PublicKey> GenerateKeyPair();

// Potentially non-deterministic method
Signature Sign(PrivateKey sk, ByteString msg);

// deterministic method
bool Verify(PublicKey pk, ByteString msg, Signature sig);

API for Digital Signatures, in some weird pseudo C++

This makes the signature schemes NIST standardized far more of a drop-in replacement for classical schemes. Here they are in a comparison table:

AlgorithmPublic Key SizeSignature SizeSpeedSecurity Scaling
RSA-2048-PKCS1~256 bytes256 bytesSlow signing
Very fast verifying
subexponential*
ECDSA32 bytes64 bytesFast signing and verifyingexponential*
ML-DSA-651952 bytes3309 bytesFast signing and verifyingexponential
SLH-DSA-SHA2-128s32 bytes7856 bytesSlow signing, decent verifyingexponential

Overview of different Digital Signature Schemes at the recommended (by me) parameter sets.
*: Unless a quantum computer is build, when it becomes polynomial

Again, we can see that the PQC algorithms, in particularly ML-DSA are comparable in speed, but have far larger overhead. At first, SLH-DSA’s public key size seems like an exception, but if you want a small signature public key, you can always just hash the public key and attach the unhashed public key to the signature, in which case ML-DSA would take the lead again. The signature size problem is in fact so large that it poses a significant problem for TLS, necessitating the blog post I’m procrastinating on.

ML-DSA again comes in three different flavors, ML-DSA-44, ML-DSA-65, and ML-DSA-87. The security parameter here is not a two digit number, but two single digit integers written without a space in between them. Since confusing lay people seems to have been an explicit security goal here, the authors went with the parameter that actually shows up in the code (aka the module ranks), instead of ML-KEM’s theoretical only lattice rank, so (4, 4), (6, 5), and (8, 7) correspond to ML-KEM’s 2, 3, and 4.

In order to compete on the confusion metric, SLH-DSA comes in way too many flavors. It mainly requires a hash function, with NIST allowing both SHA2 and SHA3/SHAKE as possible choices, in three different sizes each. There are also variants for faster signing at the cost of even larger signatures, and the prehashing thing they added for some reason which I’ll rant about below, totaling 24 different versions of SLH-DSA, of which maybe 2 are useful in practice as far as I can see.

Security Discussion

I again made the choice of parameters in the table for you, but in this case ML-DSA-44 is not quite as tightly scraping by the security level, making it somewhat more acceptable. The overhead again scales linearly in the security parameter, with ML-DSA-44 having 1312 byte public keys and 2420 byte signatures, and ML-DSA-87 having 2592 byte public keys and 4627 byte signatures. The NSA again requires ML-DSA-87 for all applications (and sadly dislikes SLH-DSA).

SLH-DSA’s security deserves some special discussion. You might have noticed that I recommended the smallest parameter choice here, and that is not just due to it otherwise being pretty much intolerable, with 16 KB to 50 KB signatures at the higher/faster settings. SLH-DSA has a proof that reduces its security to that of the underlying hash function, that is as long as the hash function is secure, SLH-DSA based on the hash function is secure. Since hash functions by now have been very well understood, we can be more aggressive with the parameter choice and use the same 128 bit security level we also use for our current elliptic curve based cryptography.

This fact also makes SLH-DSA uniquely suited as a replacement for use cases where the public key cannot be rotated easily, in particular signatures used to verify the integrity of firmware updates for secure boot. It also helps that all you need to implement SLH-DSA is an implementation of the hash function (which you then proceed to call an ungodly amount of times), which makes SLH-DSA uniquely well suited for hardware applications.

For all other use cases, ML-DSA wins on both overhead and performance, so anything that can be updated to account for the unlikely event of potential mathematical breakthroughs is probably better served using ML-DSA.

Performance Discussion

As mentioned, SLH-DSA uses hash function calls. A lot of hash function calls. Accelerating those is probably a good idea if you want to use SLH-DSA.

SLH-DSA has another curious security property: Its security depends on how many signatures have been created with the scheme. The NIST parameter sets are secure as long as “only” 2^{64} signatures are created. This restriction does not apply to ML-DSA, which provably does not degrade in security as more signatures are produced. This property allows for another potential trade-off, when it is known that less signatures are actually produced over the lifetime of a key, again something that makes it better suited for firmware signatures. In a way, this can be seen as a sliding scale of statefulness of the signature scheme, with stateful hash-based signatures on one end, requiring keeping exact state of the signatures that have been produced, to keeping a rough count of signatures that have been produced in the modified parameter sets, to SLH-DSA not requiring any such count, unless you can reach 2^{64} signatures per public key in your application (Not strictly speaking impossible, but even at Google’s scale very, very unlikely to happen in a practical design). NIST has promised to write a SP about the modified parameter sets, which will help with the security – performance tradeoff even further.

With this, we can move on to discuss ML-DSA. The algorithm used there is closely related to the one used in ML-KEM, so the discussion there applies here as well, SHA3, vector instruction sets, and NTT being the biggest things to accelerate. The prime used here, 8,380,417, is slightly larger, but still comfortably fits in a 32 bit integer. This larger prime also makes the cached matrix a lot larger, k \cdot l \cdot 256 \cdot 4 (notice the 4 instead of 2, to account for 32 bit integers instead of 16 bit integers), which means that ML-KEM-65 will need 30 KB of memory to cache its matrix when signing/verifying with the same key multiple times (16 KB for ML-DSA-44, 57 KB for ML-DSA-87).

Another quirk of ML-DSA, important for hardware implementations in particular, is that the raw signing procedure is not always successful, and has to be repeated until a signature can be obtained. This means the implementation technically requires a while loop, although one can alternatively also just calculate an upper bound at which the probability of not being able to find a signature becomes vanishingly small, which NIST has done and added as an allowed alternative to the standard.

Randomized Signing

Both ML-DSA and SLH-DSA can create signatures in a randomized or a deterministic fashion. Classically, this has been a discussion in the case of ECDSA, with the argument in favor of deterministic signatures being that a signer with insufficient entropy can still produce secure signatures. The argument in favor of randomized signatures is that they are more robust against certain type of fault injection attacks, but all in all, neither argument has been particularly convincing to me. Not having entropy available basically destroys all ability to do secure cryptography, so salvaging one part of one primitive is rarely going to make a difference. On the other hand, fault attacks are essentially asking whether an entirely different algorithm that just happens to share some instructions with the original is secure, which usually isn’t the case.

Thankfully, in the case of PQ signatures, we do not have to choose. The randomized variant in both cases devolves seamlessly into the deterministic variant if no entropy is available, so always using the randomized option is the more secure choice, and has no performance implications. If one knows that no entropy is available, the deterministic algorithm can be used to be more honest about the situation, but really randomized should be the default choice.

There are some exotic protocols that require signatures to be deterministic, so use the deterministic version in those cases as well, but you are unlikely to run into those by accident.

Hybrid Signatures

Similar to KEMs, signatures can also be hybridized. Where KEMs needed some borderline magical key derivation functions to achieve the “secure if either is secure” property, for signatures that can be done with the somewhat simpler “logical and” between the two verification calls. There are some pending RFC drafts for this as well, but the necessity here is not as pressing. On the ML-DSA side, if you are able to change your public key and given that there is no secrets to leak, just forgeries to avoid, we can switch the PQ scheme if we are aware of any weaknesses. For SLH-DSA, this gets even stronger, since all signature algorithms, including all classical signature algorithms, are only secure if the hash function used in the scheme is secure, so combining SLH-DSA with another scheme is not really useful.

That being said, a classical ECDSA/EdDSA or even RSA signature is so tiny in comparison that adding it to the ML-DSA signature is not going to change things much, so you might as well, it does protect you against mathematical advances that happen outside of the public view.

Prehashing, a Rant

I very briefly mentioned prehashing above, as a reason why SLH-DSA has another factor of 2 more variants. Prehashing technically also exists for ML-DSA, where it makes even less sense, but I’m getting ahead of myself.

As you might be able to tell, I don’t have the highest opinion of prehashing, but let’s start at the beginning:

In the case of RSA-PKCS1, RSA-PSS, and ECDSA, the very first operation the algorithm does is to hash the message down to a more manageable size. This can be used by HSMs and remote signing oracle by computing the hash outside of the HSM/locally and only transmitting the hash of the message to be signed. However, doing this has some implications for some properties beyond forgability resistance, such as the public key recovery I discussed in the previous blog post, and more importantly something called non-resignability, which NIST encouraged schemes to achieve. This led to both ML-DSA and SLH-DSA departing from the simple hash as a first step paradigm, and instead mixing the message into the scheme in a more complicated fashion. In the case of ML-DSA, this “more complicated” fashion is just a hash of the public key together with the message, which still can be computed outside of the HSM/remote signing oracle, but in the case of SLH-DSA even that is not possible.

FIPS 204 explicitly allows for ML-DSA prehashing the message by computing this first hash in a different place, and in my opinion that is the way prehashing should be accomplished there.

For SLH-DSA, this isn’t possible. In my opinion, that simply means that it should not be done, and that instead any protocol (where a protocol is not necessarily interactive) that requires the message size to be small should specify the hashing of the message on the protocol level, instead of trying to make a primitive, which does intentionally not support such an operation, to support it anyways. Unfortunately, this is exactly what NIST has done, technically for both SLH-DSA and ML-DSA, with ML-DSA also allowing for the actual prehashing as described above. What this means is that there are two versions for each signature scheme and parameter set, obtained by either prefixing 0x00 to the message or prefixing 0x01, a hash OID, and the hash of the message before feeding the so manipulated input into the underlying signature scheme, thus essentially defining a protocol on the primitive level. You should not use the same key for both prehashed and not-prehashed messages, mostly due to the primitive otherwise not being a well-defined signature scheme anymore (i.e. the API defined above would have to change to allow another input when signing/verifying), and if you fix whether a key is always used prehashed or always used non-prehashed, you just defined a protocol, and adding this step to the primitive was unnecessary.

Anyways, it’s a thing the standard allows for, but ideally we should all just collectively ignore that this option exists and always use the non-prehashed variants (with exception of the true prehashing of ML-DSA), and just define our protocols to correctly work with our use cases.