Categories
Cryptography Cryptography (Incomprehensible)

There is no Diffie-Hellman but Elliptic Curve Diffie-Hellman

When I first learned about Diffie-Hellman and especially elliptic curve Diffie-Hellman, I had one rather obvious question: Why elliptic curves? Why use this strange group that seems rather arbitrary, with its third intersection of a line and then reflected? Why not use, say, the Monster Group? Surely a monster is better equipped to guard your secrets than some curve thing named after, but legally distinct from, a completely different curve thing!

I wouldn’t have a satisfying answer to this question until way into my graduate studies, and the answer makes a perfect blog post for the “Cryptography (Incomprehensible)” category of this blog, so here we go.

Groups

First, as a recap, what do we mean by Diffie-Hellman? Well, we need a group G, and element of this group g \in G with some order p, so p is the smallest positive integer with g^p = e where e \in G is the neutral element of the group. The we take our private key s \in \mathbb{Z}/p\mathbb{Z} and compute our public key as p = g^s. We can now compute a shared secret with some other public key q = g^r as K = q^s = g^{rs} = g^{sr} = p^r.

This should work for any group G, no? So why doesn’t it work for the monster? Let’s take a closer look at the map that computes public keys from private keys (in fact, the rest of this blog will only focus on that map, the whole key exchange thing doesn’t really matter much, you could also compute Schnorr signatures or whatever). Given any group element g \in G, we can always create the exponentiation map \mathbb{Z} \to G\colon k \mapsto g^k. This map is a group homomorphism and is not necessarily injective. In fact, for finite groups, it is never injective. There is an old trick in algebra, where you can make any group homomorphism injective, by dividing the domain by its kernel. The kernel in question here is the ideal (p) of all multiples of our order p, so setting our key space to \mathbb{Z}/p\mathbb{Z} is a sensible first step with no drawbacks, you don’t really get new keys by allowing all integers. The map is also not necessarily surjective, though. In fact, image of the map is exactly \langle g \rangle, the subgroup of G consisting of all multiples of the generator g (This is why we call g the generator, it generates the whole thing).

So we can somewhat simplify our private to public map by looking at \mathbb{Z}/p\mathbb{Z} \to \langle g \rangle \colon k \mapsto g^k. With that our private to public key map is not just a group homomorphism, but also injective and surjective, in other words, a group isomorphism.

A tangent on finite simple groups

This gives us the first answer to “why not the monster group?”, although it is an answer that leads to even more questions. To get to that answer we have to ask an even deeper question, namely, what groups even are there? We want to work on a computer, so finite would be nice, and we have heard of small subgroup attacks, so something without normal subgroups, please. (If you don’t know what a normal subgroup is, it’s things you can divide out of a group). A finite simple group, in other words.

One of the most impressive results in 20th century mathematics has to be the classification of finite simple groups. The (not quite) ultimate answer to the question of “what groups even are there?”

To classify groups, we need to figure out what we mean by two groups being the same. Clearly, the numbers on a clock and \mathbb{Z}/12\mathbb{Z} are the same thing, even if the clock uses Roman numerals. So simple relabeling of group elements should not count as a separate thing. In other words, we want to classify finite simple groups up to isomorphism.

The answer to this classification problem is absolutely mind boggeling: First, there are infinite families of groups. For example C_p, the cyclic groups with prime order. Another such family are the alternating groups, from A_5 onwards. There are 16 more such infinite families, the others being said to be of “Lie type”. And then there are 26 groups that just, sort, of hang out by themselves. They are not part of an infinite family, they just exist, and there is no way to make them bigger or anything. The monster group mentioned above is the largest of them, and I totally did not chose it in the intro because it has a cool name and let’s me go on a tangent about sporadic groups.

But with this classification out of the way, we can now spot a problem (we could have spotted the problem before stating the classification result, but come on, it’s such a cool result!). Namely that, in order to even answer the question “what groups are there”, we had to say that we want to look at groups up to isomorphism. But our private to public map is a group isomorphism! To a group theorist, the idea that \mathbb{Z}/p\mathbb{Z} somehow differs from \langle g \rangle is preposterous, they both clearly are just C_p the cyclic group of order p!

Categories

So, uhm, we kinda – accidentally – by asking the innocent question why we use elliptic curves instead of any other group for Diffie-Hellman, have blown a wide hole in the entire concept of Diffie-Hellman. If the private key space and the public key space are one and the same, then how does Diffie-Hellman even exist? How can it be secure, if we cannot even distinguish private keys from public keys?

Well, we are already talking to the group theorists, and are having an existential crisis of sorts, so why not just give in and let them pull us deeper, into the realm of the category theorists? Things can’t possibly get worse, can they?

A category is a class of objects, such that for any two objects X, Y there is a class Hom(X, Y) of homomorphisms between X and Y. Homomorphisms are just that, elements of this class. We are generous enough to require there to be an identity homomorphism in Hom(X, X) and a composition operations, where the existence of f \in Hom(X, Y) and g \in Hom(Y, Z) gives you a homomorphism h = g \circ f, but that’s about it. In particular, we do not even necessarily require our objects to be sets. We definitely do not require our homomorphisms to be functions. In fact, just so that nobody gets any stupid ideas, when we don’t call them homomorphisms, we call them arrows. A category is really more like a directed graph. A graph that has so many vertices that they do not necessarily fit in a set any more, and where two vertices might have so many edges between them that they don’t fit either. This is why we said class above, a class is a collection of things that is potentially larger than any set. For example, there is a class of all sets, but there cannot be a set of all sets, because nobody knows how to shave them or something. In fact, sets, written as SET is an example of a category. The objects are sets, and homomorphisms are the functions between those sets.

In order to not fully drift off into space, we will only look at locally small categories. Locally small categories are categories where the homomorphisms actually do form a set. Which makes life a whole lot simpler, and pretty much encompasses everything we want to know about anyways.

So aside from SET, what are examples of categories? Why does this kinda thin definition turn out to be so useful that is sprung force an entire field of mathematics, with so much abstract nonsense? Well, it turns out, pretty much every thing can be seen as a category. Graphs? Category. Groups? Category. Rings? Category. Fields? Category. Topological spaces? Category. Analytic varieties? Category. Algebraic varieties? You might have guessed it, also a category. Vector spaces? Category. Modules? Category.

It turns out, if all you require from things is that they somehow have some homomorphisms between them that vaguely act like functions, you can pull in pretty much every mathematical object known to humanity. But is it useful? We didn’t even define our objects to be sets, yet all the examples I gave above are sets. Surely something more concrete than an absurdly infinite directed graph would be better? It turns out, categories are immensely useful. You do have to redefine terms in somewhat awkward ways, but once you have done that you can make statements that are true for fields, rings, groups, vector spaces, topological space, etc all at once. For example, we no longer can define a kernel as all the elements mapping to zero, because remember, objects aren’t necessarily sets, so they do not necessarily have elements, so instead we have to say fancy words like “the kernel of a homomorphism f in a locally small category with zero homomorphisms is a homomorphism k that when composed with f is zero, and is universal in that property”. We won’t quite need that full apparatus today (In particular, we won’t need universal properties), but we do need the slightly simpler concept of initial and terminal objects. An initial object I \in \mathcal{C} is an object such that for any X \in \mathcal{C}, Hom(I, X) has exactly one element. A terminal object T \in \mathcal{C} is the dual object, i.e. for every X \in \mathcal{C} we have exactly one element in Hom(X, T). If an object is both initial and terminal we call this object a zero object, and the unique homomorphism it has for a given X the zero homomorphism. Examples are important for this, so let’s first look at SET. The initial object here is the empty set, since there is a unique function (the empty function) from the empty set to any given set. The empty set however is not the terminal object, as there are no functions into the empty set from a non-empty set. Instead, the singleton set with exactly one element takes the place of the terminal object. When we look at GROUP, we see an example of a category with a zero object, the trivial group consisting of only the neutral element. There is exactly one group homomorphism from the trivial group to any given group, since group homomorphisms have to map neutral elements to neutral elements, and there is exactly one group homomorphism from any given group to the trivial group, given by mapping everything to the neutral element. To tie off our examples, a slightly more interesting case is the category RING of (commutative) rings. Our terminal object is again the trivial ring consisting only of the element 0, but it cannot be the initial object, since a ring homomorphism needs to map 1 to 1, and the trivial ring, having only one element, cannot map that element to both 0 and 1. Instead, the initial object of the category of rings are the integers \mathbb{Z}. The most generic cyclic group.

Group Objects

What where we talking about again, oh right Diffie-Hellman. So with the language of category theory under our belt, we can now point exactly to why “any group should works” is not working. As long as we are only looking at groups, we have to look at groups up to isomorphism (otherwise relabeling things would be a different thing), but our private to public key map is a group isomorphism. We are not looking at the right category. Diffie-Hellman does not work with groups.

We have to go deeper.

Enter group objects. I love this topic. So much in fact that I’ve written an entire 90 page rambling novel about it once and then people said I am now allowed to call myself a doctor.

The relevant part is in chapter 3, wedged in there between copious amounts of abstract nonsense.

To go through it in a bit more detail, we have some category \mathcal{C} that we use as our base. We want to make some object G in that category into a group. But remember, this is category theory, so G is not necessarily a set. I mean, it will be a set in any practical situation, but we can pretend that we don’t know that, for some really, really cool math. And you should always pretend that you don’t know something when it leads to some really, really cool math. So the only thing we have is homomorphisms. In the cropped away parts of the chapter, we did narrow things down to locally small categories that have a terminal object, and named that terminal object 0. So, if we don’t have elements, how can we describe the group axioms? Well the first one, associativity, is about multiplication. Multiplication is a map, mapping two elements of the group to one element, so we can just say that we require that map to be a homomorphism in Hom(G \times G, G) (Do not ask what G\times G is if G is not a set. It’s something about universal properties, and I promised there would be no universal properties). To define associativity, we need that (G \times G) \times G is the same as G \times (G \times G). This is due to the universal property thing that we do not talk about, but it gives us this canonical isomorphism, i.e. those two constructions are the same, and they are the same in some very obvious way. Now associativity is just saying that the multiplication morphism, applied twice in one order, is the same as the multiplication morphism, applied twice, in some other order, what you see in the screenshot as the first property. Similarly, commutativity is also just a statement on the multiplication morphism, with yet another canonical morphism, explaining the fourth, optional property.

Way more interesting and cooler is the second property, which defines the neutral element. So how does one define a neutral element, when you try to pretend very hard that you do not have any elements? This is where our terminal object comes in. We designate an element of e \in Hom(0, G) and call it the neutral morphism. Note that 0 is the terminal object, not the initial object, so e is not necessarily unique. In fact, if we take the category of sets as our base (which will give us back the groups we know and love), the terminal object, as discussed above, is the singleton set. A map from the singleton set into a set can be seen as pointing at one specific element in our target set. To put differently, an element of Hom(0, G) in the category of sets is just an element of G. Now we need to describe what makes the neutral element, well, neutral. It should interact with our multiplication morphism in some way. We now perform the most miraculous part of our definition. We want to define the neutral property, i.e. that x \cdot e = x, but without mentioning x, and instead phrase it as a series of maps. Let’s stay with x for a bit, though, to come up with a battle plan. Since we want to talk about x \cdot e, we want to make a statement about m(x, e), using the prefix notation instead of the more familiar infix. So we need to construct (x, e), ideally starting from x, so we can have a nice equality in the end. So starting from a single element, we want two. There is a way to do this, by using the diagonal morphism G \to G\times G \colon x \mapsto (x, x). That’s not quite (x, e), but it at least has two components. The first component we can leave unchanged, in other words, we apply the identity to it. But how do we get rid of the second component? Well, we can map it to our terminal object 0. There is exactly one way to do that. Now that we are in 0, we can apply our neutral morphism e to get back to G. If we write 0 for the only element in our terminal set 0, and write e for the neutral element in our group, what we can do is a chain of maps like this: G \to G \times G \to G \times 0 \to G \times G \colon x \mapsto (x, x) \mapsto (x, 0) \mapsto (x, e). Now we can apply m and expect to end up at the same place. In other words m \circ (id_G \times (e \circ 0)) = id_G. And suddenly, we have described what a neutral element does, without having, you know, elements.

The last property, about the inverse, uses a similar construction, this time with another morphism that maps from the group object to itself, to find the inverse. I leave it to the reader to go through why it has to look this exact way.

Finding our category

And with that, we now finally have the tools to actually answer our original question. What is the thing that we map into and what options do we have for that? We have established that it is not a group, as groups would not be able to distinguish private and public key. We need some additional structure, something that tells us how we can operate on the elements of our thing, without saying that the g \cdot g = g^2 and nothing else. We still want something that behaves like a group, though, since the whole stuff with agreeing on keys and stuff fundamentally relied on there being some group things going on. We need a group object. But what is our category \mathcal{C} we use as a base? It can’t be SET, since that would lead straight back to groups. We probably want something with finite objects, because computers aren’t very good at dealing with things that don’t fit into their little memories and disks and registers and stuff. We do want to be able to compute at the very least the multiplication morphism m. We will need a tiny amount of handwaving, but I think it is not too hard to see that taking tuples of elements of finite fields as a base might be a good idea. Computations on these tuples would involve addition, multiplication, typical field stuff. Adding and multiplying elements of a field? It kind of looks like we want our morphisms to be polynomials! And, as it turns out, there is a category just like that, the category of algebraic varieties. The category of some tuples of finite field elements, where you map stuff around by applying polynomials. So our public key space should be the group objects in the category of algebraic varieties, usually called algebraic groups.

We have classified those. There are two main types, the linear groups and the abelian varieties. You can multiply them together, too but we are really interested in simple groups, so we want them in their most basic forms. First, the linear groups. They come in two flavors, the additive group \mathbb{G}_a and the multiplicative group \mathbb{G}_m. The additive group would work for Diffie-Hellman, if it wasn’t for this very efficient algorithm that can solve the discrete log problem in this group, named “division”. The multiplicative group we set aside and get to in the next chapter. That leaves us with abelian varieties. Varieties have an important invariant, called the dimension. So what are the abelian varieties of dimension 1? Elliptic curves. And they work great. We could go to higher dimensions (and in fact people have explored this), but it turns out you don’t really get anything but headaches from that. They result in secure constructions, but point counting is hard, and you get these “special divisors” which make constant time programming a nightmare, and get nothing in return. So that is why elliptic curves, even though at first glance looking like a quite arbitrary and odd choice for a “group” are actually pretty much the only choice there is.

What about finite field Diffie-Hellman?

This only leaves the multiplicative group. It showed up in our classification, and you probably know that it has been used in cryptography (under the name finite field Diffie-Hellman), so shouldn’t the title of this blog post be “There is no Diffie-Hellman but Elliptic Curve Diffie-Hellman and also Finite Field Diffie-Hellman”?

I mean yeah, but it would be an awful title, way less clickbaity, so let’s end this blog post by showing that Finite Field Diffie-Hellman is also just Elliptic Curve Diffie-Hellman. Well “Elliptic Curve” Diffie-Hellman, but I’m getting ahead of myself.

We’ve classified a lot of objects in this blog post, so what about one more? How can we describe all elliptic curves (up to isomorphism)?

To do this, we need the so called j invariant, defined as j = 1728 \frac{4a^3}{4a^3 - 27b^2}, for an elliptic curve given by Y^2 = X^3 + aX + b. Two elliptic curves are isomorphic (over the algebraic closure) if and only if they have the same j invariant.

How many j invariants are there? In other words, does every element of the base field show up? It turns out, yes. Given an element j, if j = 0, we can use the curve Y^2 = X^3 + 1. For j = 1728, we can use the curve Y^2 = X^3 + X. For all other cases, we can just use Y^2 = X^3 + \frac{4(1728 - j)}{27j}X + \left(\frac{4(1728 - j)}{27j}\right)^2. You can check that this will have the right j invariant.

In other words, two elliptic curves are isomorphic (over the algebraic closure) if they share the same j invariant, and there is one j invariant for every element of our base field. So we can think of our elliptic curves as points on the affine line, and somehow, the collection of all elliptic curves, themselves behaves like an algebraic variety. (This is an extremely powerful concept called the moduli space). Affine varieties always feel like they are incomplete, you really want to see if there are any points of infinity you could add to them. The affine line in particular really wants to be a projective line, and add its one point at infinity it is due. And it turns out, we can do this in a sensible way (constructing the moduli of stable curves of genus 1). This point at infinity? It belongs to the curve Y^2 = X^3 - X^2. It kind of looks like an elliptic curve at first glance. Sure it’s not in Weierstraß form, but you can rectify that and give an isomorphic curve that has that form (I’m too lazy to do that right now). However, this is not an elliptic curve. In order for a curve to be elliptic, it needs to not have any singularities, and this curve has, …, well, here is a plot of it:

When a curve intersects itself in this awkward way, that’s a singularity. But other than that, it is perfectly fine. In fact, you can even use the very same formulas as you use for elliptic curves to do point addition and everything, just stay away from that singularity at (0, 0). And when you do that, you notice something strange. You can simplify the formula. It’s a bit on the lengthier side of algebra when doing it by hand the long way around, but very much doable. And when you simplify it, you notice that the points on the nodal curve (save for the singularity) can be described by a single non-zero field element, and that adding two points is the same as multiplying the corresponding field elements. In other words, this curve, the somewhat awkward reject elliptic curve, gives us the multiplicative group. Finite field Diffie-Hellman is also just Elliptic Curve Diffie-Hellman, just over a curve that accidentally twisted itself a bit. And we found that curve naturally, trying to hang out with its elliptic curve friends on the same moduli space (as opposed to the cuspial curve Y^2 = X^3, with its way nastier singularity, which happens to give you the additive group instead, and does not hang out on this moduli space).

And so, far from being arbitrary, there was never an actual choice for Diffie-Hellman. There is only Elliptic Curves.

Categories
Cryptography Cryptography (Incomprehensible)

Unbalanced Oil and Vinegar, Part I

Introduction

While there are many schemes discussed in the currently ongoing second onramp for PQC signatures, Unbalanced Oil and Vinegar (UOV) is both one of the most serious contenders and also an extremely curious scheme. In this three part blog series, we will take a closer look.

In a previous blog post, I looked at how to recover an UOV public key from loads of valid signatures. That blog post very intentionally left out a lot of details about the actual scheme, like why it is believed to be secure, or how the private key actually works.

It belongs to a larger family of multivariate schemes, which is most famous for having most of their members broken, with paper titles ranging from the rather dry “Cryptanalysis of the oil and vinegar signature scheme” to the very memeable “Breaking Rainbow Takes a Weekend on a Laptop“. Interestingly enough, for a field with this many broken schemes, Unbalanced Oil and Vinegar has so far stood the test of time. In this first part, we will try to understand why the underlying hard problem, multivariate quadratic equations, is hard.

What solves polynomial equations

If you want to cut this blog post short, you can just read the last section, about why the problem is NP-complete. It is independent from the rest of the blog post, and I think slightly easier to understand. Doing so however, misses out on building some deeper understanding of what it means to solve systems of polynomial equations.

If you attended any form of primary eduction, you have probably learned the formula for “solving” quadratic equations, ax^2 + bx + c = 0 is \frac{-b\pm\sqrt{b^2-4ac}}{2a}. In fact, in German high school, we called this formula the “midnight formula” on account that you should be able to recite it without hesitation if your math teacher wakes you up at midnight and asks you how to solve quadratic equations. This approach is called “solving by radicals”. It solves a polynomial equation by expressing the roots of the polynomial with field operations and a special function \sqrt[n]{a}, which returns the single positive real solution of X^n - a = 0.

However, a problem soon starts occurring with this idea, without even making our polynomials multivariate (i.e. univariate, i.e. with a single variable): It does not work. While there are formulas for degree three and degree four univariate polynomial equations, with a thrilling history involving lots of drama and zero-knowledge proofs, there is, for a history equally steeped in very different drama, but without zero-knowledge proofs, no formula for quintic polynomials. As in, there is provably no way of solving a quintic polynomial equation with just the use of field operations and radicals over the rational numbers. (There also isn’t a way to do this over finite field, but for entirely different reasons, funnily also relating back to the same Galois guy).

If you take a step back, this is not super surprising. After all, the way I presented it above, all a radical means is assuming that you already have a way to solve an entirely different polynomial equation, and then making it weird by introducing such ill-defined concepts like the real numbers (yuck).

So what do we (as in algebraists) mean by solving a (univariate) polynomial equation? Well first we need to distinguish between reducible and irreducible polynomials. If the polynomial is reducible, than factoring it will give you new polynomials, and each solution of any factor is a solution of the original polynomial. If the polynomial is irreducible however, we just, sorta, do nothing, and just assume that x is a solution, and try to express things with respect to that solution.

For UOV in particular, we are concerned with rational solutions, i.e. solutions that themselves are part of the base field. Thankfully, despite all of this rambling above, those are always well defined, even if you don’t have a formula for it: They are just the roots of the linear factors of our polynomial. When it comes to finding rational solutions, the good news do not stop here: Factoring polynomials (both over finite fields and over the rationals, and also over all sorts of other fields like \mathbb{Q}_p((\zeta))) is a problem of polynomial complexity, so there are efficient algorithms for computing the factorization, and by extend finding the rational solutions, of any univariate polynomial. And if you are a German high school student, after reading this section, you are officially allowed to answer with “it’s complicated”, roll over, and get a good night’s rest when your teacher wakes you in the middle of the night.

Gröbner bases

So far for solving univariate polynomial equations. But, as the name of the field “multivariate cryptography” suggests, what we really are interested in are multivariate polynomials, i.e. polynomials with multiple variables. We also need to solve systems of equations, i.e. multiple equations simultaneously. Here we first encounter something really weird about multivariate polynomials, by looking at multivariate monomials.

While univariate monomials, i.e. expressions of the form X^n are a total well-order, i.e. all monomials compare with each other through divisibility, and there is a smallest monomial (1), the same is not true for multivariate monomials: We still get a natural partial order via divisibility (e.g. X^3Y^5 is divisible by and ergo bigger than X^2Y^3), not all monomials compare anymore, (e.g. X^3Y^5 is neither divisible by nor divides X^2Y^6). This quickly becomes a problem, for example division with remainder is no longer well-defined.

We can, however, fix this by extending our existing partial order (which, as an aside, is a lattice, which has nothing to do with lattices of lattice cryptography, which in turn for ML-KEM and friends are an order, which has nothing to do with the comparison thingy called order. Mathematicians are very bad at naming things.) to a total order. The easiest way to extend the order is by using the lexicographical order, where we first compare the degrees of the first variable, and if equal, compare the second variable, and so on, until we find differing degrees, which then decide which monomial is bigger (e.g. X^3Y^5 is bigger than X^2Y^6, but smaller than X^3Y^6). This is not the only possible order, and the choice of order will become very important later.

With the order chosen, we can now go ahead and define division with remainder again, by taking the leading monomial of the divisor, checking whether it divides the leading monomial of the dividend, and if so, subtracting the appropriately multiple of the divisor from the dividend. We can do this repeatedly with a whole set of divisors, to end up with a remainder, but we will notice that the order of these divisors matters, and that in general our division with remainder algorithm lost a lot of the nice properties that it had in the univariate case.

Enter Gröbner bases. These are special sets of polynomials, for which division with remainder regains at least some of the useful properties they had in the univariate case. In particular, when using a Gröbner basis, the remainder is a canonical representative of the ring of all polynomials modulo the ideal generated by the set of polynomials. Why do we care about this? It turns out that there is a rather intimate relationship between the modulo operation, and the set where all polynomials simultaneously vanish, i.e. the solution of our system of multivariate polynomials, called the Nullstellensatz. You just know how important it is because we didn’t translate it from Hilbert’s original German. The details here go far beyond this blog post (as you might have noticed when I started talking about generating ideals etc, without warning the reader that they’ll need commutative algebra to proceed), and are thankfully not necessary to understand the basics.

For us, interested in particular rational solutions though, there is some value in looking more deeply into how a Gröbner basis for the lexicographical order looks like:

Given a system of polynomial equations (of dimension 0, don’t worry about that, though, it’s a technicality), the Gröbner basis with respect to the lexicographical order is an equivalent set of polynomial equations (i.e. each solution of one is a solution of the other), that specifically has the form (f_1, f_2, \dots, f_n), with f_n being univariate in X_m, f_{n-1} being bivariate in X_m, X_{m-1}, etc.

In particular, given such a Gröbner basis, we can find the rational solution by solving the last polynomial (which is univariate), taking each zero, partially evaluating the second to last polynomial obtaining another univariate polynomial, solving that, and so on. Note that this algorithm by itself might already be exponential: Nothing guarantees us that any given zero will end up with a valid solution after having factored tons of univariate polynomials, but in practice that is not usually the problem.

Finding Gröbner bases

What is the problem though, is finding the Gröbner basis to begin with. In fact, this problem is so complicated and interesting, that I decided to write an entire blog post about it, in case you haven’t noticed that I have not really talked about UOV so far, and won’t really mention the scheme other than as a motivating factor for all this until part 2.

So let’s finally look at the algorithm to find Gröbner bases, Buchbergers’s algorithm (there are slight improvements on this algorithm called F4 and F5, but they don’t drastically change things). The algorithm is simple enough to write it out in its entirety here:

buchberger(set of polynomials S) ->
list of polynomials S'
  for every pair (p1, p2) in S:
    let m1 be the leading monomial of p1
    let m2 be the leading monomial of p2
    let m be the lcm of m1 and m2
    compute p = m/m1 p1 - m/m2 p2
    reduce p by S with quotient and remainder
    if p != 0, add p to S
  reduce every element of S by S without it
  return S if the loop adds no new polynomials

This algorithm is a beauty! Notice that, the way it is written, it is not even clear that it terminates, much less what complexity it has. The “simplest” argument I know as to why it terminates goes something like this: If we look at the ideal generated by the leading monomials of S, then the element p we add in each loop iteration, will either have a new leading monomial (otherwise it would have been reduced), not yet part of this ideal, or it will reduce to 0 and not be added. So for each loop iteration, this ideal of leading monomials strictly grows. But polynomials rings are Noetherian, and as such, every strictly ascending chain of ideals will eventually terminate. At that point, the algorithm returns. While this proves that Buchberger’s algorithm always terminates, it does so by invoking one of the strangest properties of commutative algebra, and it does so without giving a constructive proof which could be used to determine the runtime complexity. One can compute the runtime complexity, by doing things more carefully, and arrives at the wonderful terms like O(d^{2^n}) for the degree of the polynomials in the eventual basis, so yeah, it might be a while, and even just writing down the basis is not trivially possible. In fact, in practice it is the space complexity of Buchberger’s algorithm which destroys all hope of ever getting a solution far before the time complexity becomes relevant.

And yet, algebraic geometers are so desperate for Gröbner bases, that they will continue running this algorithm for their current pet problem, based on nothing more than just hope. And it turns out, that hope is not always misplaced. I mentioned monomial orders before, and it turns out that Gröbner basis computation very sensitively depends on which monomial order one choses. A problem which will just run out of memory in one order might finish in seconds in another. You can also vary when you reduce which polynomial by which set, and come up with all sorts of fun distributed computation problems that will greatly influence the runtime of this algorithm. And what is nice is that once you have one Gröbner basis, you can quite efficiently compute other Gröbner bases as well, so nothing stops us from first computing the Gröbner basis in the more efficient graded lexicographical order, and then switching to the desired lexicographical order for cheap.

And this is the biggest weakness of multivariate cryptography. Where lattice cryptography has its nice worst case to average case reduction, multivariate cryptography has a very hard worst case problem, but a very finicky to control average case. So finicky in fact that often authors will just state how much RAM they spend on trying to find a Gröbner basis, since there is no better way of estimating runtime.

Multi-variate quadratic equations solving is NP-complete

Now to the last, and as promised independent section of this blog post, showing that solving multivariate quadratic equations is NP complete. We do this by reducing the problem to SAT, boolean satisfiability. For conciseness, we only do this for multivariate polynomials over \mathbb{F}_2, the field with 2 elements, but showing it for every other field is not substantially different.

We do this, by looking at one multivariate polynomial in particular: f(X, Y) = XY + 1.

Since our field has only two elements, we can quickly check what f evaluates to: f(0, 0) = 1, f(0, 1) = 1, f(1, 0) = 1, f(1, 1) = 0. In other words, if we associate 0 with false and 1 with true, f is equal to NAND. Famously every boolean circuit can be expressed just by using NAND gates, and wiring up gates is the same as chaining function calls, which in turn is the same as substituting polynomials. This means every boolean expression equals one multivariate polynomial, with the degree being exponentially bounded by the circuit depth. In order to get back to a quadratic system of polynomials, we now introduce a new helper variable Y for each variable X that has a degree higher than a square, by adding the polynomial X^2 - Y to our system of polynomials. This allows us to halve the degree of the polynomial again, and taken together, we get a polynomial bounded system of quadratic equations (and if we do both steps together, we never explode the degree of the polynomials in illegitimate ways). This reduces SAT to MVQ. On the other hand, given a possible solution, verifying it can be done simply by evaluating quadratic polynomials, an operation that itself is polynomial in time, so MVQ is in NP, making it NP-complete.

But remember, this only goes for generic systems of multivariate quadratic equations, and is no proof that UOV, which we will discuss in the next part, is even worst case hard, much less does it make statements about the average case complexity.

Categories
Cryptography Cryptography (Incomprehensible)

Reconstructing public keys from signatures

One weird hobby of mine is reasonable properties of cryptographic schemes that nobody promised they do or don’t have. Whether that’s invisible salamanders or binding through shared secrets, anything that isn’t just boring IND-CCA2 or existential unforgeability is just delightful material to construct vulnerabilities with.

Normally, with a signature scheme, you have the public key and want to know whether a given signature is valid. But what if we instead have a message and a signature, assume the signature is valid, and want to know which public key signed it? A rather delightful property if you want to attack anonymity in some proposed “everybody just uses cryptographic signatures for everything” scheme.

Chapter 1: ECDSA

This one is so famous it even has it’s own Wikipedia section. In short, yes, it is very much possible to recover a public key from a single valid signature.

An ECDSA signature are two integers (r, s) such that sR = zG + rP_A, where R is one of the two points with x coordinate r. To reconstruct the public key, we can just solve for P_A and get P_A = r^{-1}sR - r^{-1}zG, where we compute r^{-1} as the inverse of r modulo the order of the elliptic curve.

Public key recovered.

Chapter 2: RSA

Here, it gets a bit tricker. Instead of a single signature, we will need two, s_1 and s_2, with corresponding messages m_1 and m_2. RSA usually will hash a message and then pad it in some way, but we don’t really care about that part here, and assume that m is already this hashed and padded message. The only important property is that in order to be valid, a RSA signature has to fulfill s^e = m \mod n. To achieve this property, the signer has calculated s = m^d \mod n, with d\cdot e \mod \varphi(n). e is usually 65537, and we will just assume this value and only recover n. We do this by computing n' = \gcd(s_1^e - m_1, s_2^e - m_2). Note the lack of modulus operation, this is just a raw exponent, and yes, the result will have roughly 65537 times as many digits as the signature, message, or modulus do. The Euclidean Algorithm is just so efficient, it doesn’t care. Well it cares a little, but I still was able to compute this for a 2048 bit public key within just a few minutes. The result isn’t quite equal to n, though, as there is a 1/4 probability that both numbers were even, and 1/9 that both of them were divisible by 3. But n is supposed to have only two, very large prime divisors, so after doing trial division of the first few hundred primes or so, we can be relatively certain that what is left over is the actual public key.

Chapter 3: A slight detour into Schnorr signatures

Before we go into post quantum signature schemes, we should look at one more classical signature scheme, that while not used much in practice (curse you, patents), is going to be very important to understand for PQ schemes. Schnorr signatures are elliptic curve signatures, that use a specific method to transform a zero knowledge proof into a signature scheme, the Fiat-Shamir transform.

For Schnorr, we have the usual setup of an elliptic curve of order n with generator point G, and public key P_A = x_AG

The Zero Knowledge protocol works in the following way:

First, the prover commits to some value R. In the case of Schnorr, this is done by choosing a random value k and setting R = kG.

Next, the verifier issues a challenge c.

The prover now reveals s = k - c\cdot x_A \mod n. The verifier can check that R = sG + cP_A.

This proves that the prover knows x_A, as without foreknowledge of the challenge, the proof s cannot be given for a given commitment without knowing it (or having a quantum computer at hand). However, this proof is zero knowledge, since if one knows the challenge beforehand, it is rather easy to come up with the right commitment and proof, while still keeping the commitment uniformly random, by choosing s randomly, and then computing R' = sG, and “committing” to R = R' + cP_A. This will mean that we do not lose security by playing this game multiple times, and an attacker might as well concentrate on key recovery attacks instead.

To convert this zero knowledge protocol into a signature scheme, we replace the challenger with a hash function. In order to ensure that our commitment was actually committed to, we add it to the hash function as well. In other words the signature scheme looks like this:

Choose k randomly. Set r to the x coordinate of R = kG. Compute c = H(r || m), and set s = k - c\cdot x_A. We now have two variants, either setting the signature to the tuple (r, s) or to the tuple (c, s).

In the first case, the verifier computes c = H(r || m) and checks that sG + cP_A has x coordinate r, in the second variant, the verifier first computes sG + cP_A, takes the x coordinate and checks that c = H(r || m).

Both variants are perfectly secure, and in the case of elliptic curves it really does not matter which one is chosen. But now it gets interesting: Whether or not we can recover the public key depends on the variant. If the signature is (r, s), we can compute c = H(r || m) and perform the same operation as previously on ECDSA, and compute P_A = c^{-1}R - c^{-1}sG. However, if the signature is (c, s), we have no way of finding r, as it is a random x coordinate with enough entropy in order to make finding a preimage of c impossible.

Chapter 4: Dilithium

And with that, we go to our first post quantum signature scheme. Dilithium really deserves its own full blog post, but since here we are only interested in recovering public keys, we can simplify it quite a bit. Recall that for a learning with errors scheme, the public key is a vector t and a matrix A modulo some prime q such that t = As + e \mod q with some short vectors s and e. The secret key is the short vector s. For Dilithium, we now use the following zero knowledge proof:

First, the prover commits to the high bits w_1 of Ay, where y is shortish. Using the high bits only is similar to adding an error, since the small error also only affects the least significant bits. The challenger then challenges with a small scalar \mathit{c} (this scalar is somehow multidimensional, in a way that I will simply ignore here), and the prover responds with z = y + \mathit{c} s. The challenger can now check that Az - \mathit{c} t = Ay + \mathit{c}As - \mathit{c}As - \mathit{c}e has the same high bits as the commitment w_1.

We are just going to ignore that this protocol would not be zero knowledge, and that Dilithium actually has to use something called Fiat-Shamir with aborts, since, as mentioned, that stuff really should go into its own blog post.

For our purposes, the only question that matters is how this zerosome knowledge proof is turned into a signature scheme. And here comes the bad news for our public-key-recovering ambitions: It turns out that the commitment w_1 is significantly larger than the challenge \mathit{c}, so Dilithium uses the challenge in the signature landing on (\mathit{c}, z), instead of the commitment, which, as with Schnorr signatures, makes the public key unrecoverable.

A separate, interesting question is if we could recover the public key if w_1 was part of the signature instead. If we knew A, we could at least recover the high bits of t, which are the ones that matter. (As it turns out, Dilithium actually doesn’t even include the least significant bits of t in the actual public key, to save a few bits in there).

Without knowledge of A, this becomes a lot of linear algebra, which I haven’t done, you’d certainly need more than one valid signature, as a single signature does not contain enough information to recover that giant matrix.

Part of the reason why I haven’t done that linear algebra is that if we knew w_1 through some side channel or similar, we still would have no chance recovering the public key since Dilithium rudely hashes the public key with the message when computing the challenge, destroying any hopes of recovery.

Chapter 5 SPHINCS+

This brings us nicely to SPHINCS+. SPHINCS+ is a hash based signature scheme, meaning that it follows the simple logic that if, at time of public key creation, you already know what you would like to sign, you can just hash that thing and call that your public key instead. If you want to sign a lot of things, you can just pack all of these things into a Merkle tree, and declare the root of that tree to be your public key. If you don’t know what you want to sign, you can just commit to every possible message, by creating commitments for the single bits of a hash. Throw some randomness in there so you don’t by accident reuse your commitments to the point of breaking, and you get SPHINCS+. The scheme probably also deserves its own blog post, but we don’t really need much more for now. A Merkle tree inclusion proof will leak the root node of the Merkle tree, as long as we can observe the initial message of which the inclusion is proven. Unfortunately, loving hashing so much, the authors of SPHINCS+ did the same thing as Dilithium did, and hash in the public key when computing the initial message hash, making the public key unrecoverable. You only need relatively little extra information to recover though, and it is possible that some timing attacks on the verifying logic would suffice to pick back up and continue along the Merkle tree.

Chapter 6: Unbalanced Oil and Vinegar

Lastly, let’s look at a fun candidate in the second onramp called Unbalanced Oil and Vinegar. It again could use its own blog post, but here it is, slightly condensed and leaving out such unnecessary details as what the private key is and how anyone actually computes a valid signature in the first place.

Remember solving quadratic equations in high school? Like just given a value y, find the value x such that ax^2 + bx + c = y? Now imagine someone turned that into a signature scheme. Of course, even adversaries know that x_{1,2} = \frac{-b\pm \sqrt{b^2 - 4ac}}{2a}, so we will have to increase the difficulty a tiny bit by making things higher dimensional, but the idea remains the same.

A UOV public key is a system of o quadratic equations (f_i(x_1, x_2, \dots, x_{o+v})_{i=1,\dots o} in o + v variables. These equations are homogenous, so we can also describe them via a matrix: f_i(x) = x^TA_ix, where x is a vector. To sign with this, signer computes (via a miraculous trap door, this post is too short to contain) a value x, such that (f_i(x))_i={1\dots o} = H(m). The real trick to UOV is in the process of how the signer computes this preimage, but we do not care about that, we care about recovering public keys. Clearly, since a UOV signature is much shorter than a UOV public key, a single signature will not suffice to recover the public key, but what if we collect more than just one? What we have is a unknown set of polynomials, evaluated at certain known points (the signatures) getting to certain known images (the hashes of the messages). This sounds like a job for polynomial interpolation!

And indeed, with “just” \frac{(o+v)(o+v+1)}{2} message/signature pairs, we will be able to recover the public key:

We take the basis

\left(b_i\right)_{i=1}^{\frac{(o+v)(o+v+1)}{2}}=\left(X_i\cdot X_j\right)_{i=1,j=i}^{u+v, u+v}

and for each message/signature pair (x_j, y_j), we compute b_k(x_j). Since we know that f_i(x_j)=y_j, and know that each f_i is a linear combination of the b_k, this gives us a linear equation, which starts having a unique solution once we have collected \frac{(o+v)(o+v+1)}{2} signatures. On top of that, UOV doesn’t hash in the public key (probably because the public key is ginormous), so this works even on the candidate as published.

UOV also features a version with a compressed public key (as mentioned the public key is a tad big). This blog post is already really long, so I just remark that we cannot recover this compressed public key, since it involves a seed and we can’t find preimages of hash functions when the preimage has that much entropy. But the expanded public key is equivalent in any way, so I will still count UOV as a success.

Conclusion

Public keys are public, even when you don’t tell people about them. Signature schemes vary greatly in how easily the keys are recovered, but keep in mind that verification logic is usually not written to be side channel resistant, so even with schemes that do not leak the public key algebraically, there might still be a way for an adversary to recover the public key, and you need to design your protocols to be resilient against that property.

Categories
Cryptography (Incomprehensible)

Cartier Divisors

As an obvious first blog post, easily understandable and very relevant to cryptography (/s), here a description of Cartier Divisors, because Thai asked for it on Twitter.

For this, first some history: A while ago, I taught a Google internal course about the mathematics of elliptic curves. It would probably make sense to start with that content, but I’m going to assume that I’ll come back to it and fix the order later.

Anyways, the objects we are looking at are Divisors and Principal Divisors. The come up when studying curves as a way to describe the zeroes and poles of functions. Over the projective line \mathbb{P}_K^1 (also known as just the base field plus a point at infinity), a rational function (the quotient of two polynomials) can have any selection of zeroes and poles it so pleases, with the only constraint being that there must be (with multiplicity) the same number of zeroes and poles. We can see that by looking at

\frac{(X-a_1)(X-a_2)\dots (X-a_n)}{(X-b_1)(X-b_2)\dots (X-b_n)}

for a function with zeroes at a_1, a_2, \dots, a_n and poles at b_1, b_2, \dots, b_n. If a_i or b_i is \infty, then we ignore the corresponding term, and get a zero/pole at infinity.

On more general curves, we do not have this amount of freedom. The lack of freedom we have in choosing zeroes and poles is tied surprisingly deeply to the curve in question, so describing it turns out to be very useful.

A Weil divisor is a formal sum of points of the curve, that is, we assign an integer to every number of the curve, with all but finitely many points getting assigned the integer zero. The degree of a divisor is the sum of all these integers. The divisor of a function \operatorname{div}f is the divisor we get by assigning the order of the function in that point to the point, i.e. setting it 1 for simple zeroes, -1 for simple poles, and so on. If a divisor is a divisor of a function, we call the divisor a principal divisor.

With these definitions out of the way, we can get to Thai’s question. It turns out that the thing we are interested in is the divisors of degree 0 modulo the principal divisors. This group in some sense measures how restricted we are in our choice for placing zeroes and poles. It turns out, that for Elliptic curves, all divisors are equal to a divisor of the form P - O, with O being the point at infinity (or really any fixed (“marked”) point on the curve) up to a principal divisor (equal up to principal divisor is also called linearly equivalent). So what Thai is asking is that while we can think of principal divisors as a description of rational functions, what are the other divisors? The simple answer to that is that they are just what we said, formal sums of points, just some points with some integer weights. For elliptic curves, they are conveniently in a 1:1 correspondence with the points of the curve itself, which is why we usually gloss over the whole divisor thing and just pretend to add points of the curve themselves. But this answer is kind of unsatisfying, and it does generalize well in higher dimensions or for curves with singularities in them, so a better concept is needed.

Enter Cartier Divisors. In order to explain these, we’re technically going to need sheaves, but sheaves are a bit much, so I’ll try to handwave some things. The basic idea is, since we want to describe zeroes and poles, why don’t we just use zeroes and poles for that? Of course we can’t use a full function that is defined everywhere for that, that would only give us the principal divisors. But locally, we can use a function to describe zeroes or poles. Now what does locally mean? In algebraic geometry, the topologies we’re using are kind of weird. Here, we are using the Zariski topology, which for curves just means that when we say locally, we mean the whole curve with a finite number of points removed. We use this to remove the any zeroes or poles we don’t want for our divisor from our local representative.

All in all that means a Cartier divisor on a curve C is a covering (U_i), i.e. a collection of open sets (curve minus finite amounts of points) such that their union is the whole curve, and a rational function f_i per U_i, defined on U_i. This function’s zeroes and poles are what we understand as the divisor. Obviously, we now need to make sure all these partial functions work well as a whole. We do that by looking at U_i \cap U_j and the functions f_i and f_j restricted to that intersection. If we want this construction to define a consistent divisor, then f_i/f_j can not have any zeroes or poles in U_i \cap U_j. We write this as

f_i/f_j \in \mathcal{O}^\times (U_i \cap U_j)\;\;.

This now describes a consistent global object with zeroes and poles as we want them, getting quite close to describing divisors in a completely different way! We just have one problem, there are way too many functions with a specific pattern of zeroes and poles on our local neighborhood U_i, we need to get rid of all the extra behavior that isn’t just zeroes and poles! To do that, we need to look at two functions f_i and g_i on U_i that have the same pattern of zeroes and poles. What happens when we take f_i/g_i? Well we, as above, get a function without zeroes or poles on U_i. So if we want to forget all that extra structure, we need to take f_i modulo the set of functions without zeroes or poles on U_i. And that’s it.

If we write \mathcal{M}^\times(U_i) for the rational functions that are not equal to zero (so the rational functions that have a multiplicative inverse) and write \mathcal{O}^\times (U_i) for the functions without zeroes or poles on U_i, we can now describe a Cartier divisor as a covering (U_i) together with an element f_i \in \mathcal{M}^\times(U_i)/\mathcal{O}^\times(U_i) such that f_i/f_j\in\mathcal{O}^\times(U_i \cap U_j). A principal Cartier divisor is a Cartier divisor that can be described with just using just the entire curve C as the only element of the covering.

For extra bonus points (which I will not describe in detail here, because this blog post is already way too long and completely incomprehensible), we can look at what happens if we now take these Cartier divisors modulo principal Cartier divisors. It turns out, that the result can be described again with a covering U_i, but this time, instead of going through all that choosing of rational functions per set, we just use the intersections, and choose an element f_{ij}\in \mathcal{O}^\times (U_i \cap U_j), without even looking at rational functions in U_i at all, with some sheafy/cohomological rules for when two of those things are equal.