Categories
Cryptography Cryptography (Incomprehensible)

Unbalanced Oil and Vinegar, Part I

Introduction

While there are many schemes discussed in the currently ongoing second onramp for PQC signatures, Unbalanced Oil and Vinegar (UOV) is both one of the most serious contenders and also an extremely curious scheme. In this three part blog series, we will take a closer look.

In a previous blog post, I looked at how to recover an UOV public key from loads of valid signatures. That blog post very intentionally left out a lot of details about the actual scheme, like why it is believed to be secure, or how the private key actually works.

It belongs to a larger family of multivariate schemes, which is most famous for having most of their members broken, with paper titles ranging from the rather dry “Cryptanalysis of the oil and vinegar signature scheme” to the very memeable “Breaking Rainbow Takes a Weekend on a Laptop“. Interestingly enough, for a field with this many broken schemes, Unbalanced Oil and Vinegar has so far stood the test of time. In this first part, we will try to understand why the underlying hard problem, multivariate quadratic equations, is hard.

What solves polynomial equations

If you want to cut this blog post short, you can just read the last section, about why the problem is NP-complete. It is independent from the rest of the blog post, and I think slightly easier to understand. Doing so however, misses out on building some deeper understanding of what it means to solve systems of polynomial equations.

If you attended any form of primary eduction, you have probably learned the formula for “solving” quadratic equations, ax^2 + bx + c = 0 is \frac{-b\pm\sqrt{b^2-4ac}}{2a}. In fact, in German high school, we called this formula the “midnight formula” on account that you should be able to recite it without hesitation if your math teacher wakes you up at midnight and asks you how to solve quadratic equations. This approach is called “solving by radicals”. It solves a polynomial equation by expressing the roots of the polynomial with field operations and a special function \sqrt[n]{a}, which returns the single positive real solution of X^n - a = 0.

However, a problem soon starts occurring with this idea, without even making our polynomials multivariate (i.e. univariate, i.e. with a single variable): It does not work. While there are formulas for degree three and degree four univariate polynomial equations, with a thrilling history involving lots of drama and zero-knowledge proofs, there is, for a history equally steeped in very different drama, but without zero-knowledge proofs, no formula for quintic polynomials. As in, there is provably no way of solving a quintic polynomial equation with just the use of field operations and radicals over the rational numbers. (There also isn’t a way to do this over finite field, but for entirely different reasons, funnily also relating back to the same Galois guy).

If you take a step back, this is not super surprising. After all, the way I presented it above, all a radical means is assuming that you already have a way to solve an entirely different polynomial equation, and then making it weird by introducing such ill-defined concepts like the real numbers (yuck).

So what do we (as in algebraists) mean by solving a (univariate) polynomial equation? Well first we need to distinguish between reducible and irreducible polynomials. If the polynomial is reducible, than factoring it will give you new polynomials, and each solution of any factor is a solution of the original polynomial. If the polynomial is irreducible however, we just, sorta, do nothing, and just assume that x is a solution, and try to express things with respect to that solution.

For UOV in particular, we are concerned with rational solutions, i.e. solutions that themselves are part of the base field. Thankfully, despite all of this rambling above, those are always well defined, even if you don’t have a formula for it: They are just the roots of the linear factors of our polynomial. When it comes to finding rational solutions, the good news do not stop here: Factoring polynomials (both over finite fields and over the rationals, and also over all sorts of other fields like \mathbb{Q}_p((\zeta))) is a problem of polynomial complexity, so there are efficient algorithms for computing the factorization, and by extend finding the rational solutions, of any univariate polynomial. And if you are a German high school student, after reading this section, you are officially allowed to answer with “it’s complicated”, roll over, and get a good night’s rest when your teacher wakes you in the middle of the night.

Gröbner bases

So far for solving univariate polynomial equations. But, as the name of the field “multivariate cryptography” suggests, what we really are interested in are multivariate polynomials, i.e. polynomials with multiple variables. We also need to solve systems of equations, i.e. multiple equations simultaneously. Here we first encounter something really weird about multivariate polynomials, by looking at multivariate monomials.

While univariate monomials, i.e. expressions of the form X^n are a total well-order, i.e. all monomials compare with each other through divisibility, and there is a smallest monomial (1), the same is not true for multivariate monomials: We still get a natural partial order via divisibility (e.g. X^3Y^5 is divisible by and ergo bigger than X^2Y^3), not all monomials compare anymore, (e.g. X^3Y^5 is neither divisible by nor divides X^2Y^6). This quickly becomes a problem, for example division with remainder is no longer well-defined.

We can, however, fix this by extending our existing partial order (which, as an aside, is a lattice, which has nothing to do with lattices of lattice cryptography, which in turn for ML-KEM and friends are an order, which has nothing to do with the comparison thingy called order. Mathematicians are very bad at naming things.) to a total order. The easiest way to extend the order is by using the lexicographical order, where we first compare the degrees of the first variable, and if equal, compare the second variable, and so on, until we find differing degrees, which then decide which monomial is bigger (e.g. X^3Y^5 is bigger than X^2Y^6, but smaller than X^3Y^6). This is not the only possible order, and the choice of order will become very important later.

With the order chosen, we can now go ahead and define division with remainder again, by taking the leading monomial of the divisor, checking whether it divides the leading monomial of the dividend, and if so, subtracting the appropriately multiple of the divisor from the dividend. We can do this repeatedly with a whole set of divisors, to end up with a remainder, but we will notice that the order of these divisors matters, and that in general our division with remainder algorithm lost a lot of the nice properties that it had in the univariate case.

Enter Gröbner bases. These are special sets of polynomials, for which division with remainder regains at least some of the useful properties they had in the univariate case. In particular, when using a Gröbner basis, the remainder is a canonical representative of the ring of all polynomials modulo the ideal generated by the set of polynomials. Why do we care about this? It turns out that there is a rather intimate relationship between the modulo operation, and the set where all polynomials simultaneously vanish, i.e. the solution of our system of multivariate polynomials, called the Nullstellensatz. You just know how important it is because we didn’t translate it from Hilbert’s original German. The details here go far beyond this blog post (as you might have noticed when I started talking about generating ideals etc, without warning the reader that they’ll need commutative algebra to proceed), and are thankfully not necessary to understand the basics.

For us, interested in particular rational solutions though, there is some value in looking more deeply into how a Gröbner basis for the lexicographical order looks like:

Given a system of polynomial equations (of dimension 0, don’t worry about that, though, it’s a technicality), the Gröbner basis with respect to the lexicographical order is an equivalent set of polynomial equations (i.e. each solution of one is a solution of the other), that specifically has the form (f_1, f_2, \dots, f_n), with f_n being univariate in X_m, f_{n-1} being bivariate in X_m, X_{m-1}, etc.

In particular, given such a Gröbner basis, we can find the rational solution by solving the last polynomial (which is univariate), taking each zero, partially evaluating the second to last polynomial obtaining another univariate polynomial, solving that, and so on. Note that this algorithm by itself might already be exponential: Nothing guarantees us that any given zero will end up with a valid solution after having factored tons of univariate polynomials, but in practice that is not usually the problem.

Finding Gröbner bases

What is the problem though, is finding the Gröbner basis to begin with. In fact, this problem is so complicated and interesting, that I decided to write an entire blog post about it, in case you haven’t noticed that I have not really talked about UOV so far, and won’t really mention the scheme other than as a motivating factor for all this until part 2.

So let’s finally look at the algorithm to find Gröbner bases, Buchbergers’s algorithm (there are slight improvements on this algorithm called F4 and F5, but they don’t drastically change things). The algorithm is simple enough to write it out in its entirety here:

buchberger(set of polynomials S) ->
list of polynomials S'
  for every pair (p1, p2) in S:
    let m1 be the leading monomial of p1
    let m2 be the leading monomial of p2
    let m be the lcm of m1 and m2
    compute p = m/m1 p1 - m/m2 p2
    reduce p by S with quotient and remainder
    if p != 0, add p to S
  reduce every element of S by S without it
  return S if the loop adds no new polynomials

This algorithm is a beauty! Notice that, the way it is written, it is not even clear that it terminates, much less what complexity it has. The “simplest” argument I know as to why it terminates goes something like this: If we look at the ideal generated by the leading monomials of S, then the element p we add in each loop iteration, will either have a new leading monomial (otherwise it would have been reduced), not yet part of this ideal, or it will reduce to 0 and not be added. So for each loop iteration, this ideal of leading monomials strictly grows. But polynomials rings are Noetherian, and as such, every strictly ascending chain of ideals will eventually terminate. At that point, the algorithm returns. While this proves that Buchberger’s algorithm always terminates, it does so by invoking one of the strangest properties of commutative algebra, and it does so without giving a constructive proof which could be used to determine the runtime complexity. One can compute the runtime complexity, by doing things more carefully, and arrives at the wonderful terms like O(d^{2^n}) for the degree of the polynomials in the eventual basis, so yeah, it might be a while, and even just writing down the basis is not trivially possible. In fact, in practice it is the space complexity of Buchberger’s algorithm which destroys all hope of ever getting a solution far before the time complexity becomes relevant.

And yet, algebraic geometers are so desperate for Gröbner bases, that they will continue running this algorithm for their current pet problem, based on nothing more than just hope. And it turns out, that hope is not always misplaced. I mentioned monomial orders before, and it turns out that Gröbner basis computation very sensitively depends on which monomial order one choses. A problem which will just run out of memory in one order might finish in seconds in another. You can also vary when you reduce which polynomial by which set, and come up with all sorts of fun distributed computation problems that will greatly influence the runtime of this algorithm. And what is nice is that once you have one Gröbner basis, you can quite efficiently compute other Gröbner bases as well, so nothing stops us from first computing the Gröbner basis in the more efficient graded lexicographical order, and then switching to the desired lexicographical order for cheap.

And this is the biggest weakness of multivariate cryptography. Where lattice cryptography has its nice worst case to average case reduction, multivariate cryptography has a very hard worst case problem, but a very finicky to control average case. So finicky in fact that often authors will just state how much RAM they spend on trying to find a Gröbner basis, since there is no better way of estimating runtime.

Multi-variate quadratic equations solving is NP-complete

Now to the last, and as promised independent section of this blog post, showing that solving multivariate quadratic equations is NP complete. We do this by reducing the problem to SAT, boolean satisfiability. For conciseness, we only do this for multivariate polynomials over \mathbb{F}_2, the field with 2 elements, but showing it for every other field is not substantially different.

We do this, by looking at one multivariate polynomial in particular: f(X, Y) = XY + 1.

Since our field has only two elements, we can quickly check what f evaluates to: f(0, 0) = 1, f(0, 1) = 1, f(1, 0) = 1, f(1, 1) = 0. In other words, if we associate 0 with false and 1 with true, f is equal to NAND. Famously every boolean circuit can be expressed just by using NAND gates, and wiring up gates is the same as chaining function calls, which in turn is the same as substituting polynomials. This means every boolean expression equals one multivariate polynomial, with the degree being exponentially bounded by the circuit depth. In order to get back to a quadratic system of polynomials, we now introduce a new helper variable Y for each variable X that has a degree higher than a square, by adding the polynomial X^2 - Y to our system of polynomials. This allows us to halve the degree of the polynomial again, and taken together, we get a polynomial bounded system of quadratic equations (and if we do both steps together, we never explode the degree of the polynomials in illegitimate ways). This reduces SAT to MVQ. On the other hand, given a possible solution, verifying it can be done simply by evaluating quadratic polynomials, an operation that itself is polynomial in time, so MVQ is in NP, making it NP-complete.

But remember, this only goes for generic systems of multivariate quadratic equations, and is no proof that UOV, which we will discuss in the next part, is even worst case hard, much less does it make statements about the average case complexity.

Leave a Reply

Discover more from Key Material

Subscribe now to keep reading and get access to the full archive.

Continue reading