Categories
Cryptography

Unbalanced Oil and Vinegar Part II

After Part I looked at the hard problem underlying Unbalanced Oil and Vinegar, it is now finally time to talk about the algorithm itself.

Verify

As with many signature algorithms, looking at the verification routine first is a good idea. The verification algorithm is usually simpler, and gives you an idea of what the signature scheme is trying to do, with the signing algorithm then showing you how it does it.

In the case of UOV, we use a paradigm called “hash and sign”, in many ways the simplest signing scheme paradigm, not using any fancy transforms. All we need is a trapdoor one-way function f, i.e. a function that everybody can evaluate efficiently at any point x, computing y = f(x), but where, given y, finding an x such that f(x) = y is difficult, unless one knowns some extra information k, in which case it is easy. In other words, there is a function g that is easy to evaluate such that f(g(k, y)) = y.

Given such a trapdoor one-way function, we can construct a signature scheme by simply making x the signature of the (hash of the) message y, i.e. the signature of a message r is the value x such that f(x) = h(r). The most prominent example of a signature scheme using this paradigm is RSA-PKCS1, where modular exponentiation is the one-way function, and the factorization of the modulus is the trapdoor information making it reversible.

As we discussed in part I, we want to use quadratic forms as our hard problem. At this point, it makes some sense to discuss different ways to write down quadratic forms. They are all interchangeable, and really which one is used is mainly a matter of what is most useful in a given context.

The most straightforward way of writing down a quadratic form is by writing it as a polynomial, i.e. by writing f = \sum_{i = 1}^n \sum_{j = i}^n a_{ij} X_i X_j. Note that the second sum starts at i, since we are operating over commutative fields, so we can always reorder things so that i \le j. This will be useful later. Also note that we only use a homogenous polynomial, i.e. a polynomial without linear or constant components. This is both a simplification and something that does not matter in practice, as you could always homogenize the polynomial.

The second way to notate a quadratic form is using matrix-vector multiplications. If we put our variables X_i into a vector x = (X_1, \dots, X_n), then x^T A x will evaluate to the exact same polynomial as above, with A being a triangular matrix. We can shift around the coefficients of A that are not on the diagonal, as long as A + A^T remains the same (if the characteristic is not equal to two, which will be a common caveat in this post).

Finally, looking at x^T A x it makes sense to ask what would happen if we did not use the same vector twice, but instead kept two different ones. This gives us x^T A y = (x, y)_A, a bilinear form, i.e. a function that is linear in both arguments and evaluates to a single field element. In other words, given a bilinear form (\cdot, \cdot)_A, we get the associated quadratic form by evaluating at the diagonal. Note that for the bilinear form, the order very much matters, since x_i y_j \neq x_j y_i, so in this case, we can no longer simply assume upper triangular matrices. In fact, usually, when studying quadratic forms, mathematicians restrict themselves to symmetric bilinear forms as the associated bilinear forms, i.e. forms with (x, y)_A = (y, x)_A, corresponding to symmetric matrices A. This gives a one-to-one correspondence between quadratic forms and bilinear forms, and is super convenient, with the slight downside of not working in characteristic 2. As mentioned, this is a more general thing, having to do with the fact that the degree of the polynomial equals the characteristic, something algebraic geometers call the wild case. And guess which base field UOV uses? That’s right \mathbb{F}_{256}. Super nice for computers, since a single byte corresponds to a field element, but really inconvenient for the more theoretical analysis. We can mostly ignore this, but keep in mind that pretty much everything that uses bilinear forms might have an unwritten asterisk that goes into unreasonable detail about the situation in characteristic 2.

With all this on notation out of the way, we can finally write down the public key and the verification algorithm:

A UOV public key consists of m quadratic forms, given as symmetric matrices A_i, of dimension n \times n. A signature of a message r is a pair (x, s), with x being an n dimensional vector, and s some salt, such that x^T A_i x = h(s || r)_i.

Written this way, it is clear that the generic way of solving this problem would be equivalent of solving the multivariate quadratic equation problem. Indeed, the signature is literally given by a solution to the multivariate quadratic equation problem defined by public key and message. But keep in mind that just because the problem is formulated as a multivariate quadratic equation, that does not mean that you have to use generic solvers to attack it.

Note that, different from RSA, every message has more than one possible preimage, so even without the salt would allow for multiple signatures. By adding the salt we gain some additional security guarantees, in that it no longer matters whether the signer has signed the exact same message before, since with a different salt it results in a completely different challenge anyways.

Sign

It is now finally time to discuss the trapdoor used for UOV. After all, if our public key was completely random, it would indeed be impossible to forge a signature, but it would be equally impossible to actually compute a signature for the signer as well. So we need to first construct a special system of quadratic equations that can be solved easily, and then find a way to transform it into one that hopefully is indistinguishable from a randomly chosen one. We use the matrix notation discussed above for this. In particular, if the matrices A_i have the form \begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}, with V_i being a symmetric (n-m) \times (n-m) matrix, and M_i being a (n-m) \times m matrix. Written in this block fashion, it makes sense to distinguish the first n - m variables, called the vinegar variables from the last m variables, called the oil variables. If we look at what this form of matrix does to the polynomial, we see that having that zero block in the lower right means that while vinegar variables are multiplied with all other variables, oil variables are only ever multiplied with vinegar variables, and never with each other. We also conveniently have exactly as many oil variables as we have equations to solve. And solving the equations is now very easy: First, randomly chose the value of the vinegar variables. Partially evaluating the polynomials, we will now have constant terms (where vinegar variable was multiplied with vinegar variable), and linear terms (where vinegar variable was multiplied with oil variable), but crucially, we do no longer have any quadratic terms, as we did not allow oil variables to be multiplied with other oil variables. In other words, we have m linear equations with m variables to solve. Which we can do by applying Gaussian elimination. This system will not always have a unique solution, but that is reasonably unlikely, and if it doesn’t, we can simply chose a different set of values for the vinegar variables.

This gives us a system of quadratic equations that is easy to solve, and this will be our private key, and what we actually solve when signing.

So how do we switch from this private system of quadratic equations to the public system? Well, with a base change, of course. This is a linear function on our n variables, mixing them around, explaining the name of the scheme. We can write this using a n \times n matrix B, which allows us to fully write down the private key and the signing algorithm:

A UOV private key is a n \times n matrix B such that the public key A_i can be expressed as B^T A_i B = \begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}. To sign, we first compute the hash of the salted message, chose random values for the vinegar variables, giving us (v, X) as preliminary signature, leaving variables for the last m coefficients. We then compute (v^T, X^T)\begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}\begin{pmatrix}v\\X\end{pmatrix} = v^TV_iv + v^TM_iX + X^TM_i^Tv = v^TV_iv + 2v^TM_iX, giving us m linear equations, which we solve for X, giving us a private signature (v, x). Since A_i can be expressed as B^T A_i B = \begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}, and we just computed (v, x) such that (v^T, x^T)\begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}\begin{pmatrix}v\\x\end{pmatrix} = h(s || r)_i, we now know that h(s || r)_i = (v^T, x^T)\begin{pmatrix}V_i&M_i\\M_i^T&0\end{pmatrix}\begin{pmatrix}v\\x\end{pmatrix} = (v^T, x^T)B^T A_i B\begin{pmatrix}v\\x\end{pmatrix}, so B\begin{pmatrix}v\\x\end{pmatrix} is our signature.

Bonus: Self verifying public keys

In my opinion, the best way to understand an algorithm is to have fun with it, looking for interesting properties and silliness that it allows for or thwarts. I’ve already discussed how one can reconstruct a UOV public key from n(n+1)/2 message signature pairs, but UOV is an endless pit of funny properties, so here we can discuss how to construct amusing public keys, such as, for example, a public key that is an executable python script verifying an input signature using its own code as the public key. This, as a side benefit, will also explain how we can drastically compress UOV public keys as well.

To do this, we need to take the approach algebraists love to take, and get rid of the basis altogether, i.e. we now talk about some n dimensional vector space X. So how can we describe the special property of our quadratic form, or its associated bilinear form, without explicitly referencing its matrix structure (and thus the specific basis)? Well, we made the matrix symmetric (since we ignore the whole “it’s actually characteristic 2, so it’s all weird” thing), and we know that symmetric matrices lead to symmetric bilinear forms, so we know that we are looking for m symmetric bilinear forms. The main property of the form in our special basis was that 0 in the lower right corner. The coefficients of a matrix of a bilinear form describe the result of the bilinear form applied to the corresponding basis vectors, i.e. a_{ij} = (e_i, e_j)_A, so having a 0 in this position means that there are m linear independent vectors which, when put into the bilinear form in both positions, will evaluate to 0. So to get rid of the basis, we need an m-dimensional subvectorspace O of X, with the property that (x, y)_i = 0 \;\;\forall x, y \in O for i from 1 to m. It is this subvectorspace which makes up the private key. Since it corresponds to the last m variables when choosing a basis, it makes sense to call this the oil space, containing vectors consisting of only nonzero oil variables above.

To continue our description of UOV without choosing a basis, we need to extend our oil space to the full vector space, by finding some space V such that V \oplus O = X. Thankfully, there are tons of such vector spaces, and, as it will turn out later that the choice of vinegar space does not matter at all.

With the vinegar space chosen, we can now reformulate the full signing algorithm, without the use of base changes:

We know that all vectors x \in X have a unique representation as v + o = x, with v \in V, o \in O. So we know that (v+o, v+o)_i = (v, v)_i + (v, o)_i + (o, v)_i + (o, o)_i = (v, v)_i + 2(v, o)_i, given the property of our bilinear forms. Choosing v \in V randomly, we again get a linear equation per bilinear form to compute o, and in particular, if v is not very unluckily chosen, there is now only one o that will give the right value for all (\cdot, \cdot)_i. Since the decomposition is unique, if we had chosen a different vinegar space, there would be a different vinegar vector that would have resulted in the same signature (assuming again that we do not get unlucky and end up with an underdefined system of equations and reroll). But we chose the vinegar vector at random anyways, so other than determining the lower dimensional space of unlucky vinegar vectors, the choice of the vinegar space has no consequence for our signing algorithm, as long as it intersects the oil space trivially to form the direct product, and even the basis we chose makes no difference, since we only use this vector space for choosing a random element from it, an operation that is independent from the choice of basis.

Let’s now look at the oil space, and what consequences different choices of basis have here. We assume that the vinegar space and oil space are fixed as above, and that we have chosen a random vinegar vector such that the system of equations (v, v)_i + 2(v, \cdot)_i has full rank. But then there is exactly one oil vector that solves for a particular right hand side, and that oil vector again does not depend on the basis we use for the oil space, just on the fact that it is in this oil space. So we can also chose the basis of the oil space basis as convenient. Note that, different from the vinegar space, this is not true for the oil space itself, which we had to secretly determine and now have no way of changing, just its basis.

Lastly, we can ask how likely it is for a given random n-m dimensional sub space and m dimensional sub space to intersect trivially. It turns out, this is again very likely (the chance of having a non trivial intersection is about one over the number of field elements). This means we don’t restrict our choice in oil space too much if we require it to intersect trivially with a specific vinegar space.

So, to put everything together: The choice of vinegar space and its basis has no real relevance to the signing algorithm, so we might as well chose the first n-m unit vectors to span this space. This does not really restrict the choice of oil space, and within that space the choice of basis again does not matter. We obviously cannot chose this part of the basis as the unit basis, but we can chose the basis such that the last m coefficients agree with with the last m unit vectors, i.e. chose the basis of O as (o_i, e_i), with e_i being the m dimensional unit vectors. All in all, this means our base change matrix now looks like this: B = \begin{pmatrix}I_{n-m}&B_{12}\\0&I_m\end{pmatrix}. Applying this base change to our public key as above, we get \begin{pmatrix}I_{n-m}&0\\B_{12}^T&I_m\end{pmatrix}\begin{pmatrix}A_{11}&A_{12}\\A_{12}^T&A_{22}\end{pmatrix}\begin{pmatrix}I_{n-m}&B_{12}\\0&I_m\end{pmatrix} = \begin{pmatrix}A_{11}&A_{11}B_{12} + A_{12}\\B_{12}^TA_{11} + A_{12}^T&B_{12}^TA_{11}B_{12} + B_{12}^TA_{12} + A_{12}^TB_{12} + A_{22}\end{pmatrix}. So in order for the form to have the required shape, we need only that A_{22} = -B_{12}^TA_{11}B_{12} - B_{12}^TA_{12} - A_{12}^TB_{12}.

All of this taken together, this means that the only part of the public key that actually depends on the private key is the lower right m \times m block of the m quadratic forms. Which means that the rest of the public matrices can be chosen at random. The UOV competition entry cleverly uses this to make the public key much smaller, from m\cdot n(n+1)/2 bytes for the expanded public keys, to m\cdot m(m+1)/2 + 16 bytes for the compressed public key, using the 16 extra bytes as a seed to input into an XOF, to generate A_{11} and A_{12}. I, in turn, can use this property to keep the public key large, but hide amusing message in them, like the aforementioned python script.

Leave a Reply

Discover more from Key Material

Subscribe now to keep reading and get access to the full archive.

Continue reading