Learning, but with Errors


It’s now been several times that I had a conversation with some security folks that went something like this:

Them: “I wished this lattice stuff was easier, like RSA, I tried to read up on it but it was just so complex”
Me: “I think RSA is actually a bit more complicated than Learning With Errors, the didatics on it are just not as far along”
Them: *Pressing X to doubt*
Me: “Fine, I’ll explain it, against my will. I am just going to note here that I am basically forced into this and derive no joy whatsoever from explaining lattice cryptography.”

So after having done that a few times, I figured I should put some of it into a more scalable, written form.

Prerequisites: A bit of Modular Arithmetic and a bit of Linear Algebra.

So what on Earth is a lattice?

A lattice is a free, finitely generated \mathbb{Z}-module usually embedded into a finite dimensional real vector space.

That’s a bit of a mouthful, so let’s break that down a bit.

The term module is just mathematicians being fancy and using a different word for “vector space, but over a ring” (In case you don’t know what a ring is, for this blog post, just replace that term with \mathbb{Z}, the integers. It’s a numbery thing where you can add, subtract, and multiply, but not necessarily divide. In case you don’t remember vector spaces, it’s where vectors live. You can add vectors, there is a zero vector, and you can multiply vectors with elements of the base ring/field to scale them).

Finitely generated means that there is a finite number of elements that generates it, in other words every element x in our module can be written as x = \sum_{i=1}^n a_i g_i, with a_i \in \mathbb{Z} and g_i being our finite set of generators.

This leaves us with free. A module is free, when it actually behaves like a vector space and doesn’t do any weird ring shit. Modules love doing weird ring shit, so this qualifier is unfortunately very necessary. More formally, it means that our module has a basis, i.e. that we can choose the g_i so that the sum above is unique. In that case, any such set will have the same length, which we call the rank of the module, because calling it the dimension would make too much sense and we can rewrite the module to be \mathbb{Z}^n. In other words, for any ring R all free, finitely generated R-modules are isomorphic to R^n.

So we have some integer vectors, and we want to embed them into a real vector space. To avoid some technicalities, I will only talk about so called full rank lattices, where the lattice rank is the same as the dimension of the vector space. I will also not go into the technical definition of embedding, it’s a fairly decent word as math words go and means pretty much that, the lattice is placed inside the vector space.

One thing to note is that there is more than one way to embed \mathbb{Z}^n into \mathbb{R}^n. Let’s take n=2 as example, so we can draw some pictures.

The lattice \left\langle\left(1, 0\right),\left(0, 1\right)\right\rangle

The simplest way we can embed \mathbb{Z}^2 into \mathbb{R}^2 is by just mapping things trivially and looking at all points with integer coordinates. Unsurprisingly, we get a grid.

The lattice \left\langle\left(1, 0\right),\left(\frac12,\frac{\sqrt{3}}{2} \right)\right\rangle

If we want to be a tad bit more creative, we could try this hexagonal pattern. We can immediately tell that these two lattices are materially different. For example, in the first lattice, every lattice point has four closest neighbors, while in the second lattice we have six closest neighbors. As every board game nerd will tell you, the difference matters. A lot.

There are far more ways to create 2-dimensional lattices. So many in fact that you can prove all sorts of things about elliptic curves just by looking at 2d lattices, but that is a different blog post.

So to recap:

  • A lattice is a copy of the \mathbb{Z}^n, embedded into the \mathbb{R}^n.
  • Lattices have bases. They behave nicely, and don’t do weird ring shit.
  • The way the lattice is embedded matters.

Hard problems in lattices

While lattices are fascinating by themselves, we want to do cryptography, so we need to have hard problems. The two hard problems we will look at are called the Shortest Vector Problem (SVP) and the Closest Vector Problem (CVP).

The Shortest Vector Problem is the question “What vector other than 0 is the shortest vector in this lattice”. The zero vector is in all lattices and has length 0, so if we didn’t exclude it it would not be a very interesting problem.

The Closest Vector Problem on the other hand answers the question “Given a vector x, not necessarily in the lattice, which point in the lattice is the closest point to this vector?”

Answering (pink) the CVP for the vector \left(\frac53, \frac34\right)(red)

Good Basis, Bad Basis

At first, it might seem strange that the CVP and the SVP are considered hard problems. After all, you probably didn’t need to even get out a calculator or measuring tape to solve that example CVP, and I didn’t even bother to prove that 4 neighbors, 6 neighbors stuff which is basically the SVP, so what is supposed to be so hard about this?

The problem is, we only looked at this problem in 2 dimensions. But for lattice cryptography, we use something more like 768 dimensions. Unfortunately, graphics cards have trouble rendering those, so we will have to build some intuitions with numbers.

In the example we found that the closest vector to \left(\frac53, \frac34\right) is the lattice point (2, 1). How did my computer calculate this closest vector? Well I told it to round \frac53 to 2 and \frac34 to 1. Exciting stuff.

Let’s look at a completely, totally different lattice, the one generated by \{(2, 3), (5, 7)\}. If we now want to know what is the closest lattice point to (\frac53, \frac34), we first need to change basis to our lattice basis, round, and change back to the standard basis. We see that (\frac53, \frac34) = -\frac{95}{12}\cdot(2, 3) + \frac72\cdot(5, 7), rounding gives us -8\cdot(2,3)+4\cdot(5,7) = (4, 4). Let’s plot this:

The starting point (red) and the “closest” point (pink)

Wait, what? First of all, it turns out, this completely, totally different lattice is the same lattice as before, because (1,0)=-7\cdot(2,3) + 3\cdot(5,7) and (0, 1) = 5\cdot(2,3) -2\cdot(5,7). Turns out, I “accidentally” just used a different basis of the very same lattice. But why did our rounding trick not work? Let’s work backwards: Our solution should be (2,1) = -9\cdot(2,3)+4\cdot(5,7). So we almost got there, but rounding was still off by one in the first coordinate. In 2 dimensions, that’s not a big deal, just brute force by adding or subtracting 1 from each coordinate and see if you find a closer point. Of course, if the dimension is something like 768, brute forcing isn’t going to help.

This means that there are good bases and bad bases. In a good basis, we can just round the coefficients to solve the closest vector problem. In a bad basis however, if the dimension is high enough, you are hopelessly lost. And it turns out the bad bases outnumber the good ones, and if you have a good basis and want a bad one you just randomly base change a bit. (The example basis was literally my first try).

Now there is a way to go from a bad basis to a good basis, with a method called lattice reduction. But this blog post is already too long, so I won’t go into that. Suffice to say it’s hard to get a good enough basis in high dimensions.

A problem that is hard, but only if you don’t know some secret? This is very promising, let’s do some cryptography!

And how do I encrypt with this?

With all that lattice knowledge, we can dive right into learning with errors.

As parameters, we need a prime q, and lattice rank n. Popular choices are primes like 3329 and ranks like 768. One somewhat sour note is that we will unfortunately only encrypt a single bit.

To generate a key, we need a matrix A, where we take the coefficients uniformly at random between 0 and q. This matrix could theoretically be a public parameter, but choosing it per key makes it impossible to attack all keys at once.

The next thing we need are two vectors, s and e, with small coefficients. And by small we usually mean something like -1, 0, or 1. Maybe the occasional 2, but not larger. We then compute t = A^Ts + e \mod q and make (A, t) the public key, keeping s for the secret key.

To encrypt, we will need two more vectors r and f, also with small coefficients and a small integer g. We kind of do the same thing as with key generation and compute u = Ar + f \mod q as well as v = t^Tr + m\cdot\frac{q-1}{2}+g \mod q. To build some intuition let’s look at what size those terms have. Since we multiplied with a bunch of uniformly distributed stuff, t and u are also uniformly distributed. g as discussed is small. m\cdot\frac{q-1}{2} is either 0, a number generally considered fairly small, or \frac{q-1}{2}, a number which, modulo q is the largest possible (we allow negative numbers, so if we make it any bigger it just wraps around). Since t was uniformly distributed, t^Tr is as well. Which is good, since that means it masks whether our message term is big or small, and with only a single message bit, we should better try protecting it.

So how do we decrypt? Easy, just compute m' = v - s^Tu \mod q and output 0 if it’s small and 1 if it’s large. Since

\begin{align*}m' &= v - s^Tu\\ &= t^Tr + m\cdot\frac{q-1}{2}+g - s^T(Ar + f)\\ &= (A^Ts+e)^Tr - s^T(Ar + f) +m\cdot\frac{q-1}{2}+g\\ &= s^TAr - s^TAr + m\cdot\frac{q-1}{2} + e^Tr-s^Tf +g\end{align*}

The first two terms cancel. The last three terms are small, more precisely not larger than 2n. So m' is large if and only if m was 1, at least as long as we choose our parameters so that the rank is not too large compared to the prime.

Cherchez la lattice

So you might say, that’s nice and all, but where is the freaking lattice? All that stuff about closest vectors and bad bases and so on, it doesn’t look like we actually used any of it. And you’d be right. Lattices are kind of weird in that they are everywhere in arithmetic number theory, but they like to hide a bit sometimes. So let’s find this lattice, by attempting to break this scheme.

We want to recover the secret key s from the public key A, t = A^Ts+e \mod q. Let’s do this in two steps. First, we try to find any vectors s' and e' that solve t=A^Ts'+e'\mod q, without caring whether or not they are short, and then we can try to modify them to be short.

We can rewrite our equation as (A^T, \operatorname{Id}, q\operatorname{Id})(s', e', k) = t with k being some integer vector that we just use so we don’t have to worry about the \mod q part. This is solving a linear equation over the integers. It’s not quite as straightforward as the standard Gaussian algorithm, but a small modification by Smith will solve it and give us both a special solution (s', e', k) and a basis of the kernel of (A^T, \operatorname{Id}, q\operatorname{Id}). Recall that any solution, including the short s and e we are looking for differ from s' and e' by an element of the kernel.

Hmm, looking more closely at that kernel, it is a \mathbb{Z}-module by virtue of being a kernel, it has a basis, so it is free, and the basis is finite, so it is finitely generated. We don’t really care about the last n coefficients though which we only needed for the modulo, but we can just drop them, and the result remains a free, finitely generated \mathbb{Z}-module. Given that we want to find short vectors, and our notion of shortness was just the normal Euclidean norm, we can embed this kernel into \mathbb{R}^{m} trivially, and have found a lattice. In fact, we have found the lattice, since with a good basis we could find the vector (s_0, e_0) closest to (s', e') in the kernel lattice, which means their difference would be short, and indeed it would be (s, e) (even if it was not, what the only property for the decryption algorithm to work is the s and e are short, so it would just decrypt).

Smith unfortunately gives us a very large special solution and kernel basis vectors that very much are a bad basis of the kernel, which means that we won’t be able to find this closest vector, and this property gives Learning With Errors its security.

Closing Remarks

Since we have encrypted a single bit, if we used the scheme as is, it would be incredibly wordy, since we would need to repeat all of this 128 times (or more) to actually transmit a key. Kyber (or ML-KEM in NIST terms) uses some trickery to both make keys and ciphertexts smaller and operations faster, but I won’t go into them in this blog post.

Also, just like textbook RSA, the algorithm I presented here would not be secure in interactive settings, and needs additional modifications to ensure that we cannot extract the secret key by sending several related messages and observing whether the decrypting party use the same or a different key. But this too would need a separate blog post.


Invisible Salamanders in AES-GCM-SIV

By now, many people have run across the Invisible Salamander paper about the interesting property of AES-GCM, that allows an attacker to construct a ciphertext that will decrypt with a valid tag under two different keys, provided both keys are known to the attacker. On some level, finding properties like this isn’t too surprising: AES-GCM was designed to be an AEAD, and nowhere in the AEAD definition does it state anything about what attackers with access to the keys can do, since the usual assumption is that attackers don’t have that access, since any Alice-Bob-Message model would be meaningless in that scenario.

What is interesting to me is that this property comes up more often than one would think, I ran across it several times now during my work reviewing cryptographic designs, it’s far from an obscure property for real world systems. The general situation these systems have in common is that they involve three parties: Alice, Bob, and Trent. Trent is a trusted third party for Bob, who is allowed to read messages and scan them, with details like when and why depending on the crypto system in question. While Trent and Bob agree on the ciphertextsay because Trent hands Bob the ciphertext or because Alice presents Trent’s signature on itAlice has the option of giving Trent and Bob different keys. The challenge for Alice is to come up with a ciphertext that has a valid authentication tag and still decrypts to different messages for Trent and Bob.


Before I dive deeper into how to construct invisible salamanders for AES-GCM and AES-GCM-SIV, a few words on how to defend against these problems. The easiest option here is to add a hash of the key to the ciphertext. This technically violates indistinguishability, as the identity of the key is leaked, i.e. an attacker now knows which key was used for the message. If indistinguishability is necessary, using the IV as a salt for the hash works well, constructions like HMAC-SHA-2(key=IV, message=key) (i.e. aka HKDF-expand) work well here, as long as attention is paid on whether or not this key hash can be used in any other context. In general, it shouldn’t because the key already should only be used for AES-GCM/AES-GCM-SIV, but real world systems sometimes have weird properties.

Constructing Salamanders

With the mitigation out of the way, onto the fun part: Constructing the messages. In order to understand why and how these attacks work, we first have to talk about \mathbb{F}_{2^{128}} and the way AES-GCM and AES-GCM-SIV use this field to construct their respective tags. As a finite field \mathbb{F}_{2^{128}} supports addition, multiplication, and division, following the usual field axioms. The field has characteristic 2, which means addition is just the xor operator, and subtraction is the exact same operation as addition. Multiplication and division is somewhat more complicated and not in scope for this article, it suffices to say that multiplication can be implemented with a very fast algorithm if the hardware supports certain instruction sets (carryless multiplication). The division algorithm uses the Euclidean algorithm and will at most take 256 multiplications in a naive implementation, so while slower than the other operations, it will still be extremely fast. I will use + for the addition operation and \cdot for the multiplication operation. The most important caveat is to not confuse these operations with integer arithmetic.


Next, on to AES-GCM. This AEAD is a relatively straightforward implementation of an AEAD that uses a UHF based MAC for authentication. Our IV is 12 byte long, we use a 4 byte counter and CTR mode to encrypt the message. The slightly odd feature is that we start the counter at 2, for reasons we will see later. For authentication, we first derive an authentication key H by encrypting the zero block (This is why we don’t start the counter at zero, otherwise the zero IV would be invalid). Now, using the ciphertext blocks, additional data blocks (both padded with zeros as needed for the last block), and adding a special length block containing the size of the additional data and the ciphertext, we get a collection of blocks, all of which I will refer to as C_i. To compute the tag, we now compute the polynomial

GHASH(H, C, T) = C_0\cdot H^{n+1} + C_1\cdot H^{n}+\dots + C_{n-1}\cdot H^2+C_n\cdot H+T

The constant term, T is the encrypted counter block associated to the counter variable of 1 (Which is why we started at 2 for the CTR mode). Remember that in characteristic 2 + is xor, so we could equivalently say that we compute the polynomial without the constant term and then encrypt it with CTR mode as the first block.

Now, how do we get two different plaintexts to agree on both ciphertext and tag, we first choose two keys and produce the corresponding keystreams, choosing the plaintexts so that the ciphertexts agree (If you want two plaintext that make sense, this part is the hardest step, you first brute force the first few bytes in order to be valid in one format and a comment opening statement in the other, so that you can switch which parts of the ciphertext will appear as valid plaintext and which parts appear as commented out). We leave one ciphertext block open for now, as a sacrificial block that we will modify in order to make the tags turn out to be the same. Next derive the corresponding authentication keys H_1 and H_2 and our constant terms T_1, T_2. This means, we have C_i fixed, except for a specific index, say j, and can now solve

GHASH(H_1, C, T_1) =GHASH(H_2, C, T_2) \sum_{i=0}^n C_i\cdot H_1^{n+1-i}+T_1=\sum_{i=0}^n C_i\cdot H_2^{n+1-i}+T_2 C_j\cdot\left(H_1^{n+1-j}+H_2^{n+1-j}\right)=\sum_{\substack{i=0\\i\neq j}}^n C_i\cdot \left(H_1^{n+1-i}+H_2^{n+1-i}\right)+T_1+T_2 C_j=\left(H_1^{n+1-j}+H_2^{n+1-j}\right)^{-1}\cdot\left(\sum_{\substack{i=0\\i\neq j}}^n C_i\cdot \left(H_1^{n+1-i}+H_2^{n+1-i}\right)+T_1+T_2\right)

by solving for the sacrificial block C_j.


So far so good, but, what about AES-GCM-SIV? GCM is famous for having many weird properties that make it extremely fragile, like leaking the authentication key on a single IV reuse, or allowing for insecure tags smaller than 128 bits. In many ways, AES-GCM-SIV is how AES-GCM should look like for real world applications, much more robust against IV reuse, only revealing the damaging properties of an UHF with a reused IV if both IV and tag are the same. This is accomplished through using the tag as a synthetic IV, meaning the tag is computed over the plaintext, and then used as IV for CTR mode to encrypt. Even though this kind of SIV construction uses MAC-then-Encrypt, they are secure against the usual downsides due to CTR mode always succeeding in constant time, independent of the plaintext. This means the receiver can decrypt the message and validate the tag without revealing information about the plaintext in case of an invalid tag. The library needs to take care that the plaintext is properly discarded and not exposed to the user in case the tag does not validate.

The actual IV for AES-GCM-SIV is used primarily derive a per message key. This means that if the IV of two messages is different, both encryption and authentication keys will be unrelated and can not be used to infer things about each other.

All in all AES-GCM-SIV works like this:

  • H, K_E = \operatorname{KDF}(K, IV)
  • T=\operatorname{AES}(K_E, P_0\cdot H^{n+1}+\dots+P_n\cdot H)
  • C=\operatorname{AES-CTR}(K_E, IV=T)

where the plaintext blocks P_i again contain additional data and length, and some extra hardening and efficiency tricks having been stripped for clarity.

Our previous approach of first creating the ciphertext and then balancing things out to get the tags to agree clearly cannot work here anymore. The keystream, and therefore the ciphertext, depend on the tag, so if we want to have any chance of finding a salamander, we have to fix the tag before we do any calculation at all. So after having chosen T, we decrypt it under each of our keys to get the result of our polynomial S_i=\operatorname{AES}^{-1}(K_{E,i}, T). What we are left with is finding plaintexts P_1, P_2 such that

S_i=\sum_{j=0}^n P_{j, i} H_i^{n+1-j}

which gives us a system of two linear equations with 2n unknowns. But this isn’t all constraints we need to satisfy, since we still need to encrypt these plaintexts once we have the tag balanced. Here, we are lucky that everything is over characteristic 2: The CTR encryption is just an addition of the plaintext and the encrypted counter block C_i=\operatorname{AES}(K_E, CB_i)+P_i. To say that two plaintexts result in the same ciphertext under two different keys is just fulfilling the equation

\operatorname{AES}(K_{E, 1}, CB_{j, 1})+P_{j, 1}=\operatorname{AES}(K_{E, 2}, CB_{j, 2})+P_{j, 2}.

This, like our two equations for the tag, is a linear equation. So in the end, for a plaintext that has a size of n blocks, we get n+2 linear equations with 2n variables. This means, in almost all cases, we can construct an invisible salamander with only adding two sacrificial blocks, with the same caveat that the two plaintexts need to be partially brute forced.

Test Code

I’ve put this to the test and have written code to produce AES-GCM (Java) and AES-GCM-SIV (C++) salamanders.

Cryptography (Incomprehensible)

Cartier Divisors

As an obvious first blog post, easily understandable and very relevant to cryptography (/s), here a description of Cartier Divisors, because Thai asked for it on Twitter.

For this, first some history: A while ago, I taught a Google internal course about the mathematics of elliptic curves. It would probably make sense to start with that content, but I’m going to assume that I’ll come back to it and fix the order later.

Anyways, the objects we are looking at are Divisors and Principal Divisors. The come up when studying curves as a way to describe the zeroes and poles of functions. Over the projective line \mathbb{P}_K^1 (also known as just the base field plus a point at infinity), a rational function (the quotient of two polynomials) can have any selection of zeroes and poles it so pleases, with the only constraint being that there must be (with multiplicity) the same number of zeroes and poles. We can see that by looking at

\frac{(X-a_1)(X-a_2)\dots (X-a_n)}{(X-b_1)(X-b_2)\dots (X-b_n)}

for a function with zeroes at a_1, a_2, \dots, a_n and poles at b_1, b_2, \dots, b_n. If a_i or b_i is \infty, then we ignore the corresponding term, and get a zero/pole at infinity.

On more general curves, we do not have this amount of freedom. The lack of freedom we have in choosing zeroes and poles is tied surprisingly deeply to the curve in question, so describing it turns out to be very useful.

A Weil divisor is a formal sum of points of the curve, that is, we assign an integer to every number of the curve, with all but finitely many points getting assigned the integer zero. The degree of a divisor is the sum of all these integers. The divisor of a function \operatorname{div}f is the divisor we get by assigning the order of the function in that point to the point, i.e. setting it 1 for simple zeroes, -1 for simple poles, and so on. If a divisor is a divisor of a function, we call the divisor a principal divisor.

With these definitions out of the way, we can get to Thai’s question. It turns out that the thing we are interested in is the divisors of degree 0 modulo the principal divisors. This group in some sense measures how restricted we are in our choice for placing zeroes and poles. It turns out, that for Elliptic curves, all divisors are equal to a divisor of the form P - O, with O being the point at infinity (or really any fixed (“marked”) point on the curve) up to a principal divisor (equal up to principal divisor is also called linearly equivalent). So what Thai is asking is that while we can think of principal divisors as a description of rational functions, what are the other divisors? The simple answer to that is that they are just what we said, formal sums of points, just some points with some integer weights. For elliptic curves, they are conveniently in a 1:1 correspondence with the points of the curve itself, which is why we usually gloss over the whole divisor thing and just pretend to add points of the curve themselves. But this answer is kind of unsatisfying, and it does generalize well in higher dimensions or for curves with singularities in them, so a better concept is needed.

Enter Cartier Divisors. In order to explain these, we’re technically going to need sheaves, but sheaves are a bit much, so I’ll try to handwave some things. The basic idea is, since we want to describe zeroes and poles, why don’t we just use zeroes and poles for that? Of course we can’t use a full function that is defined everywhere for that, that would only give us the principal divisors. But locally, we can use a function to describe zeroes or poles. Now what does locally mean? In algebraic geometry, the topologies we’re using are kind of weird. Here, we are using the Zariski topology, which for curves just means that when we say locally, we mean the whole curve with a finite number of points removed. We use this to remove the any zeroes or poles we don’t want for our divisor from our local representative.

All in all that means a Cartier divisor on a curve C is a covering (U_i), i.e. a collection of open sets (curve minus finite amounts of points) such that their union is the whole curve, and a rational function f_i per U_i, defined on U_i. This function’s zeroes and poles are what we understand as the divisor. Obviously, we now need to make sure all these partial functions work well as a whole. We do that by looking at U_i \cap U_j and the functions f_i and f_j restricted to that intersection. If we want this construction to define a consistent divisor, then f_i/f_j can not have any zeroes or poles in U_i \cap U_j. We write this as

f_i/f_j \in \mathcal{O}^\times (U_i \cap U_j)\;\;.

This now describes a consistent global object with zeroes and poles as we want them, getting quite close to describing divisors in a completely different way! We just have one problem, there are way too many functions with a specific pattern of zeroes and poles on our local neighborhood U_i, we need to get rid of all the extra behavior that isn’t just zeroes and poles! To do that, we need to look at two functions f_i and g_i on U_i that have the same pattern of zeroes and poles. What happens when we take f_i/g_i? Well we, as above, get a function without zeroes or poles on U_i. So if we want to forget all that extra structure, we need to take f_i modulo the set of functions without zeroes or poles on U_i. And that’s it.

If we write \mathcal{M}^\times(U_i) for the rational functions that are not equal to zero (so the rational functions that have a multiplicative inverse) and write \mathcal{O}^\times (U_i) for the functions without zeroes or poles on U_i, we can now describe a Cartier divisor as a covering (U_i) together with an element f_i \in \mathcal{M}^\times(U_i)/\mathcal{O}^\times(U_i) such that f_i/f_j\in\mathcal{O}^\times(U_i \cap U_j). A principal Cartier divisor is a Cartier divisor that can be described with just using just the entire curve C as the only element of the covering.

For extra bonus points (which I will not describe in detail here, because this blog post is already way too long and completely incomprehensible), we can look at what happens if we now take these Cartier divisors modulo principal Cartier divisors. It turns out, that the result can be described again with a covering U_i, but this time, instead of going through all that choosing of rational functions per set, we just use the intersections, and choose an element f_{ij}\in \mathcal{O}^\times (U_i \cap U_j), without even looking at rational functions in U_i at all, with some sheafy/cohomological rules for when two of those things are equal.