Invisible Salamanders in AES-GCM-SIV

By now, many people have run across the Invisible Salamander paper about the interesting property of AES-GCM, that allows an attacker to construct a ciphertext that will decrypt with a valid tag under two different keys, provided both keys are known to the attacker. On some level, finding properties like this isn’t too surprising: AES-GCM was designed to be an AEAD, and nowhere in the AEAD definition does it state anything about what attackers with access to the keys can do, since the usual assumption is that attackers don’t have that access, since any Alice-Bob-Message model would be meaningless in that scenario.

What is interesting to me is that this property comes up more often than one would think, I ran across it several times now during my work reviewing cryptographic designs, it’s far from an obscure property for real world systems. The general situation these systems have in common is that they involve three parties: Alice, Bob, and Trent. Trent is a trusted third party for Bob, who is allowed to read messages and scan them, with details like when and why depending on the crypto system in question. While Trent and Bob agree on the ciphertextsay because Trent hands Bob the ciphertext or because Alice presents Trent’s signature on itAlice has the option of giving Trent and Bob different keys. The challenge for Alice is to come up with a ciphertext that has a valid authentication tag and still decrypts to different messages for Trent and Bob.


Before I dive deeper into how to construct invisible salamanders for AES-GCM and AES-GCM-SIV, a few words on how to defend against these problems. The easiest option here is to add a hash of the key to the ciphertext. This technically violates indistinguishability, as the identity of the key is leaked, i.e. an attacker now knows which key was used for the message. If indistinguishability is necessary, using the IV as a salt for the hash works well, constructions like HMAC-SHA-2(key=IV, message=key) (i.e. aka HKDF-expand) work well here, as long as attention is paid on whether or not this key hash can be used in any other context. In general, it shouldn’t because the key already should only be used for AES-GCM/AES-GCM-SIV, but real world systems sometimes have weird properties.

Constructing Salamanders

With the mitigation out of the way, onto the fun part: Constructing the messages. In order to understand why and how these attacks work, we first have to talk about \mathbb{F}_{2^{128}} and the way AES-GCM and AES-GCM-SIV use this field to construct their respective tags. As a finite field \mathbb{F}_{2^{128}} supports addition, multiplication, and division, following the usual field axioms. The field has characteristic 2, which means addition is just the xor operator, and subtraction is the exact same operation as addition. Multiplication and division is somewhat more complicated and not in scope for this article, it suffices to say that multiplication can be implemented with a very fast algorithm if the hardware supports certain instruction sets (carryless multiplication). The division algorithm uses the Euclidean algorithm and will at most take 256 multiplications in a naive implementation, so while slower than the other operations, it will still be extremely fast. I will use + for the addition operation and \cdot for the multiplication operation. The most important caveat is to not confuse these operations with integer arithmetic.


Next, on to AES-GCM. This AEAD is a relatively straightforward implementation of an AEAD that uses a UHF based MAC for authentication. Our IV is 12 byte long, we use a 4 byte counter and CTR mode to encrypt the message. The slightly odd feature is that we start the counter at 2, for reasons we will see later. For authentication, we first derive an authentication key H by encrypting the zero block (This is why we don’t start the counter at zero, otherwise the zero IV would be invalid). Now, using the ciphertext blocks, additional data blocks (both padded with zeros as needed for the last block), and adding a special length block containing the size of the additional data and the ciphertext, we get a collection of blocks, all of which I will refer to as C_i. To compute the tag, we now compute the polynomial

GHASH(H, C, T) = C_0\cdot H^{n+1} + C_1\cdot H^{n}+\dots + C_{n-1}\cdot H^2+C_n\cdot H+T

The constant term, T is the encrypted counter block associated to the counter variable of 1 (Which is why we started at 2 for the CTR mode). Remember that in characteristic 2 + is xor, so we could equivalently say that we compute the polynomial without the constant term and then encrypt it with CTR mode as the first block.

Now, how do we get two different plaintexts to agree on both ciphertext and tag, we first choose two keys and produce the corresponding keystreams, choosing the plaintexts so that the ciphertexts agree (If you want two plaintext that make sense, this part is the hardest step, you first brute force the first few bytes in order to be valid in one format and a comment opening statement in the other, so that you can switch which parts of the ciphertext will appear as valid plaintext and which parts appear as commented out). We leave one ciphertext block open for now, as a sacrificial block that we will modify in order to make the tags turn out to be the same. Next derive the corresponding authentication keys H_1 and H_2 and our constant terms T_1, T_2. This means, we have C_i fixed, except for a specific index, say j, and can now solve

GHASH(H_1, C, T_1) =GHASH(H_2, C, T_2) \sum_{i=0}^n C_i\cdot H_1^{n+1-i}+T_1=\sum_{i=0}^n C_i\cdot H_2^{n+1-i}+T_2 C_j\cdot\left(H_1^{n+1-j}+H_2^{n+1-j}\right)=\sum_{\substack{i=0\\i\neq j}}^n C_i\cdot \left(H_1^{n+1-i}+H_2^{n+1-i}\right)+T_1+T_2 C_j=\left(H_1^{n+1-j}+H_2^{n+1-j}\right)^{-1}\cdot\left(\sum_{\substack{i=0\\i\neq j}}^n C_i\cdot \left(H_1^{n+1-i}+H_2^{n+1-i}\right)+T_1+T_2\right)

by solving for the sacrificial block C_j.


So far so good, but, what about AES-GCM-SIV? GCM is famous for having many weird properties that make it extremely fragile, like leaking the authentication key on a single IV reuse, or allowing for insecure tags smaller than 128 bits. In many ways, AES-GCM-SIV is how AES-GCM should look like for real world applications, much more robust against IV reuse, only revealing the damaging properties of an UHF with a reused IV if both IV and tag are the same. This is accomplished through using the tag as a synthetic IV, meaning the tag is computed over the plaintext, and then used as IV for CTR mode to encrypt. Even though this kind of SIV construction uses MAC-then-Encrypt, they are secure against the usual downsides due to CTR mode always succeeding in constant time, independent of the plaintext. This means the receiver can decrypt the message and validate the tag without revealing information about the plaintext in case of an invalid tag. The library needs to take care that the plaintext is properly discarded and not exposed to the user in case the tag does not validate.

The actual IV for AES-GCM-SIV is used primarily derive a per message key. This means that if the IV of two messages is different, both encryption and authentication keys will be unrelated and can not be used to infer things about each other.

All in all AES-GCM-SIV works like this:

  • H, K_E = \operatorname{KDF}(K, IV)
  • T=\operatorname{AES}(K_E, P_0\cdot H^{n+1}+\dots+P_n\cdot H)
  • C=\operatorname{AES-CTR}(K_E, IV=T)

where the plaintext blocks P_i again contain additional data and length, and some extra hardening and efficiency tricks having been stripped for clarity.

Our previous approach of first creating the ciphertext and then balancing things out to get the tags to agree clearly cannot work here anymore. The keystream, and therefore the ciphertext, depend on the tag, so if we want to have any chance of finding a salamander, we have to fix the tag before we do any calculation at all. So after having chosen T, we decrypt it under each of our keys to get the result of our polynomial S_i=\operatorname{AES}^{-1}(K_{E,i}, T). What we are left with is finding plaintexts P_1, P_2 such that

S_i=\sum_{j=0}^n P_{j, i} H_i^{n+1-j}

which gives us a system of two linear equations with 2n unknowns. But this isn’t all constraints we need to satisfy, since we still need to encrypt these plaintexts once we have the tag balanced. Here, we are lucky that everything is over characteristic 2: The CTR encryption is just an addition of the plaintext and the encrypted counter block C_i=\operatorname{AES}(K_E, CB_i)+P_i. To say that two plaintexts result in the same ciphertext under two different keys is just fulfilling the equation

\operatorname{AES}(K_{E, 1}, CB_{j, 1})+P_{j, 1}=\operatorname{AES}(K_{E, 2}, CB_{j, 2})+P_{j, 2}.

This, like our two equations for the tag, is a linear equation. So in the end, for a plaintext that has a size of n blocks, we get n+2 linear equations with 2n variables. This means, in almost all cases, we can construct an invisible salamander with only adding two sacrificial blocks, with the same caveat that the two plaintexts need to be partially brute forced.

Test Code

I’ve put this to the test and have written code to produce AES-GCM (Java) and AES-GCM-SIV (C++) salamanders.

Cryptography (Incomprehensible)

Cartier Divisors

As an obvious first blog post, easily understandable and very relevant to cryptography (/s), here a description of Cartier Divisors, because Thai asked for it on Twitter.

For this, first some history: A while ago, I taught a Google internal course about the mathematics of elliptic curves. It would probably make sense to start with that content, but I’m going to assume that I’ll come back to it and fix the order later.

Anyways, the objects we are looking at are Divisors and Principal Divisors. The come up when studying curves as a way to describe the zeroes and poles of functions. Over the projective line \mathbb{P}_K^1 (also known as just the base field plus a point at infinity), a rational function (the quotient of two polynomials) can have any selection of zeroes and poles it so pleases, with the only constraint being that there must be (with multiplicity) the same number of zeroes and poles. We can see that by looking at

\frac{(X-a_1)(X-a_2)\dots (X-a_n)}{(X-b_1)(X-b_2)\dots (X-b_n)}

for a function with zeroes at a_1, a_2, \dots, a_n and poles at b_1, b_2, \dots, b_n. If a_i or b_i is \infty, then we ignore the corresponding term, and get a zero/pole at infinity.

On more general curves, we do not have this amount of freedom. The lack of freedom we have in choosing zeroes and poles is tied surprisingly deeply to the curve in question, so describing it turns out to be very useful.

A Weil divisor is a formal sum of points of the curve, that is, we assign an integer to every number of the curve, with all but finitely many points getting assigned the integer zero. The degree of a divisor is the sum of all these integers. The divisor of a function \operatorname{div}f is the divisor we get by assigning the order of the function in that point to the point, i.e. setting it 1 for simple zeroes, -1 for simple poles, and so on. If a divisor is a divisor of a function, we call the divisor a principal divisor.

With these definitions out of the way, we can get to Thai’s question. It turns out that the thing we are interested in is the divisors of degree 0 modulo the principal divisors. This group in some sense measures how restricted we are in our choice for placing zeroes and poles. It turns out, that for Elliptic curves, all divisors are equal to a divisor of the form P - O, with O being the point at infinity (or really any fixed (“marked”) point on the curve) up to a principal divisor (equal up to principal divisor is also called linearly equivalent). So what Thai is asking is that while we can think of principal divisors as a description of rational functions, what are the other divisors? The simple answer to that is that they are just what we said, formal sums of points, just some points with some integer weights. For elliptic curves, they are conveniently in a 1:1 correspondence with the points of the curve itself, which is why we usually gloss over the whole divisor thing and just pretend to add points of the curve themselves. But this answer is kind of unsatisfying, and it does generalize well in higher dimensions or for curves with singularities in them, so a better concept is needed.

Enter Cartier Divisors. In order to explain these, we’re technically going to need sheaves, but sheaves are a bit much, so I’ll try to handwave some things. The basic idea is, since we want to describe zeroes and poles, why don’t we just use zeroes and poles for that? Of course we can’t use a full function that is defined everywhere for that, that would only give us the principal divisors. But locally, we can use a function to describe zeroes or poles. Now what does locally mean? In algebraic geometry, the topologies we’re using are kind of weird. Here, we are using the Zariski topology, which for curves just means that when we say locally, we mean the whole curve with a finite number of points removed. We use this to remove the any zeroes or poles we don’t want for our divisor from our local representative.

All in all that means a Cartier divisor on a curve C is a covering (U_i), i.e. a collection of open sets (curve minus finite amounts of points) such that their union is the whole curve, and a rational function f_i per U_i, defined on U_i. This function’s zeroes and poles are what we understand as the divisor. Obviously, we now need to make sure all these partial functions work well as a whole. We do that by looking at U_i \cap U_j and the functions f_i and f_j restricted to that intersection. If we want this construction to define a consistent divisor, then f_i/f_j can not have any zeroes or poles in U_i \cap U_j. We write this as

f_i/f_j \in \mathcal{O}^\times (U_i \cap U_j)\;\;.

This now describes a consistent global object with zeroes and poles as we want them, getting quite close to describing divisors in a completely different way! We just have one problem, there are way too many functions with a specific pattern of zeroes and poles on our local neighborhood U_i, we need to get rid of all the extra behavior that isn’t just zeroes and poles! To do that, we need to look at two functions f_i and g_i on U_i that have the same pattern of zeroes and poles. What happens when we take f_i/g_i? Well we, as above, get a function without zeroes or poles on U_i. So if we want to forget all that extra structure, we need to take f_i modulo the set of functions without zeroes or poles on U_i. And that’s it.

If we write \mathcal{M}^\times(U_i) for the rational functions that are not equal to zero (so the rational functions that have a multiplicative inverse) and write \mathcal{O}^\times (U_i) for the functions without zeroes or poles on U_i, we can now describe a Cartier divisor as a covering (U_i) together with an element f_i \in \mathcal{M}^\times(U_i)/\mathcal{O}^\times(U_i) such that f_i/f_j\in\mathcal{O}^\times(U_i \cap U_j). A principal Cartier divisor is a Cartier divisor that can be described with just using just the entire curve C as the only element of the covering.

For extra bonus points (which I will not describe in detail here, because this blog post is already way too long and completely incomprehensible), we can look at what happens if we now take these Cartier divisors modulo principal Cartier divisors. It turns out, that the result can be described again with a covering U_i, but this time, instead of going through all that choosing of rational functions per set, we just use the intersections, and choose an element f_{ij}\in \mathcal{O}^\times (U_i \cap U_j), without even looking at rational functions in U_i at all, with some sheafy/cohomological rules for when two of those things are equal.