Categories
Cryptography

Unbindable Kemmy Schmidt

If you have been terminally online on IETF mailing lists, you might have seen this thread, where, in extremely uncharacteristic fashion for the IETF, everybody just agreed to only use the seed used in key generation for ML-KEM’s private key instead of the format defined by NIST, something allowed by the NIST standard, but not particularly encouraged. But even if you’re not in the habit of voluntarily reading standardization email lists, the reason behind this is a fun piece of cryptography, a good excuse to explain the Fujisaki-Okamoto transform, and yet another example of my favorite topic, properties primitives never promised to have, but people like to assume anyways.

Misbinding

As the title suggests, this has something to do with binding properties, so first, let’s go into what those are. My favorite example here are invisible salamanders, of which Soatok just wrote an excellent blog post about, so read that and come back here if you need a refresher. Or read it because you want an excuse to look at cute blue dhole pictures, also always a valid reason to read Soatok’s blog.

For our purposes, invisible salamanders could be described as such: Can Eve craft a ciphertext that both Alice and Bob decrypt successfully, but with a different resulting plaintext? Or, to introduce the vocabulary we use to express this a bit more succinctly, does the AEAD ciphertext bind the plaintext?

More generally, for a misbinding attack we have two honest parties performing a successful cryptographic operation on some input variables, creating some output variables. These operations are either the same operation, or at least related, so one party might encrypt while the other decrypts, but importantly both parties arrive at the same set of variables, even if the role of input and output gets switched. We say a variable X binds a variable Y, if, knowing that Alice’s value for X equals Bob’s value for X, we can prove that Alice’s value for Y has to equal Bob’s value for Y, even if other variables differ. The limitation to successful cryptographic operations is to get rid of trivial attacks, since the whole point of these misbinding attacks is to confuse deputies, not to alert them of an ongoing attack.

Going back to the invisible salamander example with this theory, we can now neatly formalize it: The operation Alice and Bob are performing is AEAD-decrypt, which has two input variables, the key and the ciphertext, and one output variable, the plaintext. We are interested in whether the ciphertext binds the plaintext, which means that both Alice and Bob are given the same ciphertext, but potentially have different keys. In fact, since AEAD decryption is a deterministic algorithm, if our attacker wants to have any chance of accomplishing their task of producing different plaintexts, this third variable, the key will indeed have to be different between our two honest participants. We can distinguish three different attacker models, with respect to that key.

The first option is that the attacker does not know either key. We call such an attacker honest, and write HON-BIND-CT-PT (HONest Attacker, CipherText binds PlainText) for a primitive where the binding holds.

The second option is that the attacker is aware of the keys, but cannot control them, i.e. the keys where leaked to them. We write LEAK-BIND-CT-PT for an AEAD where no attacker can come up with a ciphertext that would decrypt to two different plaintexts, even if they know both keys. This is the invisible salamander scenario, so we could also say that AES-GCM is not LEAK-BIND-CT-PT. (Since cryptographers love consistency, in reality the property is called key commitment when talking about AEADs, otherwise it would be too easy)

The third option, featuring the most powerful attacker, would be an attacker who does not only know the keys, but is allowed to chose them for Alice and Bob. In the AEAD scenario, the ability to chose the key is usually not all that interesting, since AEAD keys usually do not have much structure, but it will be crucial for the KEM misbinding attacks we actually want to discuss here.

With all that theory out of the way, we can now define the various misbinding terms that Cas and friends discuss in Keeping Up With The KEMs, the paper that started the whole tradition of naming things after TV show puns. The paper is a good read, so if you want more details, give it a try.

As a quick refresher, a KEM is a primitive consisting of three functions, and is essentially asymmetric encryption, but with the cryptographers choosing the message for you. KeyGen generates a private/public key pair, Encaps takes a public key and produces a shared secret and a ciphertext, and Decaps takes a private key and a ciphertext and recovers the shared secret.

For misbinding, similarly to the AEAD example, we can have Alice and Bob perform the Encaps or Decaps function, collect the input and output variables: PK (public key), SK (private key), CT (ciphertext), and K (shared secret), and ask which of these bind which with which attacker model. As one slight complication, in order to properly compare the keys, we need introduce a helper function that maps a private key to its corresponding public key, which I’ll call PrivateToPublic. The existence of this function is guaranteed for honestly generated keys, but it is not clear that such a function is well defined for malformed keys, but in the case of ML-KEM, the public key is simply part of the expanded private key form of the NIST standard, so this is not a problem.

So what binding properties might one be interested in? Let’s get some trivial ones out of the way. Clearly the public key does not bind the ciphertext, since we can just encapsulate twice and get different ciphertexts. Somewhat more subtly, the ciphertext does also not bind either the shared secret or the public key, since the Kyber team went with implict rejection, so a decapsulation operation never fails, it just gives you some gibberish, so having the same ciphertext says nothing about the keys used or the resulting shared secret if those keys differ.

This leaves the question whether the shared secret binds public key and/or ciphertext, i.e. BIND-K-PK and BIND-K-CT, in our three attacker models honest (knows only the public keys involved), leak (know the private keys involved) and malicious (choses the private keys involved).

The Fujisaki-Okamoto transform

The neat part about all the misbinding attacks we are going to discuss is that they do not rely on the lattice cryptography that underlies ML-KEM at all, just on a nifty transform used by pretty much all KEMs in the PQC competition, so everything I write here applies just as much to BIKE, HQC, Classic McEliece, etc. The attacks would even have applied to SIKE, were it not horribly broken along the way. All you need to look at to analyze the binding properties are the exact details of the variant of the Fujisaki-Okamoto transform used by the different schemes, without bothering with the actual hard problem they are based on.

The point of the Fujisaki-Okamoto transform is to take an asymmetric encryption scheme that is only secure if everyone behaves nicely (IND-CPA) and turn it into an encapsulation scheme where the attacker is allowed to adaptively chose a ciphertext, learning the decapsulation at each step, and still not being able to break things. Similar to the Fiat-Shamir transform for signatures, it takes one scheme that is much easier to construct and reason about, and transforms it into one that is much more useful, using the power of hash functions. We will also need a PRF, a pseudo random function, also known as a keyed hash function, for the implicit rejection step.

So how does it work? As mentioned, we start with an asymmetric encryption scheme, i.e. three functions (CPA.KeyGen, CPA.Encrypt, CPA.Decrypt), and construct three functions (FO.KeyGen, FO.Encaps, FO.Decaps), based on those. For KeyGen, we don’t need to do much:

FO.KeyGen() {
  sk, pk = CPA.KeyGen()
  z = Rand(32)
  fo-sk = (sk, z)
  return fo-sk, pk
}

In other words, we just use the original KeyGen function, and only add some random numbers to the private key, which we will use later.

In Encaps, we can start to see the main trick of the transform, but it still looks fairly straightforward:

FO.Encaps(pk) {
  m = Rand(32)
  // H is a hash function that outputs
  // two 32 byte strings.
  k, r = H(m)
  // derandomized call of CPA.Encrypt
  ct = CPA.Encrypt(pk, m, r)
  return k, ct
}

Written out, we choose a random message, then use a hash function to turn this message into both our shared secret and some entropy, and then encrypt our original message using this entropy obtained from hashing the message itself. Note that you can stretch 32 bytes of randomness into more randomness as necessary for the encryption algorithm, using a deterministic random bit generator as needed.

With that, we now come to the main course, the decapsulation function, where we use all this setup:

FO.Decaps(sk, ct) {
  // decrypt as normal
  m = CPA.Decrypt(sk, ct)
  // hash the recovered message as in Encaps
  k, r = H(m)
  // Perform the same operation the
  // encapsulating party should have performed
  expected_ct = CPA.Encrypt(pk, m, r)
  if (ct == expected_ct) {
    return k
  } else {
    // The encapsulating party was dishonest.
    // Instead of releasing the potentially
    // tainted shared secret to the caller,
    // return random gibberish depending
    // on the ciphertext
    return PRF(sk.z, ct) 
  }
}

The encapsulating party, if they were honest, has included their entire state in the ciphertext, since we derandomized the encryption call based on the message itself. So the decapsulating party can rerun the encapsulation in its entirety, and if they detect any foul play, they discard the shared secret they recovered, and return something completely unrelated. This means, all the adaptive chosen ciphertext abilities of our attackers go nowhere. Sure they can manipulate the ciphertext, and carefully craft something that would leak information, usually whether a closely related ciphertext decrypts to the same or a different shared secret, but any modification of the ciphertext would just lead to a malformed ciphertext, which the decapsulating party then rejects.

We should briefly discuss why the decapsulating party, after discovering the manipulated ciphertext, not simply fails and instead does this dance with the PRF. The reason here is kind of strange, it just so happens that that construction is theoretically more secure, since it doesn’t leak the fact that we have detected the manipulation to the attacker. In practice however, this does not really matter, since you usually will want to use the output of a KEM in an AEAD, and the AEAD will very visibly fail if the key was not what was expected, so it’s a weird artifact of the specific incentives the KEM designers are under, not actually holistically the best choice.

Unbindable Kemmy Schmidt

Now, we can finally discuss what this is all about, and look into the misbinding attacks that Fujisaki-Okamoto transformed KEM schemes are and are not vulnerable to. We just need to correct a tiny oversimplification I made in the previous section, and look at the Fujisaki-Okamoto transform that is actually used for ML-KEM:

This is NIST’s definition of ML-KEM’s encapsulate method. You can see that it is almost the same as what I have written above, except that NIST has the random message m as an input instead of computing it directly, which does not matter, and that the hash function call that computes the shared secret and the randomness for the CPA encryption scheme also includes a hash of the public key, which will matter a lot. At first glance, this sounds like good news for resistance to misbinding attacks: We include the public key in the transform, so if someone were to change the public key, they would mess with the ciphertext, and the Fujisaki-Okamoto transform will catch them red handed. And indeed, that is true for honest or leaking attackers. The Fujisaki-Okamoto transform is incredibly robust and unintentionally also defended us against a whole class of misbinding attacks!

For the malicious attacker though, this very hash will become the Achilles heel of the implementation. And another peculiar incentive KEM designers were under is to blame for it. One way to judge a KEM is using microbenchmarks, i.e. doing lots of decapsulation calls in a row and checking how fast they are, independent of the larger context that they are used in. And if you want your microbenchmarks to look better, you’ll want to avoid unnecessary repetitions of cacheable operations. But you also don’t want to scare off people by caching everything and having a ridiculously sized private key, so you trade off what you cache and what you compute on the fly. In a realistic scenario, you would have a wire format for storing your key and a in-memory format for your expanded key with all cacheable elements precomputed, but in the competition microbenchmark scenario, you’ll want to keep things simple if you can, and use the same format for both. Long story short, what the Kyber team decided to do was to cache the hash of the public key in the private key. The hash is only 32 bytes large, so it doesn’t increase the size of the private key all that much, but it avoids a full hash function call on the kilo byte sized public key, so it makes an ideal thing to include in the private key. The private key, for your normal IND-CCA2 purposes, is considered authentic, so storing random cached values in there is not a security problem.

But for the misbinding property, we have this whole malicious attacker concept, who is allowed to chose the private key! And that’s exactly what they do.

So we have a ML-KEM private key, which, if honestly formed, contains the CPA private key, the public key, the hash of the public key h, and the rejection secret z. To attack, Kemmy prepares two private/public key pairs, sk1/pk1 and sk2/pk2. We want Alice to encapsulate, so we ignore sk1 and set pk1 aside for her. For Bob, who we want to decapsulate, we take all components of sk2, except for the public key hash h2, which we replace with the hash h1 of pk1 instead to obtain our manipulated sk2′. Kemmy then gives pk1 to Alice and lets her encapsulate a message, obtaining ct1 and the shared secret k. Since Kemmy knows the private key, she can recover the random message m that Alice used in the Fujisaki-Okamoto transform, and prepares a second ciphertext ct2 by almost following the rules:

Kemmy.Encaps(m, pk2, h1) {
  k, r = H(m || h1)
  ct = CPA.Encrypt(pk2, m, r)
  return k, ct
}

In other words, instead of using the hash of the actual public key when encapsulating, we hash the other public key instead. The ciphertext so obtained will differ from Alice’s ciphertext, since it was produced using pk2, so this attack will misbind both ciphertext and public key to our unsuspecting Bob. Speaking of whom, Bob’s decapsulate, using the manipulated private key sk2′ will go as follows:

FO.Decaps(sk2', ct) {
  // decrypt as normal, since we have
  // not manipulated the CPA private key
  m = CPA.Decrypt(sk2', ct)
  // Remember that sk2'.h equals to
  // the hash of pk1, so both Alice and Bob
  // arrive at the same k and r
  k, r = H(m || sk2'.h)
  // Same operation Kemmy did!
  expected_ct = CPA.Encrypt(pk2, m, r)
  if (ct == expected_ct) {
    // Since Bob repeated Kemmy's operations,
    // this case will be chosen
    return k
  } else {
    // Not executed
    return PRF(sk.z, ct) 
  }
}

So both Alice and Bob recovered the same shared secret, but they got so via completely different public keys and ciphertexts, because Kemmy messed with some internal cached value of the private key, showing that ML-KEM, when using expanded private keys, is neither MAL-BIND-K-CT nor MAL-BIND-K-PK.

But Kemmy’s exploits don’t stop here. There is a second, very similar attack that uses the other Fujisaki-Okamoto transform related parameter of the private key, the rejection secret z. This attack is even stranger, and only works if both Alice and Bob are decapsulating. We also need to use the same ciphertext for both of them, so we only aim for breaking MAL-BIND-K-PK. The again run this attack by manipulating the private keys. This time, we take the honestly generated private keys sk1 and sk2, and replace their rejection secrets with a shared one, i.e. sk1.z = sk2.z. For the ciphertext, we are not going to put in any effort, and just use some random string. A random string, with overwhelming probability, will fail the Fujisaki-Okamoto transform’s test, and move to the rejection case. So the public keys corresponding to Alice and Bob’s private keys do not matter at all, we discard any variable we’ve calculated with them, after all. What does matter though is the rejection secret z, which Kemmy replaced to be the same for both keys. Since it is the same value for z, and since we have given both Alice and Bob the same ciphertext ct, both of them will compute the same rejection “shared” secret, which is supposed to be gibberish, and not shared with anyone. But now, through our machinations, what was supposed to be a clever way to fail the decapsulation has actually become a shared value. So we found another way to break MAL-BIND-K-PK.

Mitigations

Both attacks rely on manipulating the private key in a way that no genuinely created private key would ever be. And it turns out, there is a simple trick to avoid any and all manipulations to the private key: Just use the seed that was used to generate the key pair as the private key. When you need the private key, you can just call KeyGen, and get the expanded private key from there. Somewhat unfortunately, the way the NIST standard is structured, the seed is (d, z), with z being the rejection secret of the second attack, so if we want to defend against the second attack as well, we would need to introduce another key derivation step that expands an even more fundamental seed g into both d and z, which, from what we can tell at this time, seems to not be FIPS compliant. But the second attack is also substantially weaker, requiring two decapsulating parties, so even just using (d, z) as private key should mitigate the worst of things.

The best part is that we can prove, without ever using any of the internals of the CPA scheme, that any KEM that uses the Fujisaki-Okamoto transform and stores the private key as a seed, has both MAL-BIND-K-PK and MAL-BIND-K-CT properties, and a fortiori also has LEAK-BIND-K-PK, LEAK-BIND-K-CT, HON-BIND-K-PK, HON-BIND-K-CT, since those only allow for weaker attackers. In other words, Kemmy is the worst thing that can happen to a Fujisaki-Okamoto transformed KEM, and the transform actually prevents most misbinding attacks, especially always prevents the much more powerful honest or leak attacks.

Conclusions

You might have noticed that I performed a small trick when introducing the misbinding attacks: The motivating example, invisible salamanders, does not actually apply to KEMs, but to AEADs. We know that invisible salamanders can be quite powerful in real world scenarios, but I completely omitted any motivating example where a malicious misbinding attack on a KEM actually matters. The reason for that is simply that I am not aware of any real life protocol, or even just hypothetical protocol, where a malicious misbinding attack on a public key encapsulation scheme actually matters. This is not to say that it does not exist, but we are talking about a scenario where the attacker does not only know the private key, but has intentionally manipulated it, and in the case of the more powerful attack, has intentionally manipulated it in a way that makes it impossible for an honest party to encapsulate to it. That is not to say that it is impossible for these attacks to matter, just that the attacker model here is incredibly strong.

As another point of comparison, we should also look at classical KEMs, i.e. RSA-KEM and ECIES-KEM, and check which binding properties those KEMs have.

For RSA-KEM, which is just “take a random number between 0 and n, hash it for the shared secret and raise it to e mod n for the ciphertext”, we can find a misbinding attack for an honest attacker: Take a random number smaller than both moduli, then the shared secret will be the same, since it does not depend on the modulus, but, when raised to the power e modulo the different moduli, will produce different ciphertext. So right of the bat, HON-BIND-K-CT and HON-BIND-K-PK are violated.

For ECIES-KEM, the situation is a bit more complex. ECIES-KEM uses a static-ephemeral key exchange to create a KEM, i.e. the public key is a static ECDH public key, the ciphertext is a ephemerally created, random ECDH public key, and the shared secret between those to public key is used to computed the shared secret of the KEM. In order to be IND-CCA2 secure, the shared secret needs to also depend on the ephemeral public key, so ECIES-KEM needs to, for the very least, hash the shared secret and the ephemeral public key together. The reason for that is that elliptic curve public keys are not uniquely encoded, and can in particular differ in the sign of the y coordinate without changing the shared secret of an ECDH key agreement. So if the shared secret did not depend on the ephemeral public key, an attacker could query the oracle for the ephemeral public key with the sign of the y coordinate switched, and would learn the shared secret of the challenge ciphertext. If we just want IND-CCA2 security though, it is not necessary to include the static public key in the hash that produces the shared secret, and could theoretically omit it. However, if we do so, an honest attacker could flip the sign of the public key, and produce a (extremely weak) HON-BIND-K-PK violation. If the version of ECIES-KEM used includes both static and ephemeral public keys in the key derivation function call for creating the shared secret, this problem goes away and ECIES-KEM becomes MAL-BIND-K-PK and MAL-BIND-K-CT. However, the fact that all classical KEMs exhibit some weakness when it comes to misbinding, and that those weaknesses show up in the far weaker attacker model, is probably a good sign that misbinding properties for KEMs are not the worst attack vector imaginable, and far weaker than invisible salamanders when it comes to real life protocols.

But then again, using the seed, especially in the case of ML-KEM where the key generation function is incredibly cheap, is a very straightforward fix to the problem, and by deciding to do so, we do not have to care whether or not misbinding attacks matter in practice, and as a neat side benefit drastically reduce the size of the private key, and it is really nice to see the IETF pretty unanimously agreeing to do so. It is always better to reduce the sharp edges in cryptographic libraries after all, and one attack automatically mitigated is one less attack to worry about.

Leave a Reply

Discover more from Key Material

Subscribe now to keep reading and get access to the full archive.

Continue reading