HashML-DSA considered harmful

I ~~mentioned~~ ranted about this topic as a section of a previous blog post (at the very end), but the topic keeps coming up, so I am escalating to a full blog post, since obviously that will help with all these people who are wrong on the internet standardization.

The Problem

Prehashing is a paradigm often used within the context of digital signature schemes. To understand where the problem is, let’s start with the normal definition of a signature scheme, as used by cryptographers, as a tuple of three functions (G, S, V), with the following mappings:

$G\colon 0 \overset{R}{\rightarrow} \mathcal{K} \times \mathcal{P}$ (Key Generation)

$S\colon \mathcal{K} \times \mathcal{M} \overset{R}{\rightarrow} \mathcal{S}$ (Signing algorithm)

$V\colon \mathcal{P} \times \mathcal{M} \times \mathcal{S} \rightarrow \{\top, \bot\}$ (Verification algorithm)

In order to be secure, you’ll need some more stuff, like saying that a signature produced by S will verify when plugged into V with the same data, but that stuff is boring (not really boring) and Dan Boneh has already written it up in his book, so I’ll skip the details here.

As you can see, in the world of mathematics, where everything is perfect and wonderful, there are no hashes anywhere, so to understand what prehashing is about, we will unfortunately go a layer deeper, and pretend to implement these functions in made-up pseudo-code which happens to vaguely resemble C++.

typedef std::vector<uint8_t> Message;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();
Signature sign(PrivateKey sk, Message m);
bool verify(PublicKey pk, Message m, Signature s);

While we now specify that messages and signatures are byte arrays, this API is still very much the same as the mathematical description of a signature algorithm, and no hash function is anywhere to be seen. But there is a subtle problem with this API, or at least something that can be a subtle problem: While PrivateKey, PublicKey, and Signature are all types that, for any given algorithm, are constant in the size they take in memory, (or at least bounded, looking at Falcon), the message can be literally anything, from a few bytes to theoretically petabytes worth of data. This is especially a problem if you want to hold the private key in a separate device, such as an HSM or a Cloud service, and cannot easily transmit or process large quantities of data.

We mainly care about signing here, since private keys are the ones that need protecting, so a first attempt could be to introduce a streamed version of sign():

typedef std::vector<uint8_t> Message;
typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init(PrivateKey sk);
void sign_update(State* st, MessageChunk m);
Signature sign_finalize(State* st);

bool verify(PublicKey pk, Message m, Signature s);

There is still no hash to be found, and the state should probably be some object oriented thing, but we can now solve the problem: To sign a large piece of data, we call sign_init() to create a new stream, call sign_update as often as we need to, updating our state, and call finalize to get the signature. While this is no longer the exact same set of methods as the mathematical definition had, we can see the init/update/finalize pattern as a way to implement a streaming interface.

And there still is no hash. And while this interface is nice and all, it means that our HSM and/or Cloud service now need to manage state, which while possible, is still a bit of a problem. But we can make one more, somewhat more radical change to our API, by delaying at which point the private key gets involved:

typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init();
void sign_update(State* st, MessageChunk m);
Signature sign_finalize(State* st, PrivateKey sk);

bool verify(PublicKey pk, Message m, Signature s);

Assuming our State object is again bounded in size, we now could do sign_init() and sign_update() directly where the data lives, and only have sign_finalize() happen on the HSM/Cloud Service, with the State object itself being transmitted. Note that we still need to trust the caller of sign_init() and sign_update(), since the remote side would no longer be able to tell what it is signing, but we only have to trust that caller to correctly compute the State, and do not need to expose the private key there.

And if we squint a bit, we can now see the hash: With the addition of just one tiny function, we can make it more explicit:

typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Hash;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init();
void sign_update(State* st, MessageChunk m);
Hash sign_finalize_local(State* st);
Signature sign_finalize_remote(Hash message_identifier, PrivateKey sk);

bool verify(PublicKey pk, Message m, Signature s);

Now we can nicely see that signing can be written as the composition of a hash function call, followed by the actual signature algorithm.

So as long as we just keep our interface like this, we can use HSMs and Cloud services and all that good remote oracle stuff, without running into pesky data transmission problems sending over half a Petabyte.

The Problem with the Problem

This, however, does not readily work with every signature scheme. As stated, the signature scheme must decompose into a hash function call and a call to sign the output of that hash function, and not all signature schemes do. In fact, of the two signature schemes NIST just standardized, exactly zero decompose like that, in part because NIST actively encouraged having a property called “non-resignability”, which guarantees that the message identifier by itself, at least for messages with enough entropy to not be brute-forcable, is not enough to create a signature that verifies under a different, possibly attacker controlled key.

But people need the ability to use HSMs and remote oracles, and pushed for NIST to add it to the schemes, so NIST decided to instead of standardizing two schemes, they actually standardized four: ML-DSA, SLH-DSA, HashML-DSA, and HashSLH-DSA, with the latter two being absent from the draft standard. And those four standards are mutually exclusive, have different properties, and are not very well described as separate algorithms. I will go more into HashML-DSA (and HashSLH-DSA) in the last section of this post (which, just as a teaser, is called “The Bad Place”), but, with the hopefully not as futile as it feels wish to prevent us going there, present the in my opinion far better alternatives first.

The Solution (Part 1)

When introducing the whole problem, I somewhat intentionally named only one variable, the message_identifier, and presented the steps leading up to the requirement of decomposing the signature scheme into a hash function and the signing of the message identifier.

The main trick to get to this decomposition was moving the private key from init to finalize. Obviously, any one roundtrip protocol will need the private key to be moved to finalize, since the signature cannot be computed before all message data is accumulated, forcing finalize to be the remote function call that uses the secret key as a side input.

But do we have to make sign_init() to be a method that is completely independent of the rest of the signature scheme? After all, we do have a very conveniently public parameter that we could feed the signing procedure from the very start:

typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Hash;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init(PublicKey pk);
void sign_update(State* st, MessageChunk m);
Hash sign_finalize_local(State* st);
Signature sign_finalize_remote(Hash messsage_identifier, PrivateKey sk);

bool verify(PublicKey pk, Message m, Signature s);

In other words, the message identifier is no longer independent of the key, but only depends on the public key, and not the private key, so the private key can stay remote, but the resulting message identifier is still unique to our signature. A very convenient property, if say, you wanted to not be able to create a signature for a different public key using only the message identifier for messages with sufficient entropy.

And because it turns out to be such a convenient way to achieve this property, it turns out that ML-DSA, when it was still a little contestant going by Dilithium and trying to impress NIST, did exactly this.

And NIST thought that this was a good idea as well, so they left the following comment in Algorithm 7, line 6 of ML-DSA:

message representative that may optionally be computed in a different cryptographic module

And if you look at the rest of ML-DSA, things indeed only depend on this message identifier µ from there on out, and also notice that tr is a hash of the public key, meaning that it is possible to reorder the program in such a way that µ is computed without any knowledge of the private key at all, and can be passed to the HSM/remote oracle (aka the “cryptographic module”, in NIST speak) while being computed in a different “cryptographic module” such as a software library like BoringSSL.

In other words, and to be very explicit, here is how you could implement the above API for ML-DSA:

typedef std::vector<uint8_t> MessageChunk;
typedef std::vector<uint8_t> Hash;
typedef std::vector<uint8_t> Signature;

std::tuple<PrivateKey, PublicKey> generate_key();

State sign_init(PublicKey pk) {
  State st = SHAKE256_init();
  SHAKE256_absorb(&st, SHAKE256_oneshot(pk.serialize(), 64));
  SHAKE256_absorb(&st, "\0\0");
  return st;
}

void sign_update(State* st, MessageChunk m) {
  SHAKE256_absorb(&st, m);
}

Hash sign_finalize_local(State* st) {
  return SHAKE256_squeeze(st, 64);
}

Signature sign_finalize_remote(Hash message_identifier, PrivateKey sk);

bool verify(PublicKey pk, Message m, Signature s);

Or more explicitly: You can prehash ML-DSA with the hash function SHAKE256(SHAKE256(pk, 64) || 0x00 || 0x00 || m), where pk is the public key, and m is the message. Note that this hash function is for the case of using an empty context, but I leave figuring out the hash functions with context to my capable reader (Hint, there are two 0x00’s here for a reason).

The Solution (Part 2)

I want to be very clear: The first part of the solution solves the situation for ML-DSA. Pretty much completely. In my opinion, part 1 of the solution should be the most preferred way for solving this problem whenever we encounter it in its pure form. It allows the signing infrastructure to make choices about the locality of data and private key without having to bother the verifiers with it, allowing for someone to change it later, without having to update every verifier in existence. It also gives us this nice non-resignability bonus property, for whatever that is worth.

There are a few other things to tie up, though. First, this does not work for SLH-DSA, which is a multipass signature scheme, and has to hash the input data multiple times, so even the first API refinement would have led to an unbounded state object. Second, you might want to have resignability, i.e. you might want to produce signatures for the same piece of data with multiple public keys at once, even if that makes some academics sad. And third, you might want to sign the actual petabytes I promised above, and find yourself in need of a tad bit more parallelism than a single output stream can provide.

Thankfully, all three scenarios actually have the same solution: You need a protocol. By protocol I mean any description of the larger context your signature scheme is used in, and not necessarily an interactive protocol of two or more online parties. And the best part is: you already have a protocol, since a signature scheme, by itself, is pretty useless, and needs some larger framework that defines what data in what format is signed by whom, how public keys are distributed and trusted, etc. You never deploy a signature scheme by itself, it is always part of a larger system. And with just a few tweaks to that larger system, you make all the rest of your prehashing woes go away.

For example, when defining what data is signed by your signature algorithm, you could decide to simply sign the hash of some data, instead of the data itself. You could even sign the root of a Merkle tree, if say, you wanted to sign a petabyte of data using massively parallel data stores. You could sign the hash of your large data blob together with some unhashed metadata, which would give the remote side some auditing capabilities. The possibilities are endless. While I often blog about the rather minute details of primitives, it is actually this endlessness of possibilities of the protocol space of cryptographic system that is by far the largest part of my day job. Using the protocol to define a system that does not have to transport immense amount of data over constrained networks or to constrained devices is something that you will have to do in many instances of this problem, and in the simplest case this can be as easy as writing “The signer signs a hash of the data provided”. Yes, this means the data is hashed twice, which is a type of inefficiency, but these kind of tradeoffs are commonplace in protocol design.

The Bad Place

This leaves us with the bad place, HashML-DSA and HashSLH-DSA. Those two signature schemes are, in my opinion, what happens when protocol questions get pushed into the primitive layer, and leave a situation that is either ill-defined, or unnecessary. Note that even when pushing this protocol question into the primitive layer, you do not actually absolve yourself from the need of having a protocol defining the larger deployment of a signature scheme, but you merely make the protocol more complicated to analyze, as the primitive is now diverging from the neatly defined building block it is supposed to be.

The way HashML-DSA and HashSLH-DSA are defined, there are two different ways of signing the hash of some data. Option 1 is to just sign this hash, using the method described in the previous paragraph, by declaring the hash to be the actual data. Option 2 is to use HashML-DSA, and invoke a special version of ML-DSA that flips a bit and adds an OID.

The verifier will need to know which option you used, since, different from the solution (part 1), the resulting signature will differ. So in reality, both scenarios are actually protocol decisions, and nothing has been gained by the use of HashML-DSA.

But since HashML-DSA/HashSLH-DSA are no longer actual signature schemes as defined by the mathematical definition above, but require another parameter, the hash used, as a side input, you now have the problem of how to communicate this side input to the verifier, with two basic options: Either, you make it part of the public key, or equivalently, the verification logic, in which case it is simply superfluous, since your verification logic could have just said to first hash the data and then verify the signature on the hash. Or, you put the choice of hash with the data, and have it as an untrusted side input. The latter is the worst case scenario, because a signature scheme is not guaranteed to be secure if the payload describes how to verify itself. This is the problem with JWT, and the problem of countless security vulnerabilities in existence, and an extremely common mistake to make. The mathematical proofs used to show a signature scheme does not permit forgeries simply do not apply when the signature scheme is not fully defined. In the case of HashML-DSA, this, as far as I can tell, seems to mostly be a theoretical problem, as the hash OID should prevent forgeries, even if a weak hash function has been used in the past, but the mere fact that it encourages sending information necessary to define the verification algorithm through an untrusted channel, is a net negative, given how the whole thing is a completely unnecessary in the first place. We would be best off by ignoring the existence of the HashML-DSA/HashSLH-DSA variants completely, and instead operate on well defined signature schemes at all times.

Side note: A similar thing can be said about the context string, it too breaks the mathematical framework used and cuts through the abstraction layers of what a signature scheme does, but it is pretty easy to ignore or use correctly, so my visceral hatred for the HashML-DSA/HashSLH-DSA variants does not extend quite the same way here. In the end, both boil down to something Cas Cramer said, somewhat jokingly, about the misbinding issues with KEMs: “There should have been a precompetition, where the properties of the candidates for the competition are being set”. We should have clearly defined the APIs as used in practice from the beginning, including being clear about serialization (which causes the misbinding issues), implicit rejection (which causes even more misbinding issues), and prehashing capabilities of signature schemes. That would have avoided the last minute introduction of HashML-DSA, likely caused the use of explicit rejection in KEMs, and made papers like unbindable Kemmy Schmidt unnecessary to be written months before the standards were finalized.

The Problem

The Problem with the Problem

The Solution (Part 1)

The Solution (Part 2)

The Bad Place

Share this:

Leave a ReplyCancel reply

Discover more from Key Material