Mitigating Hertzbleed, general timing protection
The emergence of the Hertzbleed attack has underlined how difficult it is to prevent side channel data leaks. Constant time cryptographic algorithms have been standard for a while now, but the real world is invariably messier than an idealised computer model. Even in the absence of boosting CPU cores, there is always technically some data leak through power consumption and other measurable parameters, which we generally assume are too noisy to be of any real concern.
It is however impossible to be certain that there will be no new revelation bringing timing or timing-equivalent attacks back in action. This has lead me to consider the possibility of mitigating timing attacks without relying on constant time code execution.
An important aspect of timing attacks is that they rarely leak a lot of data at once. In typical scenarios each invocation of a cryptographic algorithm will only leak a fraction of a bit of key material. It is only with systematic repeated invocation that an attacker can gather enough info to succeed. So what would happen if we changed the key on each invocation? An attacker could still obtain key information, but not much for each key, so if we can ensure that the key generation method is not itself leaking a master key, and that the fragments an attacker might gather cannot be used to guess the state of the key generator, it should be safe. Most simple CSPRNGs should suffice for the key generation job, as they are usually built with no permanently retained state, and knowing part of the output does not allow an attacker to guess more output.
So for a symmetric cipher, a working construction is seeding a non-state-retaining CSPRNG with a master key, and then use the output as successive keys to encrypt the message blocks. This model has a lot of room for variations. A CSPRNG is built for all of its output to "leak", but a real attacker will have only a small fraction of that data, so there is definitely some room for using faster RNG with a lower leak tolerance. Alternatively keeping the full CSPRNG strength for building a stream cipher is also an option. The key generation could also be integrated into the encryption/decryption step, with each block generating the key for the next.
Ciphers with this kind of linear reliance have generally been shunned due to difficulty of performance scaling, as each block relies at least partially on the previous, it may be impossible to take full advantage of the parallel processing capabilities of a modern CPU. I have found two methods to alleviate this. First of all, design the algorithms to be able to do a lot of work in parallel on a single block, this may require blocks that are somewhat larger than those of current algorithms. Second, any cipher can easily be rearranged to work as multiple instances operating on the same data stream, sliced to fit in vector registers. For instance, an algorithm that works natively with 32 bit numbers could be run in 16 parallel instances, each dealing with 32 bits out of each block of 512 bits of data.
What about asymmetric cryptography? We obviously can't just change a centrally signed TLS key on every use, but we do have other tricks available. First of all, instead of using the master key directly for handshakes, the master key can be used for signing a bunch of locally generated keys, each being used for just a single handshake. Timing attacks on the single use keys are prevented, much like the symmetric keys, by an attacker not getting nearly enough invocations to recover a meaningful amount of each key. But what about the master key? Isn't this just kicking the can down the road? Well, first of all we have prevented an attacker from performing attacks using purposefully crafted keys for the handshake, the master key will not interact with those keys. This is significant as practical timing attacks usually rely on purpose-crafted data that exacerbate the timing differences when interacting with the key. It is however no guarantee against leaks, as the combination of a single use public key and accurate timing of how long it took to sign it, may still reveal a fraction of the master key.
A property of these intermediate single use keys is that they can be generated and signed in advance of their use. Thus instead of letting TLS termination threads generate their own keys ad hoc, a dedicated thread can make large batches of keys. This immediately makes it very hard for a remote attacker to gather any timing information on the signing process, as it is no longer part of the response delay. Furthermore, the batch processing obscures the data to any adversary that can time the batch process, as it is hard to pick out which keys took longer than others.
If the overhead of generating and signing keys is considered too large, reusing each intermediate key a few times instead of once may be a possibility, a timing attack on the intermediate keys should still be very much impossible.
I wouldn't suggest to stop making cryptographic algorithms nominally constant time. There is still a small sliver of a theoretical attack against the batch signing process that scales with any non-constant-timeness. But mostly there is probably not much performance to gain from discarding this principle, as the concessions of constant time code tend to align with those required to utilize vector instructions.
Finally a reminder that there are other side channel attacks, for instance exploiting speculative execution, that this procedure does not deal with.
Daniel J. Bernstein's page on timing attacks, which inspired me to think about alternatives to fixing all the gritty details of constant time computing.