Hash levels

Note: This categorisation system is a work in progress. Please do not hesitate to contact me if you have suggestions or find flaws in the following text.

Until now hash functions have generally been categorised as either cryptographic or non-cryptographic. However, there are several different hash function capabilities that are useful for different purposes. I here try to categorise the most useful capability sets. A complete description consists of a level, which describes what types of capabilities a hash implements, an output size, and possibly a key size. The output size and key size describe how hard it is for an attacker to break the capabilities, either through luck or computation.

Level 1 (n bits of output)

A level 1 hash should, given two different inputs, produce a collision with a probability of 1 in 2ⁿ. This must hold true even if the inputs differ only by minor differences, such as inserted, deleted, altered or moved sections. It may be possible, with relatively little effort, to intentionally design input that collides. In order to avoid signalling level 2 capabilities, a level 1 hash should not accept a key, seed, nonce or similar.

Level 2 (n bits of output) (k bits of key)

A level 2 hash must take a key as part of the input. In addition to level 1 capabilities a level 2 hash must be designed so that an attacker, given no knowledge of the key or any output from the hash, cannot produce a c-way multicollision with a probability greater than 1 in 2^{min(k,n (c-1))}.

Level 3 (n bits of output) (k bits of key)

In addition to level 2 capabilities, a level 3 hash must be designed so that an attacker with access to compute hashes for arbitrary input, without knowing the key, must not be able to recover the key, or any derivative thereof that is useful for computing hashes, with probability 1 in 2^p, using less computations than the equivalent of computing 2^k-p hashes. Producing a collision with certainty must be no easier than recovering the key. Generating a c-way multicollision with probability 1 in 2^q must not be possible for q<min(k,n (c-1)). It must not be possible for an attacker to generate a valid input/hash value pair, without obtaining the key, with probability greater than 1 in 2ⁿ.

Level 4 (n bits of output)

In addition to level 1 capabilities, there must be no faster way of generating a collision than brute force search. Knowing the hash value of an unknown string must not improve the probability of guessing the hash value of a related unknown string, except for the case where an attacker may use the hash value to verify a guess on the contents of the original string. The hash may take a key as part of the input, if it does, it must also have all capabilities of a level 3 hash.

Level 5 (n bits of output)

In addition to level 4 capabilities, there must be no faster way of generating a multicollision than brute force search.

For all hash levels it should be possible to drop part of the output and maintain all capabilities for the new output size.

Hash functions may claim lower capabilities than their implemented output and key size suggest. Authors of such functions should generally advertise output and key sizes according to capabilities.

A level 1 hash is useful for cases where there are absolutely no adversaries. Level 1 is what is usually referred to as a non-cryptographic hash.

Level 2 hashes are useful as internal hash functions for hash maps and similar data structures, where an adversary could use collisions in a level 1 hash to cause a denial of service. Beware that while a practical attack on a level 2 hash seems unlikely, timing attacks could reveal a small amount of information about hash values, thus technically breaking the premise of an attacker having no knowledge of hash values.

Level 3 hashes can also be used for hash maps, and suffer no theoretical deficiency for this job. They can also be used for generating message authentication codes, and are therefore usually called MAC functions.

Level 4 and level 5 functions can be used for identifying and verifying the integrity of a piece of data, without the need for a secret key. Level 4 functions provide only n/2 bits of resistance against preimage attacks, where level 5 provide n bits. Level 5 is what is usually called a cryptographic hash.