Stream ciphers form the basis for simpler encryption and decryption algorithms than traditional block ciphers like AES. In particular, the ChaCha family of stream ciphers form the basis of the encryption functionality in O(1) Cryptography which we’ll explore in more detail. Originally published as the Salsa family (PDF) of ciphers, ChaCha (PDF) makes some small modifications to Salsa for increased security while maintaining equivalent performance. ChaCha has since been widely standardized in various networking standards and programming language standard cryptography libraries as an alternative to AES and other ciphers. Recall that a stream cipher is an algorithm that takes a secret key and an input stream which returns an output stream of the same size of the input stream. Stream ciphers work by taking the secret key (and usually some sort of nonce or initial value which cannot be reused for the same key) and generating a stream of deterministic random bits called the keystream which is used for multiple purposes. The primary use of this keystream is to xor it with an input stream of plaintext or ciphertext to produce an output ciphertext or plaintext respectively. Given this mode of operation, compared to block ciphers which require complicated key scheduling algorithms, it can be hard to imagine why block ciphers have been so popular historically speaking. Surely a stream cipher isn’t that simple, is it?

Overview

ChaCha and Salsa are stream ciphers that expand a 256-bit key into 264 randomly accessible streams of 264 randomly accessible 64-byte (512-bit) blocks. They are parameterized by a round number suffix, recommended at 20 (as in ChaCha20 or Salsa20), but also available in 8 and 12 round variants with reduced security margin. This round number controls the number of times the round function is applied to the cipher’s internal state and must be an even number. Each round applies a sequence of constant-time operations on an array of 16 32-bit words consisting of four addition, xor, and constant-distance left shift and rotate operations each. The choice for these operations relies on the mechanical sympathy of how CPUs physically implement addition, xor, and shift/rotate instructions, all of which are both fast and operate in constant time regardless of input. This set of operations is also functionally complete, so despite seeming simple, they can simulate any other Boolean expression or logic gate which makes them sufficiently powerful.

Internal State

The internal state of a ChaCha cipher consists of a 512-bit block of data addressed as 16 little endian 32-bit unsigned integers. Keeping the entirety of the cipher state inside this buffer along with the operations performed on it allows CPUs to keep the state in its cache which helps maximize software performance while simultaneously preventing timing attacks based on memory access patterns common to optimized software implementations of AES. This state is initialized with a secret key, a constant value, and some input data formed from a nonce and initial counter integer. The constant value is what is known as a nothing up my sleeve number and is needed as part of the standard initialization of the cipher state, and in ChaCha, this constant is the 16 byte ASCII-encoded string “expand 32-byte k” which fills the first four little endian integer values of this state. A “nothing up my sleeve number” is an arbitrarily chosen initialization constant value that is used in a cipher where one is needed such that it’s clear that the choice of constant was arbitrary and is therefore unlikely to have intentional backdoors, a problem faced by the secrecy behind parameter choices in DES recommended by the NSA in the 1990s. Such numbers are usually encodings of mathematical constants or ASCII strings. The next eight integers consist of the 32-byte secret key interpreted as an array of eight little endian 32-bit integers key0, …, key7. Finally, the last four integers consist of the initial counter and nonce encoded similarly. This last aspect has three main variants on how to encode the input data: a 64-bit initial counter with a 64-bit nonce; a 32-bit initial counter with a 96-bit nonce; or a 128-bit nonce. These correspond to the original ChaCha cipher, the IETF standardized variant, and HChaCha respectively, the latter being used in the XChaCha variant of ChaCha which uses an extended nonce. Laying this out in a matrix, the internal state looks like this:

0x61707865 0x3320646E 0x79622D32 0x6B206574
key0 key1 key2 key3
key4 key5 key6 key7
input0 input1 input2 input3

These bottom input values correspond to attacker-controlled input which are positioned to reduce the flexibility attackers have in cryptanalysis of this cipher compared to Salsa which organizes the initial state in a different configuration with attacker-controlled input values in cells 6 through 9. This state is updated by a sequence of invertible operations defined by applying a round function the round number of times.

Round Function

ChaCha defines its round function using a smaller sub-operation known as the quarter-round which is applied four times per round using a specified permutation. This quarter-round is responsible for the entirety of the underlying bit-shuffling taking place to produce a keystream and is defined using the following algorithm.

void quarterRound(int[] state, int a, int b, int c, int d) {
    state[a] += state[b];
    state[d] = Integer.rotateLeft(state[d] ^ state[a], 16);

    state[c] += state[d];
    state[b] = Integer.rotateLeft(state[b] ^ state[c], 12);

    state[a] += state[b];
    state[d] = Integer.rotateLeft(state[d] ^ state[a], 8);

    state[c] += state[d];
    state[b] = Integer.rotateLeft(state[b] ^ state[c], 7);
}

Finally, a round consists of a column round or a diagonal round in alternating sequence. A column round consists of a quarter-round applied to each of the four columns. A diagonal round consists of a quarter-round applied to four diagonal permutations of the state.

void columnRound(int[] state) {
    quarterRound(state, 0, 4,  8, 12);
    quarterRound(state, 1, 5,  9, 13);
    quarterRound(state, 2, 6, 10, 14);
    quarterRound(state, 3, 7, 11, 15);
}

void diagonalRound(int[] state) {
    quarterRound(state, 0, 5, 10, 15);
    quarterRound(state, 1, 6, 11, 12);
    quarterRound(state, 2, 7,  8, 13);
    quarterRound(state, 3, 4,  9, 14);
}

ChaCha20 consists of 20 rounds, and this block shows the application of two subsequent rounds. Therefore, a complete application of ChaCha20 will apply the above two rounds 10 times to produce a 64-byte block in the keystream.

void permute(int[] state) {
    for (int i = 0; i < 10; i++) {
        columnRound(state);
        diagonalRound(state);
    }
}

At the end of each key block, we interpret cells 12 and 13 as a little endian 64-bit counter which is incremented in place before generating the next 64-byte keystream block.

void incrementCounter(int[] state) {
    if (++state[12] == 0) {
        ++state[13];
    }
}

Combined with functions to decode and encode bytes to and from arrays of integers, data streams and keystreams can be fairly easily combined. These ChaCha functions are also used for deriving subkeys in the case of ChaCha20-Poly1305 authenticated encryption and for deriving subkeys and sub-nonce data for implementing XChaCha20-Poly1305. Using the keystream output from ChaCha can be used to implement a deterministic random bit generator. When combined with a message authentication function like Poly1305, ChaCha can form the basis for an authenticated encryption with authenticated data algorithm as standardized in RFC 8439. The ultimate advantage to using a cipher like ChaCha that permits efficient secure software implementations is that it can be widely used by less powerful devices or devices lacking intrinsic AES-related instructions for efficient secure hardware implementations. Many existing software AES implementations are vulnerable to a considerable number of attacks depending on the threat model, and proper software implementations without appropriate operating system or hardware support may suffer performance problems leading to deployment of insecure code in practice. Several of these attacks are detailed in Cache-timing attacks on AES (PDF) along with advice on how to mitigate this in software, though one of the key conclusions you might come to is that AES is something to be avoided where possible.

In a future post, we’ll go over how Poly1305 works, combine it with ChaCha, and ultimately define the authenticated encryption with authenticated data (AEAD) algorithm XChaCha20-Poly1305 used in O(1) Cryptography. In the meantime, you can have a look at some of the academic papers and references cited in O(1) Cryptography.