In the introduction we assume a log that we can snapshot and then efficiently prove that a later snapshot contains all the entries that a given, previous snapshot did. We also require that a batch of certificates can be included in between two, consecutive snapshots. (We call these batches, segments). In this section we'll explain how to realise that log in practice.
Batching certificates into a segment is the easier problem so we'll deal with that first.
We need to be able to attach, to each certificate, an audit proof, signed by the log, that shows that the certificate has been included in a snapshot. Hundreds of certificates might have been added since the previous snapshot but, when the client verifies the signature, it only has a single certificate in the segment. Typical digital signature schemes require the whole signed message before the signature can be verified but we don't want to have to get all the certificates in the segment to the client.
So we use a fairly standard cryptographic construction called a Merkle tree. Consider the tree diagram below. The blue leaves are hashes of certificates that we want to sign (a, b, c and d). Each of the interior, red nodes gets labeled with the hash of the hashes of its children. The root of the tree is thus a hash over all the children.
The "signature" for one of the certificates in the tree then consists of a standard digital signature of the root of the tree, followed by a path up the tree to the root.
So a signature of certificate b would consist of the value of the node labeled x (which allows the value of node y to be calculated) and the value of node z (which allows the value of the root to be calculated). If we assume that our hash function has second-preimage resistance then a signature of the root is a signature of all the certificates in the tree.
That deals with how we sign a whole log segment with a single signature. Now, when a client is verifying an audit proof for a certificate, that audit proof will be based on a previous snapshot of the log (from the time that the signature was issued). Next we have to show how to prove that a previous snapshot of the log is consistent with a later, trusted snapshot. (Otherwise the log could sign a snapshot which hasn't been published.)
As a strawman proposal, consider this diagram:
It represents a log which is taking a snapshot every four certificates. The root of the Merkle tree for each segment hashes in the hash of the previous segment. Thus it forms a structure a little like a series of git commits.
Assume that the client trusts that snapshot s has been verified by an auditor, who has downloaded all the certifiates in the log and checked that they hash to the correct value. The client now has to verify an audit proof and certificate which was created in snapshot p. If it obtains the value of the two children of q then it can calculate the value of q. Likewise, the values of the two children of r and s allow it to calculate the value of r and then s. If the two values of s match (the trusted value and the calculated value) then the client is convinced that 'p' is included in 's'.
However, the amount of data in a proof of consistency grows linearly with the number of snapshots in between. If we assume that the log snapshots once an hour then a proof of consistency spanning six months needs nearly 300KB of hash data.
So we build a tree of snapshots too:
In this diagram, the mauve nodes are roots of the segment Merkle trees again (although I've only drawn in one of the segment trees). The green nodes form a Merkle tree of Merkle trees.
On the left is the state of the log after three snapshots (there are three mauve nodes). On the right is the log after another two segments have been added (z and v).
If we simply wanted to ensure segment x was contained in the right-hand snapshot, then we could simply provide a Merkle from x to the root. However, the number of such proofs increases as O(n**2) as the number of snapshots increases so we would also like a way to prove that an entire snapshot is consistent with a later snapshot. This would allow us to build structure of snapshots so that we could prove consistency from one snapshot to closer and closer snapshots and thus only need a logarithmic number of different proofs.
So now we want to be convinced that state three (on the left) is a subset of state five (on the right). We trust the value of the root of state five and we have the value of x because we verified an audit proof for a certifciate in segment x. The ringed nodes on the left are nodes that have changed in state five because they've had children added.
First we provide a Merkle path from x to r. This just consists of the value of y since x has no sibling. Now we provide a Merkle path from x to r. We don't need to repeat y, but we do need to provide the value of z. At this point we've shown that the subtree at r in the right hand snapshot is equal to the subtree at r in the left, save for the additional nodes. Now, a Merkle path from r' to the root completes the proof of consistency.