Concurrent Merkle Trees - Bubblegum V2

Summary

Concurrent Merkle Trees explains the data structure underlying compressed NFTs. This page covers how merkle trees work, leaf paths, proofs, validation, and the concurrency mechanism that enables parallel writes within the same Solana block.

Merkle trees store cNFT data as hashed leaves with a single root representing the entire tree
Proofs enable verifying a specific cNFT exists without rehashing the entire tree
The concurrent mechanism uses a ChangeLog buffer to handle multiple writes per block
The rightmost proof is stored on-chain, enabling minting without sending proof data

Introduction

A Merkle Tree is a tree data structure in which each leaf node is labeled with a hash representing some data. Adjacent leaves are hashed together, and the resulting hash becomes the label for the node that is the parent of those leaves. Nodes at the same level are hashed together again, and the resulting hash becomes the label for the node that is the parent of those nodes. This process continues until a single hash is created for the root node. This single hash cryptographically represents the data integrity of the entire tree, and is called the Merkle root.

Most Merkle trees are binary trees, but they do not have to be. The Merkle tree used for Bubblegum compressed NFTs (cNFTs) is a binary tree as shown in our diagram.

React Flow

When we talk about storing the state of data on the blockchain, if we store this Merkle root, we can effectively store a single value that represents the data integrity of everything that was previously hashed in order to create the root. If any leaf value were to change on the tree, the existing Merkle root would become invalid and need to be recomputed.

For Bubblegum compressed NFTs, the leaf node hashes are the hash of a leaf schema. The leaf schema contains a leaf ID, owner/delegate information, a creator_hash representing the cNFT's creators, and a data_hash representing the compressed NFT's metadata in general (it again includes the creator array). So all the information we need to cryptographically verify a single compressed NFT is stored in the hashed leaf schema.

Leaf Path

As we learned in the previous section, in a Merkle tree only the leaf nodes represent end-user data. The inner nodes leading up to the hash are all just intermediate values in service to the Merkle root. When we refer to a leaf node's Path, we mean the leaf node hash itself and the inner nodes directly leading to the Merkle root. For example, the Path for leaf 2 is highlighted in the diagram below.

React Flow

Leaf Proof

If we want to prove whether a compressed NFT exists in a Merkle tree, we don't need to rehash all the leaf nodes. As you can see in the diagram below we only need to have certain values to hash together until we calculate our Merkle root. These values are known as the Proof for the leaf. Specifically, the Proof for a leaf node is the adjacent leaf node's hash, and the adjacent inner node hashes that can be used to calculate the Merkle root. The Proof for leaf 2 is highlighted in the diagram below.

React Flow

Leaf Validation

The process for using the leaf node and its Proof to calculate the Merkle root is as follows:

Start with our raw leaf schema, hash it.
Hash the value from step 1 with the sibling leaf node's hash to create the next value up of the leaf's Path.
Hash the path value from step 2 with the next sibling inner node, which is the next value of the Proof.
Continue this process of hashing values with sibling inner node values, up the tree until we calculate the Merkle root.

If the Merkle root we calculate matches the Merkle root we were given for that tree, then we know that our exact leaf node exists in the Merkle tree. Also any time a leaf node is updated (i.e. when the cNFT is transferred to a new owner), a new Merkle root must be calculated and updated onchain.

Concurrency

The onchain Merkle tree used for cNFTs must be able to handle multiple writes occurring in the same block. This is because there can be multiple transactions to mint new cNFTs to the tree, transfer cNFTs, delegate cNFTs, burn cNFTs, etc. The problem is that the first write to the onchain tree invalidates the proofs sent for other writes within the same block.

The solution for this is that the Merkle tree used by spl-account-compression doesn't only store one Merkle root, but also stores a ChangeLog of previous roots and the paths for previously modified leaves. Even if the root and proof sent by the new transaction have been invalidated by a previous update, the program will fast-forward the proof. Note the number of ChangeLogs available is set by the Max Buffer Size used when creating the tree.

Also note that the rightmost proof for the Merkle tree is stored onchain. This allows for appends to the tree to occur without needing a proof to be sent. This is exactly how Bubblegum is able to mint new cNFTs without needing a proof.

Notes

The max buffer size set at tree creation determines how many concurrent changes can be fast-forwarded per block.
If more concurrent changes occur than the buffer size allows, some transactions will fail and need to be retried.
Proofs fetched from the DAS API may become stale if the tree is modified between fetch and use. The ChangeLog mechanism mitigates this for concurrent operations.
The rightmost proof optimization allows minting (appending) without any proof data, since new leaves are always added at the rightmost position.

Glossary

Term	Definition
Merkle Tree	A binary tree where each node is a hash of its children, with leaves representing cNFT data
Merkle Root	The single top-level hash representing the integrity of the entire tree
Leaf Path	The leaf node hash and all intermediate nodes leading to the root
Proof	The sibling hashes needed to recalculate the root from a given leaf
ChangeLog	A buffer of recent tree modifications that enables concurrent writes by fast-forwarding stale proofs
spl-account-compression	The Solana program that manages the on-chain concurrent merkle tree