#consensusProtocol #BFT
Co-authored by Malkhi, from [[Malkhi, Momose, Ren]]
[[1803.05069] HotStuff: BFT Consensus in the Lens of Blockchain](https://arxiv.org/abs/1803.05069)
# Abstract
[[HotStuff]] is a leader-based Byzantine fault-tolerant replication protocol for the partially synchronous model.
When synchronous, it allows for a leader to drive the protocol to consensus at the pace of actual, as opposed to maximum, network delay - a property called responsiveness.
Does so with communication complexity linear in the number of replicas
# Introduction
BFT in context of SMR is when the system as a whole provides a replicated service whose state is mirrored across n deterministic replicas. Ensures that the order of execution for client-initiated service commands is agreed upon.
Uses partially synchronous communication model, and requires n >= 3f+1
Permissioned blockchains, in contrast to permissionless, have a fix set of replicas but that number can still be massive.
**The Scaling Challenge**
[[PBFT]] created the two-phase paradigm, meaning a stable leader can drive consensus in two rounds of message exchanges.
Phase one guarantees proposal uniqueness by forming a quorum certificate of (n-f) votes.
Phase two guarantees the next leader can convince replicas to vote for a safe proposal
*view-change algorithm* is the epicenter of replication, allowing a new leader to collect info and propose it to rpelicas.
Requires leader to relay info from (n-f) replicas, each reporting its own highest known QC. Has scaling challenges.
HotStuff has a three-phase core
Leader picks the highest QC it knows of
A new second phase allows replicas to change their mind after voting without requiring a leader proof - simplifies leader replacement and scaling challenges
Only [[Tendermint]] and [[Casper]] follow this simple leader regime
HOWEVER, they're built around a synchronous core - accomodate the worst-case time it takes to propogate messages over the p2p [[gossip network]]. Forgoes *responsiveness*
**[[Responsiveness]]** requires that a leader can get consensus in time depending only on the actual message delays, not worrying about any upper bound
**Optimistic Responsiveness** requires responsiveness only in beneficial (and hopefully common) circumstances - here, after GST is reached.
The difficulty in achieving it is that there may exist an honest replica with the highest QC that the leader doesn't know about.
**Contribution of HotStuff is that it's the first BFT SMR protocol to achieve:**
- **Linear View Change:** After GST, any correct leader sends only O(n) authenticators to drive a consensus decision. If cascading leader failures, comm cost is O(n^2)
- **Optimistic Responsiveness:** After GST, any correct leader needs to wait for first n-f responses to guarantee it'll create a proposal that will make progress. Includes case where a leader is replaced.
Also allows for frequent succession of leaders, as the cost for a new leader to drive protocol to consensus is no greater than for current leader.
All of this is achieved by adding another phase to each view, slightly increasing latency.
It's also very simple: only two message types and simple rules to determine how a replica treats each. Safety specified via voting an commit rules over graphs of nodes.
Liveness mechanisms encapsulated within a *pacemaker* separated from safety
Expressive, allowing representation of other protocols as minor variations
This flexibility derives from its operation over a graph of nodes - forms a bridge between classic BFT and blockchains
# Related Work
In [[asynchrony resilience]] world, we know the [[availability-finality dilemma]] means BFT is unsolvable in an asynchronous setting in face of a single failure.
We also know a (n-1)/3 resilience bound exists for any asynchronous solution
if you're working in an asynchronous environment, and you want to be certain that you can still achieve consensus despite some processes being malicious or faulty, then at most (n−1)/3(n−1)/3 of the processes can be faulty. If the number of faulty processes exceeds this threshold, consensus can't be guaranteed.
Partial synchrony approach preserves synchrony during asynch periods and then guarantees termination after system becomes synchronous ([[DLS]])
Randomized Byzantine solutions employ randomized asynchronous Byzantine consensus, with best known communication complexity of O(n^3)
Bitcoin's [[Nakamoto Consensus]] is synchronous with probabilistic safety and no finality.
# Model
**Fixed set** (no [[dynamic participation]]) of n=3f+1 replicas.
[[Partial synchrony]] model: Known bound $\Delta$ and an unknown Global Stabilization Time such that after GST all transmissions between two correct replicas arrive within $\Delta$
Always ensures safety, and guarantees progress if system remains stable (message arrive within $\Delta$) for sufficiently long after GST.
**Cryptographic primitives**
Uses *threshold signatures* in a (k,n)-threshold signature scheme
All replicas have single public key and each of the n replicas has a unique private key
Replicas can use their private key to generate a partial signature
If there are partial signatures where the size of the set is k, then they can be combined to produce a full dig sig on the message
A threshold of k=2f+1 is used
**Complexity Measure**
*Authenticator complexity* - the sum over all replicas of the number of authenticators received by replica i in the protocol to reach a consensus decision after GST
An *authenticator* is a signature or partial signature
# Basic HotStuff
Solves [[SMR]] problem
HotStuff works in a succession of numbered views
Each viewNumber has a unique dedicated leader known to everyone
Each replica stores a tree of pending commands
Each tree node contains a proposed command, protocol metadata, and a parent link
The branch led by a given node is the path from the node to the root by visiting parent links
During the protocol, a monotonically growing branch becomes *committed*
This happens when the leader of a particular view proposing the branch collects votes from a quorum of (n-f) replicas in the three phases: PREPARE, PRE-COMMIT, and COMMIT
The collection of (n-f) votes over a leader proposal is key, known as a **quorum certificate**
Associated with a particular node and view number
## Phases
### PREPARE
- Protocol for new leader starts by collecting new-view messages from (n-f) replicas
new-view message sent as a replica transitions into viewNumer and carries the highest prepareQC it received
- Leader processes these messages to select a branch that has the highest preceding view in which a prepareQC was formed
Leader selects prepareQC with the highest view, highQC, among the new-view messages. Since it's the highest among (n-f) replicas, no higher view could have reached a commit decision, thus the branch led by highQC's node is safe
- Leader uses `createLeaf` to extend highQC.node with a new proposal.
Creates a new leaf node as a child
Sent in a `PREPARE` message to all other replicas, carrying highQC for safety justification
- Upon reception of `PREPARE` from current leader, replica r uses `SAFENODE` predicate to determine whether to accepted.
- If accepted, replica sends `PREPARE` vote with a partial signature for the proposal to the leader
**`SAFENODE` predicate:** Examines proposal message m carrying QC justification m.justify and determines if m.node is safe to accept.
- Safety rule to accept a proposal is the branch of m.node extends from the currently locked node lockedQC.node
- Liveness rule is to accept m is m.justify has a higher view than lockedQC
- True as either one of the two holds
### PRECOMMIT
When leader receives (n-f) `PREPARE` votes for the current proposal, it combines them into a prepareQC
- This is then broadcasted in `PRE-COMMIT` messages
Replica responds to the leader with `PRE-COMMIT` vote having a signed digest of the proposal
### COMMIT
Similar to PRE-COMMIT
When leader gets (n-f) `PRE-COMMIT` votes, combines them into a precommitQC
- This is broadcasted in `COMMIT` messages.
- Replicas respond with a `COMMIT` vote
Replica becomes *locked* on the precommitQC by setting its lockedQC to precommitQC
### DECIDE
When leader gets (n-f) `COMMIT` votes, combines them into commitQC
- Broadcast in a `DECIDE` message
Upon reception of `DECIDE` message, replica considers the proposal in the commitQC a committed decision and executes the commands in the associated committed branch
Replica increments viewNumber and starts the next view
### `NEXTVIEW` interrupt
In all phases, replica waits for a message in the view for a timeout period
If timeout expires, replica goes to next view
## Data Structures
### Messages
Has:
sender's current view number,
the type/phase (new-view, prepare, pre-commit, commit, decide),
m.node contains proposed node (leaf node of proposed branch)
(optional) m.justify - used by leader to carry QC for each phase
Replicas use it in new-view to carry highest prepareQC
m.partialSig
### [[Quorum Certificate]]
Combines a collection of signatures for the same message signed by (n-f) replicas
Has type, viewNumber, and node
### Tree and Branches
Each command wrapped in a node that contains parent link - hash digest of parent node.
Replica delivers a message only after the branch led by the node is already in its local tree
## Bookkeeping variables
viewNumber
locked quorum certificate, *lockedQC*, for storing the highest QC for which a replica voted `COMMIT`
prepareQC, storing the highest QC for which a replica voted `PRE-COMMIT`
Also maintains the highest node whose branch has been executed in order to incrementally execute a committed log of commands
![[Pasted image 20230818065536.png]]
![[Pasted image 20230818065823.png]]
## Safety, Liveness, and Complexity
**Safety:** Define a quorum certificate qc to be valid if tverify(<qc.type, qc.viewNumber, qc.node>, qc.sig) is true
**Lemma 1:** For any valid qc1, qc2 in which qc1.type = qc2.type and qc1.node conflicts with qc2.node, their view numbers are not the same
Proof: Show a contradiction, suppose they do have the same view number. Since a valid QC can be formed only with n-f = 2f+1 votes (partial sigs), there must be a correct replica who voted twice in the same phase of v. This is impossible because the algorithm allows voting only once for each phase in each view
**Theorem 2:** If w and b are conflicting nodes, then they cannot both be committed, each by a correct replica.
Proof: By contradiction. Let qc1 denote a valid commitQC such that qc1.node = w, and qc2 a valid commitQC such that qc2.node = b. By lemma 1, we know they have different view numbers, we'll assume qc1 was first.
Denote by $v_{s}$ the lowest view higher than $v_{1}$ for which there is a valid prepareQC, $qc_{s}$ where qcs.node conflicts with w. Formally, define the following predicate for any prepareQC:
E(prepareQC) = $(v_{1} < pr epar eQC.viewNumber \leq v_{2} ) \cap (pr epar eQC.node$ conflicts with w)
Set first switching point qcs: ![[Pasted image 20230818070820.png]]
By assumption qcs must exist.
Let r be the first that sent a tsign for both qc1 and qcs, since they could otherwise not exist.
During v1, r updates its lockedQC to a precommitQC on w. The lock placed on the branch led by w is not changed before qcs is formed, otherwise r musthave seen some other prepareQC with lower view.\
Consider invocation os SAFENODE in the Prepare phase of view vs. By assumption, qcs's node conflicts with lockedQC.node, so safenode must return false and r cannot cast a prepare vote on the conflicting branch in view vs.
**Lemma 3:** If a correct replica is locked such that lockedQC = precommitQC, then at least f+1 correct replicas voted for some prepareQC matching lockedQC
Proof: Suppose replica r is locked on precommitQC. Then, (n-f) votes were cast for the matching prepareQC in the PREPARE phase, out of which at least f+1 were from correct replicas.
**Theorem 4:** After GST, there exists a bounded time period $T_{f}$ such that if all correct replicas remain in view v during $T_{f}$ and the leader view for v is correct, then a decision is reached.
Now provide constructions for leader and nextView that ensure that after GST, eventually a view will be reached where the leader is correct and all correct replicas remain in this view for $T_{f}$ time.
**Livelessness with two-phases**
Infinite non-deciding scenario for a two-phase hotstuff that explains the necessity for introducing a synchronous delay in Casper and Tendermint (abandoning Optimistic Responsiveness)
In two-phase HotStuff, omit pre-commit and go straight to commit.
Replica becomes locked when it votes on a prepareQC
Suppose in view b a leader proposes b.
It completes PREPARE, and some $r_{v}$ votes for the prepareQC such that qc.node = b. Hence, $r_{v}$ becomes locked on qc.
Asynch network scheduling causes the rest of the replicas to move to view v+1 without receiving qc.
Repeat ad infinitum following the single-view transcript.
Start v+1 with only $r_{v}$ holding the highest prepareQC in the system
New leader collects new-view messages from 2f+1 replicas excluding $r_{v}$
Highest prepareQC among these, qc', has view v-1 and b' = qc'.node conflicts with b.
New leader then proposes b'' which extends b -> 2f honest replicas respond with a vote and $r_{v}$ rejects because it's locked on qc, b'' conflicts with b and qc' is lower than qc.
Eventually, 2f replicas give up and move to the next view.
JUST THEN, a faulty replica responds to v+1 leader's proposal, it puts together prepareQC(v+1,b''), and one replica votes for it and becomes locked onto it
**Complexity**
In each phase, there are O(n) authenticators received in total, with a constant number of phases, overall complexity per view is O(n)
# Chained HotStuff
Takes three phases for a Basic HotStuff leader to commit a proposal
The phases aren't doing useful work except collecting votes from replicas, and every phase is pretty similar
*Chained HotStuff* improves basic HotStuff while also simplifying it by changing the view on every `PREPARE` phase so each proposal has its own view
This change reduces the number of message types and allows for pipelining of decisions
- Votes over a `PREPARE` phase are collected in a view by the leader into a genericQC.
- The genericQC is relayed to the leader of the next view, essentially delegating responsibility for the next phase to the next leader
- Previously would have been to `PRE-COMMIT`
- New leader doesn't carry a pre-commit phase, but instead initiates a new prepare phase and adds itsown proposal
- The prepare phase for view v+1 simultaneously serves as pre-commit for view v.
- prepare phase for v+2 simultaneously serves as pre-commit for v+1 and as the commit phase for v
- This is possible because all phases have an identical structure
- Only two types of messages in Chained HotStuff: `NEW-VIEW` and generic-phase `GENERIC` message
**Dummy Nodes**
The *genericQC* used by a leader in some viewNumber may not directly reference the proposal of the preceding view
This is because the leader of a preceding view fails to obtain a QC, either because there are conflicting proposals or a crash
To simplify tree structure, createLeaf extends genericQC.node with blank nodes up to the height of the proposing view so view numbers are equated with node heights.
Consequently, the QC embedded in a node b may not refer to its parent
**One-Chain, Two-Chain, and Three-Chain**
When a node b* carries a QC that refers to a direct parent (b*.justify.node = b*.parent), we say it forms a *One-Chain.*
Denote by b'' = b*.justify.node. Node b* forms a *Two-Chain* if in addition to forming a One-Chain, b''.justify.node = b''.parent
It forms a *Three-Chain* if b'' forms a Two-Chain
Looking at chain b = b'.justify.node, b' = b''.justify.node, b'' = b*.justify.node, ancestry gaps might occur at any one of the nodes.
Similar to basic HotStuff failing to complete a view and moving onto nextView
If b* forms a One-Chain, the prepare phase of b'' has succeeded
Hence, when a replica votes for b*, it should remember genericQC <- b*.justify
It's safe to update genericQC even when a One-Chain is not direct so long as it is higher than the current genericQC
If b* forms a Two-Chain, the pre-commit phase of b' has succeeded.
Replica should updated lockedQC <- b''.justify
If b* forms a Three-Chain, the commit phase of b has succeeded, and b becomes a committed decision.
*NOTE: One way to think about this concept is that the number of chains a protocol uses is how many concurrent 'chains + 1' are running at the same time. For example, with four phases, the three-chain hotstuff has four concurrent chains running. ChatGPT described it as the number of phases or chains inthe commit rule of the BFt protocol. The more chains there are, the more phases or steps in the commit process.*
![[Pasted image 20230819090328.png]]
# Implementation
![[Pasted image 20230819090701.png]]
### Pacemaker
**Pacemaker** is a mechanism that guarantees progress after GST, achieved through 2 ingredients: synchronization and leader provision
Synchronization brings all correct replicas and a leader into a common height for a sufficiently long period
Can use a rotating leader scheme known by replicas
Also gives the leader a way to choose a proposal that will be supported by correct replicas, by collecting new-view messages to discover the highest QC
![[Pasted image 20230819091235.png]]
# One-Chain and Two-Chain BFT Protocols
Examine four BFT replication protocols spanning four decades of research in BFT, casting them into a chained framework similar to Chained HotStuff
## [[DLS]]
Simplest One-Chain commit rule, and first known asynchronous Byzantine consensus solution
Replica becomes locked on the highest node it voted for
Easily leads to deadlock if a leader equivocates and two correct replicas become locked on the conflicting proposals at that height
Relinquishing lock is unsafe unless 2f+1 indicate they didn't vote for locked value
Only the leader of each height can itself reach a commit decision by the One-Chain commit rule, meaning only the leader is harmed if it equivocates.
Replicas can unlock if 2f+1 replicas did not vote or if there are conflicting proposals signed by the leader.
Not practical
## [[PBFT]]
Two-Chain commit rule
When a replica votes for a node that forms a One-Chain, it becomes locked onto it
Conflicting One-Chains at the same height aren't possible, as each has a QC, so deadlock from DLS is avoided
IF one replica holds a higher lock than others, a leader may not know about it even if it collects info from n-f replicas. Could prevent leaders from reaching decisions ad infinitum due to scheduling
There's a method for unlocking and unsticking but is involved.
Quadratic communication per leader replacement
## [[Tendermint]] and [[Casper]]
Tendermint has Two-Chain commit rule identical to PBFT
Casper has Two-Chain rule where leaf doesn't need to have a QC to direct parent
Leader sends highest One-Chain it knows with a proposal
Replica unlocks a One-Chain if it receives a higher from the leader
Because correct replicas may not vote for a leader's node, to guarantee progress a new leader must obtain the highest One-Chain by waiting for maximal network delay. Otherwise, if leader only waits for first n-f messages, there's no progress guarantee. Leader delays provide liveness.
HotStuff borrows this linear leap in communication complexity. Due to the extra QC step, HotStuff doesn't require the leader to wait the maximal network delay.
# Conclusion
PBFT introduced its Core two-phase paradigm. The first phase guarantees proposal uniqueness through a QC. The second phase guarantees that a new leader can convince replicas to vote for a safe proposal. This requires the leader to relay information from (n−f) replicas, each reporting its own highest QC or vote. Generations of two-phase works thus suffer from a quadratic communication bottleneck on leader replacement
HotStuff's three-phase core allows a new leader to pick the highest QC it knows of. Allows for easy pipelining, leader rotation, and simplification.