Constant Latency in sleepy consensus - Single Slot Finality Research

[https://eprint.iacr.org/2022/404.pdf](https://eprint.iacr.org/2022/404.pdf) [[Momose-Ren]] The basis for [[Byzantine Consensus under Fully Fluctuating Participation]] After reading this, go back to ^ and see how it improves ## Abstract Longest-chain protocol and its variants suffer from long latency as a tradeoff Depends on security level and participation level This work presents a protocol for dynamic participation and constant latency by extending classic BFT from static quorum size to *dynamic quorum size* while preserving important properties of static quorum. Also includes a recovery mechanism for rejoining nodes ## Introduction BFT consensus allows parties to reach consensus in the presence of malicious actors Protocols have a latency of at least $\Omega\left( \frac{\kappa \Delta}{\gamma} \right)$ where kappa is security parameter, delta is network delay bound, and gamma is fraction of actual participants compared to anticipated level of participation Dependence on security parameter due to k-confirmation rule, where a block is decided only after k subsequent blocks are generated, where k depends linearly on the desired security level Block interval must be larger than delta, and further increases if participation drops, hence the inverse proportionality to gamma. Classic BFT with static participation can achieve O($\Delta$) This approach uses VRF and PKI to create a BFT atomic broadcast protocol in the sleepy model that tolerates minority faults and has an expected O($\Delta$) latency Uses a quorum-based graded agreement protocol to build an atomic broadcast protocol - Main question is how to set quorum threshold, as the protocol is not aware of how many nodes are awake Maybe use dynamic quorum where the quorum size is defined based on each party's perceived participation level? Not possible with malicious actors who can announce themselves to some but not others **Main technical contribution of this paper is to restore the transferability property with dynamic quorums** Also shows an efficient recovery mechanism for rejoining nodes where only ledger contents and recent protocol messages need to be re-transmitted. Tolerates minority faults, where the number of malicious parties f is at most n/2 where n is the minimum number of active participants over the course of protocol execution Necessary in sleepy model Makes progress only in periods of "Stable participation" - when participants don't change too wildly. Remains safe always ## Model and Definitions See [[Byzantine Consensus under Fully Fluctuating Participation]] for a more rigorous writeup on the conditions N total parties communicating over synchronous network $\Delta$ denotes bounds on communication delay and clock skew Loosely synchronized block Number of awake parties at each global time t is 0 < $n_{t}$ <= N, and at most f < $\frac{n_{t}}{2}$ can be faulty All faulty parties awake all the time Safety (total order) and liveness (censorship resistance/fairness) *eventually stable participation* assumption is needed to ensure liveness **Definition 3.1 (T-eventually stable participation):** There exists a global time $T_{s} \geq 0$ such that for all t >= $T_{s}$, more than $\frac{\alpha}{2}$ honest parties are insomniac during \[t, t+T] where $\alpha$ is the number of parties awake at some time during this interval The protocol assumes $7\Delta$-eventually stable participation Use dig sigs + PKI and VRF ### Sleepy Model with Recovery Original [[sleepy model]] assumes that an asleep party, upon wake, immediately receives all past messages. Impractical. Add in a *recovering* status. They don't have to receive all missed message, only recent ones (formalized later). Grace period of $\Gamma = 2\Delta$ is sufficient for recovering Upon rejoining the execution, the recovering party p multi-casts a recovery request, which is received within $\Delta$ but all awake honest parties. They respond with messages and data p missed, which takes another $\Delta$. In practice, this process can take a lot longer if there's a lot of data, so treat the grace period as an independent parameter per party. Can be thought of as the *crash/recovery* process in distributed computing literature. ### Additional Remark on the Sleepy Model To support dynamic participation it is necessary for us to assume that an asleep party knowingly went to sleep and does not take any actions during its sleep ## [[Graded Agreement]] Each party has an input value (possibly empty) and outputs (possibly multiple) pairs of (b,g) of value b and grade bit g ∈ {0, 1} 1. **Consistency:** If an honest party outputs (b,1), then every honest party awake at time t >= T outputs (b,\*) 2. **Integrity:** If no honest party inputs a value b, no honest party outputs (b,\*) 3. **Validity:** If all honest parties awake at the beginning (time 0) have the same input value b, then all honest parties that stay awake all the time output (b,1) **In the classic static participation model:** Quorum-based protocol with honest majority, N=n=2f+1 In round 1 (time 0), each party multi-casts a vote for its own input f+1 votes for the same value is called a *quorum certificate* for that value If a party receives a certificate for b at the end of round 1 (time $\Delta$), it outputs the value b with grade 1 and forwards the certificate to all other parties If it receives the certificate in round 2 or later (time > $\Delta$), it outputs b with grade 0 Certificate transferrability is crucial for consistency: an honest party who outputs (b,1) will forward the certificate, and all honest parties will also recognize the certificate and output (b,0) ### Warmup: A Lockstep GA A graded agreement in the lockstep round model where parties have access to a common clock - every party's local time equals the global time Each party p runs the following four-round algorithm: ![[Pasted image 20230728103958.png]] **Dynamic Quorum:** Threshold based on perceived participation level, the number of parties that have announced themselves to the network. Ensures that the number of honest parties awake in the voting round can meet the quorum threshold Round 3 is the voting round, and the quorum threshold is based on the number of <awake,3> received To make certificates transferable, parties need to collect votes from a majority of actual participation instead of perceived. Do this by making sure all honest and awake parties (a majority of participation) vote **Time-shifted quorum** In round 1, each party sends an echo message for its input value along with an awake message for round 1. Each party tracks the number of echo for any value b, denoted E(b), and the perceived participation level of round 1, denoted M1. Note that these two variables may change over time. In round 3, if a party observes a majority of echo for a value b, meaning E(b) > M1 /2, it votes for b. After round 3, if the number of echo received by time $\Delta$ still meets the majority, a party can be assured that all honest parties awake in round 3 voted for b and can output (b,1) ### GA Protocol under Loosely Synchronized Clocks Parties have loosely synchronized clocks that can differ by up to $\Delta$ Idea is to reserve 2$\Delta$ for each round In sleepy model, this simple approach could lead to a party not sending a message, like its vote even when it observes a majority echo, if it is asleep when instructed to do so Solution is to give a party a $\Delta$ time window to send a message Making the protocol only forward messages if no other message conflicts with it is important ##### Algorithm 1: Graded agreement in the sleepy model ![[Pasted image 20230728110224.png]] Line-by-line explanation: 7. Set $E_{line}$(b) to the number of parties from which the party has received an echo message for b without any conflict. This happens for every value b received 13/14. For each value b for which E(b) is greater than half of M1, multicast a vote for b 17. If p was awake in round 2, then for all b's that recieved more echos without conflict in round 2 than half of awake parties in round 1, add (b,1) to the output 20. Set V(b) to the number of parties from which the party has received a vote message for b 22. For all bs for which V(b) is greater than half of awake parties in round 3, 23. add (b,0) to output ## Atomic Broadcast in Sleepy Model **View-based construction** Protocol follows standard view-by-view paradigm, where each view lasts for $37\Delta$ time. View v starts at 37(v-1)$\Delta$ and ends at 37v$\Delta$ local time View consists of two phases 1. Each party proposes a new block (batch of tx) and a VRF lottery 2. A proposal from an elected leader is decided through a series of GA instances **Blocks and chaining** Set of values (parties' inputs) are batched into a block b A log is a list of blocks, and each log is uniquely identified by a hash Hash of log $\Lambda'=\Lambda\mid|b$ is defined as $\bar{H}(\Lambda')=H(\bar{H}(\Lambda)\mid|b)$ **Propose and Leader Election** A proposal consists of: a new block b and a hash h of a log $\Lambda$, which proposes the new log $\Lambda$||b A VRF evaluation on the current view number, the leader election lottery In phase 2, each party considers the leader of this view to be the party from whom it has received a proposal (of the current view) with the largest VRF Since there are more awake honest parties than faulty parties, with probability at least 1/2, all honest parties recognize the same honest party as the leader **Deciding Phase** It is possible, with constant probability, that the leader election is unsuccessful, such as if honest parties elect different leaders and proposals The second phase determines if the locally elected leader's proposal can be decided or not A proposal (b,h) (h is the hash of a log) of an elected leader L is processed through a series of 5 graded agreements denoted $GA_{v,i}$ for 1 <= i <= 5 H(h||b) is the input into $GA_{v,1}$ It's output with grade 1 is passed into the next one as input, and so on The deciding phase is further separated into four sub-phases "notary-candidate-lock-decide" to maintain safety and liveness **Notary** A log (or its hash h) is first *notarized* in its proposed view v when $GA_{v,2}$ outputs its hash wither either grade 0 or 1. Maintained by a set called "notarized" of pairs of notarized value h and its view v. The notarization of a log confirms the uniqueness of the log in the view There is no other log notarized in the view, ensuring safety within a view Can be thought of like a record of all the decisions that have been made in the protocol To this end, an output h of the first $GA_{v,1}$ is passed to $GA_{v,2}$ only if there is no other output h' != h with either grade 0 or 1. If the second GA outputs a hash and a view that is greater than or equal to the highest decided view, then the pair (h,v) is added to the set of notarized pairs **Candidate** Each party keeps track of the highest view v such that $GA_{v,3}$ outputs a hash h. They maintain in a pair the candidate of the hash h and the view v **Lock** Each party lock on the highest view v such that $GA_{v,4}$ outputs a hash h and its corresponding notary for h. Each party commits to the highest view that has a notarized hash output from the fourth GA A proposal (b,h) from a leader is input to $GA_{v,1}$ only if the party observes a notary (h,u) in the notarized set of view u that's greater than v (h,u) is a pair consisting of a hash h and a view u that's already in the set of notarized pairs Ensures that only proposals that extend from a notarized hash in a lower view are considered If an honest party decides a log of hash h in a view v, all honest parties will lock on the view and h is only a notarized value in the view In the next view, only proposals extending h can be decided, ensuring safety across views Candidate will always pass the safety guard condition since if an honest party locks onto a view v, then all honest parties will update the candidate in the view, due to the consistency and integrity of GA Therefore, a proposal from an honest leader will always be decided, ensuring liveness **Decide** If $GA_{v,5}$ outputs a hash, the party decides a log identified by the hash. Parties can always obtain the corresponding log because all blocks in the decided log are forwarded in their views ![[Pasted image 20230729124200.png]] ### Correctness of the protocol Proving safety and liveness of Algorithm 2 **Lemma 5.1:** For all 𝑖 ∈ {2, 3, 4, 5} and v, IF $GA_{v,i}$ of an honest party outputs (b,\*), THEN for all honest party awake at its local time $\tau \geq (7i-5)\Delta$ of its view v, its $GA_{v,i-1}$ outputs (b,\*) Proof: If GA𝑣,𝑖 of an honest party outputs (𝑏, ∗), then by the integrity of GA, at least an honest party (say 𝑝) must have input 𝑏 to GA𝑣,𝑖. This implies GA𝑣,𝑖−1 of the party 𝑝 must have output (𝑏, 1). By the consistency of GA, for all honest party awake at its local time 𝜏 ≥ (7𝑖 − 5)Δ of view 𝑣 (i.e., at least 7Δ after GA𝑣,𝑖−1 starts), its GA𝑣,𝑖−1 outputs (𝑏, ∗). Based on the above, the following lemma that says that every decided log (hash) is notarized and once a log is decided no conflicting log can be notarized after **Lemma 5.2:** If an honest party p decided a log $\Lambda$ in view v, THEN (i) p observes (H($\Lambda$),v) ∈ *notarized* at that time (ii) For any $u \geq v$ and log $\Lambda'$ that conflicts with $\Lambda$, no honest party observes (H($\Lambda),v) ∈ *notarized* at any time **Lemma 5.3 (Safety):** Honest parties do not decide two different values at the same log height The following lemma shows that an honest party can always obtain a log that corresponds to a decided hash (the output of the fifth GA) **Lemma 5.4:** If $GA_{v,5}$ of an honest party outputs (h,\*), then all honest parties awake at their local time $\tau \geq 37\Delta$ of view v observe a log $\Lambda$ such that $\bar{H}(\Lambda) = h$ Proof: 1. If the fifth GA of an honest party outputs a hash, then there must be at least one honest party that has inputted this hash to the first GA. This is because the GA is a consensus protocol, and a value can only be outputted if it was inputted by at least one party. 2. This party must have multicast a block message, which includes the hash of the previous block and the current block. This implies that the party must have observed a notarized pair (previous hash, view) for a view less than the current view during a specific time frame. 3. This logic is then applied recursively. For each hash in the sequence, there must be an honest party that inputted it to the first GA and multicast a block message. This means that for each hash, there is a corresponding block in the log. 4. Therefore, all honest parties that are awake at their local time of view observe a log that is identified by the hash. This is because all the blocks in the log have been multicast by honest parties, and therefore can be observed by all other honest parties. This lemma says every honest party's proposal passes the line 8 safety check **Lemma 5.5:** Let ℎ be the candidate.value of an honest party 𝑝 awake at some global time 𝑡 ∈ \[0, Δ] of view 𝑣. Then, any honest party 𝑞 awake at any global time 𝑡 ′ ∈ \[2Δ, 3Δ] of view 𝑣 observes (ℎ, 𝑢) ∈ notarized for some 𝑢 ≥ lock. **Lemma 5.6 (Liveness):** If an honest party inputs a value 𝑥, then there exists a time 𝑡 such that all honest parties awake at global time 𝑡 decides a log that contains 𝑥. **Theorem 5.7 (Latency):** If an honest party inputs a value 𝑥 at global time 𝑡 ≥ 𝑇𝑠 , then there exists a time 𝑡 ′ = 𝑡 + 𝑂(Δ) such that all honest parties awake at global time 𝑡 ′ decide a log that contains 𝑥 in expectation. Proof basically says the honest party's input is disseminated to all honest parties by $\Delta$ ## Atomic Broadcast with Practical Recovery Adds recovery mechanism to prior protocol where newly joined parties recover only necessary info of past views Each party only needs to send messages of recent views plus the decided log contents during the recovering party's sleep Recovering party q sends a recovery request with its latest decided view $\bar{v}$ and each party needs to send the decided log contents of view $\bar{v}$ or higher Only getting some GA messages is difficult since that could lead to incorrect updates to state variables (notary, candidate, lock, decide), key to safety and liveness. Each party needs to identify the oldest view *v* of which it needs to send GA messages to the recovering party q. Only GAs of the highest decided view *v* or later trigger variable updates. Thus, q only requires messages of view v >= *v*, where *v* is the highest view of which q will decide a log immediately after the recovery is completed. One possible issue is that p does not know *v* of q, and it could be different than p's *v* To solve this, add another GA, $GA_{v,6}$ If $GA_{v,6}$ outputs some value, parties update $\bar{v}$ with view v. If an honest party outputs some value from $GA_{v,6}$, then all honest parties must output the value from $GA_{v,5}$. Therefore, the recovering party q always sets $\bar{v}$ to be at least *v* that p observes, and p needs to send only messages of only the recent few views plus log contents to recovering parties ![[Pasted image 20230731081059.png]] ### Correctness of the Protocol We say a party is *up-to-date* with respect to view v at time t if the party has received all messages of v sent by time t - $\Delta$ by all honest parties **Lemma 6.1 (Recovery for Current View):** Any honest party awake at a global time within a view v or v + 1 is up-to-date w.r.t view v at that time. ## Conclusion and Future Directions Paper presented a BFT atomic broadcast protocol that simultaneously supports dynamic participation and achieves expected O($\Delta$) latency. Follows quorum-based design and uses the time-shifted quorum idea Only makes progress during periods of stable participation Open question whether it is possible to make progress during wildly fluctuating periods while achieving O($\Delta$) latency. Possible soln is to adopt multi-chain paradigm without longest-chain protocols. Though existing ones either require PoW or depend on $\kappa$ to decide between conflicting transactions