1. **Definition:** - Byzantine consensus (or Byzantine Fault Tolerance - BFT) is a consensus mechanism aiming to provide agreement within a distributed system in the presence of malicious nodes (Byzantine nodes) that may act arbitrarily. - Derived from the "Byzantine Generals Problem." 2. **Background: Byzantine Generals Problem:** - Thought experiment: Several divisions of the Byzantine army are camped outside a city. They must decide unanimously to either attack or retreat. - Communication only via messengers. - Some generals might be traitors and send conflicting messages. - Goal: Loyal generals must agree on a decision, even with the presence of traitors. 3. **Significance:** - In distributed systems, nodes might fail or be compromised. - Ensuring that the system continues to function correctly, even when some nodes are not trustworthy, is essential for security and reliability. 4. **Basic Requirements:** - **Agreement:** Every correct node must agree on the same value. - **Validity:** If all nodes propose the same value, that value must be chosen. - **Termination:** All correct nodes decide on a value within a bounded period. - **Fault Tolerance:** System should function even when a portion (typically up to 1/3) of the nodes are Byzantine. 5. **Classical Byzantine Consensus Algorithms:** - **Practical Byzantine Fault Tolerance (PBFT):** - Synchronous system. - Uses a three-phase protocol: Pre-prepare, Prepare, Commit. - Needs 2f + 1 replicas to tolerate f faulty nodes. - Heavily reliant on message passing. - **HoneyBadgerBFT:** - Asynchronous system. - Focuses on censorship resistance and privacy. **Not [[dynamic participation]]**, but static participation 1. **Byzantine Consensus in Blockchains:** - Given the decentralized and open nature of many blockchain networks, they're vulnerable to Byzantine behaviors. - Examples: Bitcoin's Proof of Work (PoW) and Ethereum's transition to Proof of Stake (PoS). - Both are mechanisms to achieve consensus, but they differ from traditional BFT models. 7. **Advantages of Byzantine Consensus:** - **Security:** Able to handle malicious nodes. - **Consistency:** Ensures that all nodes agree on the state of the system. - **Availability:** System continues to function even with malicious nodes. 8. **Challenges:** - **Scalability:** Traditional BFT protocols may not scale well with a large number of nodes due to the heavy communication overhead. - **Network delays:** Asynchronous networks can face challenges in reaching consensus due to unpredictable delays. - **Adaptive adversaries:** Some models assume a static set of Byzantine nodes, but real-world adversaries can be adaptive. 9. **Modern Variations and Developments:** - **Hybrid models:** Combine BFT with other consensus mechanisms for better scalability. - **Sharding:** Divide the network into smaller groups (shards) to process transactions in parallel. - **Threshold cryptography:** Allows for data to be encrypted and split among nodes. A subset (threshold) is required to decrypt or execute operations, adding an extra layer of security. 10. **Applications:** - **Distributed ledgers and blockchains:** e.g., Bitcoin, Ethereum. - **Distributed databases:** Ensuring consistency across replicas. - **Mission-critical systems:** Where system failure could lead to significant losses or hazards. **Byzantine Failure Model:** Faulty nodes may behave arbitrarily, assuming independent node failures Each node should run different implementations of code Defined in [[Practical Byzantine Fault Tolerance]] When there are Byzantine leaders, there is an unavoidable performance cost in both communication complexity and latency. When faced with an unlucky cascade of 𝑡 failures, Ω(𝑛^2) communication cost is mandated by the Dolev-Reischuk lower bound, and Ω(𝑛Δ) latency is mandated by the Aguilera-Toueg bound. Before [[HotStuff]], the gold standard established by [[PBFT]] for handling steady-leader replacement incurred O(n^2) communication, so a cascade of failures would suffer O(n^3) communication cost HotStuff leverages linear leader-replacement regime can close gap to get worst-case O(n^2) communication and O(n$\Delta$) latency