1. **Definition:**
- Byzantine consensus (or Byzantine Fault Tolerance - BFT) is a consensus mechanism aiming to provide agreement within a distributed system in the presence of malicious nodes (Byzantine nodes) that may act arbitrarily.
- Derived from the "Byzantine Generals Problem."
2. **Background: Byzantine Generals Problem:**
- Thought experiment: Several divisions of the Byzantine army are camped outside a city. They must decide unanimously to either attack or retreat.
- Communication only via messengers.
- Some generals might be traitors and send conflicting messages.
- Goal: Loyal generals must agree on a decision, even with the presence of traitors.
3. **Significance:**
- In distributed systems, nodes might fail or be compromised.
- Ensuring that the system continues to function correctly, even when some nodes are not trustworthy, is essential for security and reliability.
4. **Basic Requirements:**
- **Agreement:** Every correct node must agree on the same value.
- **Validity:** If all nodes propose the same value, that value must be chosen.
- **Termination:** All correct nodes decide on a value within a bounded period.
- **Fault Tolerance:** System should function even when a portion (typically up to 1/3) of the nodes are Byzantine.
5. **Classical Byzantine Consensus Algorithms:**
- **Practical Byzantine Fault Tolerance (PBFT):**
- Synchronous system.
- Uses a three-phase protocol: Pre-prepare, Prepare, Commit.
- Needs 2f + 1 replicas to tolerate f faulty nodes.
- Heavily reliant on message passing.
- **HoneyBadgerBFT:**
- Asynchronous system.
- Focuses on censorship resistance and privacy.
**Not [[dynamic participation]]**, but static participation
1. **Byzantine Consensus in Blockchains:**
- Given the decentralized and open nature of many blockchain networks, they're vulnerable to Byzantine behaviors.
- Examples: Bitcoin's Proof of Work (PoW) and Ethereum's transition to Proof of Stake (PoS).
- Both are mechanisms to achieve consensus, but they differ from traditional BFT models.
7. **Advantages of Byzantine Consensus:**
- **Security:** Able to handle malicious nodes.
- **Consistency:** Ensures that all nodes agree on the state of the system.
- **Availability:** System continues to function even with malicious nodes.
8. **Challenges:**
- **Scalability:** Traditional BFT protocols may not scale well with a large number of nodes due to the heavy communication overhead.
- **Network delays:** Asynchronous networks can face challenges in reaching consensus due to unpredictable delays.
- **Adaptive adversaries:** Some models assume a static set of Byzantine nodes, but real-world adversaries can be adaptive.
9. **Modern Variations and Developments:**
- **Hybrid models:** Combine BFT with other consensus mechanisms for better scalability.
- **Sharding:** Divide the network into smaller groups (shards) to process transactions in parallel.
- **Threshold cryptography:** Allows for data to be encrypted and split among nodes. A subset (threshold) is required to decrypt or execute operations, adding an extra layer of security.
10. **Applications:**
- **Distributed ledgers and blockchains:** e.g., Bitcoin, Ethereum.
- **Distributed databases:** Ensuring consistency across replicas.
- **Mission-critical systems:** Where system failure could lead to significant losses or hazards.
**Byzantine Failure Model:** Faulty nodes may behave arbitrarily, assuming independent node failures
Each node should run different implementations of code
Defined in [[Practical Byzantine Fault Tolerance]]
When there are Byzantine leaders, there is an unavoidable performance cost in both communication complexity and latency. When faced with an unlucky cascade of 𝑡 failures, Ω(𝑛^2) communication cost is mandated by the Dolev-Reischuk lower bound, and Ω(𝑛Δ) latency is mandated by the Aguilera-Toueg bound.
Before [[HotStuff]], the gold standard established by [[PBFT]] for handling steady-leader replacement incurred O(n^2) communication, so a cascade of failures would suffer O(n^3) communication cost
HotStuff leverages linear leader-replacement regime can close gap to get worst-case O(n^2) communication and O(n$\Delta$) latency