Known Limitations - Xerberus - Obsidian Publish

# Known Limitations An honest assessment of what the Xerberus methodology cannot do, where it may produce incomplete results, and what users should be aware of. --- ## Methodological Limitations ### K-Only Scope The dendrogram measures containment but not probability or impact separately. A mechanism with a perfect containment score could still fail through an unprecedented attack vector. The dendrogram measures how well you've prepared for known risks — it cannot guarantee preparation for unknown ones. ### Historical Bias The incident-backed requirement means the dendrogram can only measure risks that have materialised in the past. Genuinely novel attack vectors — failure modes that have never occurred in any form — have no place in the system until they happen once. This is a deliberate trade-off, not an oversight. The alternative — scoring hypothetical risks — introduces subjectivity that undermines reproducibility (different evaluators imagine different scenarios). The incident requirement ensures that every scored risk has a known exploitation vector, which makes the defence against it concretely specifiable. The `failure_mode_type: "analogous"` mechanism partially addresses this by allowing structurally similar incidents from different contexts. But truly unprecedented risks cannot be captured. ### Static Taxonomy Templates are fixed structures updated manually. Real mechanisms evolve: new governance patterns emerge, novel liquidation architectures are invented, regulatory regimes shift. The dendrogram must be updated to incorporate new mechanism categories. There is no automatic discovery of new risks. ### Binary Gate Simplification Gate propositions are YES/NO. Reality is more nuanced — a mechanism might partially exist, be in the process of being implemented, or exist in a degraded state. The evaluator must decide: does this mechanism exist enough to score? There is no "partially" gate. ### Aggregation Limitations Mean aggregation treats all activated subscores as equally important within a domain. A critical safeguard failure (e.g., no audit) and a minor safeguard gap (e.g., suboptimal quorum threshold) contribute equally to the domain score. Domain weights can adjust the importance of broad domains, but they do not weight individual subscores inside a domain. Users concerned about specific critical safeguards should inspect subscores directly. ### Evaluator Subjectivity While the tree structure reduces ambiguity, scoring quantitative subscores involves judgment. Different evaluators may score the same subscore differently. The [[governance/Evaluation Process|evaluation process]] provides calibration guidelines, but some variance is inherent. See [[design/Design Principles#7. Reproducibility]] for how the methodology addresses this. ### No Temporal Dimension The dendrogram captures a snapshot. It does not track how mechanisms change over time — whether governance is becoming more decentralised, whether audit coverage is improving. Temporal analysis requires comparing scorecards across evaluation dates. --- ## Coverage Limitations ### Template Completeness The dendrogram's promise of exhaustive mechanism mapping is only as good as the current template coverage. Templates are complete for the documented mechanism set, but new mechanism categories may still emerge. See [[reference/Build Status]] for the current state. ### Object Coverage Not all assets, protocols, pools, and organisations are evaluated. Coverage is expanding but currently limited to a whitelisted registry. See [[reference/Build Status]] for the current evaluation count. ### Organisation Templates Organisation-level templates (leadership, legal) are less mature than technical templates. This reflects the inherent difficulty of measuring organisational safeguards compared to on-chain mechanisms. --- ## Structural Limitations ### No Cross-Object Dependencies The dendrogram evaluates one object at a time. stETH's containment score does not reflect Lido's governance quality. Cross-object risk flows through the systemic risk layer (dependency graph), which is separate from the dendrogram. ### OR/AND Edge Cases Some mechanism groupings don't cleanly fit AND or OR. A protocol might use two liquidation architectures for different markets. The OR-split forces a single choice, which may not perfectly represent hybrid implementations. ### Domain-Weight Bootstrap Domain weights are currently equal in bootstrap. This means the current object score treats all domains in an object class as equally important, which may not reflect the actual risk distribution for every subject. --- ## What These Limitations Mean for Users - **Don't treat scores as complete risk assessments.** They measure containment quality for known mechanism risks. Other risk factors (market risk, regulatory change, systemic contagion) require separate analysis. - **Check coverage before citing.** Verify that the subject has been evaluated and that the relevant templates are complete. See [[reference/Build Status]]. - **Understand the snapshot.** A score reflects the state at evaluation time. Mechanisms may have changed since. --- ## Related - [[reference/Build Status]] — Current coverage and template completeness - [[design/Design Principles]] — Why these constraints exist - [[governance/Methodology Updates]] — How limitations are addressed over time