# Kingdom Stress Atlas — Publication Scripts Vault This vault documents the logic, methods, scripts, and outputs behind every figure of the Kingdom Stress Atlas manuscript. It is organized as one Karpathy-style mini-vault per figure. ## What this vault contains A cross-species stress transcriptomics resource covering RNA-seq experiments across 36 plant species (monocots, dicots, gymnosperms, lycophytes, bryophytes, charophytes, chlorophytes) and 9 stresses (Heat, Cold, Drought, Salt, High light, Pathogen, Flooding, Heavy metal, Herbivory). For each figure of the paper, this vault holds the panel-by-panel logic, the methods, the scripts that built the panels, and the final assembled PDF. ## Figures | Figure | Topic | Folder | |--------|-------|--------| | 1 | Dataset overview: phylogeny + experiment count heatmap, pipeline, marker validation | [[Figure 1/README]] | | 2 | Gene family analysis: conservation, functional composition, phylostratigraphic age | [[Figure 2/README]] | | 3 | Stress response conservation across organs, clades, phylogenetic distance, stress types | [[Figure 3/README]] | | 4 | MapMan co-occurrence networks: direction bias, stress specificity, hormone pathways | [[Figure 4/README]] | | 5 | Conserved stress co-expression modules and regulatory subfunctionalization via duplication | [[Figure 5/README]] | | 6 | Cis-regulatory sequence prediction with CNN, PlantCAD2, PlantRNA-FM, and interpretable models | [[Figure 6/README]] | Open any figure's README for an embedded PDF, panel-to-script mapping, navigation to wiki articles, and links to supplemental data. ## How each figure folder is organized Every `Figure N/` folder follows the same Karpathy-style layout: ``` Figure N/ ├── README.md # Landing page: embedded PDF, panel-to-script table, navigation ├── _index/ │ ├── Table of Contents.md │ ├── Timeline.md # Chronological project log: milestones, conclusions, script paths │ ├── Glossary.md │ ├── Entity Index.md # Species, stresses, datasets used in this figure │ └── Open Questions.md ├── _templates/ # YAML frontmatter templates for new pages ├── raw/ # Source material: clipped papers, web resources (not synthesized) ├── wiki/ │ ├── concepts/ # Core ideas (e.g., Overlap Coefficient, Orthogroup) │ ├── methods/ # What we did (e.g., Cross-clade OC analysis, LUMI training setup) │ ├── tools/ # Third-party tools used (e.g., OrthoFinder, PlantCAD2) │ └── project/ # High-level project pages incl. Panel Overview └── outputs/ ├── figures/ # Figure N.pdf + PNG ├── scripts/ # .py / .R / .slurm + .md wrappers for Obsidian Publish ├── tables/ # Supplemental tables tied to this figure └── data_summaries/ # Intermediate CSVs used to draw the panels ``` ## Reading conventions - **raw/ vs wiki/** — `raw/` holds source material (papers, web clips, screenshots); `wiki/` holds synthesized, cross-linked articles. Never mix them. - **YAML frontmatter** — every wiki article has `type` (concept | method | tool | project), `aliases`, `created`, `updated`, `status` (stub | draft | complete), `tags`, and `sources`. - **Wikilinks** — pages cross-reference each other via `[[Page Name]]`. A `[[name]]` that does not yet exist is a deliberate stub marker, not an error. - **Embedded PDFs** — figure PDFs are embedded in each figure's `README.md` via `![[Figure N.pdf]]` so they render directly in Obsidian Publish. - **Scripts as wiki pages** — every `.py` / `.R` / `.slurm` in `outputs/scripts/` has a sibling `.md` wrapper containing the source in a fenced code block. Wikilinks like `[[plot_panelA]]` resolve to the `.md` wrapper. - **Timeline as project log** — `_index/Timeline.md` records milestones (not every small change), with a brief conclusion, the script that produced it, and the output location. ## Where to find what | Looking for… | Go to | |---|---| | The final figure PDF | `Figure N/outputs/figures/Figure N.pdf` | | Which script generated which panel | `Figure N/README.md` (panel table) | | Method details | `Figure N/wiki/methods/` | | Concept definitions | `Figure N/wiki/concepts/` (or `Glossary.md`) | | Tools used | `Figure N/wiki/tools/` | | Supplemental tables for that figure | `Figure N/outputs/tables/` | | The full source code of a script | `Figure N/outputs/scripts/<name>.md` (rendered) or `.py` (raw) | | When something was done | `Figure N/_index/Timeline.md` | | Open issues / unresolved questions | `Figure N/_index/Open Questions.md` | ## Published artifacts (outside the vault) | Artifact | Where | |---|---| | Manuscript | (in preparation) | | Scripts (raw `.py` / `.slurm`) | [github.com/mutwil/KingdomStress](https://github.com/mutwil/KingdomStress) | | Trained model weights (Figure 6) | Supplementary Dataset 5 (figshare) | | Per-gene feature table (1.7 M genes, Figure 6) | Supplementary Dataset 6 (figshare) | | Co-expression networks (Figure 5) | Supplementary Dataset 3 (figshare) | | Module assignments (Figure 5) | Supplementary Dataset 4 (figshare) | ## Pipeline (paper-wide) - **Quantification**: Kallisto (LSTRAP-Cloud for public RNA-seq data) - **Differential expression**: DESeq2 (|log2FC| > 1, adjusted P < 0.05) - **Orthogroups**: OrthoFinder across 36 species (275,222 orthogroups) - **Functional annotation**: TAIR GO biological process terms; Mercator/MapMan bins - **Co-expression**: TEA-GCN per species - **Modules**: Louvain community detection - **Sequence prediction**: CNN baseline + PlantCAD2 (DNA LLM) + PlantRNA-FM (RNA LLM), late fusion, GBM interpretable models ## Citation Please cite the Kingdom Stress Atlas paper (in preparation).