# Kingdom Stress Atlas — Publication Scripts Vault
This vault documents the logic, methods, scripts, and outputs behind every figure of the Kingdom Stress Atlas manuscript. It is organized as one Karpathy-style mini-vault per figure.
## What this vault contains
A cross-species stress transcriptomics resource covering RNA-seq experiments across 36 plant species (monocots, dicots, gymnosperms, lycophytes, bryophytes, charophytes, chlorophytes) and 9 stresses (Heat, Cold, Drought, Salt, High light, Pathogen, Flooding, Heavy metal, Herbivory). For each figure of the paper, this vault holds the panel-by-panel logic, the methods, the scripts that built the panels, and the final assembled PDF.
## Figures
| Figure | Topic | Folder |
|--------|-------|--------|
| 1 | Dataset overview: phylogeny + experiment count heatmap, pipeline, marker validation | [[Figure 1/README]] |
| 2 | Gene family analysis: conservation, functional composition, phylostratigraphic age | [[Figure 2/README]] |
| 3 | Stress response conservation across organs, clades, phylogenetic distance, stress types | [[Figure 3/README]] |
| 4 | MapMan co-occurrence networks: direction bias, stress specificity, hormone pathways | [[Figure 4/README]] |
| 5 | Conserved stress co-expression modules and regulatory subfunctionalization via duplication | [[Figure 5/README]] |
| 6 | Cis-regulatory sequence prediction with CNN, PlantCAD2, PlantRNA-FM, and interpretable models | [[Figure 6/README]] |
Open any figure's README for an embedded PDF, panel-to-script mapping, navigation to wiki articles, and links to supplemental data.
## How each figure folder is organized
Every `Figure N/` folder follows the same Karpathy-style layout:
```
Figure N/
├── README.md # Landing page: embedded PDF, panel-to-script table, navigation
├── _index/
│ ├── Table of Contents.md
│ ├── Timeline.md # Chronological project log: milestones, conclusions, script paths
│ ├── Glossary.md
│ ├── Entity Index.md # Species, stresses, datasets used in this figure
│ └── Open Questions.md
├── _templates/ # YAML frontmatter templates for new pages
├── raw/ # Source material: clipped papers, web resources (not synthesized)
├── wiki/
│ ├── concepts/ # Core ideas (e.g., Overlap Coefficient, Orthogroup)
│ ├── methods/ # What we did (e.g., Cross-clade OC analysis, LUMI training setup)
│ ├── tools/ # Third-party tools used (e.g., OrthoFinder, PlantCAD2)
│ └── project/ # High-level project pages incl. Panel Overview
└── outputs/
├── figures/ # Figure N.pdf + PNG
├── scripts/ # .py / .R / .slurm + .md wrappers for Obsidian Publish
├── tables/ # Supplemental tables tied to this figure
└── data_summaries/ # Intermediate CSVs used to draw the panels
```
## Reading conventions
- **raw/ vs wiki/** — `raw/` holds source material (papers, web clips, screenshots); `wiki/` holds synthesized, cross-linked articles. Never mix them.
- **YAML frontmatter** — every wiki article has `type` (concept | method | tool | project), `aliases`, `created`, `updated`, `status` (stub | draft | complete), `tags`, and `sources`.
- **Wikilinks** — pages cross-reference each other via `[[Page Name]]`. A `[[name]]` that does not yet exist is a deliberate stub marker, not an error.
- **Embedded PDFs** — figure PDFs are embedded in each figure's `README.md` via `![[Figure N.pdf]]` so they render directly in Obsidian Publish.
- **Scripts as wiki pages** — every `.py` / `.R` / `.slurm` in `outputs/scripts/` has a sibling `.md` wrapper containing the source in a fenced code block. Wikilinks like `[[plot_panelA]]` resolve to the `.md` wrapper.
- **Timeline as project log** — `_index/Timeline.md` records milestones (not every small change), with a brief conclusion, the script that produced it, and the output location.
## Where to find what
| Looking for… | Go to |
|---|---|
| The final figure PDF | `Figure N/outputs/figures/Figure N.pdf` |
| Which script generated which panel | `Figure N/README.md` (panel table) |
| Method details | `Figure N/wiki/methods/` |
| Concept definitions | `Figure N/wiki/concepts/` (or `Glossary.md`) |
| Tools used | `Figure N/wiki/tools/` |
| Supplemental tables for that figure | `Figure N/outputs/tables/` |
| The full source code of a script | `Figure N/outputs/scripts/<name>.md` (rendered) or `.py` (raw) |
| When something was done | `Figure N/_index/Timeline.md` |
| Open issues / unresolved questions | `Figure N/_index/Open Questions.md` |
## Published artifacts (outside the vault)
| Artifact | Where |
|---|---|
| Manuscript | (in preparation) |
| Scripts (raw `.py` / `.slurm`) | [github.com/mutwil/KingdomStress](https://github.com/mutwil/KingdomStress) |
| Trained model weights (Figure 6) | Supplementary Dataset 5 (figshare) |
| Per-gene feature table (1.7 M genes, Figure 6) | Supplementary Dataset 6 (figshare) |
| Co-expression networks (Figure 5) | Supplementary Dataset 3 (figshare) |
| Module assignments (Figure 5) | Supplementary Dataset 4 (figshare) |
## Pipeline (paper-wide)
- **Quantification**: Kallisto (LSTRAP-Cloud for public RNA-seq data)
- **Differential expression**: DESeq2 (|log2FC| > 1, adjusted P < 0.05)
- **Orthogroups**: OrthoFinder across 36 species (275,222 orthogroups)
- **Functional annotation**: TAIR GO biological process terms; Mercator/MapMan bins
- **Co-expression**: TEA-GCN per species
- **Modules**: Louvain community detection
- **Sequence prediction**: CNN baseline + PlantCAD2 (DNA LLM) + PlantRNA-FM (RNA LLM), late fusion, GBM interpretable models
## Citation
Please cite the Kingdom Stress Atlas paper (in preparation).