# RPG (Research Process Graphs) — Publication Scripts Vault This vault documents the logic, methods, scripts, and outputs behind every figure of the *Plant Cell Research Process Graphs* manuscript. It is organised as one Karpathy-style mini-vault per figure. ## What this vault contains A 20-year, structured atlas of *The Plant Cell* in which every paper is converted by an LLM into a typed, directed Research Process Graph (RPG) of Question (Q), Method (M) and Finding (F) nodes connected by Q→M and M→F edges. The pipeline was applied to **2,633 *Plant Cell* research articles published 2005-2024**, recovering **110,235 Q/M/F nodes** and **126,805 directed Q→M→F chains** at >98 % precision. A second LLM pass generalises every node into a paper-independent canonical form and assigns it to a hierarchical taxonomy of 10 top-level (L1) and ~90 sub-level (L2) categories per node type. The atlas is released as a public, browsable database with five complementary interfaces. For each figure of the paper, this vault holds the panel-by-panel logic, the methods, the scripts that built the panels, and the final assembled PDF. ## Figures | Figure | Topic | Folder | |--------|-------|--------| | 1 | Pipeline, benchmarks, corpus stats, example RPG | [[Figure 1/README]] | | 2 | Generalisation pipeline + hierarchical Q/M/F taxonomy + L1 coupling | [[Figure 2/README]] | | 3 | Paper recipes + chain networks + PI specialisation + impact correlations | [[Figure 3/README]] | | 4 | 20 years of methodological turnover + technique co-occurrence network | [[Figure 4/README]] | | 5 | Public RPG database with five complementary interfaces | [[Figure 5/README]] | Open any figure's README for an embedded PDF, panel-to-script mapping, navigation to wiki articles, and links to data and code. ## How each figure folder is organised Every `Figure N/` folder follows the same Karpathy-style layout: ``` Figure N/ ├── README.md # Landing page: embedded PDF, panel-to-script table, navigation ├── _index/ │ ├── Table of Contents.md │ ├── Timeline.md # Chronological log: milestones, conclusions, script paths │ ├── Glossary.md │ ├── Entity Index.md # Datasets, models, key references used in this figure │ └── Open Questions.md ├── _templates/ # YAML frontmatter templates for new pages ├── raw/ # Source material: clipped papers, web resources (not synthesised) ├── wiki/ │ ├── concepts/ # Core ideas (e.g., Research Process Graph, Cohort percentile) │ ├── methods/ # What we did (e.g., LLM extraction, k-means recipe clustering) │ ├── tools/ # Third-party tools used (e.g., GPT-5, OpenAlex, UMAP) │ └── project/ # Panel descriptions and figure-level overviews └── outputs/ ├── data/ # Filed tables or summary data ├── figures/ # The final figure PDF └── scripts/ # Per-panel scripts with .md companion notes ``` ## Source repository All extraction, taxonomy, and figure scripts live at `/Users/vjx443/Library/CloudStorage/[email protected]/My Drive/Projects/2026_RPG_PlantCell/`. ## Authors Jing Yang and Manoj Itharajula (co-first authors); Marek Mutwil (corresponding). University of Copenhagen, Department of Plant and Environmental Sciences (Jing Yang, Marek Mutwil) and Nanyang Technological University, Singapore (Manoj Itharajula).