*Part of [[Our research]] · Pillar 1 of 3 (Core) · See also [[Stress resilience]] · [[Evolution of regulatory circuits]]* **Graph-grounded AI & knowledge graphs — the computational core of the lab.** Understanding what plant genes do is essential for tackling global food security, sustainable agriculture, and environmental change. When we know a gene's function, we can use that knowledge to improve crops — increasing disease resistance, lowering the energy cost of growth, or reducing the need for fertilisers. Yet for the majority of plant genes, we still lack clear functional information. Closing this **gene knowledge gap** is the central mission of the lab. ![[three_pillars.png|700]] *Three pillars: a computational core (gene function prediction) and two biological testbeds ([[Stress resilience]] and [[Evolution of regulatory circuits]]) that ground predictions in mechanism and evolution.* Our long-term goal is to move from *lists of candidate genes* to *actionable biological understanding* by combining three types of evidence: 1. **Patterns in large-scale data** — gene expression across tissues, conditions, and species. 2. **Direct evidence from the literature** — millions of statements extracted from plant biology papers. 3. **Evolutionary comparisons** — conservation and innovation across hundreds of plant species. We integrate these sources using modern AI into predictions that are **transparent, testable, and evidence-linked**, with each predicted function traceable to the data and publications that support it. ### Three intertwined directions 1. **Build plant knowledge graphs.** [PlantConnectome](http://plant.connectome.tools/) ([Plant Cell, 2025](https://pmc.ncbi.nlm.nih.gov/articles/PMC12290883/)) is our flagship resource: a large-scale plant biology knowledge graph extracted from >70,000 articles, connecting genes, metabolites, organs, treatments, phenotypes and conditions. It enables evidence-linked prediction, reasoning, and benchmarking across species. 2. **Infer mechanisms with graph & symbolic reasoning.** Co-expression, regulatory and knowledge graphs are combined with graph neural networks and symbolic reasoning to propose mechanistic hypotheses — not just "gene X is related to Y" but *how*. This includes multimodal integration of transcriptomics with protein domains and structure, subcellular localisation, and (where available) single-cell data. 3. **Train integrative, explainable predictors.** Large language models, GNNs and ensemble methods are trained across modalities — sequence, expression, literature, structure. Predictions come with provenance: which evidence and which experimental conditions support them, prioritising targets for experimental validation with PLEN partners. ![[llms.png|600]] *Large language models represent biological sequences and the literature as vectors, enabling new families of explainable gene function predictors.* ### A 17-year trajectory of tool development A bioinformatics method is only as useful as its accessibility. We have built and openly released a coherent line of databases, algorithms and pipelines — moving from data → networks → interpretable prediction → evidence-linked knowledge: - **[PlantConnectome](http://plant.connectome.tools/)** — plant biology knowledge graph (>71,000 articles; [Plant Cell, 2025](https://pmc.ncbi.nlm.nih.gov/articles/PMC12290883/)). - **TEA-GCN** — Two-Tier Ensemble Aggregation Gene Co-expression Networks ([Nature Communications, 2026](https://www.nature.com/); [code](https://github.com/pengkenlim/TEA-GCN)). State-of-the-art tissue/condition-aware co-expression and gene-regulatory networks. - **[PEO — Plant Expression Omnibus](https://expression.plant.tools/)** — comparative transcriptomic database for 103 Archaeplastida ([Plant J, 2024](https://pubmed.ncbi.nlm.nih.gov/38050352/)). - **[LSTrAP-denovo](https://github.com/pengkenlim/LSTrAP-denovo/)** — automated transcriptome atlases for species without genomes ([Physiol Plant, 2024](https://pubmed.ncbi.nlm.nih.gov/38973613/)). - **[CoNekT](http://www.conekt.plant.tools/)** — open-source framework for comparative co-expression analyses across Archaeplastida ([NAR, 2018](https://pubmed.ncbi.nlm.nih.gov/29718322/)). - **Kingdom-specific atlases** — Diurnal.plant.tools, [Fungi.guru](http://www.fungi.guru/), [Bacteria.guru](http://www.bacteria.guru/), [Protist.guru](http://www.protist.guru/), [Malaria.tools](https://malaria.sbs.ntu.edu.sg/). - **LSTrAP family** — pipelines for processing public RNA-seq into curated atlases ([LSTrAP](https://pubmed.ncbi.nlm.nih.gov/29017446/), [LSTrAP-Kingdom](https://github.com/wirriamm/plants-pipeline), LSTrAP-Lite for ARM). - **EnsembleNet, FINder, PhyloNet, FamNet, BrachyNet, PhytoNet, PlaNet, GeneCAT** — earlier generations of comparative and ensemble gene-function tools that seeded the field. ### Representative recent papers 1. *PlantConnectome: A knowledge graph database encompassing >71,000 plant articles.* [Plant Cell, 2025](https://pmc.ncbi.nlm.nih.gov/articles/PMC12290883/). 2. *Constructing gene co-functional and co-regulatory networks from public transcriptomes using condition-specific ensemble co-expression* (TEA-GCN). *Nature Communications*, 2026. 3. *The gene function prediction challenge: Large language models and knowledge graphs to the rescue.* [Curr Opin Plant Biol, 2024](https://www.sciencedirect.com/science/article/abs/pii/S1369526624001560). 4. *LSTrAP-denovo: Automated generation of transcriptome atlases for eukaryotic species without genomes.* [Physiol Plant, 2024](https://pubmed.ncbi.nlm.nih.gov/38973613/). 5. *PEO: Plant Expression Omnibus — a comparative transcriptomic database for 103 Archaeplastida.* [Plant J, 2024](https://pubmed.ncbi.nlm.nih.gov/38050352/). 6. *Ensemble gene function prediction database reveals genes important for complex I formation in Arabidopsis thaliana.* [New Phytol, 2018](https://pubmed.ncbi.nlm.nih.gov/29205376/). A full list lives on [[Publications]].