We are primarily interested in studying gene function and evolution of organs, cell types and biological pathways with large scale, comparative omics analyses. Our aim is to combine the exponentially growing omics data (publicly-available or generated by us in [[Evolution]] and [[Abiotic stress]] projects), with recent advances in machine learning and artificial intelligence. This will allow us to produce accurate gene function predictions, that would help wet-lab biologists in experimental gene characterization. ![[SCR-20240825-kovt.png|600]] *Gene expression data is growing at an exponential pace and is now available for hundreds of species* We are now getting more into AI, with focus on large language models and graph neural networks. We will explore how these methods can be used to predict the various aspects of gene functions in plants. ![[llms.png|600]] *Large language models are able to represent biological sequences as vectors, allowing the development of gene function predictions* ## Tool development A bioinformatics method is only as useful as its accessibility. Therefore, to make our methods, data and algorithms accessible to the scientific community, we developed several online databases (e.g., CoNekt, PEO, PlaNet, GeneCAT), algorithms (HCCA, TEA-GCN) and pipelines (e.g., LSTRAP family methods to analyze large-scale RNA-seq data). Below, you can find them, in chronological order:  ​ 1. [GeneCAT--novel webtools that combine BLAST and co-expression analyses.](https://pubmed.ncbi.nlm.nih.gov/18480120/)Mutwil M, Obro J, Willats WG, Persson S._Nucleic Acids Res_ (IF: 16.97;Q1). 2008 Jul 1;36(Web Server issue):W320-6. doi: 10.1093/nar/gkn292. GeneCAT was the first tool that allowed a comparative co-expression analysis in plants. Retired in 2017 after 10 years of service. 2. [PlaNet: combined sequence and expression comparisons across plant networks derived from seven species.](https://pubmed.ncbi.nlm.nih.gov/21441431/)Mutwil M, Klie S, Tohge T, Giorgi FM, Wilkins O, Campbell MM, Fernie AR, Usadel B, Nikoloski Z, Persson S._Plant Cell_ (IF: 11.28;Q1). 2011 Mar;23(3):895-910. Published in 2011 in Plant Cell, PlaNet allows you to browse and compare co-expression networks of multiple species. Retired in 2020. 3. [FamNet: A Framework to Identify Multiplied Modules Driving Pathway Expansion in Plants.](https://pubmed.ncbi.nlm.nih.gov/26754669/)Ruprecht C, Mendrinna A, Tohge T, Sampathkumar A, Klie S, Fernie AR, Nikoloski Z, Persson S,Mutwil M._Plant Physiol_ (IF: 8.34;Q1). 2016 Mar;170(3):1878-94. FamNet (extension of PlaNet). Published in 2016 in Plant Physiology, FamNet allows you to identify conserved (across species) and duplicated (within species) gene modules. 4. [Phylogenomic analysis of gene co-expression networks reveals the evolution of functional modules.](https://pubmed.ncbi.nlm.nih.gov/28161902/)Ruprecht C, Proost S, Hernandez-Coronado M, Ortiz-Ramirez C, Lang D, Rensing SA, Becker JD, Vandepoele K,utwil M._Plant J_ (IF: 6.42;Q1). 2017 May;90(3):447-465. doi: 10.1111/tpj.13502. PhyloNet (extension of PlaNet). Published in 2017 in the Plant Journal. The tool combines phylostratigraphic and phylogenetic information with comparative co-expression network analyses to elucidate when gene modules were created and duplicated.   5. [Expression atlas and comparative coexpression network analyses reveal important genes involved in the formation of lignified cell wall in Brachypodium distachyon.](https://pubmed.ncbi.nlm.nih.gov/28617955/)Sibout R, Proost S, Hansen BO, Vaid N, Giorgi FM, Ho-Yue-Kuang S, Legée F, Cézart L, Bouchabké-Coussa O, Soulhat C, Provart N, Pasha A, Le Bris P, Roujol D, Hofte H, Jamet E, Lapierre C, Persson S,Mutwil M._New Phytol_ (IF: 10.15;Q1). 2017 Aug;215(3):1009-1025, BrachyNet introduces an expression atlas for Brachypodium distachyon, an important model for grasses. We have also introduced an [eFP browser for Brachypodium](http://bar.utoronto.ca/efp_brachypodium/cgi-bin/efpWeb.cgi). 6. [LSTrAP: efficiently combining RNA sequencing data into co-expression networks.](https://pubmed.ncbi.nlm.nih.gov/29017446/) Proost S, Krawczyk A,Mutwil M._BMC Bioinformatics_ (IF: 3.17;Q3). 2017 Oct 10;18(1):444. doi: 10.1186/s12859-017-1861-z. The tool allows you to produce co-expression networks from RNA sequencing data using a computer cluster. 7. [PhytoNet: comparative co-expression network analyses across phytoplankton and land plants.](https://pubmed.ncbi.nlm.nih.gov/29718316/)Ferrari C, Proost S, Ruprecht C,Mutwil M._Nucleic Acids Res_ (IF: 16.97;Q1). 2018 Jul 2;46(W1):W76-W83. This tool introduced comparative co-expression analyses for 10 species of phyto-plankton. 8. [CoNekT: an open-source framework for comparative genomic and transcriptomic network analyses.](https://pubmed.ncbi.nlm.nih.gov/29718322/) Proost S, Mutwil M._Nucleic Acids Res_ (IF: 16.97;Q1). 2018 Jul 2;46(W1):W133-W140CoNekT (Co-expression Network Toolkit,[www.conekt.plant.tools](http://www.conekt.plant.tools/)). CoNekT is an open-source platform for comparative co-expression analyses and includes diverse members of the Archaeplastida kingdom. The tool allows you to set up your own PlaNet-like database.  9. [Ensemble gene function prediction database reveals genes important for complex I formation in Arabidopsis thaliana.](https://pubmed.ncbi.nlm.nih.gov/29205376/)Hansen BO, Meyer EH, Ferrari C, Vaid N, Movahedi S, Vandepoele K, Nikoloski Z,Mutwil M._New Phytol_ (IF: 10.15;Q1). 2018 Mar;217(4):1521-1534. doi: 10.1111/nph.14921. EnsembleNet provides a powerful ensemble gene function predictions for Arabidopsis thaliana. 10. [Malaria.tools-comparative genomic and transcriptomic database for Plasmodium species.](https://pubmed.ncbi.nlm.nih.gov/31372645/) Tan QW, Mutwil M._Nucleic Acids Res_ (IF: 16.97;Q1). 2020 Jan 8;48(D1):D768-D775 Malaria.tools (https://malaria.sbs.ntu.edu.sg/). The database provides comparative genomic and transcriptomic analyses to malaria researchers. 11. [Inferring biosynthetic and gene regulatory networks from Artemisia annua RNA sequencing data on a credit card-sized ARM computer.](https://pubmed.ncbi.nlm.nih.gov/31634636/) Tan QW,Mutwil M._Biochim Biophys Acta Gene Regul Mech_ (IF: 4.49;Q2). 2020 Jun;1863(6):194429. doi: 10.1016/j.bbagrm.2019.194429 LSTrAP-Lite ([https://github.com/mutwil/LSTrAP-Lite](https://github.com/mutwil/LSTrAP-Lite)). This collection of scripts is conceptually based on Large-Scale Transcriptome Analysis Pipeline, but is designed to run on a credit card-sized ARM computer.  12. [Diurnal.plant.tools: Comparative Transcriptomic and Co-expression Analyses of Diurnal Gene Expression of the Archaeplastida Kingdom.](https://pubmed.ncbi.nlm.nih.gov/31501868/) Ng JWX, Tan QW, Ferrari C,Mutwil M._Plant Cell Physiol_ (IF: 4.93;Q2). 2020 Jan 1;61(1):212-220. doi: 10.1093/pcp/pcz176 Diurnal.plant.tools (https://diurnal.sbs.ntu.edu.sg/). Published in 2019 in Plant Cell and Physiology, the database provides tools to study the diurnal gene expression in 17 members of Archaeplastida. 13. [Fungi.guru: Comparative genomic and transcriptomic resource for the fungi kingdom.](https://pubmed.ncbi.nlm.nih.gov/33304470/)Lim JJJ, Koh J, Moo JR, Villanueva EMF, Putri DA, Lim YS, Seetoh WS, Mulupuri S, Ng JWZ, Nguyen NLU, Reji R, Foo H, Zhao MX, Chan TL, Rodrigues EE, Kairon RS, Hee KM, Chee NC, Low AD, Chen ZHX, Lim SC, Lunardi V, Fong TC, Chua CX, Koh KTS, Julca I, Delli-Ponti R, Ng JWX,Mutwil M._Comput Struct Biotechnol J_ (IF: 7.27;Q1). 2020 Nov 20;18:3788-3795. doi: 10.1016/j.csbj.2020.11.019 Fungi.guru (www.fungi.guru). Published in 2020 in Computational and Structural Biotechnology Journal by BS1009 (Introduction to Computational Thinking) students, the database gives access to gene expression and genomic tools for the fungi kingdom. 14. [LSTrAP-Kingdom: an automated pipeline to generate annotated gene expression atlases for kingdoms of life.](https://pubmed.ncbi.nlm.nih.gov/33704421/) Goh W,Mutwil M. Bioinformatics_ (IF: 6.94;Q1). 2021 Sep 29;37(18):3053-3055 LSTrAP-Kingdom (https://github.com/wirriamm/plants-pipeline). Published in 2021 in Bioinformatics by William Goh, an intern in our group during 2020-2021. The pipeline allows the download, quality control and annotation of gene expression atlases for kingdoms of life.  15. [Bacteria.guru: Comparative Transcriptomics and Co-Expression Database for Bacterial Pathogens.](https://pubmed.ncbi.nlm.nih.gov/34838806/) Lim PK, Davey EE, Wee S, Seetoh WS, Goh JC, Zheng X, Phang SKA, Seah ESK, Ng JWZ, Wee XJH, Quek AJH, Lim JJ, Rodrigues EE, Lee H, Lim CY, Tan WZ, Dan YR, Lee B, Chee SEL, Lim ZZE, Guan JS, Tan IJL, Arong TJ,Mutwil M._J Mol Biol_ (IF: 5.47;Q1). 2022 Jun 15;434(11):167380. doi: 10.1016/j.jmb.2021.167380. Bacteria.guru (www.bacteria.guru). We provide a curated gene expression and co-expression database for 17 bacterial pathogens.  16. [Protist.guru: A Comparative Transcriptomics Database for Protists.](https://pubmed.ncbi.nlm.nih.gov/35389344/) Villanueva EMF, Lim PK, Lim JJJ, Lim SC, Lau PY, Koh KTS, Tan E, Kairon RS, See WA, Liao JX, Hee KM, Vijay V, Maitra I, Boon CJ, Fo K, Wang YT, Jaya R, Hew LA, Lim YY, Lee WQ, Lee ZQ, Foo H, Dos Santos AL,Mutwil M._J Mol Biol_ (IF: 5.47;Q1). 2022 Jun 15;434(11):167502 Protist.guru (www.protist.guru). Here, we constructed an online database for 17 protists. Published in JMB. 17. [Feature importance network reveals novel functional relationships between biological features in _Arabidopsis thaliana_.](https://pubmed.ncbi.nlm.nih.gov/36212273/) Ng JWX, Chua SK,Mutwil M._Front Plant Sci_ (IF: 5.75;Q1). 2022 Sep 23;13:944992 FINder database (https://sweekwang.github.io/golabel/) allows you to study which gene features (e.g., protein length, gene family age, maximum expression) are associated. 18. [LSTrAP-denovo: Automated Generation of Transcriptome Atlases for Eukaryotic Species Without Genomes.](https://pubmed.ncbi.nlm.nih.gov/38973613/) Lim PK, Wang R,Mutwil M._Physiol Plant_ (IF: 4.5;Q1). 2024 Jul-Aug;176(4):e14407 (https://github.com/pengkenlim/LSTrAP-denovo/) is an automated transcriptome assembly pipeline, that makes it easy to obtain gene expression matrices for species without a genome. 19. [ PEO: Plant Expression Omnibus - a comparative transcriptomic database for 103 Archaeplastida.](https://pubmed.ncbi.nlm.nih.gov/38050352/) Koh E, Goh W, Julca I, Villanueva E, Mutwil M. Plant J_ (IF: 6.42;Q1). 2024 Mar;117(5):1592-1603. doi: 10.1111/tpj.16566 Plant ExpressioOmnibus (https://expression.plant.tools/) is a new gene expression database that is focused on providing comparative expression tools for hundreds of species. 20. [PlantConnectome: knowledge networks encompassing> 100,000 plant article abstracts](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=1WY1xDwAAAAJ&cstart=20&pagesize=80&sortby=pubdate&citation_for_view=1WY1xDwAAAAJ:fQNAKQ3IYiAC) K Fo, YS Chuah, H Foo, EE Davey, M Fullwood, G Thibault, M Mutwil bioRxiv, 2023.07. 11.548541 PlantConnectome (http://plant.connectome.tools/) contains knowledge graphs representing relationships between genes, metabolites, organs, treatments and other entities. In revision. 21. [Constructing Ensemble Gene Functional Networks Capturing Tissue/condition-specific Co-expression from Unlabled Transcriptomic Data with TEA-GCN](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=1WY1xDwAAAAJ&sortby=pubdate&citation_for_view=1WY1xDwAAAAJ:eq2jaN3J8jMC) PK Lim, R Wang, JP Antony Velankanni, M Mutwil bioRxiv, 2024.07. 22.604713 TEA-GCN: Two-Tier Ensemble Aggregation Gene Co-expression Network​ (https://github.com/pengkenlim/TEA-GCN) is a new method that achieves state-of-the-art performance in generating co-expression and gene regulatory networks. In review.