# NCBI GEO — Gene Expression Omnibus ## Overview The Gene Expression Omnibus (GEO) is a public repository for high-throughput functional genomics data, operated by NCBI/NIH since 2000. It archives and freely distributes microarray, next-generation sequencing (NGS), and other forms of high-throughput functional genomic data from submitted studies. Many journals mandate GEO (or equivalent) submission as a condition of publication. GEO is a fully open-access repository: all deposited data is publicly downloadable without access restrictions. It is not suitable for individual-level human genomic or phenotypic data where privacy, consent, or ethical approval conditions restrict open release (such as GWAS, WGS, or WES from patient cohorts). Such data must instead be deposited in a controlled-access repository such as [[dbGaP]] (NIH/US) or [[EGA]] (EMBL-EBI/CRG, Europe). GEO may still be used to share derived or aggregate-level results if consent and ethics approval permit. ## Data Types Accepted GEO accepts microarray data (gene expression, SNP genotyping, CGH arrays, ChIP-on-chip), RNA-seq (bulk, single-cell, spatial transcriptomics), ChIP-seq, ATAC-seq, bisulfite sequencing for DNA methylation, and other functional genomics types including CLIP-seq, Ribo-seq, and Hi-C. ## Structure GEO organises data into three levels. A Platform (GPL) record describes the array or sequencer used. A Sample (GSM) record represents a single biological sample with its raw and processed data. A Series (GSE) record is a collection of samples forming a study and is the primary submission unit. Raw sequencing data ([[FASTQ]] files) associated with GEO submissions are archived in SRA (Sequence Read Archive), NCBI's companion repository for raw reads, cross-linked from GEO series records. ## International Partners (INSDC) GEO is part of the International Nucleotide Sequence Database Collaboration (INSDC) along with [[DDBJ]] (DNA Data Bank of Japan) and [[ENA]] (European Nucleotide Archive, EMBL-EBI). These three databases mirror each other's sequence data, ensuring global redundancy and access. ## Connections - Operated by: NCBI / NIH - Part of: INSDC (with [[DDBJ]] and [[ENA]]) - Standards: [[VCF]], [[GO]] ## Resources - https://www.ncbi.nlm.nih.gov/geo/ - https://www.ncbi.nlm.nih.gov/sra (SRA, raw reads companion) - https://www.insdc.org (INSDC collaboration)