SRA - - Obsidian Publish

# SRA — NCBI Sequence Read Archive ## Overview The Sequence Read Archive (SRA) is a publicly accessible repository of high-throughput sequencing data, operated by NCBI/NIH. Established in 2009, it is the US node of the International Nucleotide Sequence Database Collaboration (INSDC), synchronising data daily with [[ENA]] (Europe) and [[DDBJ]] (Japan) so that a submission to any of the three archives becomes accessible via all three. SRA accepts sequencing data from any organism or environment, including metagenomic and environmental surveys, and stores data in both the original submitted format and a standardised SRA normalised format. The preferred submission format is BAM, which can store both aligned and unaligned reads. As of 2022, SRA holds over 9 million records and more than 20 petabytes of data, available on NCBI servers, AWS, and Google Cloud Platform. ## Open and controlled access SRA operates two tiers. The open-access tier is freely downloadable by anyone. The controlled-access tier stores sensitive human genomic data: sequence files are held in a restricted SRA partition, while [[dbGaP]] manages the associated metadata, consent records, and data access committee approvals. Once a researcher receives approval through dbGaP, they use a dbGaP access token to retrieve the sequence files from SRA's controlled-access storage. Open-access SRA data is freely downloadable via the SRA Toolkit, a command-line tool for converting SRA files to FASTQ or SAM format. ## Connections - Part of: INSDC (with [[ENA]] and [[DDBJ]]) - Controlled-access counterpart: [[dbGaP]] (human sensitive data) ## Resources - https://www.ncbi.nlm.nih.gov/sra (SRA home) - https://www.ncbi.nlm.nih.gov/sra/docs/ (documentation) - https://submit.ncbi.nlm.nih.gov/about/sra/ (submission guide) - https://www.insdc.org (INSDC collaboration) - https://doi.org/10.1093/nar/gkab1028 (Katz et al. 2022, Nucleic Acids Research)