FASTQ - - Obsidian Publish

# FASTQ — Raw Sequencing Read Format ## Overview FASTQ is the standard format for storing raw sequencing reads, the direct output of next-generation sequencing instruments. Described by Cock et al. (2010, *Nucleic Acids Research*), it stores base calls and their associated Phred-scaled quality scores in a simple four-line-per-read text format. FASTQ has no formal governance body but is universally adopted as the starting point of every genomics pipeline. Files are almost always gzip-compressed in practice. ## Position in the Genomics Pipeline FASTQ is the upstream input to alignment, which produces [[SAM-BAM-CRAM]] files. These are then processed for variant calling (producing [[VCF]]) or expression quantification (producing count matrices in [[AnnData]] for single-cell data). ## Connections - Downstream format: [[SAM-BAM-CRAM]] (after alignment) - Deposited in: [[ENA]] and NCBI SRA (open access), [[EGA]] and [[dbGaP]] (controlled access) ## Resources - https://doi.org/10.1093/nar/gkp1137 (Cock et al. 2010, Nucleic Acids Research) - https://www.ebi.ac.uk/ena (ENA) - https://www.ncbi.nlm.nih.gov/sra (NCBI SRA)