# Standards, Terminologies and Ontologies
This index covers all data format standards, metadata frameworks, terminologies, and ontologies in the graph.
## Research data management
Metadata vocabularies, provenance standards, and persistent identifier schemes that enable FAIR data management across all research domains.
- [[DCAT]] (Data Catalog Vocabulary) is the W3C standard that powers discoverability across [[EOSC]] and [[Recherche Data Gouv]].
- [[Dublin Core]] provides 15 basic metadata elements widely used as a base metadata layer across repositories including [[Zenodo]] and [[HAL]].
- [[OBI]] (Ontology for Biomedical Investigations) provides a formal vocabulary for describing study protocols and experimental designs.
- [[PROV-O]] is the W3C Provenance Ontology and the formal foundation on which [[NIDM]] and DataLad provenance tracking are built.
- [[RRID]] (Research Resource Identifiers) are persistent identifiers for reagents, software, and core facilities, governed by [[NIF]].
- [[ROR]] (Research Organization Registry) provides persistent identifiers for research institutions.
## Neuroimaging
Data format standards, metadata frameworks, and annotation vocabularies for brain imaging data.
- [[BIDS]] is the Brain Imaging Data Structure, the widely adopted community standard for organising neuroimaging datasets.
- [[CIFTI]] is a surface and volume (greyordinate) format for cortical data developed by the [[Human Connectome Project]].
- [[Cognitive Atlas]] is an ontology of cognitive processes and tasks used by [[NeuroVault]] and [[BIDS]] for task annotation.
- [[DICOM]] is the standard clinical imaging format and the source format converted to [[NIfTI]].
- [[NIfTI]] (.nii/.nii.gz) is the widely adopted processed neuroimaging format.
- [[NIDM]] is the Neuroimaging Data Model, a [[PROV-O]]-based standard for representing neuroimaging experiment provenance.
- [[Open Brain Consent]] provides GDPR-compatible model informed consent forms for open sharing of neuroimaging and electrophysiology participant data, endorsed by [[INCF]].
- [[openMINDS]] is the metadata framework required for data deposited on [[EBRAINS]].
- [[UBERON]] is a cross-species anatomy ontology used for brain region annotation in [[EBRAINS]], [[NWB]], and the [[Allen Institute for Brain Science]].
## Bioimaging
File formats and metadata standards for biological microscopy and bioimaging data.
- [[OME File Formats]] covers the two [[OME]] file formats: OME-TIFF for archival use and OME-Zarr for cloud-native large datasets.
- [[REMBI]] (Recommended Metadata for Biological Images) is the community metadata framework for bioimaging datasets.
- [[SWC]] is a widely adopted format for three-dimensional neuronal and glial morphology reconstructions, endorsed by [[INCF]] in 2024.
## Neurophysiology
File formats and annotation standards for electrophysiology, EEG, and computational neuroscience data.
- [[BrainVision]] is the Brain Products three-file EEG format (.vhdr/.vmrk/.eeg), one of the formats accepted by [[BIDS]].
- [[EDF]] (European Data Format) is a widely used format for clinical EEG, iEEG, and polysomnography.
- [[HED]] (Hierarchical Event Descriptors) provides structured event annotation integrated into [[BIDS]] and [[NWB]].
- [[Neo]] is an open Python object model and I/O library for electrophysiology data.
- [[NeuroML]] is a simulator-independent XML format for describing computational neuron and network models, endorsed by [[INCF]].
- [[NWB]] (Neurodata Without Borders) is a community standard for electrophysiology and calcium imaging data.
- [[SPARC SDS]] is the SPARC Data Structure, the NIH SPARC programme standard for peripheral nervous system data.
## Genomics and single-cell
Sequencing file formats, variant standards, and single-cell data formats covering the pipeline from raw reads through to annotated expression matrices.
- [[AnnData]] is the widely adopted standard format (h5ad) for single-cell genomics data in the Scanpy and scverse ecosystem.
- [[Cell Ontology]] is the OBO Foundry ontology for cell types, required for single-cell data annotation in [[CELLxGENE]] and [[BICAN]].
- [[FASTQ]] is the standard format for raw sequencing reads and the primary output of all NGS instruments.
- [[GO]] (Gene Ontology) covers biological process, molecular function, and cellular component and is used in transcriptomics workflows.
- [[Phenopackets]] is the [[GA4GH]] standard (ISO/TS 5435) linking clinical phenotypes via [[HPO]] to genomic data, supporting both [[VCF]] and [[VRS]] as variant formats.
- [[SAM-BAM-CRAM]] are the standard aligned sequencing read formats that form the pipeline backbone between [[FASTQ]] and [[VCF]].
- [[Seurat]] is the R-ecosystem counterpart to [[AnnData]], providing the standard data object for single-cell RNA-seq analysis in R.
- [[VCF]] (Variant Call Format) is the standard format for genomic variant data, with open-access variants deposited in [[EVA]] (Europe) or [[dbSNP]] (US).
- [[VRS]] (Variant Representation Specification) is the [[GA4GH]] standard for computationally precise, globally unique variant identifiers that complement [[VCF]] notation across genome builds.
## Clinical data models and interoperability
Data models and exchange standards for structuring, querying, and sharing clinical and health data across systems and institutions.
- [[CDISC]] provides clinical trial data standards (SDTM, ADaM, CDASH) for regulatory submissions.
- [[HL7 FHIR]] (Fast Healthcare Interoperability Resources) is mandated by [[EHDS]] for EHR exchange.
- [[OMOP CDM]] is the [[OHDSI]] Common Data Model for federated observational health research.
- [[openEHR]] is a semantic EHR specification built around reusable archetypes and templates.
## Clinical classification and coding
Terminologies and classification systems for diagnoses, procedures, observations, and research data coding in clinical and health settings.
- [[CCAM]] is the French national procedure classification present in [[SNDS]] and [[AP-HP]] PMSI billing data.
- [[ICD-10]] is the WHO disease classification. The French version (CIM-10) is used throughout [[SNDS]] and [[AP-HP]] billing.
- [[ICD-11]] is the updated WHO classification in force since 2022. France is currently in transition from [[ICD-10]].
- [[ICD-O-3]] is the WHO/IARC dual-axis tumour classification for cancer registries, coding both anatomical site and histological type. It is required by [[OSIRIS]] and all French cancer registries.
- [[LOINC]] is the international standard for identifying lab tests, biomarkers, and clinical observations.
- [[MeSH]] is the NLM controlled vocabulary (~30,000 descriptors as of 2024) used for PubMed indexing and [[ClinicalTrials.gov]].
- [[OSIRIS]] is the French national minimum dataset for oncology clinical and genomic data sharing, aligned with [[HL7 FHIR]] and [[ICD-O-3]], funded by INCa.
- [[SNOMED CT]] is a comprehensive clinical terminology and the core vocabulary in [[OMOP CDM]] and [[HL7 FHIR]].
## Drug and chemical terminologies
Controlled vocabularies for drugs, chemicals, and adverse events used in pharmacological research and clinical trials.
- [[ATC]] is the WHO Anatomical Therapeutic Chemical classification, the international standard for drug utilisation and an [[OMOP CDM]] vocabulary.
- [[ChEBI]] (Chemical Entities of Biological Interest) is the EMBL-EBI ontology covering drugs, metabolites, and neurotransmitters.
- [[MedDRA]] is the international terminology for adverse event coding required in clinical trial regulatory submissions to the EMA and ANSM.
- [[NCIT]] (NCI Thesaurus) is the NCI cancer and clinical research terminology used as a controlled terminology source in [[CDISC]] SDTM submissions.
- [[RxNorm]] is the NLM standard for clinical drug names and identifiers and the primary drug vocabulary in [[OMOP CDM]].
## Disease, phenotype, and variant curation
Ontologies and reference resources for classifying diseases, annotating phenotypes, and curating the clinical significance of genomic variants.
- [[ADO]] (Alzheimer's Disease Ontology) covers biomarkers, staging, and genetics relevant to Alzheimer's cohort data annotation.
- [[ClinVar]] is the NCBI database of clinical variant interpretations and pathogenicity classifications, curated by [[ClinGen]] expert panels.
- [[ERN Vocabularies]] are the ERN-RND and ERN-EpiCARE patient registry terminologies, combining [[ORDO]], [[HPO]], and [[OMOP CDM]].
- [[HPO]] (Human Phenotype Ontology) provides over 18,000 phenotypic abnormality terms (as of 2024) and is the primary vocabulary for rare disease genomics.
- [[MONDO]] (Monarch Disease Ontology) harmonises [[ICD-10]], [[OMIM]], and [[ORDO]] into a single disease hierarchy.
- [[NBO]] (Neurobehavior Ontology) describes behavioural phenotypes in both humans and model organisms.
- [[OMIM]] (Online Mendelian Inheritance in Man) is a curated compendium of gene-disease relationships, identified by MIM numbers.
- [[ORDO]] (Orphanet Rare Disease Ontology) is the European standard classification for rare neurological diseases.