# Standards, Terminologies and Ontologies This index covers all data format standards, metadata frameworks, terminologies, and ontologies in the graph. ## Research data management Metadata vocabularies, provenance standards, and persistent identifier schemes that enable FAIR data management across all research domains. - [[DCAT]] (Data Catalog Vocabulary) is the W3C standard that powers discoverability across [[EOSC]] and [[Recherche Data Gouv]]. - [[Dublin Core]] provides 15 basic metadata elements widely used as a base metadata layer across repositories including [[Zenodo]] and [[HAL]]. - [[OBI]] (Ontology for Biomedical Investigations) provides a formal vocabulary for describing study protocols and experimental designs. - [[PROV-O]] is the W3C Provenance Ontology and the formal foundation on which [[NIDM]] and DataLad provenance tracking are built. - [[RRID]] (Research Resource Identifiers) are persistent identifiers for reagents, software, and core facilities, governed by [[NIF]]. - [[ROR]] (Research Organization Registry) provides persistent identifiers for research institutions. ## Neuroimaging Data format standards, metadata frameworks, and annotation vocabularies for brain imaging data. - [[BIDS]] is the Brain Imaging Data Structure, the widely adopted community standard for organising neuroimaging datasets. - [[CIFTI]] is a surface and volume (greyordinate) format for cortical data developed by the [[Human Connectome Project]]. - [[Cognitive Atlas]] is an ontology of cognitive processes and tasks used by [[NeuroVault]] and [[BIDS]] for task annotation. - [[DICOM]] is the standard clinical imaging format and the source format converted to [[NIfTI]]. - [[NIfTI]] (.nii/.nii.gz) is the widely adopted processed neuroimaging format. - [[NIDM]] is the Neuroimaging Data Model, a [[PROV-O]]-based standard for representing neuroimaging experiment provenance. - [[Open Brain Consent]] provides GDPR-compatible model informed consent forms for open sharing of neuroimaging and electrophysiology participant data, endorsed by [[INCF]]. - [[openMINDS]] is the metadata framework required for data deposited on [[EBRAINS]]. - [[UBERON]] is a cross-species anatomy ontology used for brain region annotation in [[EBRAINS]], [[NWB]], and the [[Allen Institute for Brain Science]]. ## Bioimaging File formats and metadata standards for biological microscopy and bioimaging data. - [[OME File Formats]] covers the two [[OME]] file formats: OME-TIFF for archival use and OME-Zarr for cloud-native large datasets. - [[REMBI]] (Recommended Metadata for Biological Images) is the community metadata framework for bioimaging datasets. - [[SWC]] is a widely adopted format for three-dimensional neuronal and glial morphology reconstructions, endorsed by [[INCF]] in 2024. ## Neurophysiology File formats and annotation standards for electrophysiology, EEG, and computational neuroscience data. - [[BrainVision]] is the Brain Products three-file EEG format (.vhdr/.vmrk/.eeg), one of the formats accepted by [[BIDS]]. - [[EDF]] (European Data Format) is a widely used format for clinical EEG, iEEG, and polysomnography. - [[HED]] (Hierarchical Event Descriptors) provides structured event annotation integrated into [[BIDS]] and [[NWB]]. - [[Neo]] is an open Python object model and I/O library for electrophysiology data. - [[NeuroML]] is a simulator-independent XML format for describing computational neuron and network models, endorsed by [[INCF]]. - [[NWB]] (Neurodata Without Borders) is a community standard for electrophysiology and calcium imaging data. - [[SPARC SDS]] is the SPARC Data Structure, the NIH SPARC programme standard for peripheral nervous system data. ## Genomics and single-cell Sequencing file formats, variant standards, and single-cell data formats covering the pipeline from raw reads through to annotated expression matrices. - [[AnnData]] is the widely adopted standard format (h5ad) for single-cell genomics data in the Scanpy and scverse ecosystem. - [[Cell Ontology]] is the OBO Foundry ontology for cell types, required for single-cell data annotation in [[CELLxGENE]] and [[BICAN]]. - [[FASTQ]] is the standard format for raw sequencing reads and the primary output of all NGS instruments. - [[GO]] (Gene Ontology) covers biological process, molecular function, and cellular component and is used in transcriptomics workflows. - [[Phenopackets]] is the [[GA4GH]] standard (ISO/TS 5435) linking clinical phenotypes via [[HPO]] to genomic data, supporting both [[VCF]] and [[VRS]] as variant formats. - [[SAM-BAM-CRAM]] are the standard aligned sequencing read formats that form the pipeline backbone between [[FASTQ]] and [[VCF]]. - [[Seurat]] is the R-ecosystem counterpart to [[AnnData]], providing the standard data object for single-cell RNA-seq analysis in R. - [[VCF]] (Variant Call Format) is the standard format for genomic variant data, with open-access variants deposited in [[EVA]] (Europe) or [[dbSNP]] (US). - [[VRS]] (Variant Representation Specification) is the [[GA4GH]] standard for computationally precise, globally unique variant identifiers that complement [[VCF]] notation across genome builds. ## Clinical data models and interoperability Data models and exchange standards for structuring, querying, and sharing clinical and health data across systems and institutions. - [[CDISC]] provides clinical trial data standards (SDTM, ADaM, CDASH) for regulatory submissions. - [[HL7 FHIR]] (Fast Healthcare Interoperability Resources) is mandated by [[EHDS]] for EHR exchange. - [[OMOP CDM]] is the [[OHDSI]] Common Data Model for federated observational health research. - [[openEHR]] is a semantic EHR specification built around reusable archetypes and templates. ## Clinical classification and coding Terminologies and classification systems for diagnoses, procedures, observations, and research data coding in clinical and health settings. - [[CCAM]] is the French national procedure classification present in [[SNDS]] and [[AP-HP]] PMSI billing data. - [[ICD-10]] is the WHO disease classification. The French version (CIM-10) is used throughout [[SNDS]] and [[AP-HP]] billing. - [[ICD-11]] is the updated WHO classification in force since 2022. France is currently in transition from [[ICD-10]]. - [[ICD-O-3]] is the WHO/IARC dual-axis tumour classification for cancer registries, coding both anatomical site and histological type. It is required by [[OSIRIS]] and all French cancer registries. - [[LOINC]] is the international standard for identifying lab tests, biomarkers, and clinical observations. - [[MeSH]] is the NLM controlled vocabulary (~30,000 descriptors as of 2024) used for PubMed indexing and [[ClinicalTrials.gov]]. - [[OSIRIS]] is the French national minimum dataset for oncology clinical and genomic data sharing, aligned with [[HL7 FHIR]] and [[ICD-O-3]], funded by INCa. - [[SNOMED CT]] is a comprehensive clinical terminology and the core vocabulary in [[OMOP CDM]] and [[HL7 FHIR]]. ## Drug and chemical terminologies Controlled vocabularies for drugs, chemicals, and adverse events used in pharmacological research and clinical trials. - [[ATC]] is the WHO Anatomical Therapeutic Chemical classification, the international standard for drug utilisation and an [[OMOP CDM]] vocabulary. - [[ChEBI]] (Chemical Entities of Biological Interest) is the EMBL-EBI ontology covering drugs, metabolites, and neurotransmitters. - [[MedDRA]] is the international terminology for adverse event coding required in clinical trial regulatory submissions to the EMA and ANSM. - [[NCIT]] (NCI Thesaurus) is the NCI cancer and clinical research terminology used as a controlled terminology source in [[CDISC]] SDTM submissions. - [[RxNorm]] is the NLM standard for clinical drug names and identifiers and the primary drug vocabulary in [[OMOP CDM]]. ## Disease, phenotype, and variant curation Ontologies and reference resources for classifying diseases, annotating phenotypes, and curating the clinical significance of genomic variants. - [[ADO]] (Alzheimer's Disease Ontology) covers biomarkers, staging, and genetics relevant to Alzheimer's cohort data annotation. - [[ClinVar]] is the NCBI database of clinical variant interpretations and pathogenicity classifications, curated by [[ClinGen]] expert panels. - [[ERN Vocabularies]] are the ERN-RND and ERN-EpiCARE patient registry terminologies, combining [[ORDO]], [[HPO]], and [[OMOP CDM]]. - [[HPO]] (Human Phenotype Ontology) provides over 18,000 phenotypic abnormality terms (as of 2024) and is the primary vocabulary for rare disease genomics. - [[MONDO]] (Monarch Disease Ontology) harmonises [[ICD-10]], [[OMIM]], and [[ORDO]] into a single disease hierarchy. - [[NBO]] (Neurobehavior Ontology) describes behavioural phenotypes in both humans and model organisms. - [[OMIM]] (Online Mendelian Inheritance in Man) is a curated compendium of gene-disease relationships, identified by MIM numbers. - [[ORDO]] (Orphanet Rare Disease Ontology) is the European standard classification for rare neurological diseases.