Use of precise terminology and nomenclature when describing genomes and phenotypes is important for clearly conveying your intended meaning to your audience. Different biological systems have different conventions around the naming, capitalization, and formatting of various genes and gene products which you must pay attention to or risk accidentally referring to the wrong topic in your writing. This page covers the nomenclature of **prokaryotes**, which differ from various other model organisms and humans which have their own conventions.
## Species and genus
Across all of scientific literature, including non-prokaryotic systems, genus names are capitalized and species names are not. They are always italicized and spelt in full the first time they are used in a paper, and for each subsequent use the genus is immediately abbreviated to the first capital letter.[^1] This abbreviation is implicit and should not be introduced; it is understood without introduction.
> - **Correct:** *Streptococcus mutans* is a gram positive oral bacterium and primary causative agent of dental caries. *S. mutans* JH1140 is a dental isolate of this species that produces mutacin 1140, a bacteriocin.
> - **Incorrect (introduces abbreviation):** *Streptococcus mutans* ==(*S. mutans*)== is a gram positive oral bacterium and primary causative agent of dental caries. *S. mutans* JH1140 is a dental isolate of this species that produces mutacin 1140, a bacteriocin.
> - **Incorrect (not italicized):** ==Streptococcus mutans== is a gram positive oral bacterium and primary causative agent of dental caries. ==S. mutans== JH1140 is a dental isolate of this species that produces mutacin 1140, a bacteriocin.
> - **Incorrect (does not abbreviate):** *Streptococcus mutans* is a gram positive oral bacterium and primary causative agent of dental caries. ==*Streptococcus mutans*== JH1140 is a dental isolate of this species that produces mutacin 1140, a bacteriocin.
> - **Incorrect (abbreviates the first use):** ==*S. mutans*== is a gram positive oral bacterium and primary causative agent of dental caries. *S. mutans* JH1140 is a dental isolate of this species that produces mutacin 1140, a bacteriocin.
## Genotype
Genotype naming conventions supersede ordinary rules concerning capitalization at the start of sentences. Even if a gene is appearing at the start of the sentence, you must still adhere to these rules and ignore traditional casing conventions.
1. **Gene names:** Gene names are always lowercase and italicized. They are named with three letters, usually directly related to a suspected or proven function. If multiple genes share a common function (e.g. an operon) those genes may share the same first three letters, followed by a fourth letter, which is capitalized. If a gene lacks this fourth letter, it is usually a solitary gene not thought to be part of an operon.[^2] **Examples:** *ldh*, *lacZ*, *mutA*, *mutR*, *mutD*, *comR*
2. **Proteins and gene products:** Proteins and gene products are never italicized, and the first letter is capitalized. Examples of non-protein gene products include regulatory RNA products. **Examples:** Ldh, LacZ, MutA, MutR, MutD, ComR
3. **Deletions:** Deletion events are identified by placing a Greek capital delta ($\Delta$) immediately before the deleted element. **Example:** $\Delta mutA$ indicates that the *mutA* gene has been deleted.
4. **Insertions:** Two colons :: joining two distinct genetic elements indicates that the second element was inserted into the first element. **Example:** *alaS::IFDC2* indicates that the IFDC2 element was inserted into the *alaS* gene.[^3]
5. **Promoters:** Promoters are identified by a capital P followed by subscript of the gene they are most closely associated with. **Example:** $P_{ldh}$ is the promoter for *ldh*.
6. **Plasmids:** Plasmids are denoted by a lowercase p followed by a unique identifier that is its name. They are not italicized. **Example:** pUC19 is an *E. coli* plasmid developed by researchers at the University of California.
7. **Point mutations:** Point mutations are alleles of the wild-type gene indicated by the gene name followed by the original amino acid identity, its position, and the altered amino acid identity. The point mutation information following the gene name is not italicized, and may or may not be superscripted. **Example:** *mutA*R13A (or $mutA^{\mathrm{R13A}}$) indicates that position 13 of the mutA gene was altered from an arginine to an alanine.[^4][^5]
8. **Fusions:** A dash connecting two distinct genetic elements with no spaces indicates a fusion event, where the latter element has been attached to the former either transcriptionally or translationally. **Example:** $P_{comX}\text{-}lacZ$, where the promoter for *comX* is now driving the transcription of *lacZ*, a common reporter gene for $\mathrm{\beta\text{-}galactosidase}$ assays.
## Phenotype
Phenotypes are never italicized.
1. **Prototrophy or auxotrophy:** The ability to metabolize or synthesize a particular compound (prototrophy) is indicated by a superscript positive sign (+). An inability to do so (auxotrophy) is indicated by a superscripted negative sign (–). **Example:** $\mathrm{Lac}^+$ strains can metabolize lactose. $\mathrm{His}^-$ strains cannot synthesize histidine, which means media must be supplemented with it. Usually, prototrophy or auxotrophy is only noted if it a) differs from the wild type state of a given strain or b) is given as critical reminder in a particular context such as a protocol.
2. **Antimicrobial resistance and susceptibility:** Resistance to an antimicrobial is indicated with a superscript R and susceptibility by a superscript S. Occasionally, an asterisk might be used in place of an S to denote susceptbility. **Example:** An $\mathrm{Erm^R}$ organism is resistant to erythromycin. An $\mathrm{Amp^S}$ (or $\mathrm{Amp^*}$) organism is sensitive to ampicillin.
Other specific traits might be specified by authors in papers using similar superscript nomenclature, such as the use of $\mathrm{Mot^+}$ to denote motility in a paper focused on motility phenotypes.
## Strain designations
1. **Wild type:** Wild type (WT) is a term used to establish a frame of reference; the primary strain being investigated is usually referred to as the WT strain and any mutations made to it create mutants to be compared against that WT. *There is no universal WT strain for any prokaryotic species*; the term is always in reference to the strain of interest, and any particular investigation will establish their WT. If an investigation is working with many strains, the authors may choose to avoid the use of the term altogether in favor of explicitly calling out each strain they use by its designation.
2. **Naming strains:** Strain designations are never italicized, and always follow the species name. There are few other rules that consistently govern their composition, as they are often named according to systems specific to laboratory-specific cataloguing practices. Often the principal investigator or researcher's initials followed by a serial number is used. **Example:** *Streptococcus mutans* JH1140 is named for Dr. Jeffrey Hillman using a serial number related to the strains discovery.
[^1]: If used in an abstract, this rule is "reset" when the main body of the text begins. Abstracts are considered separate works that should be readable with or without the main text for the purposes of introducing abbreviations and vice versa; reading and understanding the main text should not be dependent on first reviewing the abstract.
[^2]: Note that genes are usually named upon their discovery, and their initial investigations do not always capture the full picture of their function. Thus, some genes are named according to an observed phenotype later found to be mostly indirectly related to their actual function. On this note, some genes have multiple names after a subsequent discovery caused them to be recategorized into a nearby operon.
[^3]: Additionally, we can intuit that IFDC2 is not a gene itself (or at least, not a prokaryotic gene) because it does not conform to the nomenclature rules for gene names.
[^4]: **Lab specific:** My preference is for alleles to be indicated by superscript annotation, as I find it more clear. See here how without the superscript the mutA and R13A information butt up against one another, causing some visual clutter for where the gene name ends and where the point mutation information begins. This is easily resolved by superscripting the point mutation information, but again, is not standard practice.
[^5]: Note that although the annotation occurred regarding the gene, the information was in reference to the gene product. Additionally, in this example, the position of the mutation (13) is not in reference to position 13 of the peptide, but rather position 13 of the *mature peptide*. Ensure you have reviewed the context of what you are reading or writing about to determine what frame of reference the author is using as position 1, as many peptides are post-translationally modified by cleavage.