A database management system (DBMS) is a software for managing a [[database]].
Popular DBMS include Microsoft Access, [[MySQL]], [[Postgres]], and noSQL databases like [[MongoDB]].
Contrast to a file system that represents a collection of individual files accessed by individual applications. The data are separate, isolated, often duplicated, inconsistent, and not joinable (imagine a collection of spreadsheets in a file system like Google Drive). A DBMS ensures consistency and referential integrity facilitating effective and efficient data storage. The primary downside to a DBMS is the complexity excludes most users from interacting directly with it without developing a custom UI. (Although in many cases this is a benefit as it helps preserve data integrity).
## database
A database organizes data through entities, attributes and relationships.
- **Entities** are the abstract representations of the objects of interest (analogous to a [[Class]] in programming).
- **Attributes** are the characteristics of interest for which data are stored.
- **Relationships** are associations between two or more entities. These are named with a verb by convention.
These three are often described using an [[entity-relationship diagram]].
Each individual record is referred to as an **instance**. Each instance will have an **identifier**, which is a special attribute used to identify a specific instance.
Databases include user data, metadata, application data, and indices.
- **User data** refers to the data users work with, often stored in tables.
- **Metadata** are data about data. These describe the data including table name, column name, data type, primary keys, etc. Metadata are stored in system tables accessible only to administrators.
- **Indices** store representations of data to facilitate storing, searching and sorting.
- **Application metadata** include forms, reports and queries.
The degree of a relationships is the number of entities that participate in the relationship. Relationships of degree one are called **unary** (sometimes recursive) relationships. Relationships of degree two are called **binary** relationships. Most relationships in databases are binary. Relationships of degree three or more cannot be represented directly in a DBMS and must be converted to degree two relationships.
Cardinality refers to the number of instances of the entity involved in the relationship. The three types include one to many (1:N), many to one (N:1) and many to many (N:M). This is also called maximum cardinality as it represents the most entities that will participate in a relationship.
Participation (also called optionality) can be mandatory or optional. It is also called minimum cardinality as optional participation implies cardinality can be 0.
Each relationship can be defined by the combination of cardinality and participation. Options include
- (1, 1) required single relationship
- (0, 1) optional single relationship
- (1, N) required multiple relationship
- (0, N) optional multiple relationship
Relational databases are databases based on relational models. In a relational database, a relation is table with columns and rows (not a relationship). The **degree** of a relation is the number of columns (or attributes). The **cardinality** of a relation is the number of rows (or instances). The **domain** is the set of allowable values for one or more attributes. A row is also called a **tuple**. Every tuple in a relation is distinct.
A superkey is an attribute or set of attributes that uniquely identifies an instance. A candidate key is a minimal superkey. The primary key is the chosen candidate key. When there are multiple candidate keys, alternate keys are the candidate keys not chosen.
A foreign key represents the relation in another table. When relation A has a 1:N relationship with relation B, the foreign key will be placed in relation B. When the two relations are 1:1, the foreign key can be in either table (unless you decide to merge the two relations into one relation). When the relations are N:M, a **many-to-many table** (also called **intersection**) must be set up dedicated to store the relationships. For unary 1:M, the foreign key can be stored in the same relation. For unary N:M relations, a many-to-many table must be set up.
The relational database schema defines the relations and their attributes (where each relation has a relation schema). The general format is
- Name (Primary Key, Attribute, Foreign Key(fk)...)
The attribute(s) of the primary key should be underlined. Any foreign key should be noted with (fk).
## normalization
Database normalization entails designing the schema with minimal attributes and minimal redundancy to support the requirements. The level of normalization is called Normal Form.
To proceed with normalization systematically, first identify functional dependencies in each relation. If A determines B, then B is functionally dependent on $A$ (written $A \to B)$. In many cases, a set of attributes is functionally dependent on another subset of attributes
$
A : (a_1, a_2, \dots, a_n) \to B: (b_1, b_2, \dots, b_n)
$
If there is no proper subset of $A$ that also determines $B$, then $A \to B$ is a full functional dependency (in other words, you need all attributes in $A$ to determine $B$). If there are redundant attributes in $A$, then $A \to B$ is a partial dependency. A transitive functional dependency occurs when $A \to B$ and $B \to C$.
## normal form
Normal form describes the level of normalization.
- 1NF: Each cell contains only one value.
- 2NF: Every attribute is fully functionally dependent on the primary key (eliminate partial functional dependencies).
- 3NF: No attribute is transitively dependent on the primary key.
While 3NF is typically sufficient, there are higher normal forms including Boyce-Codd 3NF, 4NF and 5NF.
[[database normalization]]