# Overview
There are two very broad general cryptographic approaches:
- Hashing (one-way functions)
- Encryption (two-way functions)
A cryptographic process can be applied to data at various levels of organization/storage:
- One or more pieces of data in a file
- A file (or set of files) on a storage drive
- An entire storage drive
# Hash Functions
## Ideal properties of hash functions
Irreversible:
- Cannot decode a hash value to produce the original data (i.e., it is a “one-way” process)
Uniqueness:
- Any two pieces of data that are not exactly alike will not have the same hash value (in the encryption lingo, there are no collisions)
- Any change to original (input) data will change the hash (output) value
Examples of the MD5 hash
| Data<br>(input) | Hash<br>(output) |
| --------------- | :--------------------------------: |
| NDSU | `58D0D125356C2ABE8F0E697B2F48D600` |
| ndsu | `AD82DACAE44DE0AEAEFB18F9FE27F191` |
| N.D.S.U. | `0E2660070296E81F55503104B51AC277` |
| ND State | `960CE662AD227F63D6141EC3506A4DBC` |
%%
Examples of SHA-1 hashes
Data (input) SHA-1 hash (output)
NDSU `3B5381CDEE06A6B204D308D8E5679F78B3962066`
ndsu `C70B0E2C91956C9F8D1A968C1EEE101A46CD96C9`
N.D.S.U. `13227128853F0EADF7D75CAF97BA1C7AD15F3E8A`
ND State `C53733393E420B7112949541807A61E804E2D318`
North Dakota State University `DE1D80FE7A2B856881A78A4F7C49AD6520A5C761`
%%
## Uses of hash functions
- Unique and anonymous data identifier—e.g., put in a social security number (SSN), get back a unique identifier that cannot be decoded back to the original SSN
- Data integrity (checksums, digital signatures) of messages (such as emails) or entire files; this prevents a third party from modifying a message or file
- Password storage and verification
> [!example]
> Say your account password for some website is "Bison123" (without the quotation marks). The website does not save the literal password in an unencrypted form; rather, when you first created an account, it ran the password you gave through a hash function and stored that result. So, it saved a seemingly random string of characters: `21FB2166E8ADC37AD58F2157C45FC267` (incidentally, this is an MD5 hash). Anytime you login to the website and enter a password, it runs your input through the same hash function and compares that result with the hash already stored for your account. If they match, then you have entered the correct password.
## Implementation
Various has functions are typically standard built-in function for most any relational database system.
There are packages in R that handle many hash functions. However, most other statistical software programs (such as Excel, Stata, and SPSS) do not have hash functions.
There are numerous online hash generators (e.g., https://hash.online-convert.com/, https://www.fileformat.info/tool/hash.htm).
## Recommended types of hash functions
==Update this list; outdated algorithms==
MD5
Message Digest, Algorithm #5
converts data to a 32-digit hexadecimal number
SHA-1
∙Secure Hash Algorithm #1
∙converts data to a 40-digit hexadecimal number
SHA-2
Secure Hash Algorithm #2
converts data to a 64-digit hexadecimal number
# Encryption
## Ideal properties of encryption
Can be encrypted and decrypted (a “two-way” process where the original data can be recovered)
Practically unbreakable (i.e., the only feasible way to crack an encryption is to correctly guess the password)
## Advanced Encryption Standard (AES)
AES (also known as Rijndael) is currently the best encryption method.
Adopted by the US Government and approved by the NSA for use with information classified as Top Secret.
## Encryption software using AES
File encryption:
- 7-Zip (http://www.7-zip.org)
- AxCrypt (http://www.axantum.com/axcrypt) ==Need to find a replacement for this.==
Full disk encryption:
- TrueCrypt (http://www.truecrypt.org)
- BitLocker (now comes with Windows)