UUID - Never Complete Only Abandoned

UUID "Universally Unique Identifier", as the name implies, is intended to be universally unique, but in practice attain this is much more difficult than the creators expected. # History Originally developed at [[Apollo Computer]] for their [[Network Computer System]] which was used by their [[Domain-OS]] operating system. # Versions All UUIDs use [[Hexadecimal]] encoding for the user presentation. ## Version 1 ![[uuid-v1-diagram.webp]] | Field | Size | | | -------------- | ---- | -------------------------------------------- | | Timestamp | | Least significant bit first, Gregorian Epoch | | Timestamp | | | | Timestamp | | | | Clock Sequence | | Incremented for each generated UUID | | Node ID | | 48-bit MAC address | Timestamp originally implied to be a signed integer which would overflow around 3400, but some implementations use unsigned integers which would overflow around 5623. ## Version 2 >[!DANGER] Avoid > Replaces the "low time" field with a user ID, removing most of the entropy from the result and increasing collision likelihood. Identical to Version 1 except that the "low time" is replaced with the user ID of who generated it. | Field | Size | | | -------------- | ---- | -------------------------------------------- | | Timestamp | | Least significant bit first, Gregorian Epoch | | Timestamp | | | | User ID | | | | Clock Sequence | | Incremented for each generated UUID | | Node ID | | 48-bit MAC address | ## Version 3 Deterministic generation using hashing. Uses [[MD5]]. Intended to represent names unique to a given context is is generated by hashing a namespace UUID with a name (string). This is similar to how salting a password works, but UUIDs are not intended or recommended for use cryptographically. > The namespace identifier is itself a UUID. The specification provides UUIDs to represent the namespaces for URLs, fully qualified domain names, object identifiers, and X.500 distinguished names; but any desired UUID may be used as a namespace designator. \- Wikipedia ## Version 4 Randomly generated. Most commonly used. ![[uuid-v4-diagram.webp]] ## Version 5 Deterministic generation using hashing. Identical to Version 3, except it uses [[SHA1]]. The SHA1 is truncated to 128 bits. Still not recommended for cryptographic purposes. ## Version 6 (proposed) Identical to Version 1, except that the time fields are listed most-significant first to improve sortability. ![[uuid-v6-diagram.webp]] | Field | Size | | | -------------- | ---- | ------------------------------------------- | | Timestamp | | Most significant bit first, Gregorian Epoch | | Timestamp | | | | Timestamp | | | | Clock Sequence | | Incremented for each generated UUID | | Node ID | | 48-bit MAC address | ## Version 7 (proposed) Identical to Version 6 except uses the [[Unix Epoch]] instead of the Gregorian calendar date. Additionally field 5 "node" from Version 1 is replaced with a randomly generated value. | Field | Size | | | -------------- | ---- | ------------------------------------------ | | Timestamp | | Most significant bit first, [[Unix Epoch]] | | Timestamp | | | | Timestamp | | | | Clock Sequence | | Incremented for each generated UUID | | Random | | | ## Version 8 (proposed) This only specifies the format and placement of version indicator in the leading position of the 3rd field. How the fields are generated is not specified. This is intended to preserve interoperability even in vendor-specific contexts. # Alternatives A single specification cannot solve every problem well due to the competing interests of sortability versus security. Some situations call for one or the other, but they are inherently at odds. ## CUID >[!DANGER] Deprecated CUID v1 suffers from the same security issues as any of the other UUID variants which use timestamp information. Sortability and monotonically increasing values are simply less secure when guessing them is a security risk. ## CUID2 An extension of the UUID concept, with a huge amount of additional complexity to improve randomness. ## Snowflake ID Originally developed by the engineers at [[Twitter]], variants have been used by other social media platforms. In this context security isn't a concern, but sortability is very important. ### Twitter In use since 2010. | Field | Size | | | --------------- | ---- | ---------------------------------------------------------------------------- | | Timestamp | 41 | Miliseconds since Epoch | | Machine ID | 10 | | | Sequence Number | 12 | Increments for each new ID generated that millisecond by the current machine | ### Tumblr ### Discord ### Mastodon ## ULID > Universally Unique Lexicographically Sortable Identifier UUID can be suboptimal for many use-cases because: - It isn't the most character efficient way of encoding 128 bits of randomness - UUID v1/v2 is impractical in many environments, as it requires access to a unique, stable MAC address - UID v3/v5 requires a unique seed and produces randomly distributed IDs, which can cause fragmentation in many data structures - UUID v4 provides no other information than randomness which can cause fragmentation in many data structures It uses [[Base 32]] encoding. ## NanoID NanoID is more of a tool to generate IDs of various types. They can vary by length and by available character set. # Problems ## Storage and Memory UUIDs stored as strings rather than binary blobs take up around 5 times as much space. And twice as much space as 64-bit integer IDs. ## As a Primary Key for MySQL SQL databases which expect sequential primary keys (a concerning constraint) such as [[InnoDB]] as used by [[MySQL]] will over-provision space because they cannot handle sparse arrays cleanly. The primary key is repeated in the index directly, leading to an increase in storage and memory usage in all indexes. [[MySQL]] only supports Version 1 UUID generation internally, producing strings, which can be combined with `uuid_to_bin` to convert them to binary representation and re-order the fields, resulting in what is essentially a Version 6 UUID. ## As a Primary Key for PostgreSQL PostgreSQL has a built-in [`gen_random_uuid()`](https://www.postgresql.org/docs/current/functions-uuid.html) function to generate Version 4 UUIDs and supports UUID primary keys out of the box. Supports a special UUID column type, which stores them in binary format and displays them as strings, transparently and without manual conversion. Indexes may be 40% larger. Performance vs 32-bit primary keys is within 15%. ## Security There are variants of UUID which store information about their origin, creator, and/or time of generation. This information may be very useful in some contexts, but they should not be used when security is paramount. This is why Version 4 is the most commonly used variant, as it is purely random (aside from the bits reserved for the version indicator). As explained by # References - https://en.wikipedia.org/w/index.php?title=Universally_unique_identifier ## Databases - https://www.cybertec-postgresql.com/en/int4-vs-int8-vs-uuid-vs-numeric-performance-on-bigger-joins/ - https://planetscale.com/blog/the-problem-with-using-a-uuid-primary-key-in-mysql ## ULID - https://github.com/ulid/spec - https://blog.bitsrc.io/ulid-vs-uuid-sortable-random-id-generators-for-javascript-183400ef862c ## Snowflake ID - https://en.wikipedia.org/wiki/Snowflake_ID ## CUID / CUID2 - https://github.com/paralleldrive/cuid2/issues/7 ## NanoID - https://github.com/ai/nanoid - https://planetscale.com/blog/why-we-chose-nanoids-for-planetscales-api - https://medium.com/@gaspm/nano-id-popular-secure-and-url-friendly-unique-identifiers-1fa86c9fdf7c - https://dev.to/harshhhdev/uuidguid-cuid-nanoid-whats-the-difference-5dj1