**CityHash** is a family of non-cryptographic hash functions designed by [[Google]]. It is optimized for speed and for producing high-quality hash values, particularly for strings and other variable-length data. CityHash is known for its efficiency and good distribution properties, making it suitable for hash-based data structures, data storage, and quick lookups.
### Key Features of CityHash
1. **Speed**:
- CityHash is designed to be extremely fast on modern CPUs, making efficient use of processor features like SIMD (Single Instruction, Multiple Data) and pipelining.
- It is particularly optimized for short strings but performs well with long strings as well.
2. **Good Distribution**:
- Provides excellent distribution of hash values, reducing the likelihood of collisions and ensuring uniform distribution across the hash space.
3. **Variety of Functions**:
- CityHash includes multiple variants, such as CityHash64, CityHash128, and CityHash256, which produce 64-bit, 128-bit, and 256-bit hash values respectively.
- Each variant is optimized for different use cases and input sizes.
4. **Non-Cryptographic**:
- Like MurmurHash, CityHash is not designed for cryptographic purposes and should not be used in security-sensitive applications.
### Comparison of CityHash and MurmurHash
**1. Performance**:
- **CityHash**: Typically faster than [[MurmurHash]], especially for longer strings due to its optimizations for modern CPU architectures. It efficiently handles large blocks of data using advanced processor features.
- **MurmurHash**: Also very fast, particularly optimized for general-purpose hashing and small to medium-sized inputs. While it may not be as fast as CityHash for longer inputs, it is still highly efficient.
**2. Quality of Hash Values**:
- **CityHash**: Known for excellent distribution and low collision rates, particularly suited for applications requiring high-quality hash functions.
- **MurmurHash**: Provides good distribution and low collision rates as well, but may not be as optimized as CityHash for certain patterns and longer inputs.
**3. Variants and Flexibility**:
- **CityHash**: Offers multiple variants (CityHash64, CityHash128, CityHash256), allowing users to choose the appropriate hash size for their needs.
- **MurmurHash**: Provides variants like MurmurHash3, which includes 32-bit, 64-bit, and 128-bit versions, giving flexibility similar to CityHash.
**4. Use Cases**:
- **CityHash**: Often used in Google's internal systems for tasks requiring fast, high-quality hash functions, such as data storage, retrieval, and processing.
- **MurmurHash**: Widely used in general-purpose applications, including hash tables, data partitioning, and checksums, where speed and good distribution are essential.
**5. Implementation Complexity**:
- **CityHash**: Slightly more complex due to its optimizations and use of advanced CPU features. This complexity can lead to better performance but might require more careful integration.
- **MurmurHash**: Simpler to implement and integrate, with straightforward arithmetic and bitwise operations making it accessible and easy to use in various environments.
### Example Usage
Here’s how you might use CityHash and MurmurHash in Python (note that CityHash requires a third-party library):
**CityHash Example**:
```python
import cityhash
data = "example data"
# Compute a 64-bit hash
hash_value_64 = cityhash.CityHash64(data)
print(f"CityHash64: {hash_value_64}")
# Compute a 128-bit hash
hash_value_128 = cityhash.CityHash128(data)
print(f"CityHash128: {hash_value_128}")
```
**MurmurHash Example**:
```python
import mmh3
data = "example data"
seed = 42
# Compute a 32-bit hash
hash_value_32 = mmh3.hash(data, seed)
print(f"MurmurHash3 (32-bit): {hash_value_32}")
# Compute a 128-bit hash
hash_value_128 = mmh3.hash128(data, seed)
print(f"MurmurHash3 (128-bit): {hash_value_128}")
```
### Conclusion
Both CityHash and MurmurHash are highly efficient, non-cryptographic hash functions designed for different use cases. CityHash is particularly optimized for speed on modern CPUs and works exceptionally well with long strings, making it suitable for applications needing high-quality and fast hash functions. MurmurHash, on the other hand, is a versatile and widely used hash function known for its good performance and distribution, particularly in general-purpose applications.
The choice between CityHash and MurmurHash depends on specific requirements such as input size, performance needs, and implementation complexity. Both are excellent options for ensuring efficient and effective hashing in various applications.
# References
```dataview
Table title as Title, authors as Authors
where contains(subject, "CityHash")
sort title, authors, modified, desc
```