[[HNSW]] stands for [[Hierarchical Navigable Small World]]. It's a highly efficient [[approximate nearest neighbor]] ([[ANN]]) algorithm designed to perform fast similarity searches on large datasets of high-dimensional vectors (like embeddings). Also see [[Inverted File Indexing]]([[IVF]]).
**hnsw:space in ChromaDB**
When you create or connect to a ChromaDB instance, you can specify a metadata argument to configure the distance function for calculating similarity between embeddings. Setting this metadata to `hnsw:space` does the following:
1. **Optimizes for HNSW:** ChromaDB understands that you intend to use the HNSW algorithm for similarity search operations.
2. **Index Creation:** ChromaDB may automatically build appropriate HNSW indexes behind the scenes to facilitate efficient searches.
3. **Search Behavior:** When you perform similarity searches (`similarity_search`) on the database, ChromaDB will leverage the HNSW algorithm and its optimizations for finding approximate nearest neighbors.
**Why Use hnsw:space?**
- **Speed:** HNSW is incredibly fast for similarity searches, especially in high-dimensional spaces where embeddings typically reside. This translates to faster retrieval times in your ChromaDB powered applications.
- **Scalability:** HNSW scales well to large datasets, enabling you to handle growing collections of embeddings efficiently.
- **Approximate Results:** Keep in mind that HNSW is an _approximate_ nearest neighbor algorithm. There might be a slight trade-off between accuracy and speed, but this is often acceptable for many semantic search use cases.
**Example**
Python
```
from langchain.vectorstores import Chroma
db = Chroma.from_texts([], embedding_model, metadata={"hnsw:space": "cosine"})
```
# References
```dataview
Table title as Title, authors as Authors
where contains(subject, "Hierarchical Navigable Small World") or contains(subject, "HNSW") or contains(subject, "hnsw")
sort modified desc, authors, title
```