Hierarchical Navigable Small World - PKC

[[HNSW]] stands for [[Hierarchical Navigable Small World]]. It's a highly efficient [[approximate nearest neighbor]] ([[ANN]]) algorithm designed to perform fast similarity searches on large datasets of high-dimensional vectors (like embeddings). Also see [[Inverted File Indexing]]([[IVF]]). **hnsw:space in ChromaDB** When you create or connect to a ChromaDB instance, you can specify a metadata argument to configure the distance function for calculating similarity between embeddings. Setting this metadata to `hnsw:space` does the following: 1. **Optimizes for HNSW:** ChromaDB understands that you intend to use the HNSW algorithm for similarity search operations. 2. **Index Creation:** ChromaDB may automatically build appropriate HNSW indexes behind the scenes to facilitate efficient searches. 3. **Search Behavior:** When you perform similarity searches (`similarity_search`) on the database, ChromaDB will leverage the HNSW algorithm and its optimizations for finding approximate nearest neighbors. **Why Use hnsw:space?** - **Speed:** HNSW is incredibly fast for similarity searches, especially in high-dimensional spaces where embeddings typically reside. This translates to faster retrieval times in your ChromaDB powered applications. - **Scalability:** HNSW scales well to large datasets, enabling you to handle growing collections of embeddings efficiently. - **Approximate Results:** Keep in mind that HNSW is an _approximate_ nearest neighbor algorithm. There might be a slight trade-off between accuracy and speed, but this is often acceptable for many semantic search use cases. **Example** Python ``` from langchain.vectorstores import Chroma db = Chroma.from_texts([], embedding_model, metadata={"hnsw:space": "cosine"}) ``` # References ```dataview Table title as Title, authors as Authors where contains(subject, "Hierarchical Navigable Small World") or contains(subject, "HNSW") or contains(subject, "hnsw") sort modified desc, authors, title ```