# Jaccard Distance - $D(x,y) = 1- \frac{x\cup y}{x\cap y}$ - The Jaccard index (or Intersection over Union) is a metric used to calculate the similarity and diversity of sample sets. It is the size of the intersection divided by the size of the union of the sample sets. - In practice, it is the total number of similar entities between sets divided by the total number of entities. - To calculate the Jaccard distance we simply subtract the Jaccard index from 1 - highly influenced by the size of the dat - Large datasets can have a big impact on the index as it could significantly increase the union whilst keeping the intersection similar - The Jaccard index is often used in applications where binary or binarized data are used - deep learning model predicting segments of an image - text similarity analysis to measure how much word choice overlap there is between documents