Probabilistic language models (language models) assign probabilities to a linguistic sequence. The elements in the sequence could be characters, tokens, words, sentences, or even concepts. ## evaluation The best way to evaluate a language model is to embed it into an application and measure how well it supports the intended application. Intrinsic evaluation assesses the performance of a model against a test set. The model that assigns the higher probability to the test set is the better model. However, simple probabilities are not tractable or meaningful for most assessment applications, as the probability of any sequence can be very low (resulting in [[numerical underflow]]) and is dependent on the size of the sequence. For training purposes, [[cross-entropy]] can be used. [[Perplexity]] is another measure of language model performance.