# Beam Search - ![[Pasted image 20220810155603.webp]] - ([from](https://publish.obsidian.md/fabian-groeger/Machine+Learning+%26+Deep+Learning/Deep+Learning/Architectures/RNN/Beam+Search)) - at each step, keep track of the k mos probable translation hypotheses (k, beam size) - examine the k most probable words for each hypothesis, compute their entire scores, keep k best ones - not guaranteed to find optimal solution, but more efficient than exhaustive search - it does not only take the best word, it rather takes the best B (user specified) and generates multiple hypothesis, which will then be evaluated and the best one at each step is chosen for the next ones - not guaranteed to find the optimal solution and is therefore an approximate search - problems: - when multiplying a lot of probabilities of very unlikely word (e.g. almost 0 but not exactly), the result will get very small and the system can no longer represent it. results in numerical underflow --> instead of multiplying, summing the log of probabilities (more numerical stable) - if the sentence is very long, the probabilities get very low, therefore it rather takes smaller translations --> normalize the output by the number of word in the translation (average of the log of each word) - how to choose beam width $B$? - the smaller, the fewer probabilities are considered (worse result, faster) - the larger, the more are considered, but the more computing expensive it is (better results, slower) - try out different values and cross check