Teacher Forcing - Subhaditya's Website

# Teacher Forcing - [from](https://publish.obsidian.md/fabian-groeger/Machine+Learning+%26+Deep+Learning/Deep+Learning/Architectures/RNN/Teacher+Forcing) - Technique where the target word (ground truth word) is passed as the next input to the decoder instead of its last prediction. - common technique to train [Basic RNN Architectures](Basic%20RNN%20Architectures.md) or [Transformer](Transformer.md) - used in [imageCaptioning](imageCaptioning.md) , Machine Translation - but also in Time Series forecasting - intuition - math exam with dependent questions, e.g. a) depends on b), b) on c) and so on - if a) is wrong, all subsequent questions are also wrong - teacher forcing: after answering question a), the teacher compares it to the correct solution and grades it and then gives us the correct answer for a) to continue with - for example in sequence generation with RNN the situation is similar - each prediction depends on the last one, thus when one is wrong all subsequent will be wrong as well - no memorization can happen - the network can not look into the future - ground truth is only fed as last $y_{t-1}$ prediction not as the current $y_{t}$ - [loss](../Tag%20Pages/loss.md) does not need to be updated at each timestep, only needs to have a list with the true predictions of the model from which then the [loss](../Tag%20Pages/loss.md) is calculated - pros - training converges faster, because early predictions are very bad - cons - no ground truth label during inference, thus no teacher forcing - discrepancy between training and inference scores - can lead to poor model performance and instability - known as _Exposure Bias_