From [[Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware]].
---
**Action Chunking**
Tries to work against the [[The Compounding Error Problem in Imitation Learning|compounding error problem]] in [[- Imitation Learning -|imitation learning]] by using chunks of $k$ actions. The [[Policy|policy]] models $\pi_\theta(a_{t:t+k} \mid s_t)$ rather than $\pi_\theta(a_t \mid s_t)$.
**Temporal ensembling**
The policy is queried at every timestep, producing overlapping predictions for each future timestep. These are combined via exponentially-decaying weights $w_i = \exp(-m \cdot i)$, where $i$ indexes how old the prediction is. Algorithm 2 maintains a FIFO buffer of size $T$.
![[Pasted image 20251213100041.png]]
> [!brainwaves] Why Chunking can help
> Trades off some reactive flexibility for lower drift.
**Style Variable at Training Time**
- Full [[Transformer]] architecture, input state (images, joints, latent $z$ for multimodality at training time), output action chunk
- Encoder processes observation
- Decoder generates sequence
- At training time additional encoder for better training signal
![[Pasted image 20251213095949.png|center|697]]