NVIDIA's framework for RL-based LLM post-training. Sits on top of [[MegatronLM]] for training and integrates with inference engines like [[vLLM]] for rollouts. The orchestration layer that coordinates the rollout-train-update loop.
## What it provides
- **RL recipes**: implementations of [[GRPO]], [[PPO]], DPO, RLOO, and other post-training algorithms.
- **Rollout coordination**: spawns inference engine workers, distributes prompts, collects rollouts, sends them back to the trainer.
- **Weight synchronization**: broadcasts updated policy weights from the learner to the rollout engine each step. This is non-trivial at frontier scale because the model can be tens of GB and weight transfer can become a bottleneck.
- **Verifier integration**: scoring rollouts for tasks like math (string match against ground truth), code (sandbox execution), or reward-model scoring.
- **Async execution**: supports async-RL setups where rollout generation and training run concurrently with some policy lag.
## Naming history
- **NeMo-Aligner**: the original name, introduced in Shen et al. 2024.
- **NeMo-RL**: the current name. Refactored library form, designed to integrate cleanly with Megatron-Core.
These are the same lineage; older papers reference NeMo-Aligner, newer ones use NeMo-RL.
## Architecture
Three components running in parallel:
1. **Learner** ([[MegatronLM]]): holds the current policy, computes log-probs and gradients.
2. **Rollout engine** ([[vLLM]]): samples completions from the policy.
3. **Verifier**: scores completions for reward.
Per RL step:
1. Learner sends weights → rollout engine.
2. Rollout engine generates completions for a batch of prompts → verifier.
3. Verifier scores completions → returns rewards to learner.
4. Learner recomputes log-probs (because vLLM's logits don't always match), applies the loss, takes a gradient step.
## Where it shows up in my notes
The spec decoding RL paper builds on NeMo-RL and inserts spec decoding into the rollout engine, with the rest of the orchestration unchanged.
## References
- Shen et al., _NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment_ (2024).
- NeMo-RL documentation: https://github.com/NVIDIA/NeMo-RL