Research

Parallel Learning Beats Sequential Fine-Tuning for Autonomous Driving

Zhexi Lian, Haoran Wang, Xuerun Yan, Weimeng Lin, Xianhong Zhang, Yongyu Chen, Jia HuApr 14, 2026 · about 1 month ago

Researchers propose PaIR-Drive, a parallel training framework that combines imitation learning and reinforcement learning for end-to-end autonomous driving without sequential fine-tuning. Rather than using RL to refine a pretrained IL policy, the method trains both branches in parallel with separate objectives, eliminating policy drift and performance ceilings. The approach achieves competitive benchmarks on NAVSIMv1 and v2 while outperforming existing RL fine-tuning methods and even correcting suboptimal human driving behaviors.

PaIR-Drive trains imitation learning and reinforcement learning in parallel rather than sequentially, avoiding policy drift and performance plateaus
The framework uses a tree-structured trajectory neural sampler with grouped relative policy optimization to improve exploration in the RL branch
Achieves 91.2 PDMS and 87.9 EPDMS on NAVSIMv1 and v2 benchmarks, outperforming sequential RL fine-tuning approaches
Eliminates need to retrain RL when applying new IL policies, reducing computational overhead and enabling faster iteration

End-to-end autonomous driving has relied on imitation learning from human demonstrations, but this approach hits a ceiling when human data quality is limited. Sequential RL fine-tuning has been the standard workaround, but it introduces instability and depends heavily on the initial IL policy. This parallel framework addresses a fundamental architectural limitation in how learning signals are combined, potentially unlocking better performance from existing datasets.

Parallel training architectures may be more effective than sequential fine-tuning for combining learning paradigms in other domains beyond autonomous driving
The framework's ability to outperform human expert behaviors suggests RL can meaningfully improve upon imitation learning without catastrophic forgetting or policy drift
Reduced retraining requirements could accelerate development cycles for teams iterating on IL baselines, lowering barriers to experimentation

Our Briefing

Weekly signal. No noise. Built for founders, operators, and AI-curious professionals.

No spam. Unsubscribe any time.