FINAL YEAR PROJECT · HKU COMPUTER SCIENCE · 2024-25
Optimization of Reinforcement Learnings Based on LLM Training
Integrating EAGLE3 speculative decoding into VeRL training pipeline to speed up the generation phase of RLHF training.
KEY NUMBERS
21.8%
Training time reduction
DP=1, TP=2 configuration
2.537
JSD acceptance length
Best among all strategies
~0.85
Validation score
Lossless across all methods
TRAINING PIPELINE
Each training step follows this four-phase loop. This project accelerates the generation phase and adds the draft update phase.
BUILT WITH
VeRL·SGLang·EAGLE3·Qwen3-4B·2×A100 GPU
SUPERVISION
This research is supervised by Prof. Chuan Wu from the Department of Computer Science at HKU, with mentorship from Guangming Sheng, a PhD researcher in Prof. Wu’s group and a core contributor to the VeRL framework.