FINAL YEAR PROJECT · HKU COMPUTER SCIENCE · 2024-25

Optimization of Reinforcement Learnings Based on LLM Training

Integrating EAGLE3 speculative decoding into VeRL training pipeline to speed up the generation phase of RLHF training.

KEY NUMBERS

21.8%

Training time reduction

DP=1, TP=2 configuration

2.537

JSD acceptance length

Best among all strategies

~0.85

Validation score

Lossless across all methods

TRAINING PIPELINE

Each training step follows this four-phase loop. This project accelerates the generation phase and adds the draft update phase.

BUILT WITH

VeRL·SGLang·EAGLE3·Qwen3-4B·2×A100 GPU

SUPERVISION

This research is supervised by Prof. Chuan Wu from the Department of Computer Science at HKU, with mentorship from Guangming Sheng, a PhD researcher in Prof. Wu’s group and a core contributor to the VeRL framework.