Abstract

Keywords

Large Language Model, Optimizer, Training, Heterogeneous memory, Distributed training.