Open Access System for Information Sharing

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Fast Performance Prediction and Expansion of 3D Parallelism for Distributed DNN Training

Title: Fast Performance Prediction and Expansion of 3D Parallelism for Distributed DNN Training

Abstract: Training large-scale DNN models requires parallel distributed training using hyper-scale systems. To make the best use of the numerous accelerators, it is essential to intelligently combine different parallelization schemes. However, as the size of DNN models increases, the possible combinations of schemes become enormous, and consequently, finding the optimal parallel plan becomes exceedingly expensive and practically unfeasible. In this paper, I introduce a novel cost model, the Markovian Performance Estimator (MPE). This model provides affordable estimates of the throughput of various parallel plans, promoting efficient and fast searches for the ideal parallel plan, even when resources are limited. Significantly, this work is pioneering in explaining the expensive nature of searching for an optimal plan and addressing it using intuitive performance estimations based on real device evaluations. The experiments demonstrate the effectiveness of the MPE, revealing that it accelerates the optimization process up to 126x faster (36.4 on average) than the existing state-of-the-art baseline, Alpa. I also propose a new search space that combines 3D parallel and offloading to support LLMs larger than 3D parallelism-only, along with the low communication cost of offloading, as a future task.

URI: http://postech.dcollection.net/common/orgView/200000732943
https://oasis.postech.ac.kr/handle/2014.oak/123368

qr_code