Open Access System for Information Sharing

Login Library

 

Thesis
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads
Full metadata record
Files in This Item:
There are no files associated with this item.
DC FieldValueLanguage
dc.contributor.author조현욱-
dc.date.accessioned2023-08-31T16:34:01Z-
dc.date.available2023-08-31T16:34:01Z-
dc.date.issued2023-
dc.identifier.otherOAK-2015-10164-
dc.identifier.urihttp://postech.dcollection.net/common/orgView/200000660623ko_KR
dc.identifier.urihttps://oasis.postech.ac.kr/handle/2014.oak/118361-
dc.descriptionMaster-
dc.description.abstractIn training DNNs, memory/communication-bound operations can account for a significant portion of runtime due to limited the off-chip bandwidth (BW) of GPUs. To address the challenge, I propose a novel memory access-triggered near-data processing (mtNDP) architecture. With mtNDP, normal memory accesses also serve as implicit NDP requests to enable NDP without any changes in the core ISA/microarchitecture, core-side SW, or memory protocol, overcoming the practicality limitations of prior approaches. In addition, mtNDP enables on-the-fly NDP where the data already supplied in normal memory access packets for compute-bound operations are also simultaneously used for NDP: thus, mtNDP can reduce memory traffic. Moreover, by overlapping NDP kernels with compute-bound kernels, memory BW underutilized by GPU cores can be used by mtNDP units to improve performance, even if total memory BW is not increased. The mtNDP units can be deployed to heterogeneous memory devices in a system. First, I deploy them near GPU’s memory controllers. With on-the-fly mtNDP, compute-bound kernels can be overlapped with memory-bound kernels, even if they have dependencies, to achieve significant speedups. Secondly, my NDP units can be deployed in memory expanders that are connected to multiple GPUs to create an NDP-enabled memory eXpander Network (NDPXNet). It can entirely offload gradient reduction and the optimizer in data-parallel training, achieving additional speedups while eliminating redundancy in memory usage and optimizer execution. To the best of my knowledge, this work is the first to 1) enable NDP without core HW/SW changes, 2) overlap the execution of dependent layers and 3) offload both memory- and communication-bound operations from GPUs in DNN training. Through deep learning compiler support, NDP kernels can be generated automatically without any model code modification. The mtNDP can improve training throughput by up to 2.83× and reduce energy by up to 41.4%.-
dc.languageeng-
dc.publisher포항공과대학교-
dc.titleMemory Access-Triggered Near-Data Processing for Accelerating DNN Training on GPUs-
dc.title.alternative그래픽처리장치에서의 심층 신경망 학습 가속을 위한 메모리 접근-동작 근접-메모리 처리-
dc.typeThesis-
dc.contributor.college컴퓨터공학과-
dc.date.degree2023- 2-

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse