Open Access System for Information Sharing

Login Library

Department of Computer Science & Engineering (컴퓨터공학과) 4. Theses_Master

Thesis

Cited 0 time in webofscience

webofscience

Cited 0 time in scopus

scopus

Metadata Downloads

Full metadata record

Files in This Item:: There are no files associated with this item.

DC Field	Value	Language
dc.contributor.author	조현욱	-
dc.date.accessioned	2023-08-31T16:34:01Z	-
dc.date.available	2023-08-31T16:34:01Z	-
dc.date.issued	2023	-
dc.identifier.other	OAK-2015-10164	-
dc.identifier.uri	http://postech.dcollection.net/common/orgView/200000660623	ko_KR
dc.identifier.uri	https://oasis.postech.ac.kr/handle/2014.oak/118361	-
dc.description	Master	-
dc.description.abstract	In training DNNs, memory/communication-bound operations can account for a significant portion of runtime due to limited the off-chip bandwidth (BW) of GPUs. To address the challenge, I propose a novel memory access-triggered near-data processing (mtNDP) architecture. With mtNDP, normal memory accesses also serve as implicit NDP requests to enable NDP without any changes in the core ISA/microarchitecture, core-side SW, or memory protocol, overcoming the practicality limitations of prior approaches. In addition, mtNDP enables on-the-fly NDP where the data already supplied in normal memory access packets for compute-bound operations are also simultaneously used for NDP: thus, mtNDP can reduce memory traffic. Moreover, by overlapping NDP kernels with compute-bound kernels, memory BW underutilized by GPU cores can be used by mtNDP units to improve performance, even if total memory BW is not increased. The mtNDP units can be deployed to heterogeneous memory devices in a system. First, I deploy them near GPU’s memory controllers. With on-the-fly mtNDP, compute-bound kernels can be overlapped with memory-bound kernels, even if they have dependencies, to achieve significant speedups. Secondly, my NDP units can be deployed in memory expanders that are connected to multiple GPUs to create an NDP-enabled memory eXpander Network (NDPXNet). It can entirely offload gradient reduction and the optimizer in data-parallel training, achieving additional speedups while eliminating redundancy in memory usage and optimizer execution. To the best of my knowledge, this work is the first to 1) enable NDP without core HW/SW changes, 2) overlap the execution of dependent layers and 3) offload both memory- and communication-bound operations from GPUs in DNN training. Through deep learning compiler support, NDP kernels can be generated automatically without any model code modification. The mtNDP can improve training throughput by up to 2.83× and reduce energy by up to 41.4%.	-
dc.language	eng	-
dc.publisher	포항공과대학교	-
dc.title	Memory Access-Triggered Near-Data Processing for Accelerating DNN Training on GPUs	-
dc.title.alternative	그래픽처리장치에서의 심층 신경망 학습 가속을 위한 메모리 접근-동작 근접-메모리 처리	-
dc.type	Thesis	-
dc.contributor.college	컴퓨터공학과	-
dc.date.degree	2023- 2	-

Show simple item record

qr_code

트윗하기

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Views & Downloads

OAK

개인정보처리방침 Personal Information Protection Policy

library@postech.ac.kr Tel: 054-279-2548

Copyrights © by 2017 Pohang University of Science ad Technology All right reserved.

Browse

Login Library Help