Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 4. Theses_Master

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

A Fast, Detailed Simulation Methodology for Designing AI Accelerators

Title: A Fast, Detailed Simulation Methodology for Designing AI Accelerators

Authors: 양원혁

Date Issued: 2024

Publisher: 포항공과대학교

Abstract: As DNNs are widely adopted in various application domains while demanding increasingly higher compute and memory requirements, designing efficient and performant NPUs (Neural Processing Units) is becoming more important. However, existing architectural NPU simulators lack support for high-speed sim- ulation, multi-core modeling, multi-tenant scenarios, detailed DRAM/NoC mod- eling, integrated AI compilers, and/or different deep learning frameworks. This work proposes new simulation methodologies called ONNXim and PyTorchSim to address these limitations. This work propose ONNXim, a fast, cycle-level NPU simulator that supports multi-core NPUs, multi-tenancy, and detailed modeling of the shared DRAM and NoC resources to overcome the limitations of existing simulators. We leverage ONNX (Open Neural Network Exchange) [1], an open standard format for de- scribing deep learning models implemented in different frameworks (e.g., PyTorch and TensorFlow) because it is currently one of the most widely used formats for DNN model conversion. For example, ONNX is the recommended input format for TensorRT, which optimizes inference for NVIDIA GPUs [2]. By using ONNX graphs as the in- put format, our simulator can easily run different DNNs implemented in various frameworks. ONNXim is recognized as an execution provider by the ONNX run- time, similar to devices such as CPUs and GPUs, to exploit its graph optimization flow. It currently supports commonly known operation fusions and can be easily extended to study the impact of various optimization techniques. PyTorchSim is a new frontend of ONNXim and uses the same NPU core modeling. PyTorchSim extends PyTorch’s compiler and it defines a new IR called tile operation graph IR for NPUs, which allows the NPU simulator to execute the optimized IRs without any code changes to the DNN model. PyTorchSim takes advantage of the deterministic computation time charac- teristics of NPUs by implementing systolic arrays and vector units that wait a pre-calculated amount of execution time without modeling them in detail. The precalculated execution time is obtained by simulating the generated RISC-V code once in advance on the Gem5 simulator so that it can be used as a fixed value for subsequent NPU simulations. This enables fast simulation speeds. To validate the generated code, the code was executed using the RISC-V Spike sim- ulator and the resulting values were compared to verify that the values were correct. Consequently, ONNXim is significantly faster than existing simulators (e.g., by up to 384× over Accel-sim) and enables various case studies, such as multi- tenant NPUs, that were previously impractical due to slow speed and/or lack of functionalities. In the case of PyTorchSim, it is still under development and has been implemented to generate code for Elementwise and Reduce operations. When simulating the code generated by PyTorchSim, it has shown simulation speeds up to 116 times faster than Accel-Sim.

URI: http://postech.dcollection.net/common/orgView/200000806172
https://oasis.postech.ac.kr/handle/2014.oak/124033

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse