Open Access System for Information Sharing

Login Library

 

Thesis
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

A Fast, Detailed Simulation Methodology for Designing AI Accelerators

Title
A Fast, Detailed Simulation Methodology for Designing AI Accelerators
Authors
양원혁
Date Issued
2024
Publisher
포항공과대학교
Abstract
As DNNs are widely adopted in various application domains while demanding increasingly higher compute and memory requirements, designing efficient and performant NPUs (Neural Processing Units) is becoming more important. However, existing architectural NPU simulators lack support for high-speed sim- ulation, multi-core modeling, multi-tenant scenarios, detailed DRAM/NoC mod- eling, integrated AI compilers, and/or different deep learning frameworks. This work proposes new simulation methodologies called ONNXim and PyTorchSim to address these limitations. This work propose ONNXim, a fast, cycle-level NPU simulator that supports multi-core NPUs, multi-tenancy, and detailed modeling of the shared DRAM and NoC resources to overcome the limitations of existing simulators. We leverage ONNX (Open Neural Network Exchange) [1], an open standard format for de- scribing deep learning models implemented in different frameworks (e.g., PyTorch and TensorFlow) because it is currently one of the most widely used formats for DNN model conversion. For example, ONNX is the recommended input format for TensorRT, which optimizes inference for NVIDIA GPUs [2]. By using ONNX graphs as the in- put format, our simulator can easily run different DNNs implemented in various frameworks. ONNXim is recognized as an execution provider by the ONNX run- time, similar to devices such as CPUs and GPUs, to exploit its graph optimization flow. It currently supports commonly known operation fusions and can be easily extended to study the impact of various optimization techniques. PyTorchSim is a new frontend of ONNXim and uses the same NPU core modeling. PyTorchSim extends PyTorch’s compiler and it defines a new IR called tile operation graph IR for NPUs, which allows the NPU simulator to execute the optimized IRs without any code changes to the DNN model. PyTorchSim takes advantage of the deterministic computation time charac- teristics of NPUs by implementing systolic arrays and vector units that wait a pre-calculated amount of execution time without modeling them in detail. The precalculated execution time is obtained by simulating the generated RISC-V code once in advance on the Gem5 simulator so that it can be used as a fixed value for subsequent NPU simulations. This enables fast simulation speeds. To validate the generated code, the code was executed using the RISC-V Spike sim- ulator and the resulting values were compared to verify that the values were correct. Consequently, ONNXim is significantly faster than existing simulators (e.g., by up to 384× over Accel-sim) and enables various case studies, such as multi- tenant NPUs, that were previously impractical due to slow speed and/or lack of functionalities. In the case of PyTorchSim, it is still under development and has been implemented to generate code for Elementwise and Reduce operations. When simulating the code generated by PyTorchSim, it has shown simulation speeds up to 116 times faster than Accel-Sim.
URI
http://postech.dcollection.net/common/orgView/200000806172
https://oasis.postech.ac.kr/handle/2014.oak/124033
Article Type
Thesis
Files in This Item:
There are no files associated with this item.

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse