Timing Analysis of CNN Inference on GPUs
- Title
- Timing Analysis of CNN Inference on GPUs
- Authors
- 현봉준
- Date Issued
- 2018
- Publisher
- 포항공과대학교
- Abstract
- Recent real-time systems such as autonomous cars and robots use convolutional neural networks (CNNs) on GPUs for image classification and pedestrian detection. Designing CNN models on a GPU to satisfy real-time constraints is important for the systems because failing to meet their time constraints can cause a crucial disaster such as crashing into pedestrians. However, existing timing analyses for GPU applications require PTX codes of the applications, which requires time-consuming labor to analyze software and hardware whenever CNN applications are modified. This work proposes a timing analysis that predicts the average inference time and Worst-case Execution Time (WCET) analyses of CNN applications on various GPUs without PTX codes and advises programmers on features of CNN models such as the numbers and sizes of layers and filters. This work analyzes an execution model of GPUs reflecting GPU processor contention and memory contention delays and simplifies the model into one equation exploiting common features of CNN applications. The equation only uses the number of computation and memory instructions that are inferred from a given CNN model and architectural constants from profiling results of an application written with CNN framework on a GPU. This work applies the proposed equation for three CNN applications on two different GPUs, and achieves 12.56% root mean square error for average inference time while always satisfying WCET results.
- URI
- http://postech.dcollection.net/common/orgView/200000115834
https://oasis.postech.ac.kr/handle/2014.oak/93422
- Article Type
- Thesis
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.