Open Access System for Information Sharing

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Timing Analysis of CNN Inference on GPUs

Abstract: Recent real-time systems such as autonomous cars and robots use convolutional neural networks (CNNs) on GPUs for image classification and pedestrian detection. Designing CNN models on a GPU to satisfy real-time constraints is important for the systems because failing to meet their time constraints can cause a crucial disaster such as crashing into pedestrians. However, existing timing analyses for GPU applications require PTX codes of the applications, which requires time-consuming labor to analyze software and hardware whenever CNN applications are modified. This work proposes a timing analysis that predicts the average inference time and Worst-case Execution Time (WCET) analyses of CNN applications on various GPUs without PTX codes and advises programmers on features of CNN models such as the numbers and sizes of layers and filters. This work analyzes an execution model of GPUs reflecting GPU processor contention and memory contention delays and simplifies the model into one equation exploiting common features of CNN applications. The equation only uses the number of computation and memory instructions that are inferred from a given CNN model and architectural constants from profiling results of an application written with CNN framework on a GPU. This work applies the proposed equation for three CNN applications on two different GPUs, and achieves 12.56% root mean square error for average inference time while always satisfying WCET results.

URI: http://postech.dcollection.net/common/orgView/200000115834
https://oasis.postech.ac.kr/handle/2014.oak/93422

qr_code