Open Access System for Information Sharing

Login Library

 

Thesis
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

대표 스톨 이벤트 스택을 이용한 프로세서 성능 모델링

Title
대표 스톨 이벤트 스택을 이용한 프로세서 성능 모델링
Authors
장한휘
Date Issued
2019
Publisher
포항공과대학교
Abstract
Processor microarchitectures have been evolving and getting sophisticated to meet increasing compute demands. Processor designs have moved from simple, in-order core to complex, out-of-order core to exploit hidden instruction-level parallelism of workloads. Besides, the advent of multi-core and heterogeneous processors to overcome power-wall and the end of Dennard scaling even further increase the complexity of processor designs. Designing a high-performance processor is an extremely challenging task. To find an optimal design, computer architects need to evaluate various designs. Because each design's overall performance is determined by many performance-critical events, which interact inside the processor, architects rely on accurate but slow cycle-level simulators to evaluate various processor designs. From a series of slow simulations, they recognize the critical bottlenecks of the current design and choose the next one that resolves the bottlenecks, then repeat the process until they find the optimal one. However, the simulations are too slow so architects cannot evaluate whole design points within a development time. To minimize the simulation overhead, they can use performance models to reduce the number of simulations by predicting the performance of designs. Nonetheless, the existing method is still slow or inaccurate, so architects end up with a sub-optimal design. In this thesis, we propose a fast and accurate performance methodology, Representative Stall-Event Stacks. The key idea is considering all key performance bottlenecks of the current design including the hidden secondary bottlenecks as well as the critical one to predict the performance of design changes. Then, based on the model, we introduce two design exploration methods, RpStacks and RpStacks-MT, for out-of-order superscalar processors and multi-core processors, respectively. For exploring 1,000 design points, RpStacks achieves 26 times speedup compared to a cycle-level simulator. RpStacks-MT is 88 times faster than the simulator on multi-core design space exploration for 10,000 design points.
URI
http://postech.dcollection.net/common/orgView/200000175795
https://oasis.postech.ac.kr/handle/2014.oak/111840
Article Type
Thesis
Files in This Item:
There are no files associated with this item.

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse