Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 4. Theses_Master

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

신경망의 양자화 오차를 최소화 하기 위한 데이터 분포의 극단치 처리

Title: 신경망의 양자화 오차를 최소화 하기 위한 데이터 분포의 극단치 처리

Authors: 장영상

Date Issued: 2023

Publisher: 포항공과대학교

Abstract: In this thesis, we investigate the use of range clamping to reduce quantization error in deep neural network models. Our findings show that range clamping can be effective in reducing quantization error, and based on this, we present two quantization algorithms that mitigate the degradation of quantized models through the clamping of value ranges. These algorithms aim to minimize the impact of quantization on model performance by carefully selecting and restricting the range of values used in the quantization process. In Chapter 2, we analytically demonstrate that the quantization error of a clamped normal distribution is always smaller than that of a standard normal distribution using a method inspired by ACIQ. This shows the effectiveness of range clamping in reducing quantization error. In Chapter 3, we propose an improved version of the existing mixed precision based outlier-aware quantization (OLQuant) method. We introduce an additional parameter for range clamping, which leads to increased accuracy. We also reformulate OLQuant as a differentiable operation, enabling end-to-end training through backpropagation using bitOPs-based performance loss. This allows for the automatic determination of the optimal outlier ratio for each layer, which was previously a hyperparameter in OLQuant. Our approach enables the exploration of the Pareto-optimal curve between accuracy and BitOPs, which was not possible with other existing quantization algorithms. In Chapter 4, we propose Saturating Nonlinearity (SatNL) to increase the robustness of neural network models in quantization by clamping the range of full precision weights. Through experiments, we show that PTQ error can be reduced using SatNL and that networks can be made more robust to changes in quantization parameters (quantization scale, bit width).
본 연구에서는 범위를 자르는 것이 어떻게 양자화 오차를 줄일 수 있는 가에 대한 분석을 진행하고 이를 통해 양자화 오차를 줄이는 2가지 알고리즘을 제안한다. 챕터 1에서는 범위를 자르는 것이 어떻게 양자화 오차를 줄이는 지에 대한 해석을 위해 ACIQ에서 영향을 받은 방식을 통해 일반적인 가우시안 분포의 양자화 오차보다 잘린 가우시안 분포의 양자화 오차가 항상 적다는 것을 증명한다. 이는 양자화 대상값의 범위를 자르는 것이 양자화 이후에 발생하는 오류를 줄인다는 것을 보인다. 이를 토대로 챕터 2에서는 기존의 혼합 정밀도 기반의 양자화 방법인 극단치 인식 양자화(OLQuant)를 향상시켰다. 먼저 앞에서 증명한 바와 같이 범위 자르기를 통해 양자화 오차를 줄이기 위한 추가적인 파라미터를 도입하였고 이를 통해 정확도를 더 증가시킬 수 있다는 것을 보였다. 이에 더해 OLQuant 자체를 미분가능한 연산으로 바꾸어 bitOPs 기반의 성능 손실을 통한 end-to-end 역전파로 학습 가능케 했다. 기존의 OLQuant에서 하이퍼 파라미터였던 각 레이어에 대한 최적 극단치 비율을 네트워크 전체의 관점에서 자동적으로 찾을 수 있게 되었고, 기존의 연산보다 더 적은 BitOPs로 더 높은 정확도를 기록할 수 있었다. 이 아이디어는 기존의 다른 양자화 알고리즘이 탐색할 수 없었던 정확도와 BitOPs간의 파레토 최적 곡선을 탐색할 수 있도록 했다. 챕터 3에서는 최대 정밀도의 범위를 제한하여 뉴럴 네트워크 자체의 양자화에서의 강인함을 증가시키는 방법인 Saturating Nonlinearity를 제안한다. 이를 통해서 양자화로 인한 PTQ 오차를 줄이는 것은 물론, 최대 정밀도 뉴럴 네트워크가 양자화 매개변수 (quantization scale, bit-width)가 바뀌는 것에 대한 강인함을 내재화 할 수 있다는 것을 실험으로 보인다.

URI: http://postech.dcollection.net/common/orgView/200000662236
https://oasis.postech.ac.kr/handle/2014.oak/118345

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse