Open Access System for Information Sharing

Login Library

 

Article
Cited 1 time in webofscience Cited 0 time in scopus
Metadata Downloads
Full metadata record
Files in This Item:
There are no files associated with this item.
DC FieldValueLanguage
dc.contributor.authorMoon, Seunghyun-
dc.contributor.authorMun, Han-Gyeol-
dc.contributor.authorSon, Hyunwoo-
dc.contributor.authorSim, Jae-Yoon-
dc.date.accessioned2024-02-19T06:20:07Z-
dc.date.available2024-02-19T06:20:07Z-
dc.date.created2024-02-19-
dc.date.issued2024-01-
dc.identifier.issn0018-9200-
dc.identifier.urihttps://oasis.postech.ac.kr/handle/2014.oak/120276-
dc.description.abstractVarious pruning and quantization heuristics have been proposed to compress recent deep-learning models. However, the rapid development of new optimization techniques makes it difficult for domain-specific accelerators to efficiently process various models showing irregularly stored parameters or nonlinear quantization. This article presents a scalable-precision deep-learning accelerator that supports multiply-and-accumulate operations (MACs) with two arbitrarily quantized data sequences. The proposed accelerator includes three main features. To minimize logic overhead when processing arbitrarily quantized 8-bit precision data, a lookup table (LUT)-based runtime reconfiguration is proposed. The use of bit-serial execution without unnecessary computations enables the multiplication of data with non-equal precision while minimizing logic and latency waste. Furthermore, two distinct data formats, raw and run-length compressed, are supported by a zero-eliminator (ZE) and runtime-density detector (RDD) that are compatible with both formats, delivering enhanced storage and performance. For a precision range of 1-8 bit and fixed sparsity of 30%, the accelerator implemented in 28 nm low-power (LP) CMOS shows a peak performance of 0.87-5.55 TOPS and a power efficiency of 15.1-95.9 TOPS/W. The accelerator supports processing with arbitrary quantization (AQ) while achieving state-of-the-art (SOTA) power efficiency.-
dc.languageEnglish-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.isPartOfIEEE Journal of Solid-State Circuits-
dc.titleMultipurpose Deep-Learning Accelerator for Arbitrary Quantization With Reduction of Storage, Logic, and Latency Waste-
dc.typeArticle-
dc.identifier.doi10.1109/jssc.2023.3312615-
dc.type.rimsART-
dc.identifier.bibliographicCitationIEEE Journal of Solid-State Circuits, v.59, no.1, pp.143 - 156-
dc.identifier.wosid001088286600001-
dc.citation.endPage156-
dc.citation.number1-
dc.citation.startPage143-
dc.citation.titleIEEE Journal of Solid-State Circuits-
dc.citation.volume59-
dc.contributor.affiliatedAuthorSim, Jae-Yoon-
dc.identifier.scopusid2-s2.0-85174825344-
dc.description.journalClass1-
dc.description.journalClass1-
dc.description.isOpenAccessN-
dc.type.docTypeArticle-
dc.subject.keywordAuthorArbitrary quantization (AQ)-
dc.subject.keywordAuthorbit-serial processing-
dc.subject.keywordAuthordeep neural network (DNN) accelerator-
dc.subject.keywordAuthorlookup table (LUT)-
dc.subject.keywordAuthorprecision scalability-
dc.subject.keywordAuthorrun-length compression (RLC)-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaEngineering-

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher

심재윤SIM, JAE YOON
Dept of Electrical Enginrg
Read more

Views & Downloads

Browse