Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 4. Theses_Master

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

CUDA를 이용한 GPU 상에서의 타원곡선 연산 구현

Title: CUDA를 이용한 GPU 상에서의 타원곡선 연산 구현

Authors: 양연형

Date Issued: 2010

Publisher: 포항공과대학교

Abstract: GPU(Graphics Processing Unit)의 성능이 점차 좋아지면서, 급기야는 초당 수행되는 인스트럭션의 수만을 보았을 때 현재의 GPU는 CPU의 성능을 뛰어넘게 되었다. 현대의 CPU들은 주로 하나, 또는 수 개의 작업에 적합하도록 latency를 줄이는 방향으로 발전하였으나, GPU들은 많은 수의 동시에 수행되는 작업에 적합하도록 throughput을 높이는 방향으로 발전하였기 때문이다. 오늘날의 CPU들이 대개의 경우 한 개, 또는 수 개(최근에 판매되는 CPU의 경우 4개)의 프로세서 코어를 갖는데 반하여, GPU들은 수십 개, 또는 수백 개의 프로세서 코어을 가지고 있다. 그러나, GPU들은 상대적으로 캐쉬 메커니즘등이 취약하여 단일 쓰레드로 동작하는 작업의 경우에는 그 선능이 떨어진다. 따라서 GPU의 다중-코어 구조의 성능을 제대로 활용하기 위해서는, 병렬 처리가 가능한 알고리즘이 사용되어야 한다. 이는 임의의 작업을 GPU에서 효율적으로 동작시키기 위해서는 특별히 최적화를 한 알고리즘을 사용하여야 한다는 의미이고, 이는 상대적으로 이루기 쉽지 않은 목표이다.이 논문에서는 CUDA 환경을 통한 GPU를 사용하여 binary field 상에서 정의되 는 타원곡선 연산의 구현을 기술하고 있다. 이와 관련해서 몇가지 이전 연구들이 있 었으나, 이들은 모두 prime field의 연산을 사용하고 있었다. 이들 연구 중에서 가장 좋은 성능을 보이는 [1]의 결과보다 더 나은 성능을, 이 논문에서는 binary field를 사용한 타원곡선에서 얻을 수 있음을 보이고 있다.이 논문에서 기술하고 있는 구현에서는, NVIDIA 사의 GeForce 9600 GT 그래픽 카드를 사용하였고, 초당 약 5400 번의 scalar multiplication을 수행할 수 있다. 구체적인 parameter는 다음과 같다: 233-bit binary field, 160-bit scalar, 8 개의 multiprocessor. 이전 연구와의 비교를 위하여 적절한 보정을 가하면, 이 논문에서 기술하고 있는 타원곡선 연산이 [1]의 결과보다 좋은 throughput을 보이고 있음을 알 수 있다.Binary field의 경우, GPU와 같이 부동 소수점 연산에 강점을 보이는 디바이스에서는 그다지 나은 성능 보이지 않을 것이라는 것이 이전의 몇가지 논문에서 보이는 의견이었으나, 이 논문에서 직접 구현을 통하여 그렇지 않음을 보인 것도 이 논문이 가지는 의미 중 하나가 된다.
With increasing performance of GPU(Graphics Processing Unit), modern GPUs beat modern CPUs with respect to the number instructions per second. While modern CPUs have been developed to be suitable to single or several jobs with decreasing latency, modern GPUs have been developed to be suitable to many concurrent jobs with increasing throughput. Today’s GPUs have usually tens or hundreds of processor cores compared to small number of cores of CPUs up to 4. However, since GPUs have poor cache mechanism installed inside, they show poor performance when used in single-threaded jobs. The real power of GPU’s many-core architecture can only be seen when used in highly parallelized jobs. It means that we need specially tailored algorithms in order to achieve maximum possible performance of GPUs, and which is challenging.In this paper, I propose a novel implementation of elliptic curve arithmetic over binary field on a CUDA-enabled GPU. Though there are several previous results about implementation of elliptic curve arithmetic, all of them are using prime field as underlyin defining field of an ellipt curve. Hence, it seems meaningful to implement elliptic curve arithmetic over binary field.My implementation of elliptic curve arithmetic over binary field shows around 5400 scalar multiplications per second using GeForce 9600 GT device, where the parameter setting is as follows: 233-bit binary field, 160-bit scalar and 8 multiprocessors on the device. With some normalization of previous results, it can be shown that my implementation is slightly faster than the previouslymost efficient elliptic curve arithmetic proposed in [1]. While some of people believe that binary field is not suitable for computation oriented device such as GPU, my result shows that binary field is not so bad. Even more, my result is faster than the prime field implementation.

URI: http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000000563491
https://oasis.postech.ac.kr/handle/2014.oak/656

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse