Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 3. Theses_Ph.D.

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

주 어레이 저장장치 상 저하된 읽기 및 복구 프로세스를 위한 빠른 복원 기법

Title: 주 어레이 저장장치 상 저하된 읽기 및 복구 프로세스를 위한 빠른 복원 기법

Authors: 성백재

Date Issued: 2017

Publisher: 포항공과대학교

Abstract: 본 연구는 RAID 시스템에서 발생하는 저하된 읽기와 복구 프로세스의 성능을 높이기 위한 빠른 데이터 복원 기법을 기술한다. RAID 기술은 성능과 신뢰성을 동시에 지원하기 위하여 디스크 어레이 저장 장치에 널리 활용하고 있다. RAID는 디스크 고장 시, 성능에 중요한 영향을 주는 두 가지 연산, 즉 저하된 읽기/쓰기와 복구 프로세스를 수행한다. 디스크 고장이 발생하면, RAID는 자동적으로 복구 프로세스를 수행한다. 백그라운드 복구 프로세스는 고장난 디스크의 데이터를 복원하여, 교체된 디스크에 데이터를 재구성한다. 이와 동시에, 애플리케이션의 읽기와 쓰기 요청은 RAID에 의해 데이터 복원을 거쳐 처리된다 (즉, 저하된 읽기/쓰기). 고객 서비스 수준 협약 (SLAs) 을 만족해야 한다는 측면에서, 이렇게 읽기/쓰기 가 저하된 상황에서의 성능은 매우 중요하다. 또한, 복구 프로세스의 완료 시간은 저장 장치의 신뢰도에 많은 영향을 준다. 이러한 두 가지 연산은 모두 빠른 데이터 복원 기법을 요구한다. 빠른 데이터 복원을 지원하는 이레이저 코드 중, 로컬 복원 코드 (LRC) 는 저장공간 오버헤드, 고장 저항력, 데이터 복원 시 읽어 드리는 디스크 수 측면에서 가장 우수한 트레이드오프를 제공한다. LRC는 원래 분산 클라우드 저장 장치에서 빠른 데이터 복원을 지원하기 위하여 설계되었다. 즉, 이 환경에서는 네트워크 트래픽이 주요 병목 구간이며, 따라서 LRC는 데이터 복원 시 읽어 드리는 디스크 수를 줄이는데 초점을 맞추고 있다. 하지만, LRC를 주 어레이 저장 장치에 적용하였을 때, 데이터 복원 시 주요 병목 구간은 불균등 디스크 활용에서 발생함을 확인하였다. 즉, 저부하 디스크들은 과부하 디스크들의 병목 현상으로 인하여, 충분히 활용되지 않음에도 불구하고 더 이상 입출력 요청을 받을 수 없는 상태가 발생하였다. LRC는 Maximally Recoverable 프로퍼티를 이루기 위하여 전담 그룹 분할 정책을 채택하며, 이는 불균등 디스크 활용을 유발한다. 본 논문은, 주 어레이 저장 장치에서 빠른 데이터 복원을 지원하기 위하여 분산 복원 코드 (DRC) 를 제안한다. DRC는 불균등 디스크 활용 문제를 해결하기 위하여, 그룹 배합 정책을 기반으로 설계하였다. 현실-세계 워크로드를 사용한 실험에서, DRC-G (글로벌 패리티 순환이 적용된 DRC) 의 저하된 읽기/쓰기 성능은 RAID-6 대비 72% 향상하였으며, LRC 대비 35% 향상하였다. 이 수치는 동일한 신뢰도에서 측정하였다. 또한, DRC-G 의 복구 프로세스 완료 시간은 LRC 대비 52% 단축하였다. 또한, 본 논문은 주어진 이레이저 코드 (즉, RAID-6, LRC, DRC, 그리고 DRC-G) 상황에서, 저하된 읽기 성능과 스토리지 시스템의 신뢰도를 분석하였다. 분석을 통하여, 주어진 스토리지 시스템의 신뢰도를 기준으로 저하된 읽기 성능을 계산 할 수 있다. 4 KB 랜덤 읽기 워크로드를 사용한 저하된 읽기 실험을 통하여, 계산된 저하된 읽기 성능 결과는 실험환경에서 얻어진 결과와 4% 내 오차로 정확하였다.
This thesis studies fast data reconstruction for degraded reads and recovery process in RAID systems. RAID has been widely deployed in disk array storage systems to manage both performance and reliability simultaneously. RAID conducts two performance-critical operations during disk failures known as degraded reads/writes and recovery process. If a disk failure occurs, RAID automatically starts the recovery process. The background recovery process reconstructs data from the failed disk and rebuilds data onto a replacement disk. Simultaneously, RAID serves reads and writes from applications using data reconstruction (i.e., degraded reads/writes). The performance of degraded reads/writes is critical in order to meet stipulations in customer service level agreements (SLAs), and the recovery process affects the reliability of a storage system considerably. Both operations require fast data reconstruction. Among the erasure codes for fast reconstruction, Local Reconstruction Codes (LRC) are known to offer the best (or optimal) trade-off between storage overhead, fault tolerance, and the number of disks involved in reconstruction. Originally, LRC was designed for fast reconstruction in distributed cloud storage systems, in which network traffic is a major bottleneck during reconstruction. Thus, LRC focuses on reducing the number of disks involved in data reconstruction, which reduces network traffic. However, we observe that when LRC is applied to primary array storage systems, a major bottleneck in reconstruction results from uneven disk utilization. In other words, underutilized disks can no longer receive I/O requests as a result of the bottleneck of overloaded disks. Uneven disk utilization in LRC is due to its dedicated group partitioning policy to achieve the Maximally Recoverable property. In this thesis, we present Distributed Reconstruction Codes (DRC) that support fast reconstruction in primary array storage systems. DRC is designed with group shuffling policy to solve the problem of uneven disk utilization. Experiments on real-world workloads show that DRC using global parity rotation (DRC-G) improves degraded performance by as much as 72% compared to RAID-6 and by as much as 35% compared to LRC under the same reliability. In addition, our study shows that DRC-G reduces the recovery process completion time by as much as 52% compared to LRC. Also, we analyze the performance of degraded reads and reliability of storage systems on given erasure codes (i.e., RAID-6, LRC, DRC, and DRC-G). With the analysis, we can calculate the performance of degraded reads of erasure codes under the given reliability of array storage systems. Through the experiments using degraded reads on random reads of 4 KB, we have shown that the calculated performance of degraded reads is accurate under a 4% error against results from our testbed.

URI: http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002326615
https://oasis.postech.ac.kr/handle/2014.oak/93532

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse