Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 3. Theses_Ph.D.

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

File System Virtualization and Buffer Cache Management in Personal Storage Servers

Title: File System Virtualization and Buffer Cache Management in Personal Storage Servers

Authors: 이우중

Date Issued: 2011

Publisher: 포항공과대학교

Abstract: This thesis studies a file system virtualization technique for unified data management on personal devices. An efficient buffer cache management algorithm is also presented for performance enhancement of the proposed system. In Part I, we introduce a self-managed distributed file system, named PosCFS, which has been proposed for file system virtualization on Device ensembles defined as cooperative networks of computing devices in personal area network (PAN). As portable devices are becoming more available and more diverse, people carry number of portable devices such as smart phones, laptops, MP3 players and digital cameras at the same time. In this environment, accessing and managing personal digital contents scattered on each device is really terrible business to users. Every devices has own specialized network or restricted I/O interface. It is not trivially solvable because of not only the heterogeniety of the underlying networks and network file systems on operating systems of the devices, but also the complexity of configurations of them. The rapid growth of personal content accompanied by advances in storage technologies also makes the management process difficult. To solve these challenges, we have proposed the PosCFS file system. All the storages in the network could be integrated automatically into a virtual storage by the file system. The file system provides per-user global namespace (virtual directories) for managing and accessing data stored on a virtual storage constructed from physical storage spaces detached to device ensembles. The file system was implemented by a peer-to-peer model and using the UPnP protocol to automatically build up a virtual space over all personal devices in the network, and also by using WebDAV for file I/O.In Part II, we present a second-level buffer cache algorithm for performance enhancement of the proposed file system. Multi-level buffer cache hierarchy is commonly deployed on network file or storage systems. Several studies showed that the I/O access pattern on the second-level buffer cache of file servers or storage controllers, whose reuse distance distribution is hill-shaped, differs from that of the upper-level. This implies that two consecutive accesses to a certain data block have a relatively long temporal distance due to upstream caches while exhibiting weaker temporal locality. The reason is that only missed requests from the upper-level are delivered to the lower-level cache. The Adaptive Replacement Cache (ARC) proposed by IBM dynamically balances recency and frequency using two Least-Recently-Used (LRU) queues in response to changing access patterns. Due to its simplicity, it has low computational overhead compared to other algorithms while performing well across varied workloads. On a second-level cache, however, ARC cannot perform efficiently when the cache size is equal to or smaller than the first-level one. If the reuse distance of most I/O accesses on the second-level cache exceeds its cache size, the recency queue of ARC would not contribute to the cache hits any more. Furthermore, cache hits near the LRU position of the recency queue causes shrinking of the frequency queue as well. In order to solve the problem, we propose a reuse-distance aware block replacement algorithm based on ARC in which the reuse distance pattern is considered by designing the recency queue as a sliding window over a long history buffer for I/O requests. In the proposed algorithm, only blocks within the range of the window are maintained by the cache and the window position is dynamically determined by speculating the minimal reuse distance of a workload. Experimental results show that the proposed algorithm outperforms traditional replacement methods for second-level buffer caches, including Multi-Queue (MQ) as well as ARC with low and constant operating cost. The results also demonstrate that the proposed method efficiently provides a certain degree of exclusive caching like X-RAY without any semantic information of upper-level file systems.
본 논문에서는 데스크탑, 노트북, 스마트 폰 등의 개인용 단말 상 통합 데이터 관리를 지원하기 위한 파일 시스템 가상화 기법과, 제안된 파일 시스템의 성능 개선에 적용 가능한 2차 버퍼 캐시 관리 알고리즘을 소개한다.1부에서는 PosCFS 자가 관리 분산 파일 시스템을 소개한다. 제안된 파일 시스템은 디바이스 앙상블(Device ensemble)로 정의되는 개인용 컴퓨터 단말 장치 간 협력 네트워크에서 파일 시스템 가상화를 지원한다. 개인용 휴대 장치가 다양화 됨에 따라, 사용자는 스마트 폰, 노트북, mp3 플레이어, 디지털 카메라 등 다양한 장치를 동시에 휴대하며, 이러한 환경에서 개인 데이터는 다양한 장치에 분산 저장되므로, 데이터 사용 및 관리가 어렵게 된다. 특히 장치들의 서로 다른 운영체제 및 특화된 네트워크 환경(Bluetooth, WiFi, UWB, USB 등)은 사용자의 복잡한 설정 과정을 요구하므로, 장치 간의 데이터 교환 및 동기화를 어렵게 만든다. 스토리지 기술의 발전과 더불어 이루어지고 있는 개인 데이터의 폭발적인 증가 또한 데이터 관리를 어렵게 만드는 요인 중 하나이다. 본 논문에서 제안하는 PosCFS 분산 파일 시스템은 이러한 문제를 해결한다. PosCFS는 개인 네트워크(Personal Area Network
PAN) 상에서 동적으로 모든 장치들의 물리적 스토리지를 하나의 가상 스토리지로 통합하여 사용자 및 응용프로그램에게 제공하며, 파일의 메타데이터를 기준으로 생성되는 가상 디렉토리를 스토리지 내의 파일 접근 주소로 제공함으로써, 데이터 접근성(data accessibility)을 개선한다. 플랫폼 중립성을 보장하기 위하여, 네트워크 기반의 스토리지 입출력은 WebDAV 프로토콜을 이용하여 정의되었으며, 파일 시스템 자동 구성과 가상 디렉토리 관리를 위한 확장 오퍼레이션은 UPnP 프로토콜을 이용하여 피어-투-피어 방식으로 구현하였다.2부에서는 제안된 파일 시스템의 성능 개선에 활용 가능한 2차 버퍼 캐시 알고리즘을 설명한다.네트워크 기반의 파일 및 스토리지 시스템에서는 일반적으로 클라이언트와 서버에서 별도의 버퍼 캐시가 관리되며, 파일 및 스토리지 서버 상 2차 버퍼 캐시의 입출력 패턴은 클라이언트 상의 1차 캐시와는 다른 특성을 보인다. 데이터 블록에 대한 재사용 거리(Reuse distance)는 특정 블록에 대한 읽기 요청이 처리된 이후, 동일한 블록에 대한 재 접근이 일어나기 전까지 발생한 입출력 접근의 개수로 정의 되며, 1차 캐시 의 입출력 패턴과는 달리 긴 재사용 거리를 가지는 블록 접근의 개수가 짧은 거리를 가지는 블록 접근의 수보다 많은, 언덕 모양(hill-shaped)의 분포를 가진다 -- 즉 상대적으로 약한 시간적 지역성(Temporal locality)을 가진다. 이는 상위 캐시에서 읽기 요청이 실패한 경우에만 하위 캐시로 읽기 요청이 전달되기 때문에 발생하는 현상으로써, Y. Zhou 등의 연구에 의해서 밝혀진 특성이다. ARC(Adaptive Replacement Cache) 알고리즘은 IBM의 N. Meggido에 의해 제안된 알고리즘으로써, 입출력 패턴의 최근성(Recency)과 반복성(Frequency)의 변화에 대응할 수 있도록 고안되었다. ARC 알고리즘은 두 개의 LRU(Least-Recently-Used) 큐를 이용하여 -- 즉, 최근성 큐(Recency queue)와 반복성 큐(Frequency queue) -- 입출력 접근이 높은 최근성을 보일 경우, 최근성 큐의 크기를 크게 유지하고, 반대로 높은 반복성을 보일 경우, 최근성 큐의 크기를 줄이고 반복성 큐를 크게 유지할 수 있도록 조절한다. ARC는 알고리즘의 간결함으로 인하여 낮은 관리 오버헤드를 가질 뿐 아니라, 입출력 워크로드 패턴의 변화에 대응함으로써, 대부분의 온-오프라인 캐시 알고리즘 보다 높은 성능을 보인다. 그러나 2차 캐시에 적용 시, 앞에서 소개한 워크로드 특성을 고려하지 못하므로 상대적으로 효율성이 떨어지게 되며, 특히 2차 캐시의 크기가 1차 캐시의 크기와 같거나 혹은 작게 유지되는 경우 급격히 효율이 떨어지게 된다. 그 이유는 2차 캐시로 전달되는 대부분의 입출력 접근의 재사용 거리가 2차 버퍼 캐시 크기보다 커지게 되기 때문이다. 추가적으로, 최근성 큐의 LRU 위치에서 발생하게 되는 빈번한 캐시 히트(Cache hit)는 반복성 큐의 크기를 작게 만들면서 워크로드의 반복성을 효율적으로 잡아 낼 수 없도록 만든다.이러한 문제를 해결하기 위하여, 본 논문에서는 ARC 상 재사용 거리를 고려한 블록 관리 방법을 제안한다. 제안하는 알고리즘은 2차 캐시에서의 시간적 지역성 특성에 대응하기 위하여 입출력 로깅(logging)을 위한 충분히 큰 크기의 히스토리 버퍼(history buffer)를 구성하고, 최근성 큐를 히스토리 버퍼상의 슬라이딩 윈도우(sliding window)로 정의한다. 워크로드의 최소 재사용 거리(minimal reuse distance)는 알고리즘에 의해 지속적으로 추적되며, 추적된 값을 기준으로 윈도우의 위치가 결정된다. 마지막으로, 수학적 분석 및 시뮬레이션을 통한 성능 평가를 통하여 제안하는 알고리즘이 낮은 관리 오버헤드를 가지면서도, 기존의 2차 버퍼 캐시 알고리즘인 MQ(Multi-queue) 및 ARC 알고리즘 보다 우수한 성능을 보이며, X-RAY 등의 배타적 캐시 알고리즘 (Exclusive Caching) 수준의 성능을 보이고 있음을 입증한다.

URI: http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000000895999
https://oasis.postech.ac.kr/handle/2014.oak/980

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse