Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 3. Theses_Ph.D.

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Large-scale Matrix and Tensor Completion based on Out-of-core Approaches

Title: Large-scale Matrix and Tensor Completion based on Out-of-core Approaches

Authors: 이동하

Date Issued: 2020

Publisher: 포항공과대학교

Abstract: Matrix or tensor completion, which aims to accurately predict unobserved matrix or tensor entries, has gained much attention because most matrices or tensors obtained from real-world applications are partially observed or have a large portion of missing entries. These tasks are mainly solved by factorization methods, which factorize sparse input data into multiple low-dimensional matrices considering only observed entries. However, existing factorization methods are not easy to be directly executed on large-scale matrices or tensors using a single machine, due to their high computational cost and memory bottleneck. Recent studies reported that most datasets can be actually stored in disks of a single off-the-shelf workstation, and using out-of-core (or disk-based) methods is much cheaper and even faster than using distributed methods. Out-of-core methods are also useful for machine learning with a large cluster; they can easily utilize multiple machines for model validation by independently building a model with different hyper-parameter values on each machine. This paper proposes novel out-of-core methods for large-scale matrix and tensor completion, specifically designed to fully utilize fast storage devices. Our approach aims to optimize the overall factorization process from the perspective of both algorithm and computer system: 1) We derive computationally-cheap update rules and cache intermediate data in order to reduce the computational cost. 2) We also design file structures for out-of-core execution and rearrange model updates, so that the amount of disk-I/O is minimized and the locality of cache/memory access is improved. Extensive experiments demonstrate that our proposed methods are much faster and more scalable than competing in-memory methods and out-of-core methods on real-world big matrices and tensors that do not fit into the memory. In addition, our methods beat distributed methods running on a large cluster, by the help of distributed validation.

URI: http://postech.dcollection.net/common/orgView/200000287371
https://oasis.postech.ac.kr/handle/2014.oak/111633

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse