Open Access System for Information Sharing

Login Library

 

Thesis
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Large-scale Matrix and Tensor Completion based on Out-of-core Approaches

Title
Large-scale Matrix and Tensor Completion based on Out-of-core Approaches
Authors
이동하
Date Issued
2020
Publisher
포항공과대학교
Abstract
Matrix or tensor completion, which aims to accurately predict unobserved matrix or tensor entries, has gained much attention because most matrices or tensors obtained from real-world applications are partially observed or have a large portion of missing entries. These tasks are mainly solved by factorization methods, which factorize sparse input data into multiple low-dimensional matrices considering only observed entries. However, existing factorization methods are not easy to be directly executed on large-scale matrices or tensors using a single machine, due to their high computational cost and memory bottleneck. Recent studies reported that most datasets can be actually stored in disks of a single off-the-shelf workstation, and using out-of-core (or disk-based) methods is much cheaper and even faster than using distributed methods. Out-of-core methods are also useful for machine learning with a large cluster; they can easily utilize multiple machines for model validation by independently building a model with different hyper-parameter values on each machine. This paper proposes novel out-of-core methods for large-scale matrix and tensor completion, specifically designed to fully utilize fast storage devices. Our approach aims to optimize the overall factorization process from the perspective of both algorithm and computer system: 1) We derive computationally-cheap update rules and cache intermediate data in order to reduce the computational cost. 2) We also design file structures for out-of-core execution and rearrange model updates, so that the amount of disk-I/O is minimized and the locality of cache/memory access is improved. Extensive experiments demonstrate that our proposed methods are much faster and more scalable than competing in-memory methods and out-of-core methods on real-world big matrices and tensors that do not fit into the memory. In addition, our methods beat distributed methods running on a large cluster, by the help of distributed validation.
URI
http://postech.dcollection.net/common/orgView/200000287371
https://oasis.postech.ac.kr/handle/2014.oak/111633
Article Type
Thesis
Files in This Item:
There are no files associated with this item.

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse