Large-scale Matrix and Tensor Completion based on Out-of-core Approaches
- Title
- Large-scale Matrix and Tensor Completion based on Out-of-core Approaches
- Authors
- 이동하
- Date Issued
- 2020
- Publisher
- 포항공과대학교
- Abstract
- Matrix or tensor completion, which aims to accurately predict unobserved matrix or tensor entries, has gained much attention because most matrices or tensors obtained from real-world applications are partially observed or have a large portion of missing entries. These tasks are mainly solved by factorization methods, which factorize sparse input data into multiple low-dimensional matrices considering only observed entries. However, existing factorization methods are not easy to be directly executed on large-scale matrices or tensors using a single machine, due to their high computational cost and memory bottleneck. Recent studies reported that most datasets can be actually stored in disks of a single off-the-shelf workstation, and using out-of-core (or disk-based) methods is much cheaper and even faster than using distributed methods. Out-of-core methods are also useful for machine learning with a large cluster; they can easily utilize multiple machines for model validation by independently building a model with different hyper-parameter values on each machine. This paper proposes novel out-of-core methods for large-scale matrix and tensor completion, specifically designed to fully utilize fast storage devices. Our approach aims to optimize the overall factorization process from the perspective of both algorithm and computer system: 1) We derive computationally-cheap update rules and cache intermediate data in order to reduce the computational cost. 2) We also design file structures for out-of-core execution and rearrange model updates, so that the amount of disk-I/O is minimized and the locality of cache/memory access is improved. Extensive experiments demonstrate that our proposed methods are much faster and more scalable than competing in-memory methods and out-of-core methods on real-world big matrices and tensors that do not fit into the memory. In addition, our methods beat distributed methods running on a large cluster, by the help of distributed validation.
- URI
- http://postech.dcollection.net/common/orgView/200000287371
https://oasis.postech.ac.kr/handle/2014.oak/111633
- Article Type
- Thesis
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.