Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 3. Theses_Ph.D.

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

그래프에서 최대최소 경로를 활용하는 기계학습 알고리즘 연구

Title: 그래프에서 최대최소 경로를 활용하는 기계학습 알고리즘 연구

Authors: 김계현

Date Issued: 2015

Publisher: 포항공과대학교

Abstract: 본 학위 논문은 주어진 데이터에 내재된 비 유클리드 매니폴드 및 클러스터 구조를 효율적으로 활용하는 기계학습 방법에 관해 다룬다. 실세계 데이터의 분포는 일반적으로 비 유클리드 공간을 이루고 있는데, 이 때 기존의 유클리드 공간상의 직선 거리 대신, 비 유클리드 구조를 반영할 수 있는 그래프 기반의 거리 측정 방법을 이용할 경우, k-인접 이웃 검색(k-nearest neighbor search), 준 지도 학습(semi-supervised learning) 등 여러 기계학습 문제를 보다 효과적으로 해결할 수 있다. 그러나 기존의 그래프 기반 거리 측정 방법들의 경우, 높은 계산 복잡도로 인해 대규모 데이터에 대한 기계학습에는 적용하기 어렵다는 한계를 가진다. 본 연구의 주요 성과는 대규모 그래프를 다루는 기계학습 문제에 대해, 기존 방법들에 비해 계산 효율성을 크게 향상시키면서도, 학습 결과의 품질에 있어서는 기존에 연구된 다양한 근사 알고리즘보다 높은 수준을 달성한 것이다. 이를 위해 기존의 그래프 기반 기계학습 방법을 새로운 관점으로 바라보고 분석하여, 그래프상의 최대최소 경로를 통한 효율적인 거리 측정 방법을 고안하였다. 본 학위 논문에서는 최대최소 경로 기반의 거리 측정 방법 및, 이를 바탕으로 새롭게 개발한 k-인접 이웃 검색 및 준 지도 학습 알고리즘에 관해 소개하며, 다양한 대규모 실세계 기계학습 문제에 적용한 연구 성과를 담고 있다.
One of the major difficulties in machine learning is that real-world datasets commonly lie in non-Euclidean manifolds or contain non-convex-shaped clusters. Capturing and exploiting the underlying non-Euclidean structures of data have achieved considerable success in information retrieval, semi-supervised learning, distance metric learning, dimensionality reduction and many other machine learning problems. Graph-based methods have drawn considerable attention, since a graph connecting near data points (in their Euclidean feature space) well approximates the underlying manifolds or cluster structures of data. However, they do not scale well to large datasets. In this thesis, we reformulate the existing graph-based methods as a unified framework of aggregation over paths between nodes in a graph, and show that a major cause of their inefficiency is that they consider all possible paths to measure the graph-based (dis)similarity between two nodes. From this observation, we develop a novel efficient approach for capturing the non-Euclidean geometry. We propose to measure the graph-based distance along only a particular type of paths, called minimax paths, which greatly reduces the number of paths to be considered for computation. Then we show that considering only those minimax paths is sufficient to capture the underlying non-convex-shaped clusters, whereas other kinds of paths (e.g., shortest paths) are not. From the proposed approach, we develop two scalable machine learning algorithms: (1) minimax $k$-NN search for $k$-nearest neighbor search on non-convex-shaped clusters, and (2) minimax label propagation for semi-supervised learning. Our algorithms are computationally more efficient than existing methods, while ensuring comparable quality of results in terms of accuracy (for classification) and precision (for information retrieval). Experimental results on some computer vision applications, object recognition and image segmentation, show that our algorithms are especially useful for learning with large-scale data.

URI: http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000001910979
https://oasis.postech.ac.kr/handle/2014.oak/93477

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse