Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 4. Theses_Master

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

다항 로지스틱 회귀분석을 이용한 감독형 비음수 텐서 분해

Title: 다항 로지스틱 회귀분석을 이용한 감독형 비음수 텐서 분해

Authors: 최진균

Date Issued: 2017

Publisher: 포항공과대학교

Abstract: The rapid growth of electronic health records (EHRs) has led to growing interest in utilizing EHRs. Medical information systems (MISs) use heterogeneous EHRs to solve healthcare problems as clinical decision-making and early disease detection. However, using the raw EHRs to solve healthcare problems has some challenges. One of the biggest challenges is that the raw EHRs do not map to medical concepts that the clinical researcher directly uses. Besides, clinical researchers should take much time and effort to transform the raw EHRs into meaningful medical concepts. For this reason, several applications based on machine learning techniques have been proposed to automatically extract medical concepts from the raw EHRs. Among various machine learning techniques, tensor factorization methods have attracted a lot of attention because tensor representations can capture interactions among high-dimensional EHRs. However, existing tensor factorization methods for computational phenotyping are only designed to approximate the observed tensor as closely as possible and do not consider diagnostic accuracy. In this thesis, we propose Supervised Non-negative Tensor Factorization with Multinomial Logistic Regression (SNTFL) to improve diagnostic accuracy. We define a model-based constraint to use expertise knowledge such as the group information of patients. To learn discriminative latent representations of patients, we also design an algorithm that optimizes a multinomial logistic regression during the tensor factorization process. Through experiments, we show that our proposed method obtains better diagnostic accuracy than the baselines.
의료 장비들이 발달함에 따라, 다양한 의료 시설에서 디지털 형태로 저장된 전자건강기록 (electronic Health Record)들의 양이 급속도로 증가하고 있다. 최근 막대한 양의 전자건강기록들을 이용하여, 여러 가지 건강 관리 문제 (health care problem)를 해결하려는 많은 시도가 이루어지고 있다. 그러나 여러 가지 건강 관리 문제를 해결하기 위해, 환자들의 다양한 정보로 구성된 전자건강기록들을 사용하는 데 어려움이 있다. 특히, 전자건강기록들은 의료진들이 바로 사용할 수 있는 의학 개념 (medical concept)과 연결되지 않는 큰 문제점이 있다. 이를 위해, 충분한 전문적인 지식을 갖춘 의료진이나 특정 분야의 전문가 (domain expert)가 전자건강기록들로부터 의미 있는 의학 개념이나 표현형 (phenotype)을 추출하는 작업을 해야 한다. 그러나 이러한 작업을 하기 위해서는 의료진이나 전문가의 많은 시간과 노력을 해야 하는 문제점이 있다. 이러한 문제점을 해결하기 위하여, 의료진과 전문가의 최소한 노력 및 시간을 투자하여 자동으로 전자건강기록들로부터 의미 있는 의학 개념을 추출해주는 연구가 활발히 진행되고 있다. 특히, 다양한 기계 학습 (machine learning)들을 활용한 여러 연구가 수행되고 있다. 그중 다차원의 구조를 가진 데이터들을 분석할 수 있으며, 복잡한 상호 작용을 알아낼 수 있는 텐서 분해 (tensor factorization) 기법이 많은 연구자로부터 관심을 받고 있다. 그러나 텐서 분해 기법 기반으로 전자건강기록들로부터 의미 있는 의학 개념을 자동으로 추출해주는 기존의 방법들은 관찰된 데이터를 잘 복원하는 것만 목표로 하며, 특정 임상 결과 (clinical outcome)를 잘 예측하는 것은 고려하지 않는다. 따라서 본 연구에서는 기존의 방법들의 한계점을 해결하고자, 새로운 비음수 텐서 분해 (non-negative tensor factorization) 방법인, 다항 로지스틱 회귀분석을 이용한 감독형 비음수 텐서 분해 (Supervised Non-negative Tensor Factorization with Multinomial Logistic Regression)를 제안한다. 우리는 사망 여부 같은 환자의 레이블 정보 (label information)를 이용하기 위하여, 모델 기반의 제약 (model-based constraint)을 정의한다. 또한, 환자들의 차별적인 잠재표현(discriminative latent representation)을 학습하고자, 텐서 분해를 수행하는 동안, 다항 로지스틱 회귀분석 (multinomial logistic regression)을 함께 학습하도록 설계한다. 우리는 하나의 공개된 전자건강기록 데이터 세트 (dataset)에 제안된 방법을 적용하여 얼마나 효과적인지 확인하였다. 그 결과, 특정 임상 결과를 더 잘 예측하는 표현형을 발견할 수 있었으며, 기존의 방법들보다 진단 정확도가 더 높은 것을 확인할 수 있었다.

URI: http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002324463
https://oasis.postech.ac.kr/handle/2014.oak/93549

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse