Open Access System for Information Sharing

Department of Industrial & Management Engineering (산업경영공학과) 4. Theses_Master

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Classification Analysis of Longitudinal Healthcare Data using Recurrent Neural Networks

Title: Classification Analysis of Longitudinal Healthcare Data using Recurrent Neural Networks

Authors: 박진이

Date Issued: 2018

Publisher: 포항공과대학교

Abstract: In various fields, data are stored and retrieved using computer systems and automatically collected in real time by machine/sensors. However, neither of the previous studies can incorporate medical treatment records and/or demographic information and health examination results. The main aim of this thesis is to predict the presence of disease using longitudinal healthcare data. In other words, we seek to solve the classification problem by using text or time series data. In this thesis, we proposed a three-step procedure; The first step is to transform the text into a numerical vector using Word2vec. The second step is to consider multi-dimensional time series data using RNN. The third step is to develop a prediction model using all datasets for predicting the disease. The thesis has analyzed results from the comparison of datasets and the RNN. Firstly, we conducted experiments to examine model performance using different datasets. RNN using all datasets provides the higher than other datasets generally. In addition, we analyzed whether using the RNN improved the predictive performance of our model. The prediction model using RNN provides the highest in all evaluation measures. Furthermore, this research provides contributions in terms of the material used and the methodology applied. Consequently, this research would provide classification analysis of longitudinal healthcare data using widely-used deep learning algorithms.
다양한 분야에서 컴퓨터 시스템을 이용하여 데이터를 저장하고 활용하고 있으며, 최근에는 기계/센서를 이용하여 실시간으로 데이터를 자동 수집하고 있습니다. 특히, 종단 데이터 또는 패널 데이터 알려진 다차원 시계열 데이터가 증가하는 추세입니다. 의료 분야에서 종단 데이터 및 데이터 분석을 이용한 연구들이 있으나, 이전의 연구에서는 진료 기록 또는 인구통계학적 정보와 건강 검진 결과를 통합하여 활용할 수 없습니다. 이 논문의 주요 목표는 종단 의료 데이터를 사용하여 질병의 유무를 예측하는 것입니다. 다시 말하자면, 문자 및 시계열 데이터를 사용하여 분류 문제를 해결하는 것입니다. 이 논문에서 3 단계 절차를 제안하였는데, 첫 번째 단계는 Word2vec을 사용하여 문자 데이터를 수치적인 벡터로 변환하는 것입니다. 두 번째 단계는 순환신경망을 적용하여 다차원 시계열 데이터를 고려하는 것입니다. 세 번째 단계는 모든 데이터를 활용하여 질병을 예측하기 위한 예측 모델을 구현하는 것입니다. 실험 결과로는 다양한 구성의 데이터 비교한 결과 및 순환신경망을 분석한 결과가 있습니다. 첫째로 다양한 구성의 데이터를 이용하여 예측 모형을 구현하였고, 각 예측 모형의 비교하는 실험을 진행하였습니다. 실험 결과, 모든 데이터를 사용하여 학습한 예측 모형이 다른 구성의 데이터들을 사용한 예측모형들보다 일반적으로 높은 성능을 보여줬습니다. 다음으로 다양한 순환신경망을 적용 및 비교 분석하여 예측 성능이 개선되었는지 확인했습니다. 장단기 기억 네트워크를 사용하는 예측 모형이 모든 평가척도에서 가장 높은 결과를 제공했습니다. 이번 연구는 사용된 데이터 및 적용된 방법론 관점에서 장점이 있는데, 사용한 데이터인 표본 샘플 코호트는 한국의 고혈압에 대한 포괄적인 분석을 제공합니다. 적용한 방법론인 순환신경망은 인구통계학적 정보, 건강 검진 결과 및 진료 기록을 동시에 고려할 수 있습니다. 결과적으로 본 논문은 딥러닝 알고리즘을 적용하여 종단 의료 데이터의 분류 분석한 유의미한 연구가 될 것입니다.

URI: http://postech.dcollection.net/common/orgView/200000009493
https://oasis.postech.ac.kr/handle/2014.oak/92838

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Industrial & Management Engineering (산업경영공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse