Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 3. Theses_Ph.D.

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Mutation profile for top-k search exploiting gene function relationship and matrix factorization

Title: Mutation profile for top-k search exploiting gene function relationship and matrix factorization

Authors: 김성철

Date Issued: 2015

Publisher: 포항공과대학교

Abstract: Given a large quantity of genome mutation data collected from clinics, how can we search for similar patients? Similarity search based on patient mutation pro les can solve various translational bioinformatics tasks, including prognos- tics and treatment e cacy predictions for better clinical decision making through sheer volume of data. However, this is a challenging problem due to heterogeneous and sparse characteristics of the mutation data as well as its high dimensionality. To tackle this problem, we suggest a compact representation and search strategy based on Gene-Ontology (GO) and orthogonal non-negative matrix factorization (ONMF). Statistical signi cance of relationship between the identi ed cancer sub- types and their clinical features are computed for validation; results show that our method can identify and characterize clinically meaningful tumor subtypes better than the recently introduced Network Based Strati cation method while enabling real-time search. To the best of our knowledge, this is the rst attempt to simul- taneously characterize and represent somatic mutational data for e cient search purposes. As a next step, to obtain a more accurate mutation pro le for similarity search, we propose a new mutation pro le, called Multi-Latent Semantic Analysis Mu- tation Pro le (MLSA-MP). MLSA-MP is inspired by the fact that the genes can have complex relationships in each gene set, in which the gene set contains genes that are biologically related with each other. Accordingly, it makes the same pair of patients to have di erent proximities according to the gene sets. To build MLSA-MP, given a mutation data and a number of pre-de ned gene sets, we rst generate a collection of sub-pro les of the mutation data. For each sub-pro le, a set of latent representations are constructed by repeatedly exploiting Latent Semantic Analysis (LSA). Finally, the MLSA-MP is built by concatenating a set of latent representations. According to the experimental result, MLSA-MP allows us to more accurately retrieve clinically similar patients than both of NBS and ONMF-MP. In terms of the predictive power of the identi ed cancer subtypes, the comparison result shows that MLSA-MP can identify and characterize clinically meaningful tumor subtypes better than both of ONMF-MP and NBS as well.

URI: http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002062204
https://oasis.postech.ac.kr/handle/2014.oak/93495

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse