Mutation profile for top-k search exploiting gene function relationship and matrix factorization
- Title
- Mutation profile for top-k search exploiting gene function relationship and matrix factorization
- Authors
- 김성철
- Date Issued
- 2015
- Publisher
- 포항공과대학교
- Abstract
- Given a large quantity of genome mutation data collected from clinics, how
can we search for similar patients? Similarity search based on patient mutation
pro les can solve various translational bioinformatics tasks, including prognos-
tics and treatment e cacy predictions for better clinical decision making through
sheer volume of data. However, this is a challenging problem due to heterogeneous
and sparse characteristics of the mutation data as well as its high dimensionality.
To tackle this problem, we suggest a compact representation and search strategy
based on Gene-Ontology (GO) and orthogonal non-negative matrix factorization
(ONMF). Statistical signi cance of relationship between the identi ed cancer sub-
types and their clinical features are computed for validation; results show that our
method can identify and characterize clinically meaningful tumor subtypes better
than the recently introduced Network Based Strati cation method while enabling
real-time search. To the best of our knowledge, this is the rst attempt to simul-
taneously characterize and represent somatic mutational data for e cient search
purposes.
As a next step, to obtain a more accurate mutation pro le for similarity search,
we propose a new mutation pro le, called Multi-Latent Semantic Analysis Mu-
tation Pro le (MLSA-MP). MLSA-MP is inspired by the fact that the genes
can have complex relationships in each gene set, in which the gene set contains
genes that are biologically related with each other. Accordingly, it makes the same
pair of patients to have di erent proximities according to the gene sets. To build
MLSA-MP, given a mutation data and a number of pre-de ned gene sets, we rst
generate a collection of sub-pro les of the mutation data. For each sub-pro le,
a set of latent representations are constructed by repeatedly exploiting Latent
Semantic Analysis (LSA). Finally, the MLSA-MP is built by concatenating a set
of latent representations. According to the experimental result, MLSA-MP allows
us to more accurately retrieve clinically similar patients than both of NBS and
ONMF-MP. In terms of the predictive power of the identi ed cancer subtypes, the
comparison result shows that MLSA-MP can identify and characterize clinically
meaningful tumor subtypes better than both of ONMF-MP and NBS as well.
- URI
- http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002062204
https://oasis.postech.ac.kr/handle/2014.oak/93495
- Article Type
- Thesis
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.