Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 3. Theses_Ph.D.

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Graph-based Dependency Parsing with Phrasalization

Title: Graph-based Dependency Parsing with Phrasalization

Authors: 김미훈

Date Issued: 2013

Publisher: 포항공과대학교

Abstract: Constituency and dependency are complementary syntactical information, and both are necessary and essential in syntactic analysis. The use of dependency relation between words in constituency parsing is quite common, and previous works have demonstrated that dependency information improves the performance of constituency parsing. Constituency information is seldom used in dependency parsing, in this thesis, we show that constituency information also helps to improve dependency parsing. Motivated by the works of lexicalized PCFG parsing, we proposed a phrasalized dependency parsing. Phrasalization is a process to associate each head word with a phrase category. Since there is no phrase node in a vanilla dependency treebank, we derived a phrase category for each word sequence which can be dominated by a head. Then we associated the head with the derived phrase category. The existing graph-based dependency parsers are mainly based on spans. With the head locates on the left or right-edged word, a span merely corresponds to a half-constituent. Compared with constituent-based algorithms, span-based algorithms can parse efficiently with a complexity ranging from O(n3) to O(n4). In a span-based algorithm, a phrase is treated as two spans, and each span is processed independently to the other. Thus, it is impossible to model phrases in previous span-based algorithms. In this thesis, we proposed a new span-based dependency chart parsing algorithm which can process on phrases based on ternary-span combination, by maintaining the computational efficiency of original span-based algorithms. Additionally, with the proposed algorithm, we can model the relations between the left and right dependents of a head, which has been ignored in existing algorithms. We also proposed a new dependency parsing model involving phrases, and the new parsing model derives parse trees based on scores of dependency arcs and phrases. With the new parsing model, we are able to achieve better performances. The improvements on long sentences are even significant. The improvement for sentences longer than 40, is over 1% on the Chinese data of CoNLL 2009.

URI: http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000001558381
https://oasis.postech.ac.kr/handle/2014.oak/1827

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse