Improving Chinese Dependency Parsing on Raw Sentences using Heterogeneous Part-of-Speech Annotations
- Title
- Improving Chinese Dependency Parsing on Raw Sentences using Heterogeneous Part-of-Speech Annotations
- Authors
- Wu, Zhen
- Date Issued
- 2013
- Publisher
- 포항공과대학교
- Abstract
- Part-of-Speech (POS) tagging and parsing are essential steps toward representing the meaning of a sentence. Most parsers take sentences with POS tag information as their input, and derive parse tree mainly based on words(word surface information) and POS information provided. However, some POS tags are too general to encapsulate a word’s syntactic behavior, thus lead to a low parsing accuracy. Subdividing POS tags to a more fine-grained level can provide more information for parsing and increase parsing performance. But a subdivided tagset also makes POS tagging task difficult. In practicalNLP tasks, inputs are raw sentences without POS tags. POS tagging is an inevitable step before parsing and the errors in POS tagging may propagate into parsing. It is challenging to balance the granularity of POS tagset andperformance of POS tagger to improve the performance of the following parser.In this thesis, we propose to utilize heterogeneous Part-of-Speech (POS) information for POS subdividing to improve dependency parsing performance on raw sentences. We first used a Tsinghua Chinese Treebank (TCT) POStagger to tag Chinese Dependency Treebank (CDT) training set, and converted some CDT tags to TCT tags. In this way, we obtained a CDT corpus with some TCT POS tags. We then trained a parser using this new CDT corpus. Fordecoding, given a raw sentence input, we proposed two methods to performword segmentation and POS tagging. We used our newly trained parser to parse the tagged sentences.Experimental results showed that the parser based on CDT corpus with some TCT POS tags performed better than one based on original CDT corpus, with an improvement of 0.67% (absolute). Better results can be expected byexploring more heterogeneous POS tags, with optimized segmentation, POS tagging and parsing models.
- URI
- http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000001622454
https://oasis.postech.ac.kr/handle/2014.oak/1949
- Article Type
- Thesis
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.