Open Access System for Information Sharing

Login Library

 

Thesis
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads
Full metadata record
Files in This Item:
There are no files associated with this item.
DC FieldValueLanguage
dc.contributor.authorWu, Zhenen_US
dc.date.accessioned2014-12-01T11:48:50Z-
dc.date.available2014-12-01T11:48:50Z-
dc.date.issued2013en_US
dc.identifier.otherOAK-2014-01447en_US
dc.identifier.urihttp://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000001622454en_US
dc.identifier.urihttps://oasis.postech.ac.kr/handle/2014.oak/1949-
dc.descriptionMasteren_US
dc.description.abstractPart-of-Speech (POS) tagging and parsing are essential steps toward representing the meaning of a sentence. Most parsers take sentences with POS tag information as their input, and derive parse tree mainly based on words(word surface information) and POS information provided. However, some POS tags are too general to encapsulate a word’s syntactic behavior, thus lead to a low parsing accuracy. Subdividing POS tags to a more fine-grained level can provide more information for parsing and increase parsing performance. But a subdivided tagset also makes POS tagging task difficult. In practicalNLP tasks, inputs are raw sentences without POS tags. POS tagging is an inevitable step before parsing and the errors in POS tagging may propagate into parsing. It is challenging to balance the granularity of POS tagset andperformance of POS tagger to improve the performance of the following parser.In this thesis, we propose to utilize heterogeneous Part-of-Speech (POS) information for POS subdividing to improve dependency parsing performance on raw sentences. We first used a Tsinghua Chinese Treebank (TCT) POStagger to tag Chinese Dependency Treebank (CDT) training set, and converted some CDT tags to TCT tags. In this way, we obtained a CDT corpus with some TCT POS tags. We then trained a parser using this new CDT corpus. Fordecoding, given a raw sentence input, we proposed two methods to performword segmentation and POS tagging. We used our newly trained parser to parse the tagged sentences.Experimental results showed that the parser based on CDT corpus with some TCT POS tags performed better than one based on original CDT corpus, with an improvement of 0.67% (absolute). Better results can be expected byexploring more heterogeneous POS tags, with optimized segmentation, POS tagging and parsing models.en_US
dc.languageengen_US
dc.publisher포항공과대학교en_US
dc.rightsBY_NC_NDen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/2.0/kren_US
dc.titleImproving Chinese Dependency Parsing on Raw Sentences using Heterogeneous Part-of-Speech Annotationsen_US
dc.typeThesisen_US
dc.contributor.college일반대학원 컴퓨터공학과en_US
dc.date.degree2013- 8en_US
dc.type.docTypeThesis-

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse