Open Access System for Information Sharing

Login Library

Department of Computer Science & Engineering (컴퓨터공학과) 4. Theses_Master

Thesis

Cited 0 time in webofscience

webofscience

Cited 0 time in scopus

scopus

Metadata Downloads

Full metadata record

Files in This Item:: There are no files associated with this item.

DC Field	Value	Language
dc.contributor.author	Wu, Zhen	en_US
dc.date.accessioned	2014-12-01T11:48:50Z	-
dc.date.available	2014-12-01T11:48:50Z	-
dc.date.issued	2013	en_US
dc.identifier.other	OAK-2014-01447	en_US
dc.identifier.uri	http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000001622454	en_US
dc.identifier.uri	https://oasis.postech.ac.kr/handle/2014.oak/1949	-
dc.description	Master	en_US
dc.description.abstract	Part-of-Speech (POS) tagging and parsing are essential steps toward representing the meaning of a sentence. Most parsers take sentences with POS tag information as their input, and derive parse tree mainly based on words(word surface information) and POS information provided. However, some POS tags are too general to encapsulate a word’s syntactic behavior, thus lead to a low parsing accuracy. Subdividing POS tags to a more fine-grained level can provide more information for parsing and increase parsing performance. But a subdivided tagset also makes POS tagging task difficult. In practicalNLP tasks, inputs are raw sentences without POS tags. POS tagging is an inevitable step before parsing and the errors in POS tagging may propagate into parsing. It is challenging to balance the granularity of POS tagset andperformance of POS tagger to improve the performance of the following parser.In this thesis, we propose to utilize heterogeneous Part-of-Speech (POS) information for POS subdividing to improve dependency parsing performance on raw sentences. We first used a Tsinghua Chinese Treebank (TCT) POStagger to tag Chinese Dependency Treebank (CDT) training set, and converted some CDT tags to TCT tags. In this way, we obtained a CDT corpus with some TCT POS tags. We then trained a parser using this new CDT corpus. Fordecoding, given a raw sentence input, we proposed two methods to performword segmentation and POS tagging. We used our newly trained parser to parse the tagged sentences.Experimental results showed that the parser based on CDT corpus with some TCT POS tags performed better than one based on original CDT corpus, with an improvement of 0.67% (absolute). Better results can be expected byexploring more heterogeneous POS tags, with optimized segmentation, POS tagging and parsing models.	en_US
dc.language	eng	en_US
dc.publisher	포항공과대학교	en_US
dc.rights	BY_NC_ND	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/2.0/kr	en_US
dc.title	Improving Chinese Dependency Parsing on Raw Sentences using Heterogeneous Part-of-Speech Annotations	en_US
dc.type	Thesis	en_US
dc.contributor.college	일반대학원 컴퓨터공학과	en_US
dc.date.degree	2013- 8	en_US
dc.type.docType	Thesis	-

Show simple item record

qr_code

트윗하기

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Views & Downloads

OAK

개인정보처리방침 Personal Information Protection Policy

library@postech.ac.kr Tel: 054-279-2548

Copyrights © by 2017 Pohang University of Science ad Technology All right reserved.

Browse

Login Library Help