Open Access System for Information Sharing

Login Library

 

Thesis
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Two-phase Lexical Normalization on Social Media Language

Title
Two-phase Lexical Normalization on Social Media Language
Authors
Zeng, Yingying
Date Issued
2016
Publisher
포항공과대학교
Abstract
Natural Language Processing (NLP) on data from social network services (SNSs) became more difficult than before because users in SNSs shorten the words to send the message quickly and some SNSs even limit the length that users can input in one message. Therefore, lexical normalization has become a necessary step before the NLP systems process SNS data. This paper proposes a lexical normalization system that can suggest normalization candidates for an input non-standard word (NSW). The proposed system generates normalization candidates by combining phonetic substitution and letter insertion. The system uses phonetic substitution by table lookup to generate intermediate candidates and uses letter insertion on intermediate candidates to form the final candidate set. Without referring to any existing NSW and SW pairs, this system succeeded to recover most test words and reach 84.82% Top-20 recall. This result proved that NSWs can be normalized without referencing existing NSWs.
URI
http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002296467
https://oasis.postech.ac.kr/handle/2014.oak/93524
Article Type
Thesis
Files in This Item:
There are no files associated with this item.

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse