Open Access System for Information Sharing

Login Library

 

Article
Cited 263 time in webofscience Cited 331 time in scopus
Metadata Downloads
Full metadata record
Files in This Item:
There are no files associated with this item.
DC FieldValueLanguage
dc.contributor.authorLee, CK-
dc.contributor.authorLee, GG-
dc.date.accessioned2016-04-01T02:04:43Z-
dc.date.available2016-04-01T02:04:43Z-
dc.date.created2009-08-21-
dc.date.issued2006-01-
dc.identifier.issn0306-4573-
dc.identifier.other2005-OAK-0000005443-
dc.identifier.urihttps://oasis.postech.ac.kr/handle/2014.oak/24365-
dc.description.abstractMost previous works of feature selection emphasized only the reduction of high dimensionality of the feature space. But in cases where many features are highly redundant with each other, we must utilize other means, for example, more complex dependence models such as Bayesian network classifiers. In this paper, we introduce a new information gain and divergence-based feature selection method for statistical machine learning-based text categorization without relying on more complex dependence models. Our feature selection method strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization. Empirical results are given on a number of dataset, showing that our feature selection method is more effective than Koller and Sahami's method [Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Proceedings of ICML-96, 13th international conference on machine learning], which is one of greedy feature selection methods, and conventional information gain which is commonly used in feature selection for text categorization. Moreover, our feature selection method sometimes produces more improvements of conventional machine learning algorithms over support vector machines which are known to give the best classification accuracy. (c) 2004 Elsevier Ltd. All rights reserved.-
dc.description.statementofresponsibilityX-
dc.languageEnglish-
dc.publisherPERGAMON-ELSEVIER SCIENCE LTD-
dc.relation.isPartOfINFORMATION PROCESSING & MANAGEMENT (postech rank 1)-
dc.subjecttext categorization-
dc.subjectfeature selection-
dc.subjectinformation gain and divergence-based feature selection-
dc.titleInformation gain and divergence-based feature selection for machine learning-based text categorization-
dc.typeArticle-
dc.contributor.college컴퓨터공학과-
dc.identifier.doi10.1016/j.ipm.2004.08.006-
dc.author.googleLee, CK-
dc.author.googleLee, GG-
dc.relation.volume42-
dc.relation.issue1-
dc.relation.startpage155-
dc.relation.lastpage165-
dc.contributor.id10103841-
dc.relation.journalINFORMATION PROCESSING & MANAGEMENT (postech rank 1)-
dc.relation.indexSCI급, SCOPUS 등재논문-
dc.relation.sciSCIE-
dc.collections.nameJournal Papers-
dc.type.rimsART-
dc.identifier.bibliographicCitationINFORMATION PROCESSING & MANAGEMENT (postech rank 1), v.42, no.1, pp.155 - 165-
dc.identifier.wosid000232355300010-
dc.date.tcdate2019-01-01-
dc.citation.endPage165-
dc.citation.number1-
dc.citation.startPage155-
dc.citation.titleINFORMATION PROCESSING & MANAGEMENT (postech rank 1)-
dc.citation.volume42-
dc.contributor.affiliatedAuthorLee, GG-
dc.identifier.scopusid2-s2.0-23744432473-
dc.description.journalClass1-
dc.description.journalClass1-
dc.description.wostc125-
dc.type.docTypeArticle-
dc.subject.keywordAuthortext categorization-
dc.subject.keywordAuthorfeature selection-
dc.subject.keywordAuthorinformation gain and divergence-based feature selection-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.relation.journalWebOfScienceCategoryInformation Science & Library Science-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaInformation Science & Library Science-

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Views & Downloads

Browse