Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 3. Theses_Ph.D.

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

News Story Ranking Using Blogosphere

Title: News Story Ranking Using Blogosphere

Authors: 이예하

Date Issued: 2012

Publisher: 포항공과대학교

Abstract: Since the advent of the Internet, it has become one of the most important channels for communicating information among users including individuals and news organizations.Many news organizations have started to distribute news stories on the Internet, and a large number of news stories are published by various news channels, on a daily basis.This makes it difficult to keep track of important news stories.As a result, users' need to identify top news stories has increased, and news story search has played an increasingly important role in users' Internet activity.The objective of this dissertation is to identify important news stories for a given date, using the blogosphere.Blogs consists of blog posts that are user-generated contents, and reflects diverse the opinion of users about news stories.Therefore, a news story that attracts much attention in the blogosphere is likely to be important.In this dissertation, we define the popularity of a news story as the amount of attention it receives from users within the blogosphere.We first evaluate the popularity of a news story in terms of content similarity between the story and blog posts published on a given date.For this purpose, we propose several approaches to estimate language models for each of the story and the blog posts.We also generate a temporal profile of a news story by analyzing the distribution of the number of blog posts relevant to the story over several days, and evaluate the popularity of the story based on the temporal profile.The experimental results on the TREC 2009 and 2010 Blog Track show that our approach is effective in identifying the important news stories.In particular, the proposed approach achieved the state-of-the-art performance.Furthermore, we propose a simple but effective approach to deal with the noisy information of blog posts.In general, blog posts include several types of noisy information including blog templates, advertisements and navigation panels.This noisy information is not user-generated contents, and has a bad influence on our system for identifying important news stories.The motivation for our approach is that most of the noisy contents do not change across several consecutive posts within the same blog.To eliminate the noisy information, we compare two consecutive posts belonging to the same blog.Then, we consider common parts of the two posts as the noisy contents, and remove them.Experimental results from the TREC blog track are remarkable, showing that the retrieval system using the proposed method results in an important performance improvement of about 10% MAP (Mean Average Precision) increase over that of the baseline system.

URI: http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000001218603
https://oasis.postech.ac.kr/handle/2014.oak/1491

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse