Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 4. Theses_Master

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

새로운 노이징 방식으로 사전 학습 모델 기반의 문서 요약 개선

Title: 새로운 노이징 방식으로 사전 학습 모델 기반의 문서 요약 개선

Authors: 박주홍

Date Issued: 2021

Publisher: 포항공과대학교

Abstract: 문서 요약은 입력 문서의 핵심 내용을 파악하여 짧고 간결한 문장으로 나타내는 과정이다. 최근에는 문서 요약을 위해 사전 학습된 언어 모델을 이용하는 방식이 여럿 제안되고 있지만, 이러한 언어 모델들은 문서 요약의 특성을 고려하지 않고 설계된 입력 노이즈 방식을 사용하는 한계점이 있다. 본 논문에서는 한국어 문서 추상 요약에 사전 학습 언어 모델인 BART를 도입하고, 입력 문서에 무작위 문장을 삽입하는 노이징 방식과 문장 삭제 노이징을 추가하여 문서 추상 요약 모델의 언어 이해 능력을 향상시키는 방법론을 제안한다. 실험 결과, BART를 도입한 문서 요약 모델의 결과는 다른 요약 모델들의 결과에 비해 전반적으로 품질 향상을 보였지만, 제안한 노이징 방식들은 큰 모델에서는 성능 향상이 보이지 않아 개선이 필요하다.
Document summarization in natural language processing entails writing several brief sentences that include core details of the given document. Recently, several methods have been proposed to use pre-trained language models for document summarization, but these language models have limitations in using a designed input noise method without considering the nature of document summarization. In this paper, we introduce BART, a pre-training language model, in the abstract summarization of Korean documents, and propose two novel noising schemes, random sentence insertion and sentence deletion, that improve the language comprehension ability of the abstract summarization model. Experiments on Korean summarization task shows that using the proposed noising schemes improve the quality of the generated summary in small scale models, but does not improve in large scale models.

URI: http://postech.dcollection.net/common/orgView/200000367875
https://oasis.postech.ac.kr/handle/2014.oak/111381

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse