Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 4. Theses_Master

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

기계 번역문의 신경망 기반 품질 예측을 위한 이중 인코더 구조에 대한 탐구

Title: 기계 번역문의 신경망 기반 품질 예측을 위한 이중 인코더 구조에 대한 탐구

Authors: 허담

Date Issued: 2022

Publisher: 포항공과대학교

Abstract: 본 논문에서는 두 언어에 대해 두 인코더가 개별적으로 자기 주의를 수행하여 각 언어에 대한 단일 언어 표현을 학습한 후에 상호 주의망을 통해 품질 예측을 위한 교차 언어 표현을 학습하는 이중 인코더 구조를 제시한다. 자기 주의 기작으로 원본 문장과 기계 번역 문장의 두 입력을 동시에 처리하는 기존의 단일 인코더를 기반으로 하는 품질 예측 모델의 안정성을 개선하는 것을 목표로 하였다. WMT20의 Task 2의 데이터를 이용하여 진행된 실험에서 이중 인코더 구조와 단일 인코더 구조를 비교했을 때, 이중 인코더 구조가 품질 예측에서 가지는 구조적 유리함이 증명되었다. 나아가 WMT21의 Task 2의 데이터를 이용하여 진행된 실험에서 이중 인코더 모델의 두 단일 언어를 위한 인코더에 전이 학습 방식을 적용함으로써 이중 인코더 모델의 성능과 안정성이 개선될 수 있다는 것을 증명하였다. 사전 학습 모델로 ELECTRA가 활용되었는데, BERT를 적용한 모델과의 비교 실험에서 ELECTRA를 적용한 모델의 성능이 더 뛰어난 것으로 나타났다. WMT20의 Task 2에 대한 실험에서 Dual-Encoder를 활용한 모델들이 Single-Encoder 활용한 모델들을 모든 평가 지표에서 능가했다. 또한, WMT21의 공식 리더 보드에 따르면 이중 인코더를 기반으로 한 모델이 WMT21에서 제시하는 기준 모델의 성능을 뛰어넘은 것을 확인할 수 있다. 제시하는 이중 인코더 기반의 모델이 단어 수준의 품질 예측에서는 기계 번역 문장에 대한 매튜 상관 관계 계수를 능가하였고, 문장 수준의 품질 예측에서는 피어슨 상관 계수를 능가하였다. 이중 인코더의 개념은 단순하고 경험적으로 효과적이었다. 또한 이중 인코더의 도입이 품질 예측 모델에 다양한 강력한 단일 언어 사전 학습 언어 모델의 적용이 가능하게 함으로써, 품질 예측 모델들이 개선된 성능을 가질 수 있는 잠재적인 가능성을 부여하였다.
In this thesis, we propose Dual-Encoder architecture that learns a monolingual representation for each language respectively in encoders and then learns a cross-lingual representation in cross attention networks. We aim to improve the stability of recently proposed Quality Estimation (QE) models, which usually have a single Transformer encoder based on the self-attention mechanism to simultaneously process both of two input data; a source sequence (src) and its machine translation (mt). In several experiments on English-German language pairs of Workshop on Machine Translation 2020 (WMT20) Task 2 and WMT21 Task 2 respectively, we prove that the Dual-Encoder architecture is structurally more advantageous over the Single-Encoder architecture and we further improve the performance and stability of the Dual-Encoder model by applying the transfer learning method. On WMT20 Task 2, Our QE models based on Dual-Encoder outperform the QE models based on Single-Encoder on all evaluation measurements. According to the WMT21 official leaderboard, our systems also outperform the baseline system in terms of the Matthews correlation coefficient (MCC) for machine translations' word-level QE and in terms of the Pearson's correlation coefficient (PCC) for sentence-level QE. Concept of Dual-Encoder is simple and empirically effective. It gives potential powers to QE models by enabling them to apply various powerful monolingual pre-trained Language Models (LM).

URI: http://postech.dcollection.net/common/orgView/200000600537
https://oasis.postech.ac.kr/handle/2014.oak/117322

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse