Open Access System for Information Sharing

Department of Creative IT Engineering (창의IT융합공학과) 3. Theses_Ph.D.

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

An Associative Reasoning Model for Multimodal Machine Translation

Title: An Associative Reasoning Model for Multimodal Machine Translation

Authors: 권순모

Date Issued: 2021

Publisher: 포항공과대학교

Abstract: 딥러닝의 발달로 인해 다수의 인공지능 기반 어플리케이션들이 활발히 연구되고 있다. 특히 자연어 처리 분야의 기계 번역은 이 기술의 발달로 다시금 주목받고 유례없는 성능 향상을 보였다. 그러나, 언어는 다의어를 포함하거나 불분명한 의미를 가지는 경우가 있어 추상화가 되기 쉬으므로 기계 번역시 어려움을 겪는 경우가 있고 현재의 기계 번역으로는 이 문제를 해결하지 못한다. 이런 이유로, 많은 연구자들을 시각적 혹은 청각적 정보를 보조적인 정보로 활용하려 하였으며 이는 다중 양상 기계 번역이라고 새로운 연구 영역이 되었다. 이론적으로 보조적인 정보들은 기계 언어를 쉽게 이해하는데 도움이 될 것이라 예상했지만, 해당 시스템은 생각보다 뛰어난 효과를 보이지는 못하고 있다. 일반적으로 사진은 그를 설명하는 글보다 더 많은 정보를 담고 있는 경우가 많다. 그 말인 즉슨 보조적인 정보로 시각적 정보를 활용하면 불필요한 정보도 같이 제공될 수 있으며, 이는 기계 번역 시스템에 부정적인 영향을 초래할 수 있다는 것이다. 본 저자는 사람들이 시각적 정보를 활용할 때 해설문을 먼저 이해하고 그와 연관 있는 정보만 특정화한다는 개념에 영감을 받았다. 그래서 본 연구에서는 해설문과 관련된 시각적 정보가 무엇인지 이들의 연관성을 추론하여 이를 활용한 번역 모델을 디자인하였다. 제안된 모델은 기존의 모델들이 기존의 해설문 정보를 적극적으로 활용하지 않았다는 점을 반영하여, 해설문에 주어진 단어 정보를 통해 관련 있는 시각적 정보만 추리는 것을 목표로 하였다. 궁극적으로 추려진 정보만 활용하여 번역문의 성능을 높이는 것을 목표로 하였다. 오픈소스로 주어지는 다중 모달 데이터를 이용하여 내부 데이터와 외부 데이터의 번역 성능을 평가하여 모델의 성능을 증명하였다. 또한, 사진의 특징정보를 시각화하여 제안한 모델이 의도대로 잘 작동하는지 확인하였다. 더 심도 있는 연구를 위하여 제안한 모델이 우리의 의도대로 잘 작동하는지 다각도적으로 실험하였고, 전반적으로 다중 양상 번역에서 긍정적인 성능들을 보였다.
With the advert of deep learning, many artificial intelligence applications are widely been investigated. Especially, machine translation, which is a part of natural language process tasks, is reviving based on novel techniques. However, there is a limitation that language could be abstract because of diverse polysemies or word sense disambiguation, so it is hard to solve the problem without external information. For the reason, many researchers are trying to utilize auxiliary context such as vision and sound, called multimodal machine translation. Theoretically, the supportive information would be useful in machine translation systems, but they have to fail to show dramatic increasing rather than human expected. Generally, images commonly contain more information rather than just their description. It means that visual context would be likely include irrelevant information to a paired-caption, and the context may affect to translation negatively. I am motivated by human being, in which utilize caption-related spatial context on a given image. Therefore, I design a novel model to reason what image maps are associated with tokens of their captions are related, and to translate using the information. The proposed model learn to choice relevant visual facts using textual information, because there is a lack of previous works to rarely utilize the textual information. I verified my model by conducting experiments with the Multi30k dataset and evaluating translation quality using translation evaluating metrics on in-domain and out-domain test sets, both. Qualitatively, I investigated distributions of image maps. For further analysis, I implement four additional ablation studies to discuss how to the model works. In general, my model showed positive effects in multi-modality translation.

URI: http://postech.dcollection.net/common/orgView/200000366427
https://oasis.postech.ac.kr/handle/2014.oak/111038

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Creative IT Engineering (창의IT융합공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse