DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kwon S. | - |
dc.contributor.author | Go B.-H. | - |
dc.contributor.author | Lee J.-H. | - |
dc.date.accessioned | 2021-12-03T09:21:30Z | - |
dc.date.available | 2021-12-03T09:21:30Z | - |
dc.date.created | 2020-07-23 | - |
dc.date.issued | 2020-08 | - |
dc.identifier.issn | 0167-8655 | - |
dc.identifier.uri | https://oasis.postech.ac.kr/handle/2014.oak/107860 | - |
dc.description.abstract | We introduce a novel multimodal machine translation model that integrates image features modulated by its caption. Generally, images contain vastly more information rather than just their description. Furthermore, in multimodal machine translation task, feature maps are commonly extracted from pre-trained network for objects. Therefore, it is not appropriate to utilize these feature map directly. To extract the visual features associated with the text, we design a modulation network based on the textual information from the encoder and visual information from the pretrained CNN. However, because multimodal translation data is scarce, using overly complicated models could result in poor performance. For simplicity, we apply a feature-wise multiplicative transformation. Therefore, our model is a modular trainable network embedded in the architecture in existing multimodal translation models. We verified our model by conducting experiments on the Transformer model with the Multi30k dataset and evaluating translation quality using the BLEU and METEOR metrics. In general, our model was an improvements over a text-based model and other existing models. (C) 2020 Elsevier B.V. All rights reserved. | - |
dc.language | English | - |
dc.publisher | ELSEVIER | - |
dc.relation.isPartOf | PATTERN RECOGNITION LETTERS | - |
dc.title | A text-based visual context modulation neural model for multimodal machine translation | - |
dc.type | Article | - |
dc.identifier.doi | 10.1016/j.patrec.2020.06.010 | - |
dc.type.rims | ART | - |
dc.identifier.bibliographicCitation | PATTERN RECOGNITION LETTERS, v.136, pp.212 - 218 | - |
dc.identifier.wosid | 000553824800003 | - |
dc.citation.endPage | 218 | - |
dc.citation.startPage | 212 | - |
dc.citation.title | PATTERN RECOGNITION LETTERS | - |
dc.citation.volume | 136 | - |
dc.contributor.affiliatedAuthor | Kwon S. | - |
dc.contributor.affiliatedAuthor | Go B.-H. | - |
dc.contributor.affiliatedAuthor | Lee J.-H. | - |
dc.identifier.scopusid | 2-s2.0-85086641687 | - |
dc.description.journalClass | 1 | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.type.docType | Article | - |
dc.subject.keywordAuthor | Deep learning | - |
dc.subject.keywordAuthor | Machine translation | - |
dc.subject.keywordAuthor | Multimodality | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
library@postech.ac.kr Tel: 054-279-2548
Copyrights © by 2017 Pohang University of Science ad Technology All right reserved.