Monolingual Machine Translation for Paraphrase Generation

  • Chris Quirk ,
  • Chris Brockett ,

Published by Association for Computational Linguistics

This version corrects an editing error in the text.

We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is measured to gauge the quality of the resulting corpus. A monotone phrasal decoder generates contextual replacements. Human evaluation shows that this system outperforms baseline paraphrase generation techniques and, in a departure from previous work, offers better coverage and scalability than the current best-of-breed paraphrasing approaches.

  • Follow on Twitter
  • Like on Facebook
  • Follow on LinkedIn
  • Subscribe on Youtube
  • Follow on Instagram
  • Subscribe to our RSS feed

Share this page:

  • Share on Twitter
  • Share on Facebook
  • Share on LinkedIn
  • Share on Reddit

Sentence-Level Paraphrasing for Machine Translation System Combination

  • Conference paper
  • First Online: 31 July 2016
  • Cite this conference paper

Book cover

  • Junguo Zhu 20 ,
  • Muyun Yang 20 ,
  • Sheng Li 20 &
  • Tiejun Zhao 20  

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 623))

Included in the following conference series:

  • International Conference of Pioneering Computer Scientists, Engineers and Educators

1408 Accesses

3 Citations

In this paper, we propose to enhance machine translation system combination (MTSC) with a sentence-level paraphrasing model trained by a neural network. This work extends the number of candidates in MTSC by paraphrasing the whole original MT translation sentences. First we train a neural paraphrasing model of Encoder-Decoder, and leverage the model to paraphrase the MT system outputs to generate synonymous candidates in the semantic space. Then we merge all of them into a single improved translation by a state-of-the-art system combination approach (MEMT) adding some new paraphrasing features. Our experimental results show a significant improvement of 0.28 BLEU points on the WMT2011 test data and 0.41 BLEU points without considering the out-of-vocabulary (OOV) words for the sentence-level paraphrasing model.

J. Zhu—This paper is supported by the project of Natural Science Foundation of China (Grant No. 61272384&61370170).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An open source Toolkit is available at https://github.com/lisa-groundhog/GroundHog .

LDC2002E18, LDC2002L27, DC2002T01, LDC2003E07, LDC2003E14, LDC2004T07, LDC2005E83, LDC2005T06, LDC2005T10, LDC2005T34, LDC2006E24, LDC2006E26, LDC2006E34, DC2006E86, LDC2006E92, LDC2006E93, LDC2004T08 (HK News, HK Hansards).

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv:1409.0473

Banerjee, S., Lavie, A.: Meteor: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)

Google Scholar  

Bangalore, S., Bordel, G., Riccardi, G.: Computing consensus translation from multiple machine translation systems. In: IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001, pp. 351–354. IEEE (2001)

Du, J., Way, A.: Using terp to augment the system combination for SMT. In: Proceedings of the Ninth Conference of the Association for Machine Translation. Association for Machine Translation in the Americas (2010)

Feng, Y., Liu, Y., Mi, H., Liu, Q., Lü, Y.: Lattice-based system combination for statistical machine translation. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3, pp. 1105–1113. Association for Computational Linguistics (2009)

Freitag, M., Peter, J.T., Peitz, S., Feng, M., Ney, H.: Local system voting feature for machine translation system combination. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 467–476 (2015)

Graves, A.: Sequence transduction with recurrent neural networks (2012). arXiv:1211.3711

He, X., Yang, M., Gao, J., Nguyen, P., Moore, R.: Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 98–107. Association for Computational Linguistics (2008)

Heafield, K., Lavie, A.: Combining machine translation output with open source: the Carnegie Mellon multi-engine machine translation scheme. Prague Bull. Math. Linguist. 93 , 27–36 (2010)

Article   Google Scholar  

Huang, F., Papineni, K.: Hierarchical system combination for machine translation. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 277–286. Association for Computational Linguistics, Prague, Czech Republic, June 2007

Karakos, D., Eisner, J., Khudanpur, S., Dreyer, M.: Machine translation system combination using ITG-based alignments. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 81–84. Association for Computational Linguistics (2008)

Koehn, P.: Statistical significance tests for machine translation evaluation. In: EMNLP, pp. 388–395. Citeseer (2004)

Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007)

Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 48–54. Association for Computational Linguistics (2003)

Ma, W.Y., McKeown, K.: Phrase-level system combination for machine translation based on target-to-target decoding. In: Proceedings of the 10th Biennial Conference of the Association for Machine Translation in the Americas (2012)

Ma, W.Y., McKeown, K.: System combination for machine translation through paraphrasing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1053–1058 (2015)

Matusov, E., Ueffing, N., Ney, H.: Computing consensus translation for multiple machine translation systems using enhanced hypothesis alignment. In: EACL, pp. 33–40 (2006)

Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)

Rosti, A.v.I., Ayan, N.F., Xiang, B., Matsoukas, S., Schwartz, R., Dorr, B.J.: Combining outputs from multiple machine translation systems. In: Proceeding NAACL-HLT 2007, pp. 228–235 (2007)

Rosti, A.V.I., Matsoukas, S., Schwartz, R.: Improved word-level system combination for machine translation. In: Annual Meeting-Association for Computational Linguistics, pp. 312–319 (2007)

Sim, K.C., Byrne, W.J., Gales, M.J.F., Sahbi, H., Woodland, P.C.: Consensus network decoding for statistical machine translation system combination. In: IEEE International Conference on Acoustics Speech Signal Processing, ICASSP 2007, vol. 4, pp. 2–5 (2007)

Snover, M.G., Madnani, N., Dorr, B., Schwartz, R.: TER-Plus: paraphrase, semantic, and alignment enhancements to translation edit rate. Mach. Transl. 23 (2–3), 117–127 (2009)

Download references

Author information

Authors and affiliations.

Computer Science and Technology, Harbin Institute of Technology, 92 West Dazhi Street, Harbin, 150001, China

Junguo Zhu, Muyun Yang, Sheng Li & Tiejun Zhao

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Muyun Yang .

Editor information

Editors and affiliations.

Harbin Institute of Technology , Harbin, China

Wanxiang Che

Harbin Engineering University , Harbin, China

Hongzhi Wang

Northeast Forestry University , Harbin, China

Weipeng Jing

National University of Defense Technology , Changsha, China

Shaoliang Peng

Harbin Univ. of Science and Technology , Harbin, China

Guanglu Sun

Xianhua Song

Hongtao Song

Harbin Sea of Clouds & Computer Tech. , Harbin, China

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper.

Zhu, J., Yang, M., Li, S., Zhao, T. (2016). Sentence-Level Paraphrasing for Machine Translation System Combination. In: Che, W., et al. Social Computing. ICYCSEE 2016. Communications in Computer and Information Science, vol 623. Springer, Singapore. https://doi.org/10.1007/978-981-10-2053-7_54

Download citation

DOI : https://doi.org/10.1007/978-981-10-2053-7_54

Published : 31 July 2016

Publisher Name : Springer, Singapore

Print ISBN : 978-981-10-2052-0

Online ISBN : 978-981-10-2053-7

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

The ParaBank project consists of a series of efforts exploring the potential for guided backtranslation for the purpose of paraphrasing with constraints. This work is spiritually connected to prior efforts at JHU in paraphrasing, in particular projects surrounding the ParaPhrase DataBase (PPDB) .

The following are brief descriptions of projects under ParaBank, along with associated artifacts.

ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation

Abstract: We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of ParaNMT, we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of English reference sentences. By adding lexical constraints to the NMT decoding procedure, however, we are able to produce multiple high-quality sentential paraphrases per source sentence, yielding an English paraphrase resource with more than 4 billion generated tokens and exhibiting greater lexical diversity. Using human judgments, we also demonstrate that ParaBank's paraphrases improve over ParaNMT on both semantic similarity and fluency. Finally, we use ParaBank to train a monolingual NMT model with the same support for lexically-constrained decoding for sentence rewriting tasks.

arXiv: https://arxiv.org/abs/1901.03644

ParaBank v1.0 Full (~9 GB)

ParaBank v1.0 Large, 50m pairs (~3 GB)

ParaBank v1.0 Small Diverse, 5m pairs

ParaBank v1.0 Large Diverse, 50m pairs

Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting

Abstract: Lexically-constrained sequence decoding allows for explicit positive or negative phrasebased constraints to be placed on target output strings in generation tasks such as machine translation or monolingual text rewriting. We describe vectorized dynamic beam allocation, which extends work in lexically-constrained decoding to work with batching, leading to a five-fold improvement in throughput when working with positive constraints. Faster decoding enables faster exploration of constraint strategies: we illustrate this via data augmentation experiments with a monolingual rewriter applied to the tasks of natural language inference, question answering and machine translation, showing improvements in all three.

https://www.aclweb.org/anthology/N19-1090

pMNLI : Paraphrase Augmentation of MNLI

Large-scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering

Abstract: Producing diverse paraphrases of a sentence is a challenging task. Natural paraphrase corpora are scarce and limited, while existing large-scale resources are automatically generated via back-translation and rely on beam search, which tends to lack diversity. We describe ParaBank 2, a new resource that contains multiple diverse sentential paraphrases, produced from a bilingual corpus using negative constraints, inference sampling, and clustering.We show that ParaBank 2 significantly surpasses prior work in both lexical and syntactic diversity while being meaning-preserving, as measured by human judgments and standardized metrics. Further, we illustrate how such paraphrastic resources may be used to refine contextualized encoders, leading to improvements in downstream tasks.

https://www.aclweb.org/anthology/K19-1005

ParaBank v2.0 (~2.3 GB)

Iterative Paraphrastic Augmentation with Discriminative Span Alignment

Abstract: We introduce a novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment. Our approach allows for the large-scale expansion of existing resources, or the rapid creation of new resources from a small, manually-produced seed corpus. We illustrate our framework on the Berkeley FrameNet Project, a large-scale language understanding effort spanning more than two decades of human labor. Based on roughly four days of collecting training data for the alignment model and approximately one day of parallel compute, we automatically generate 495,300 unique (Frame, Trigger) combinations annotated in context, a roughly 50x expansion atop FrameNet v1.7.

TACL: https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00380/100783/Iterative-Paraphrastic-Augmentation-with

Augmented FrameNet

Name: framenet-expanded-vers2.0.jsonlines.gz

This file contains an expanded 1,983,680-sentence version of FrameNet generated by applying 10 rounds of iterative paraphrastic augmentation to (almost all) of the roughly 200,000 sentences in the original resource. Each line is a JSON object with the following attributes:

  • frame_name : The frame to which this sentence belongs.
  • lexunit_compound_name : The lexical unit in the form of lemma.POS e.g. increase.n .
  • original_string : The raw FrameNet sentence.
  • original_trigger_offset : The character-level offset into the raw FrameNet sentence representing the trigger.
  • original_trigger : The string value of the trigger.
  • frame_id : The associated frame ID from FrameNet data release v1.7.
  • lexunit_id : The associated lexical unit ID from FrameNet data release v1.7.
  • exemplar_id : The associated exemplar ID from FrameNet data release v1.7.
  • annoset_id : The associated annotation set ID from FrameNet data release v1.7.
  • outputs : A list containing 10 items, each representing an automatically paraphrased and aligned sentence corresponding to the original FrameNet source sentence.

Each such item is of the form:

  • output_string : The tokenized automatically-generated paraphrase.
  • output_trigger_offset : The offset into the paraphrase representing the automatically aligned trigger.
  • output_trigger : The string value of the automatically aligned trigger in the paraphrase.
  • pbr_score : The negative log-likelihood of this paraphrase under the paraphrase model.
  • aligner_score : The probability of this alignment under the alignment model.
  • iteration : The iteration in which this output was generated (ranges between 1 and 10).
  • pclassifier_score : Probability of this output under a classifier trained to optimize for high precision of acceptable outputs.
  • rclassifier_score : Probability of this output under a classifier trained to optimize for high recall of acceptable outputs.

The pclassifier_score may be used to select a smaller, higher quality subset of the full dataset whereas the rclassifier_score may be used to obtain a larger but slightly lower quality subset.

Alignment Dataset

Name: alignment-release.jsonlines.gz

This file contains a 36,417-instance manually annotated dataset for monolingual span alignment. Each data point consists of a natural-language sentence (the source), a span in that sentence, an automatically generated paraphrase (the reference), and a span in the reference with the same meaning as the source-side span. All source sentences are taken from FrameNet v1.7.

Each line is a JSON object with the following attributes:

  • source_bert_toks : The tokenized source sentence.
  • source_bert_span : Offset into the source sentence representing a span.
  • `reference spacy tokens``: The tokenized reference sentence.
  • reference_span : Offset into the reference sentence representing a span.
  • has_corres : Boolean value representing whether the reference sentence contains a span that corresponds in meaning to the source-side span.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Make it simple with paraphrases: automated paraphrasing for authoring aids and machine translation

Profile image of Anabela Barreiro

This book presents a novel scientific approach to improve machine translation by paraphrasing support verb constructions with semantically equivalent verbs (e.g. make a presentation of/present). The author demonstrates that this strategy produces a positive impact in machine translation. The study is reproducible and extendable to distinct linguistic phenomena and successfully applied to different- purpose natural language processing applications. The author exemplifies how paraphrases can be efficiently employed by authoring aids to help simplify and clarify texts, presenting obvious benefits to linguistic quality assurance in text processing. While addressing and providing a solution for a specific linguistic problem, this book presents a comprehensive theoretical background and exposure of conceptual problems that will interest natural language processing professionals, linguists, translators, and students. Written in a simple language, this book will be easily understood by non-specialists in the field who have an interest in natural language.

Related Papers

Anabela Barreiro

Abstract. In this paper we present ParaMT, a bilingual/multilingual paraphraser to be applied in machine translation. We select paraphrases of support verb constructions and use the NooJ linguistic environment to formalize and generate translation equivalences through the use of dictionary and local grammars with syntactic and semantic content. Our research shows that linguistic paraphrasal knowledge constitutes a key element in conversion of source language into controlled language text that presents more successful translation results.

machine translation paraphrasing

Language Resources and Evaluation

This paper presents a new linguistic resource for the generation of paraphrases in Portuguese, based on the lexicon-grammar framework. The resource components include: (i) a lexicon-grammar based dictionary of 2100 predicate nouns co-occurring with the support verb ser de ‘be of’, such as in ser de uma ajuda inestimável ‘be of invaluable help’; (ii) a lexicon-grammar based dictionary of 6000 predicate nouns co-occurring with the support verb fazer ‘do’ or ‘make’, such as in fazer uma comparação ‘make a comparison’; and (iii) a lexicon-grammar based dictionary of about 5000 human intransitive adjectives co-occurring with the copula verbs ser and/or estar ‘be’, such as in ser simpático ‘be kind’ or estar entusiasmado ‘be enthusiastic’. A set of local grammars explore the properties described in linguistic resources, enabling a variety of text transformation tasks for paraphrasing applications. The paper highlights the different complementary and synergistic components and integration ...

Computational Linguistics and Intelligent Text …

This paper presents SPIDER, a system for paraphrasing in document editing and revision with applicability in machine translation pre-editing. SPIDER applies its linguistic knowledge (dictionaries and grammars) to create paraphrases of distinct linguistic phenomena. The first version of this tool was initially developed for Portuguese (ReEscreve v01), but it is extensible to different languages and can also operate across languages. SPIDER has a totally new interface, new resources which contemplate a wider coverage of linguistic phenomena, and applicability to legal terminology, which is described here.

In this paper we present version 0.1 of Port4NooJ, the open source NooJ Portuguese linguistic module, which integrates a bilingual extension for Portuguese-English machine translation, (MT4NooJ), a work in progress. We first explain the motivation behind this work and then describe the main components of the module, particularly, the electronic dictionaries, the rules which formalize and document Portuguese inflectional and derivational descriptions, and the different types of grammar: morphological, disambiguation, syntactic-semantic, multiword expressions and translation grammars. We explain how the different components interact and show the application of these linguistic resources, dictionaries and grammars to text. We present methodology and results driven by the new characteristics of this module.

on Iberian Cross-Language Natural …

Hugo Oliveira

Communications in Computer and Information Science

This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and phrasal units, such as the compounds “toda a gente” versus “todo o mundo” ‘everybody’ or the gerundive constructions [estar a + V-Inf] versus [ficar + V-Ger] (e.g., “estive a observar” | “fiquei observando” ‘I was observing’), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases. The construction of a larger dataset of paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible t...

This paper details the integration into Port4NooJ of 15 lexicon-grammar tables describing the distributional properties of 4,248 human intransitive adjectives. The properties described in these tables enable the recognition and generation of adjectival constructions where the adjective has a predicative function. These properties also establish semantic relationships between adjective, noun and verb predicates, allowing new paraphrasing capabilities that were described in NooJ grammars. The new dictionary of human intransitive adjectives created by merging the information on those tables with the Port4NooJ homo-graph adjectives is comprised of 5,177 entries. The enhanced Port4NooJ is being used in eSPERTo, a NooJ-based paraphrase generation platform.

Proceedings of the Workshop on Discontinuous Structures in Natural Language Processing

RELATED PAPERS

Gloria Corpas Pastor , Johanna Monti

… Beyond Translation Memories: New Tools for …

Lecture Notes in Computer Science

Aline Villavicencio , N. Mamede , Paulo Quaresma

António J S Teixeira

Jorge Baptista

Paulo Quaresma

Renata Vieira

Cecily J Duffield

ACL HLT 2011

Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

Veronika Vincze

Sandra Maria Aluísio

Fernando Batista

Veronique Hoste

Stan Szpakowicz

Amanda Rassi

Els Lefever

SEW-2009 Semantic Evaluations: …

Rada Mihalcea

Proceedings of the …

Shu-Kai Hsieh

Quochi Valeria

Karin Friberg Heppin

Proceedings of the 12th Workshop on Multiword Expressions

Seid Muhie Yimam

Aggeliki Fotopoulou

J. Baptista

Computational Linguistics

Bonnie Dorr

Min-Yen Kan

Proc. of LREC

Sebastian Padó

Anabela Barreiro , Mario Monteleone , Johanna Monti

Gloria Corpas Pastor

Editions Tradulex, Geneva

Johanna Monti , Mario Monteleone

Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives (Full papers) - Fraseología computacional y basada en corpus: perspectivas monolingües y multilingües (Trabajos completos)

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

2020 Duolingo Shared Task

Staple: simultaneous translation and paraphrase for language education.

This challenge is in conjunction with the WNGT workshop at ACL 2020 .

machine translation paraphrasing

Introduction

Machine translation systems typically produce a single output , but in certain cases, it is desirable to have many possible translations of a given input text. This situation is common with Duolingo (the world's largest language-learning platform), where some learning happens via translation-based exercises, and grading is done by comparing learners' responses against a large set of human-curated acceptable translations. We believe the processes of grading and/or manual curation could be vastly improved with richer, multi-output translation + paraphrase systems .

In this shared task, participants start with English prompts and generate high-coverage sets of plausible translations in five other languages. For evaluation, we provide sentences with handcrafted, field-tested sets of possible translations, weighted and ranked according to actual learner response frequency. We will also provide high-quality automatic translations of each input sentence that may (optionally) be used as a reference/anchor point, and also serves as a strong baseline. In this way, we expect the task to be of interest to diverse researchers in machine translation, MT evaluation, multilingual paraphrase, and language education technology fields.

Novel and interesting research opportunities in this task:

  • A large set of sentences with comprehensive translations (though not exhaustive, per se)
  • Translations weighted by real language-learner data
  • Datasets in 5 language pairs .

The outcomes of this shared task will be:

  • New translation datasets provided to the community
  • New benchmarks for MT and paraphrasing

Take note of the task timeline below. Quick links:

  • Mailing list for updates & announcements
  • Data sets at Dataverse
  • Baseline starter code at GitHub
  • Official submissions & results at CodaLab

Official Results

(Updated May 26, 2020) The table below shows the overall results on the TEST set for each language, in terms of weighted F1. We have omitted results that did not outperform the baselines. More detailed results can be found in this spreadsheet .

These results differ slightly from the CodaLab leaderboard . CodaLab chooses to display a results from a team's entire submission (according to some comparison function). We have chosen to select each team's highest performing score for each language track, which are not necessarily all from the same submission.

Read the 2020 STAPLE Task Overview Paper »

Important Dates

Mailing list & organizers.

We have created a Google Group to foster discussion and answer questions related to this task:

Join the STAPLE Shared Task group »

The task organizers (all from Duolingo) are:

  • Klinton Bicknell
  • Chris Brust
  • Stephen Mayhew
  • Bill McDowell
  • Will Monroe
  • Burr Settles

Task Definition & Data

Duolingo is a free, award-winning, online language learning platform. Since launching in 2012, more than 300 million students from all over the world have enrolled in one of Duolingo's 90+ game-like language courses, via the website or mobile apps. For comparison, that is more than the total number of students in the entire U.S. school system.

A portion of learning on Duolingo happens through translation-based exercises. In this task, we focus on the challenges where users are given a prompt in the language they are learning (English), and type a response in their native language. Some examples of this are shown in the following images, which are taken from English lessons for Portuguese speakers.

Prediction Task

Participants are given an English sentence, and are required to produce a high-coverage set of translations in the target language. In order to level the playing field, we also provide a high-quality automatic reference translation (via Amazon), which may be considered as a strong baseline for the machine translation task.

The prompt sentences come from Duolingo courses, and are often relatively simple (and a little quirky). For example, below is a sentence taken from the course that teaches English to Portuguese speakers:

Examining these data, it’s clear that not all accepted translations are equally likely, and therefore, they should be scored accordingly. As stewards of the world's largest and most comprehensive corpus of language learning data, we are able to use lesson response data to estimate which translations are more likely. This is used in the metric, described below.

The data for this task comes from five Duolingo courses. All use English prompts, with multiple translations, although weighted by frequency from speakers of each of the following languages:

  • en_pt — Portuguese
  • en_hu — Hungarian
  • en_ja — Japanese
  • en_ko — Korean
  • en_vi — Vietnamese

The TRAIN data will include comprehensive accepted translations along with weights for participating teams to use in tuning their systems.

Statistics for the released training data are below. All dev and test sets have the same number of prompts (500). The training sets are sampled from each course.

Download the data »

Data Format

The provided data takes the following format. The weights on each translation correspond to user response rates. These weights are used primarily for scoring. You are not required to output weights.

Submission & Evaluation

Starter code.

You can find starter code here: https://github.com/duolingo/duolingo-sharedtask-2020/ . This contains code to train standard seq2seq models, as well as the official scoring function, and some data readers.

Get the starter code »

The main scoring metric will be weighted macro \(F_1\), with respect to the accepted translations. In short, systems are scored based on how well they can return all human-curated acceptable translations , weighted by the likelihood that an English learner would respond with each translation.

In weighted macro \(F_1\), we calculate weighted \(F_1\) for each prompt \(s\), and take the average over all prompts in the corpus. We chose to calculate precision in an unweighted fashion, and weight only recall. Specifically, for weighted true positives (WTP) and weighted false negatives (WFN), we have:

The weighted \(F_1\)'s are then averaged over all prompts in the corpus.

Evaluation: CodaLab

All system submissions and evaluation will be done via CodaLab . There will be a DEV phase where you can submit predictions online after the phase 2 data release, and a TEST phase for final evaluation after the phase 3 data release. Check for more details as the submission deadline approaches.

Submit results on CodaLab »

Prediction Format

The submission file format is similar to the Amazon Translate prediction file.

Submission should have blocks of text separated by one empty line, where the first line of the block is an ID and prompt, and all following lines are unique predicted paraphrases, order doesn't matter. During evaluation, the punctuation will be stripped, and all text lowercased.

Here is an example prediction file, with prompts corresponding to the example above:

System Papers & Citation Details

All teams are expected to submit a system paper describing their approach and results, to be published in the workshop proceedings and available through the ACL Anthology website. Please do so even if you are unable to travel to the ACL conference in July 2020.

Note that we are interested not only in top-performing systems (i.e., metrics), but also meaningful findings (i.e., insights for language and/or learning). Teams are encouraged to focus on both in their write-ups!

Papers should follow the the ACL 2020 submission guidelines . Teams are invited to submit a full paper (4-8 pages of content, with unlimited pages for references). We recommend using the official style templates:

  • LaTeX + MS Word

All submissions must in PDF format and should not be anonymized . Supplementary files (hyperparameter settings, external features or ablation results too extensive to fit in the main paper, etc.) are also welcome, so long as they follow the ACL 2020 Guidelines. Final camera ready versions of accepted papers will be given up to one additional page of content (9 pages plus references) to address reviewer comments. Papers must include the following citation:

Stephen Mayhew, Klinton Bicknell, Chris Brust, Bill McDowell, Will Monroe, and Burr Settles. 2020. Simultaneous Translation And Paraphrase for Language Education. In Proceedings of the ACL Workshop on Neural Generation and Translation (WNGT) , ACL.

Submit your paper through START »

Tips, Resources, & Related Work

The following resources may prove useful. We may update this section as the challenge progresses....

Translation

  • fairseq is a Pytorch-based framework for sequence modeling, such as machine translation or text generation.
  • KyTea may be useful for segmentation in Japanese.
  • Multilingual contextual models, many of which are available through HuggingFace transformers.
  • Multilingual BERT has proven to be remarkably useful for cross-lingual applications.
  • XLM (Lample & Conneau, 2019), and XLM-R (Conneau et al., 2019), by virtue of parallel text in training, may outperform Multilingual BERT.
  • It may be important to maintain a diverse beam when decoding (Ippolito et al., 2019).
  • Given the nature of the data, phrase-based systems such as Giza++ , Moses , and others may in fact be competitive with more modern, neural methods (see statmt.org for many resources)

MT Evaluation

  • In the HyTer metric (Dreyer & Marcu, 2012), a translation prediction is scored against a comprehensive list of manually-gathered translations. Our evaluation is similar in the sense that we have high-coverage translation options at test time, but the goals are slightly different. Where HyTer is concerned with accurate measurement of machine translation, this task pursues high-coverage output. We also provide real-world weights with each translation option.
  • Where HyTer employed humans in writing all possible translations of a sentence, Automated HyTer (Apidianaki et al., 2018) uses the Paraphrase Database (PPDB) .

Paraphrasing

  • The Multilingual Paraphrase Database (PPDB) may prove useful (Ganitkevitch & Callison-Burch, 2014)
  • Towards Universal Paraphrastic Sentence Embeddings (Wieting et al., 2015)
  • Simple and Effective Paraphrastic Similarity from Parallel Translations (Wieting et al, 2019)
  • Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment (Barzilay & Lee, 2003)
  • Opensubtitles
  • News Commentary
  • The OPUS Collection

2018 SLAM Shared Task · ACL 2020 · WNGT · SIGEDU · Duolingo Research · Duolingo Careers

Help | Advanced Search

Computer Science > Computation and Language

Title: automatic machine translation evaluation in many languages via zero-shot paraphrasing.

Abstract: We frame the task of machine translation evaluation as one of scoring machine translation output with a sequence-to-sequence paraphraser, conditioned on a human reference. We propose training the paraphraser as a multilingual NMT system, treating paraphrasing as a zero-shot translation task (e.g., Czech to Czech). This results in the paraphraser's output mode being centered around a copy of the input sequence, which represents the best case scenario where the MT system output matches a human reference. Our method is simple and intuitive, and does not require human judgements for training. Our single model (trained in 39 languages) outperforms or statistically ties with all prior metrics on the WMT 2019 segment-level shared metrics task in all languages (excluding Gujarati where the model had no training data). We also explore using our model for the task of quality estimation as a metric--conditioning on the source instead of the reference--and find that it significantly outperforms every submission to the WMT 2019 shared task on quality estimation in every language pair.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

MT Evaluation in Many Languages via Zero-Shot Paraphrasing

thompsonb/prism

Folders and files, repository files navigation, prism: mt evaluation in many languages via zero-shot paraphrasing.

Prism is an automatic MT metric which uses a sequence-to-sequence paraphraser to score MT system outputs conditioned on their respective human references. Prism uses a multilingual NMT model as a zero-shot paraphraser, which negates the need for synthetic paraphrase data and results in a single model which works in many languages.

Prism outperforms or statistically ties with all metrics submitted to the WMT 2019 metrics shared task as segment-level human correlation.

We provide a large, pre-trained multilingual NMT model which we use as a multilingual paraphraser, but the model may also be of use to the research community beyond MT metrics. We provide examples of using the model for both multilingual translation and paraphrase generation .

Prism scores raw, untokenized text; all preprocessing is applied internally. This document describes how to install and use Prism.

Installation

Prism requires a version of Fairseq compatible with the provided pretrained model. We recommend starting with a clean environment:

For reasonable speeds, we recommend running on a machine with a GPU and the CUDA version compatible with the version of fairseq/torch installed above. Prism will run on a GPU if available; to run on CPU instead, set CUDA_VISIBLE_DEVICES to an empty string.

Download the Prism code and install requirements, including Fairseq:

Download Model

Metric usage: command line.

Create test candidate/reference files:

To obtain system-level metric scores, run:

Here, "ref.en" is the (untokenized) human reference, and "cand.en" is the (untokenized) system output. This command will print some logging information to STDERR, including a model/version identifier, and print the system-level score (negative, higher is better) to STDOUT:

Prism identifier: {'version': '0.1', 'model': 'm39v1', 'seg_scores': 'avg_log_prob', 'sys_scores': 'avg_log_prob', 'log_base': 2} -1.0184667

Candidates can also be piped into prism.py:

To score output using the source instead of the reference (i.e., quality estimation as a metric), use the --src flag. Note that --lang still specifies the target/reference language:

Prism also has access to all WMT test sets via the sacreBLEU API. These can be specified as arguments to --src and --ref , for a hypothetical system output $cand, as follows:

which will cause it to use the English reference from the WMT19 German--English test set. (Since the language is known, no --lang is needed).

To see all options, including segment-level scoring, run:

Metric Usage: Python Module

All functionality is also available in Python, for example:

Which should produce:

Prism identifier: {'version': '0.1', 'model': 'm39v1', 'seg_scores': 'avg_log_prob', 'sys_scores': 'avg_log_prob', 'log_base': 2} System-level metric: -1.0184666 Segment-level metric: [-1.4878583 -0.5490748] System-level QE-as-metric: -1.8306842 Segment-level QE-as-metric: [-2.462842 -1.1985264]

Multilingual Translation

The Prism model is simply a multilingual NMT model, and can be used for translation -- see the multilingual translation README .

Paraphrase Generation

Attempting to generate paraphrases from the Prism model via naive beam search (e.g. "translate" from French to French) results in trivial copies most of the time. However, we provide a simple algorithm to discourage copying and enable paraphrase generation in many languages -- see the paraphrase generation README .

Supported Languages

Albanian (sq), Arabic (ar), Bengali (bn), Bulgarian (bg), Catalan; Valencian (ca), Chinese (zh), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Esperanto (eo), Estonian (et), Finnish (fi), French (fr), German (de), Greek, Modern (el), Hebrew (modern) (he), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Kazakh (kk), Latvian (lv), Lithuanian (lt), Macedonian (mk), Norwegian (no), Polish (pl), Portuguese (pt), Romanian, Moldavan (ro), Russian (ru), Serbian (sr), Slovak (sk), Slovene (sl), Spanish; Castilian (es), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi)

Data Filtering

The data filtering scripts used to train the Prism model can be found here .

Publications

If you the Prism metric and/or the provided multilingual NMT model, please cite our EMNLP paper :

If you the paraphrase generation algorithm, please also cite our WMT paper :

Contributors 2

@thompsonb

  • Python 100.0%

P ara Z h-22 M : A Large-Scale C hinese Parabank via Machine Translation

Wenjie Hao , Hongfei Xu , Deyi Xiong , Hongying Zan , Lingling Mu

Export citation

  • Preformatted

Markdown (Informal)

[ParaZh-22M: A Large-Scale Chinese Parabank via Machine Translation](https://aclanthology.org/2022.coling-1.341) (Hao et al., COLING 2022)

  • ParaZh-22M: A Large-Scale Chinese Parabank via Machine Translation (Hao et al., COLING 2022)
  • Wenjie Hao, Hongfei Xu, Deyi Xiong, Hongying Zan, and Lingling Mu. 2022. ParaZh-22M: A Large-Scale Chinese Parabank via Machine Translation . In Proceedings of the 29th International Conference on Computational Linguistics , pages 3885–3897, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.

IMAGES

  1. Machine Translation: What is it and How Does it Work?

    machine translation paraphrasing

  2. Machine translation

    machine translation paraphrasing

  3. Machine Translation

    machine translation paraphrasing

  4. What are Machine Learning Technologies for Paraphrasing

    machine translation paraphrasing

  5. SPIDER: a System for Paraphrasing

    machine translation paraphrasing

  6. Working Of Machine Learning In AI Paraphrasing Tools

    machine translation paraphrasing

VIDEO

  1. MACHINE TRANSLATION NLP PROJECT 2

  2. Phrase Strings

  3. Attention Mechanism: Overview

  4. Free Paraphrase tool

  5. Machine Translation engines

  6. Revolutionizing Translations with Machine Learning by Prof. Eng. Carlos Alberto Garay

COMMENTS

  1. Explore the AI-powered paraphrasing tool by DeepL

    Key features of our AI paraphrasing tool. Incorporated into translator: Translate your text into English or German, and click "Improve translation" to explore alternate versions of your translation. No more copy/paste between tools. Easy-to-see changes: When you insert the text to be rewritten, activate "Show changes" to see suggested edits.

  2. Paraphrasing Tool

    The QuillBot's Paraphraser is fast, free, and easy to use, making it the best paraphrasing tool on the market. You can compare results from 8 predefined modes and use the remarkable Custom mode to define and create an unlimited number of Custom modes. The built-in thesaurus helps you customize your paraphrases, and the rephrase option means you ...

  3. Free Paraphrasing Tool

    1. Put your text into the paraphraser. 2. Select your method of paraphrasing. 3. Select the quantity of synonyms you want. 4. Edit your text where needed.

  4. Free AI Paraphrasing Tool

    Ahrefs' Paraphrasing Tool uses a language model that learns patterns, grammar, and vocabulary from large amounts of text data - then uses that knowledge to generate human-like text based on a given prompt or input. The generated text combines both the model's learned information and its understanding of the input.

  5. Paraphrase Generation as Unsupervised Machine Translation

    The proposed method offers merits over machine-translation-based paraphrase generation methods, as it avoids reliance on bilingual sentence pairs. It also allows human intervene with the model so that more diverse paraphrases can be generated using different filtering criteria. Extensive experiments on existing paraphrase dataset for both the ...

  6. Paraphrase Generation as Zero-Shot Multilingual Translation

    Abstract Recent work has shown that a multilingual neural machine translation (NMT) model can be used to judge how well a sentence paraphrases another sentence in the same language (Thompson and Post, 2020); however, attempting to generate paraphrases from such a model using standard beam search produces trivial copies or near copies.

  7. Paraphrasing Revisited with Neural Machine Translation

    Cite (ACL): Jonathan Mallinson, Rico Sennrich, and Mirella Lapata. 2017. Paraphrasing Revisited with Neural Machine Translation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 881-893, Valencia, Spain. Association for Computational Linguistics.

  8. Monolingual Machine Translation for Paraphrase Generation

    We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. ... Human evaluation shows that this system outperforms baseline paraphrase generation techniques and, in a departure from previous work, offers better coverage and scalability than the current best-of-breed paraphrasing ...

  9. Paraphrase Generation as Unsupervised Machine Translation

    View PDF Abstract: In this paper, we propose a new paradigm for paraphrase generation by treating the task as unsupervised machine translation (UMT) based on the assumption that there must be pairs of sentences expressing the same meaning in a large-scale unlabeled monolingual corpus. The proposed paradigm first splits a large unlabeled corpus into multiple clusters, and trains multiple UMT ...

  10. Improving paraphrase generation using supervised neural-based

    Naturally existing paraphrase corpora are hard to come by, in contrast to machine translation, where naturally found parallel data in the form of translated articles, books, and presentations are widely available online . The majority of methods for paraphrasing, such as bilingual pivoting, are translation-based.

  11. Sentence-Level Paraphrasing for Machine Translation System ...

    In this paper, we propose to enhance machine translation system combination (MTSC) with a sentence-level paraphrasing model trained by a neural network. This work extends the number of candidates in MTSC by paraphrasing the whole original MT translation sentences. First we train a neural paraphrasing model of Encoder-Decoder, and leverage the ...

  12. PDF Paraphrase Generation as Unsupervised Machine Translation

    Inspired by unsupervised machine translation (UMT) models, which align semantic spaces of two languages us-ing monolingual data, we propose a pipeline system to generate paraphrases following two stages: (1) splitting a large-scale monolingual corpus into mul-tiple clusters/sub-datasets, on which UMT models are trained based on pairs of these ...

  13. Translating IdiomsusingParaphrasing, Machine Translation and Rescoring

    The paraphrasing and rescoring improve the translation produced by neural machine translation from 12.03% to 12.92%. Idioms are rich multi-word expressions that can be found in many works of literature. The meaning of most idioms cannot be deduced literally. This makes translating idioms challenging. Moreover, the parallel text that contains idioms is limited. As a result, machine translation ...

  14. ParaBank

    ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation. Abstract: We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of ParaNMT, we train a Czech-English neural machine translation (NMT ...

  15. (DOC) Make it simple with paraphrases: automated paraphrasing for

    This book presents a novel scientific approach to improve machine translation by paraphrasing support verb constructions with semantically equivalent verbs (e.g. make a presentation of/present). The author demonstrates that this strategy produces a positive impact in machine translation. The study is reproducible and extendable to distinct ...

  16. 2020 Duolingo Shared Task: Translation + Paraphrase

    In this way, we expect the task to be of interest to diverse researchers in machine translation, MT evaluation, multilingual paraphrase, and language education technology fields. Novel and interesting research opportunities in this task: Datasets in 5 language pairs. The outcomes of this shared task will be: Take note of the task timeline below.

  17. Automatic Machine Translation Evaluation in Many Languages via Zero

    We frame the task of machine translation evaluation as one of scoring machine translation output with a sequence-to-sequence paraphraser, conditioned on a human reference. We propose training the paraphraser as a multilingual NMT system, treating paraphrasing as a zero-shot translation task (e.g., Czech to Czech). This results in the paraphraser's output mode being centered around a copy of ...

  18. Prism: MT Evaluation in Many Languages via Zero-Shot Paraphrasing

    Prism is an automatic MT metric which uses a sequence-to-sequence paraphraser to score MT system outputs conditioned on their respective human references. Prism uses a multilingual NMT model as a zero-shot paraphraser, which negates the need for synthetic paraphrase data and results in a single model which works in many languages.

  19. PDF Paraphrasing Revisited with Neural Machine Translation

    of paraphrase pairs in 23 different languages. In this paper we revisit the bilingual pivoting approach from the perspective of neural machine translation, a new approach to machine transla-tion based purely on neural networks (Kalchbren-ner and Blunsom, 2013; Bahdanau et al., 2014; Sutskever et al., 2014; Luong et al., 2015). At

  20. Translating Idioms using Paraphrasing, Machine Translation and Rescoring

    rescoring with machine translation is proposed. The paraphrasing and rescoring improve the translation produced by neural machine translation from 12.03% to 12.92%. Keywords: idiom, machine translation, paraphrasing, rescoring 1. Introduction - s often end up with a bad translation even with the state-of-the-art of machine translation (MT) system.

  21. PDF Natural Language Processing with Deep Learning CS224N/Ling284

    5. Machine Translation 42 Machine Translation (MT) is the task of translating a sentence x from one language (the source language) to a sentence y in another language (the target language). x: L'hommeestné libre, et partoutil estdans les fers y: Man is born free, but everywhere he is in chains -Rousseau

  22. Automatic Machine Translation Evaluation in Many Languages via Zero

    %0 Conference Proceedings %T Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing %A Thompson, Brian %A Post, Matt %Y Webber, Bonnie %Y Cohn, Trevor %Y He, Yulan %Y Liu, Yang %S Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) %D 2020 %8 November %I Association for Computational Linguistics %C Online %F thompson ...

  23. 4 in 10 translators are losing work to AI. They want remuneration

    April 18, 2024 - 11:54 am. AI threatens to decimate the translation profession, according to a new survey by a British union. Almost four in ten translators (36%) said they've already lost work ...

  24. ParaZh-22M: A Large-Scale Chinese Parabank via Machine Translation

    In our data augmentation experiments, we show that paraphrasing based on ParaZh-22M can bring about consistent and significant improvements over several strong baselines on a wide range of Chinese NLP tasks, including a number of Chinese natural language understanding benchmarks (CLUE) and low-resource machine translation. Anthology ID: