NIST 2006 Open Machine Translation (OpenMT) Evaluation
|Item Name:||NIST 2006 Open Machine Translation (OpenMT) Evaluation|
|Author(s):||NIST Multimodal Information Group|
|LDC Catalog No.:||LDC2010T17|
|Release Date:||October 15, 2010|
|Data Source(s):||newswire, broadcast news, broadcast conversation, web collection|
|Language(s):||Mandarin Chinese, Arabic, Chinese|
|Language ID(s):||cmn, ara, zho|
LDC User Agreement for Non-Members
|Online Documentation:||LDC2010T17 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||NIST Multimodal Information Group. NIST 2006 Open Machine Translation (OpenMT) Evaluation LDC2010T17. Web Download. Philadelphia: Linguistic Data Consortium, 2010.|
NIST 2006 Open Machine Translation (OpenMT) Evaluation, Linguistic Data Consortium (LDC) catalog number LDC2010T17 and isbn 1-58563-562-6, is a package containing source data, reference translations and scoring software used in the NIST 2006 OpenMT evaluation. It is designed to help evaluate the effectiveness of machine translation systems. The package was compiled and scoring software was developed by researchers at NIST, making use of broadcast, newswire and web newsgroup source data and reference translationns collected and developed by LDC.
The objective of the NIST Open Machine Translation (OpenMT) evaluation series is to support research in, and help advance the state of the art of, machine translation (MT) technologies -- technologies that translate text between human languages. Input may include all forms of text. The goal is for the output to be an adequate and fluent translation of the original.
The MT evaluation series started in 2001 as part of the DARPA TIDES (Translingual Information Dectection, Extraction) program. Beginning with the 2006 evaluation, the evaluations have been driven and coordinated by NIST as NIST OpenMT. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities in MT. The OpenMT evaluations are intended to be of interest to all researchers working on the general problem of automatic translation between human languages. To this end, they are designed to be simple, to focus on core technology issues and to be fully supported. The 2006 task was to evaluate translation from Arabic to English and from Chinese to English.
Additional information about these evaluatoins may be found at the NIST Open Machine Translation (OpenMT) Evaluation web site.
This evaluation kit includes a single Perl script (mteval-v11b.pl) that may be used to produce a translation quality score for one (or more) MT systems. The script works by comparing the system output translation with a set of (expert) reference translations of the same source text. Comparison is based on finding sequences of words in the reference translations that match word sequences in the system output translation. More information on the evaluation algorithm may be obtained from the paper detailing the algorithm: BLEU: a Method for Automatic Evaluation of Machine Translation (Papineni et al, 2002).
The included scoring script was released with the original evaluation, intended for use with SGML-formatted data files, and is provided to ensure compatibility of user scoring results with results from the original evaluation. An updated scoring software package (mteval-v13a-20091001.tar.gz), with XML support, additional options and bug fixes, documentation, and example translations, may be downloaded from the NIST Multimodal Information Group Tools website.
This release contains of 357 documents with corresponding sets of four separate human expert reference translations The source data is comprised of Arabic and Chinese newswire documents, human transcriptions of broadcast news and broadcast conversation programs and web newsgroup documents collected by LDC in 2006. The newswire and broadcast material are from Agence France-Presse (Arabic, Chinese), Xinhua News Agency (Arabic, Chinese), Lebanese Broadcasting Corp. (Arabic), Dubai TV (Arabic), China Central TV (Chinese) and New Tang Dynasty Television (Chinese). The web text was collected from Google andYahoo newsgroups.
For each language, the test set consists of two files: a source and a reference file. Each file contains four independent translations of the data set. The evaluation year, source language, test set (which, by default, is evalset), version of the data, and source vs. reference file (with the latter being indicated by -ref) are reflected in the file name. A reference file contains four independent reference translations unless noted otherwise in the accompanying README.txt
DARPA TIDES and NIST OpenMT evaluations used SGML-formatted test data until 2008 and XML-formatted test data thereafter. This files in this package are provided in both formats.
No updates have been issued as of this time.