U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Balk EM, Chung M, Chen ML, et al. Assessing the Accuracy of Google Translate to Allow Data Extraction From Trials Published in Non-English Languages [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Jan.

Cover of Assessing the Accuracy of Google Translate to Allow Data Extraction From Trials Published in Non-English Languages

Assessing the Accuracy of Google Translate to Allow Data Extraction From Trials Published in Non-English Languages [Internet].

Show details

Introduction

Systematic reviews conducted by the Agency for Healthcare Research and Quality (AHRQ) Evidence-based Practice Centers (EPCs) most commonly restrict literature searches to English language publications. In a sample of 10 recent Evidence Reports (numbers 189-198), 8 were restricted to English-language publications. One report included studies in languages for which the EPC had “available fluency” and only one reported not restricting by language. Among 28 other recent Comparative Effectiveness Reviews (CERs) with final or draft documents downloadable from the AHRQ Web site, 20 were restricted to English-language publications. Four explicitly did not impose any language restriction. Two did not report language restriction in their methods chapter and included one study each in Dutch and German. One placed no language restriction on comparative studies but included only English-language cohort studies. One included German- and French-language studies for nonoperative interventions (which were sparse), but only English-language publications for operative treatments “due to lack of translation resources.” Three of the CERs wrote that the language restriction was due to lack of resources or prohibitive translation costs, despite the recognition in one CER “that requiring studies to be published in English could lead to bias.”

Thus, in most instances, EPC reports may be at risk of selection bias based on language (if there is reason to suspect differential publication of studies in English language and non-English journals)1 and may not be following Standard 3.2.6 from the recent Institute of Medicine's (IOM) “Finding What Works in Health Care: Standards for Systematic Reviews,”2 “Search for studies reported in languages other than English if appropriate.” The IOM report notes that there is some known evidence of language bias (e.g., investigators in Germany may be more likely to publish their negative results in German language publications and their positive results in English language publications).1,3 However, numerous other studies have found that excluding non-English publications may not result in substantial bias (changes in estimates of treatment effects).4-10 Nevertheless, excluding studies solely based on language runs counter to the concept of systematic review, of including all known evidence, particularly as investigators are being encouraged to include non-peer-reviewed and other studies in the grey literature.

Using a literature search module for randomized controlled trials,11 a search in Medline from 1996 to May 25, 2012, found that of 2,982,047 citations, 92 percent were published in English. Table 1 shows the number and frequency of publications in other languages with more than 0.5 percent penetration.

Table 1. Percentage of studies from Medline search for randomized controlled trials in various languages.

Table 1

Percentage of studies from Medline search for randomized controlled trials in various languages.

EPCs have varying capacities to extract non–English-language articles, based on the language knowledge of their staff. Formally translating all non–English-language articles is costly and resource-intensive, particularly if performed at the stage of full-text article screening. Therefore, a reliable, free, easily available service to translate articles may allow EPCs to easily broaden the scope of their systematic reviews, without introducing possible language bias by restrictions based on language. Google Translate® is a free, Web-based program with an excellent reputation for accurate, natural translation (http://translate.google.com). It is one of several such tools, including Yahoo!® Babel Fish (www.babelfish.com/), SDL FreeTranslation® (www.freetranslation.com), and Bing® Translator (www.bing.com/translator). In an analysis of four translation tools for a limited set of language pairs, Google Translate was found to perform best based on human judgment of translation accuracy.12 A subsequent study comparing 2,550 language pairs (51 languages) in Google Translate using an automated technique to compare translations found a range of translation accuracy and that “translations between European languages are usually good, while those involving Asian languages are often relatively poor. Further, the vast majority of language combinations probably provide sufficient accuracy for reading comprehension in college.”13 Also of note, a pilot study presented as a poster at the 2009 Singapore Cochrane Collaboration meeting used Google Translate on 11 German articles from one Cochrane review and found that interrater agreement was 73 percent (κ=0.38) for whether the article should be included in the review.14

Tufts EPC recently conducted a pilot study evaluating Google Translate for data extraction from 88 articles published in 9 languages (Chinese, French, German, Hebrew, Italian, Japanese, Korean, Portuguese, and Spanish).15 Briefly, the results of the study concluded that the length of time required to translate articles ranged from seconds (51 articles, 58 percent) to about 1 hour. Assessment by those who extracted the 88 translated articles indicated that “a little” extra time was required for 40 articles (45 percent) and “a lot” for 42 (48 percent). When evaluating all extraction items together, Portuguese and German articles had the best agreement between original and translated extractions, with high agreement between extractors among about 60 percent of the items, compared with 80 percent in English articles. Spanish, Hebrew, and Chinese had the lowest agreement (30, 24, and 8 percent, respectively). The absolute agreement and the proportion of items with high agreement were statistically significantly worse for all languages, compared with English. Eight of 10 English-language articles had high agreement for all items; compared with 7 of 10 Portuguese articles; 6 of 10 German articles; 4 of 10 French, Italian, and Korean; 3 of 8 Hebrew articles; 3 of 10 Japanese and Spanish articles; but no Chinese articles. However, the pilot study had several important limitations, including that only single extractions were performed of the native language articles and confirmation could not be conducted; the analyses did not allow for full differentiation between disagreements in extractions due to poor translation or due to different extractors interpreting articles in different ways or errors in extraction.

Aims

The current study was designed to form a collaboration of EPCs to better analyze the accuracy of the freely available, online, translation tool—Google Translate—for the purposes of data extraction of articles in selected non-English languages. The collaboration allowed for double data extraction and a better consensus determination of the important extraction items to assess; we also implemented an improved analytic technique.

The research had the following aims:

  1. Compare data extraction of trials done on original-language articles by native speakers with data extraction done on articles translated to English by Google Translate.
  2. Track and enumerate the time and resources used for article translation and the extra time and resources required for data extraction related to use of translated articles.

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (756K)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...