Model paves way for faster, more efficient translations of more languages

New system may open up the world’s roughly 7,000 spoken languages to computer-based translation.

Rob Matheson | MIT News Office

October 30, 2018

Press Inquiries

Press Contact:

Abby Abazorius

Email: abbya@mit.edu

Phone: 617-253-2709

MIT News Office

Media Download

The new model measures distances between words with similar meanings in “word embeddings,” and then aligns the words in both embeddings that are most closely correlated by relative distances, meaning they’re most likely to be direct translations of one another.

↓ Download Image

Caption The new model measures distances between words with similar meanings in “word embeddings,” and then aligns the words in both embeddings that are most closely correlated by relative distances, meaning they’re most likely to be direct translations of one another.

Credits Courtesy of the researchers

*Terms of Use:

Images for download on the MIT News office website are made available to non-commercial entities, press and the general public under a Creative Commons Attribution Non-Commercial No Derivatives license. You may not alter the images provided, other than to crop them to size. A credit line must be used when reproducing images; if one is not provided below, credit the images to "MIT."

The new model measures distances between words with similar meanings in “word embeddings,” and then aligns the words in both embeddings that are most closely correlated by relative distances, meaning they’re most likely to be direct translations of one another.

Courtesy of the researchers

MIT researchers have developed a novel “unsupervised” language translation model — meaning it runs without the need for human annotations and guidance — that could lead to faster, more efficient computer-based translations of far more languages.

Translation systems from Google, Facebook, and Amazon require training models to look for patterns in millions of documents — such as legal and political documents, or news articles — that have been translated into various languages by humans. Given new words in one language, they can then find the matching words and phrases in the other language.

But this translational data is time consuming and difficult to gather, and simply may not exist for many of the 7,000 languages spoken worldwide. Recently, researchers have been developing “monolingual” models that make translations between texts in two languages, but without direct translational information between the two.

In a paper being presented this week at the Conference on Empirical Methods in Natural Language Processing, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) describe a model that runs faster and more efficiently than these monolingual models.

The model leverages a metric in statistics, called Gromov-Wasserstein distance, that essentially measures distances between points in one computational space and matches them to similarly distanced points in another space. They apply that technique to “word embeddings” of two languages, which are words represented as vectors — basically, arrays of numbers — with words of similar meanings clustered closer together. In doing so, the model quickly aligns the words, or vectors, in both embeddings that are most closely correlated by relative distances, meaning they’re likely to be direct translations.

In experiments, the researchers’ model performed as accurately as state-of-the-art monolingual models — and sometimes more accurately — but much more quickly and using only a fraction of the computation power.

“The model sees the words in the two languages as sets of vectors, and maps [those vectors] from one set to the other by essentially preserving relationships,” says the paper’s co-author Tommi Jaakkola, a CSAIL researcher and the Thomas Siebel Professor in the Department of Electrical Engineering and Computer Science and the Institute for Data, Systems, and Society. “The approach could help translate low-resource languages or dialects, so long as they come with enough monolingual content.”

The model represents a step toward one of the major goals of machine translation, which is fully unsupervised word alignment, says first author David Alvarez-Melis, a CSAIL PhD student: “If you don’t have any data that matches two languages … you can map two languages and, using these distance measurements, align them.”

Relationships matter most

Aligning word embeddings for unsupervised machine translation isn’t a new concept. Recent work trains neural networks to match vectors directly in word embeddings, or matrices, from two languages together. But these methods require a lot of tweaking during training to get the alignments exactly right, which is inefficient and time consuming.

Measuring and matching vectors based on relational distances, on the other hand, is a far more efficient method that doesn’t require much fine-tuning. No matter where word vectors fall in a given matrix, the relationship between the words, meaning their distances, will remain the same. For instance, the vector for “father” may fall in completely different areas in two matrices. But vectors for “father” and “mother” will most likely always be close together.

“Those distances are invariant,” Alvarez-Melis says. “By looking at distance, and not the absolute positions of vectors, then you can skip the alignment and go directly to matching the correspondences between vectors.”

That’s where Gromov-Wasserstein comes in handy. The technique has been used in computer science for, say, helping align image pixels in graphic design. But the metric seemed “tailor made” for word alignment, Alvarez-Melis says: “If there are points, or words, that are close together in one space, Gromov-Wasserstein is automatically going to try to find the corresponding cluster of points in the other space.”

For training and testing, the researchers used a dataset of publicly available word embeddings, called FASTTEXT, with 110 language pairs. In these embeddings, and others, words that appear more and more frequently in similar contexts have closely matching vectors. “Mother” and “father” will usually be close together but both farther away from, say, “house.”

Providing a “soft translation”

The model notes vectors that are closely related yet different from the others, and assigns a probability that similarly distanced vectors in the other embedding will correspond. It’s kind of like a “soft translation,” Alvarez-Melis says, “because instead of just returning a single word translation, it tells you ‘this vector, or word, has a strong correspondence with this word, or words, in the other language.’”

An example would be in the months of the year, which appear closely together in many languages. The model will see a cluster of 12 vectors that are clustered in one embedding and a remarkably similar cluster in the other embedding. “The model doesn’t know these are months,” Alvarez-Melis says. “It just knows there is a cluster of 12 points that aligns with a cluster of 12 points in the other language, but they’re different to the rest of the words, so they probably go together well. By finding these correspondences for each word, it then aligns the whole space simultaneously.”

The researchers hope the work serves as a “feasibility check,” Jaakkola says, to apply Gromov-Wasserstein method to machine-translation systems to run faster, more efficiently, and gain access to many more languages.

Additionally, a possible perk of the model is that it automatically produces a value that can be interpreted as quantifying, on a numerical scale, the similarity between languages. This may be useful for linguistics studies, the researchers say. The model calculates how distant all vectors are from one another in two embeddings, which depends on sentence structure and other factors. If vectors are all really close, they’ll score closer to 0, and the farther apart they are, the higher the score. Similar Romance languages such as French and Italian, for instance, score close to 1, while classic Chinese scores between 6 and 9 with other major languages.

“This gives you a nice, simple number for how similar languages are … and can be used to draw insights about the relationships between languages,” Alvarez-Melis says.

Press Mentions

Xinhuanet

MIT researchers have developed a language translation model that operates without human annotations and guidance, reports Liangyu for Xinhua news agency. The system, which may enable computer-based translations of the thousands of languages spoken worldwide, is “a step toward one of the major goals of machine translation, which is fully unsupervised word alignment,” Liangyu explains.

Full story via Xinhuanet →

MIT News | Massachusetts Institute of Technology

Browse By

Topics

Departments

Centers, Labs, & Programs

Schools

Model paves way for faster, more efficient translations of more languages

Press Contact:

Media Download

*Terms of Use:

Press Mentions

Xinhuanet

Related Topics

Related Articles

More MIT News

MIT engineers help multirobot systems stay in the safety zone

From bench to bedside, and beyond

MIT spinout Gradiant reduces companies’ water use and waste by billions of gallons each day

Rare and mysterious cosmic explosion: Gamma-ray burst or jetted tidal disruption event?

Evelina Fedorenko receives Troland Award from National Academy of Sciences

3 Questions: Modeling adversarial intelligence to exploit AI’s security vulnerabilities

Browse By

Topics

Departments

Centers, Labs, & Programs

Schools

Breadcrumb

Model paves way for faster, more efficient translations of more languages

Press Contact:

Media Download

*Terms of Use:

Share this news article on:

Xinhuanet

Related Links

Related Topics

Related Articles

More MIT News