Language is our lifeline to the rest of the world. However, because high-quality translation technologies for hundreds of languages do not exist, billions of people today are unable to access digital content or fully participate in online dialogues and communities in their preferred or native languages. This is especially problematic for the hundreds of millions of people who speak the many African and Asian languages.
Our AI experts developed No Language Left Behind (NLLB), an attempt to build high-quality machine translation capabilities for most of the world’s languages, to help people interact better now and be a part of the metaverse of tomorrow.
Today, we’re announcing an important NLLB breakthrough: we’ve created a single AI model dubbed NLLB-200 that can translate 200 different languages with far greater accuracy than previous technology.
When comparing translation quality to earlier AI research, NLLB-200 scored 44 percent higher on average. The NLLB-200 translations were more than 70% more accurate for various African and Indian languages.
We created FLORES-200, a dataset that allows researchers to test and enhance NLLB-200’s performance in 40,000 different language directions. FLORES-200 enables us to assess the performance of the NLLB-200 in each language to ensure that the translations are of excellent quality.
In order to assist other academics in improving their translation tools and building on our work, we are making available to developers the NLLB-200 models and the FLORES-200 dataset, as well as our model training code and code for re-creating the training dataset.
We are also providing grants of up to $200,000 to researchers and nonprofit groups with initiatives focused on sustainability, food security, gender-based violence, education, or other areas in support of the UN Sustainable Development Goals. Nonprofits and scholars working in linguistics, machine translation, and language technology who want to use the NLLB-200 to translate two or more African languages are encouraged to apply.
These improvements in research will assist more than 25 billion translations offered daily in Feed on Facebook, Instagram, and our other technologies. You can look at an NLLB-200 demo and learn more about how we created this model.