Massively Multilingual NLU 2022

A Workshop Colocated with EMNLP 2022 in Abu Dhabi and Online Dec 7, 2022

Let’s scale natural language understanding technology to every language on Earth!

By 2023 there will be over 8 billion virtual assistants worldwide, the majority of which will be on smartphones. Additionally, over 100 million smart speakers have been sold, most of which exclusively use a voice interface and require Natural Language Understanding (NLU) during every user interaction in order to function. However, even as we approach the point in which there will be more virtual assistants than people in the world, major virtual assistants still only support a small fraction of the world’s languages. This limitation is driven by the lack of labeled data, the expense associated with human-based quality assurance, model maintenance and update costs, and more. Innovation is how we will jump these hurdles. The vision of this workshop is to help propel natural language understanding technology into the 50-language, 100-language, and even the 1,000-language regime, both for production systems and for research endeavors.

News

26 Oct: We are pleased to declare Maxime De Bruyn, Ehsan Lotfi, Jeska Buhmann, and Walter Daelemans of the bolleke team as the winners of the Organizers’ Choice Award! Please come to our workshop to hear more about their model and their associated paper, Machine Translation for Multilingual Intent Detection and Slots Filling.
12 Aug: We welcome submissions until Sep 2nd for the MMNLU-22 Organizers’ Choice Award, as well as direct paper submissions until Sep 7th. The Organizers’ Choice Award is based primarily on our assessment of the promise of an approach, not only on the evaluation scores. To be eligible, please (a) make a submission on eval.ai to either MMNLU-22 task and (b) send a brief (<1 page) writeup of your approach to mmnlu-22@amazon.com describing the following:
- Your architecture,
- Any changes to training data, use of non-public data, or use of public data,
- How dev data was used and what hyperparameter tuning was performed,
- Model input and output formats,
- What tools and libraries you used, and
- Any additional training techniques you used, such as knowledge distillation.
12 Aug: We are pleased to declare the HIT-SCIR team as the winner of the MMNLU-22 Competition Full Dataset Task. Congratulations to Bo Zheng, Zhuoyang Li, Fuxuan Wei, Qiguang Chen, Libo Qin, and Wanxiang Che from the Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology. The team has been invited to speak at the MMNLU-22 workshop on Dec 7th, where you can learn more about their approach.
12 Aug: We are pleased to declare the FabT5 team as the winner of the MMNLU-22 Competition Zero-Shot Task. Congratulations to Massimo Nicosia and Francesco Piccinno from Google. They have been invited to speak at the MMNLU-22 workshop on Dec 7th, where you can learn more about their approach.
30 Jul: Based on compelling feedback, we have updated our rules as follows: Contestants for the top-scoring model awards must submit their predictions on the evaluation set by the original deadline of Aug 8th. Contestants for the “organizers’ choice award” can submit their predictions until Sep 2nd. The organizers’ choice award will be based primarily on the promise of the approach, but we will also consider evaluation scores.
29 Jul: (Outdated – see above) We have extended the deadline for MMNLU-22 evaluation to Sep 2nd. Additionally, besides the winners of the “full dataset” and “zero-shot” categories, we plan to select one team (“organizers’ choice award”) to present their findings at the workshop. This choice will be made based on the promise of the approach, not just on model evaluation scores.
25 Jul: The unlabeled evaluation data for our shared task is now live. See instructions in the alexa/massive repo.
7 Jul: A Slack workspace is now available.
30 Jun: Paper submissions are now being accepted.
20 Apr: The MASSIVE dataset and the associated paper were released publicly. Anyone can now start modeling on the data in preparation for the release of the MMNLU-22 evaluation set on July 25th.

Important Dates

Note: We accept both (a) direct submissions through OpenReview and (b) ARR commitments

Apr 20th: Release of the MASSIVE dataset (training, validation, test splits) and paper
~~Aug 15th~~ July 15th: ACL Rolling Review (ARR) submission deadline
Jul 25th: Release of the MMNLU-22 Competition evaluation set
Aug 8th: Competition deadline for the top-scoring model awards
Sep 2nd: Competition deadline for the organizers’ choice award and end of MMNLU-22 Competition
Sep 7th: OpenReview submission deadline
Oct 2nd: ARR commitment deadline
~~Oct 9th~~ (TBA): Acceptance notifications
~~Oct 16th~~ Oct 26th: Camera ready deadline
Dec 7th: Massively Multilingual NLU 2022 Workshop