100 million words translated with EU Council Presidency Translator!
Interview with Prof. Dr. Josef van Genabith, Head of the Multilingual Language Technologies Research Unit at DFKI in Saarbrücken, about Machine Translation and the EU Council Presidency Translator, which has been used in the context of the German EU Council Presidency since July 2020.
– Prof. van Genabith, you are Scientific Director at DFKI and have been heading the Multilingual Language Technologies (MLT) Research Unit in Saarbrücken since 2014. What was your scientific career like before you moved to Saarbrücken?
The success of the EU Council Presidency Translator is a great occasion for our MLT team and our partners at DeepL, Tilde and eTranslation! I am very proud of the teams and the work they have done in close cooperation with the German Foreign Office! I myself have been interested in language and technology for a very long time, studied electrical engineering and English language and literature at RWTH Aachen University and was very lucky afterwards: thanks to a scholarship from the British Council and later the Foreign & Commonwealth Office, I was able to first do an MA at the University of Essex and then do my doctorate under Louisa Sadler. At the beginning of the 1990s I was a postdoc with Hans Kamp at the Institute for Natural Language Processing (IMS) in Stuttgart. A great time! Afterwards I spent 17 years in Ireland at the School of Computing, Dublin City University, where I went through the whole range of Lecturer, Senior Lecturer and Associate Professor. In Dublin I had a lot of freedom and great colleagues at DCU, the other universities in Dublin and the many Irish based high tech companies (IBM, Microsoft, Symantec) and we were able to take advantage of this freedom: I rebuilt the National Center for Language Technology (NCLT) and was the founding director of the CNGL (Center for Next Generation Localisation, now ADAPT and run by Vinny Wade). Through this work, and especially the CNGL, we have become more and more involved in international projects, e.g. of the EU, in which the previous director of our lab in Saarbrücken, Hans Uszkoreit, was very active in the second half of the 2000-2010 years. Through Hans Uszkoreit, who in the meantime had built up the sister lab in Berlin (now SLT, managed by Sebastian Möller), I came to Saarbrücken and DFKI in 2014 after 17 years in Ireland.
– In addition to your work at DFKI, you also have a chair at Saarland University. How do the academic and application-oriented work complement each other?
The most important thing in our work are our employees: they make our work a success! My university and DFKI staff work together in a varied mix of teams. In our weekly meetings it makes no difference whether someone is at DFKI or at the university. We are part of the SFB1102 (Information Density and Linguistic Encoding) at the university, we have a DFG project at the university on multimodal post-editing, where we are working very successfully with Prof. Antonio Krüger’s DFKI team; I am head of the European Master’s program in Language and Communication Technology (LCT, Erasmus+), which is excellently managed by one of my management staff members at the MLT-Lab (DFKI) via a part-time university position. All of my DFKI management staff from the four MLT groups Machine Translation, Question Answering and Information Extraction, Talking Robots and Data and Resources are giving seminars and educating PhD, MSc and BSc students. Many MLT employees are also active at the university. Of course, formally and financially everything is clearly separated by projects. But the connection to the university is very strong. The „Language Science and Technology“ department at Saarland University is one of the best in Europe. We at the MLT Lab at DFKI are very strong in research: in 2020, for example, we have published more than 10 papers at the most important international conferences in our field (ACL, ICML, EMNLP, COLING, IJCAI) in the area of language technology, AI and machine learning. This is a great success and shows the quality of the teams. On the other hand, DFKI’s application-oriented research is an attraction for students of science, both inside and outside the university: where else is one’s own work used in such a way that 100 million words are translated within 4.5 months (to date), as in the EU Council Presidency Translator, which is publicly visible to everyone? That is really great!
– The EU Council Presidency Translator has further promoted the visibility of machine translation services in Germany. It is a joint effort by several players, but you have led this project. When did you start work? How did you put together the consortium? And how many scientists were involved?
The EU Council Presidency Translator is a very European solution that shows that Europe together is more than competitive at the highest international level in the field of language technology and AI: it is based on a combination of outstanding high-tech and AI expertise in Germany (DeepL, DFKI), Latvia (Tilde) and the public sector (EC, eTranslation). A partnership between industry (DeepL, Tilde), the public sector (EC, eTranslation) and a research institute (DFKI). DFKI is the project leader, funding is provided by the German Federal Foreign Office, which is the lead agency in the German EU Council Presidency. The competencies of the consortium members complement each other ideally: Tilde has developed the basic framework of the Presidency Translator over many years with European funding, into which the translation engines of many providers are integrated, and is developing and contributing its own translation engines. DeepL offers translation machines of outstanding quality for 8 languages. eTranslation (by the EC) provides a basic machine translation service for all 24 official EU languages. In close cooperation with the translation staffs of the ministries, DFKI has developed machine translation systems for German, French and Spanish that are specially adapted to the data and needs of the ministries. Tilde does this for English, Italian and Polish. At DFKI, Stephan Busemann is responsible for the administrative side of the Presidency Translator. I manage the scientific and technical aspects. Cristina España Bonet, the head of the MT Team in the MLT-Lab and her colleague Jingyi Zhang develop the systems. They are supported by two students, Damyana Gateva and Anastasija Amman, from the university’s MSc program „Language Science and Technology“. DFKI also manages the outreach and media work of the Presidency Translator. This is supervised by Eileen Schnur and her colleague Marlies Thönnissen in the MLT team and is actively supported by DFKI’s Corporate Communications department.
– They use artificial neural networks for translation. Can you please sketch how your translation machine works?
In recent years, neural models have enabled quantum leaps in the quality of many language technologies and other AI applications. Our systems use deep neural networks based on transformer models. These models use different types of attention and are highly parallelizable in many parts.
– Artificial neural networks are trained – tested – with very large amounts of language data. Where does this training and test data come from and only as an estimate, how many words are included?
For many language pairs, our training data contains dozens of millions of sentence pairs, each pair containing a source sentence in one language and its translation into another language. From this, the machines learn to translate themselves. This data is based on translations already made by humans. The machine therefore learns from humans. The data comes from data collections of the EU, ELRC (the European Language Resource Coordination, which we also manage at the MLT at DFKI) and other sources. In addition, we work very closely with the translation teams of the ministries using data from the ministries in order to create specialised machines, which are especially geared to the needs of the ministries. These are constantly evaluated by the ministries‘ translators so that they can be continuously improved during the course of the project.
– The Presidency Translator has been used intensively by users over the last 150 days. Over 100 million words have been translated. What were the most popular language pairs? And were there perhaps also sentences that occurred particularly frequently?
Unlike other offers, the Presidency Translator is secure and safe, all servers are located in the EU, transmissions are encrypted, and after a completed translation all data is immediately deleted. Therefore, we only have high-level information about the use. The figures show that the one-click translation of the German-language Presidency website is very well received: it accounts for approx. 47% of the 100 million words translated so far. Preferred target languages for machine translation on the Presidency website are Spanish, Italian and Portuguese (French and English versions were produced manually). The slightly larger half results from text (22%), document (30%) and website (2%) translations on the Translator page, and here the translation between German and English is most in demand.
– What do the translators say about the new quality of machine translation? Do translators see machines as competitors or as tools that support their work? And how is the job profile of translators changing?
In the „EU Council Presidency Translator“ project, we work very closely with our colleagues in the ministries‘ translation departments: they manage data collection and provision within the ministries in order to adapt the special machines to the ministries‘ needs. In addition, they test and evaluate the special machines and contribute centrally to the improvement of the systems with their results. In the translation workflow, the machines are then an aid: with good translation quality, the machine can help to increase the productivity of a human translator. The translator’s job description is changing towards quality control, quality assurance through the post-editing (correction) of automatically produced translations and the certification of translations and their quality. Modern translator training takes these changes into account: the translation course „Translation Science and Technology“ at Saarland University has a high technology component, in which prospective translators are familiarised with language technologies developed by their fellow students in the Computational Linguistics (Language Science and Technology) and Computer Science courses.
– The German EU Council Presidency ends on 31.12.2020. How will the Presidency Translator be used afterwards? And independently of that, what are your further plans?
The Presidency Translator has been extremely well received and has surpassed all previous records set by previous Presidency Translators. I am very proud of what the MLT team at DFKI has achieved together with the colleagues at DeepL, Tilde and eTranslation! There is great interest in using the Presidency Translator on other presidencies. Talks on this are currently ongoing. There is also great interest on the part of industry in German and European language technology: language technology and AI „made in Europe“. Machine translation is only one of the competences in our MLT-Lab: others are those of the Question-Answering and Information Extraction Group (especially in the biomedical field), the Talking Robots Group (which focuses on dialogue systems and rescue robotics) and the Data and Resources Group (which has been leading large EU projects like ELRC for many years). In addition, there is our sister lab SLT (Speech and Language Technology) in Berlin. The two labs (MLT in Saarbrücken and SLT in Berlin) work closely together and complement each other in their expertise.
Prof. Dr. Josef van Genabith
Head of Research Department Multilinguality and Language Technology
Deutsches Forschungszentrum für Künstliche Intelligenz, DFKI, Saarbrücken