Diskursorienterad statistisk maskinöversättning

Tidsperiod: 2012-01-01 till 2015-12-31

Projektledare: Jörg Tiedemann

Finansiär: Vetenskapsrådet

Bidragstyp: Projektbidrag

Budget: 4 374 000 SEK

In the recent years, significant progress has been made in natural language processing and machine translation in particular. The launch of Google Translate is one example that demonstrates the growing interest in language technology but also shows its shortcomings. Current models for machine translation are still limited to local phenomena and ignore knowledge from wider context and the connectedness of sentences in a text. Documents are translated sentence by sentence assuming strict independence between them. This leads to unsatisfactory results in terms of text coherence and translation adequacy.In this project, we propose to develop novel models for discourse-oriented statistical machine translation that lead to more natural translations of coherent documents. Using information from across sentence boundaries, such models will be able to adjust to various topics and domains in a similar way to how human translators would. In order to achieve this goal, we will emphasize theoretical research on computational models of discourse-oriented translation. Furthermore, we will develop efficient algorithms and practical implementations that make it possible to apply these novel models in real-world settings. The project ideas will be tested on a wide range of translation tasks for a number of language pairs and textual domains. The performance of the new translation approach will be evaluated systematically. Software and data will be made available to the public.