MUSA IST Project
The changes to the audiovisual linguistic landscape are very rapid, linked simultaneously to the blossoming of mass communications, to the evolution of technologies and the deregulation of the markets. New technological developments in mass media and communication, such as digital TV and DVD, are bound to overcome the limited physical borders of the European countries, leading to the creation of a pan-European media audience. In such a unified framework of European mass communication, subtitling (as a means of overcoming linguistic barriers between the nations) is playing a critical role. In many European countries, subtitling is the most commonly used method for conveying the content of foreign language dialogue to the audience; and a broadcaster's audience may now include several major linguistic groups (notably in the case of a satellite-broadcaster). Subtitling is also provided increasingly by broadcasters to meet the needs of the significant numbers of deaf and hard-of-hearing viewers. However, video subtitling is far from trivial and is considered to be one of the most expensive and time-consuming tasks an interested company needs to perform, since it is mainly carried out manually by experts. Typically, a 1-hour program needs around 7-15 hours of effort by humans.
Moreover, in view of the expansion of digital television and the increasing capacity to manipulate audio-visual content, assisting tools that produce a first cut of subtitles in a multilingual setting become indispensable for the subtitling industry.
The main objective of the MUSA project is the development of a system which combines together advanced text analysis, speech recognition, machine translation and other techniques to help in the preparation of subtitles: a system that converts audio streams into text transcriptions, produces draft translations in at least two language pairs and finally rephrases the content to meet the specific spatio-temporal requirements of the subtitling process. Three European languages will be supported by MUSA, namely English, French and Greek. However, special care will be given during the system design phase to ensure that the system architecture remains open for new languages to be easily included.
MUSA will combine technologies falling into three categories, corresponding to the research fields from which they have sprung: Automatic Speech Recognition (ASR), Machine Translation (MT) and Natural Language Processing (NLP).
The architecture of the MUSA multimedia production line will include the following functional blocks:
A Collection of AudioVisual (AV) material in the domains of broadcast news and documentaries, in three languages, namely English, French and Greek.
- A state-of-the-art ASR subsystem for the transcription of audio streams into text (for English)
- A subtitle condensation subsystem producing subtitles from audio transcriptions guided by the aim to provide maximum comprehension while complying with spatio-temporal constraints and linguistic parameters.
- A multilingual Translation subsystem integrating Machine Translation, Translation Memories and Term Substitution.
To achieve the main objective, the project has set the following measurable technical & scientific objectives:
- To adapt state-of-the-art technology to the transcription of television broadcast news (BN) and documentary films.
- To enhance/improve the speech recognition engine operating in English, including audio/speech discrimination, speaker clustering and adaptation.
- To develop a subtitling subsystem from transcribed texts that will be open and intralingual (within one language). The machine will involve various constraints imposed by the textual and visual context of the domain. One of the most important aspects of the problem is the amount of condensation that will be presupposed, due to the fact that there exist space and time restrictions. The machine will take into account the source transcription to decide what should be transferred to the target text and what should be omitted.
- To integrate and demonstrate multilingual tools for BN computer-aided subtitle translation. The following three translation or quasi-translation technology types will be integrated: machine translation which is adequate for surface understanding purposes, term substitution which is limited to the translation of terms that exist in bilingual lexical or terminological databases and translation memory that can both handle repetition in texts and provide the framework for the creation of multilingual resources.
- To validate core signal processing and language processing technologies in a real-life application that addresses the European AudioVisual audience as an entity. Multimodal and multilingual information access is an emerging and viable market and the proposed technologies aim at serving the needs of the European people and improving their quality of life.
- To combine speech recognition, subtitle condensation and text translation in a unified framework that will produce good-quality intralingual and interlingual subtitles. The interface to the system will assist the operator in rejecting or correcting any errors produced by the system. We do not envisage the creation of a fully automatic high quality system since this is not technically achievable, due to speech recognition and machine translation errors. However, our target is such a system to be sufficient for a user to gist the subject and contents of the video. For some viewers this might be improved by offering the subtitles in conjunction with playback of the original soundtrack. Also, while some users may find assisting technologies useful, post-editing by experts will be indispensable if the output should be of professional quality.
- To experiment on the case of live subtitling, an important project that comes naturally with the ideas of Digital Video Broadcasting (DVB) and Digital TV. Live subtitling is crucial for countries like Greece where the audience is accustomed to reading subtitles. To this end we consider combining speech recognition with machine translation prior to the subtitling process.
- To help people with hearing difficulties to overview the possible technical new solutions. Deaf people suffer from more severe communication problems than the ones faced by people who have lost the sense of hearing at some point in their life. Automatic recognition of speech, and its translation into PC or TV screen messages, could be of valuable help in reducing the gap between the deaf and the hearing world.
- To investigate the effects of automatic subtitling in Second Language Acquisition (SLA) by following the rationale that input from the subtitling bar should yield similar psycholinguistic effects as those in oral face-to-face linguistic exchanges. The multimedia language learning environment may prove ideal for providing language input and accompanying this with a range of support features that can be used to assist comprehension.