Contents

· Home
· Objectives
· Deliverables
· Consortium
· Demos
· Publications
· Links
· Leaflet & Posters

Past Conferences

MUSA at the Languages and the Media Conference
MUSA participated in the "E-Tools and Translation" session of the Languages and the Media Conference, November 3-5 2004, Berlin. MUSA related presentations can be downloaded from here.

MUSA at IST Event 2004
MUSA participated in the IST Event 2004, November 15-17 2004, The Hague.

Deliverables

WP2-D2: User requirements for content production and visual presentation
WP2-D2a: Data Collection
WP3-D3: Descriptions of the software component modules and technical specifications for their integration
WP4-D4.1: Continuous speech recognition module
WP4-D4.2: Revised ASR prototype
WP4-D4.3: Assessment and Evaluation of the Speech Recognition Component-v1.0
WP4-D4.3-2: Assessment and Evaluation of the Speech Recognition Component-v2.0
WP5-D5.1: Initial prototype of the Subtitling core engine
WP5-D5.2-3: Subtitling Subsystem Prototype with built-in the Subtitling core engine
WP5-D5.4: Revised prototype of the Subtitling Subsystem
WP5-D5.5 v 1.0: Assessment and Evaluation of the Subtitling Component-v1.0
WP5-D5.5 v 2.0: Assessment and Evaluation of the Subtitling Component-v2.0
WP6-D6.1: Initial prototype of the machine translation engine, the translation memory shell and the text processing tool set
WP6-D6.2: Revised prototype of the Translation Subsystem
WP6-D6.3 v 1.0: Assessment and Evaluation of the Translation Component
WP6-D6.3 v 2.0: Assessment and Evaluation of the Translation Component
WP7-D.7.1-2: MUSA Functional Application Prototype & Graphical User Interface
WP7-D.7.3-4: MUSA Upgraded Final System Software and Documentation
WP8-D8.1: Assessment and Evaluation Plan
WP8-D8.2 versions 1.0 and 2.0: Assessment and Evaluation of the Integrated MUSA Prototype
WP9-D9.2 versions 1.0 and 2.0: Market Analysis and Tech Watch
WP9-D9.3: Dissemination and Use Plan v1.0
WP9-D9.4: Expolitation Plan v1.0

WP2-D2: User requirements for content production and visual presentation

WP2-D.2 reports on the activities of the MUSA consortium concerning a) the current status of subtitling methods and tools, b) the subtitling standards and requirements posed by the subtitling and media industry, as well as c) the requirements, in terms of training, development and evaluation data, for building a platform for multilingual subtitling of multimedia content.

WP2-D2a: Data Collection

WP2-D2a reports on the activities of the MUSA consortium concerning the project’s requirements, in terms of training, development and evaluation data, for building a platform for multilingual subtitling of multimedia content. The primary data have been made available by BBC and LCC. They consist of BBC ‘newsy’ programmes, dealing with current affairs, subtitled in Greek by LCC. For each programme, the following types of data have been made available: video, script or transcript, English subtitles, Greek subtitles, newspaper textual material related to the programme’s topic. To cater for the unavailability of French subtitles, portion of the BBC data have been selected, translated by the SYS translation engine and then edited by LCC specialists. This material will constitute the training, development and evaluation data required for the implementation and customisation of the MUSA subsystems: the automatic speech recognition, the subtitle generation and the machine translation subsystems.

WP3-D3: Descriptions of the software component modules and technical specifications for their integration

WP3-D.3 reports on the activities of the MUSA consortium concerning the development of the first demonstration prototype. In particular, it reports on:

The software components for Speech Recognition, Subtitling, Translation and their current status within the project
The technical specifications for each component, its submodules and the expected performance
The detailed system architecture
The XML system infrastructure, the I/O procedure that connects the components and the samples of the XML language

WP4-D4.1: Continuous speech recognition module

WP4-D4.1 reports on the speech recognition subsystem of the MUSA project. The speech recognition is used in the front end to transcribe the speech signal that is present in a television programme. The current report describes the existing continuous speech recognition system with large vocabulary that was developed over the past years by K.U.Leuven/ESAT, and the required adaptations to make it work properly in the MUSA context. Adaptations described in this report include the adoption of the right vocabulary, the development of specific acoustic and language models, the prediction of punctuation, and speaker turn detection.

WP4-D4.2: Revised ASR prototype

WP4-D4.2 reports on activities concerning the further refinement and adaptation of the existing K.U.Leuven/ESAT speech recognition system to make it work properly in the MUSA context. This revised prototype contains new context dependent acoustic models and speaker adaptation in the form of Vocal Tract Length Normalization.

WP4-D4.3: Assessment and Evaluation of the Speech Recognition Component-v1.0

This version of WP4-D.4.3 v1.0 reports on the activities of the MUSA consortium concerning the evaluation of the performance of the automatic speech recognition subsystem and the automatic alignment subsystem (which is also based on the recognition system).

WP4-D4.3-2: Assessment and Evaluation of the Speech Recognition Component-v2.0

The present deliverable, D4.3 ASR Subsystem evaluation v2.0, reports on the activities of the MUSA consortium concerning the evaluation of the performance of the automatic speech recognition subsystem and the automatic alignment subsystem (which is also based on the recognition system).

WP5-D5.1: Initial prototype of the Subtitling core engine

WP5-D5.1 reports on the activities of the MUSA consortium concerning the design and implementation of the first version of the Subtitling Component. It provides a description of a) the constraint formulation and calculation module, b) the segment compression module and c) the subtitle production module by means of automatically editing the compressed segment. Automatically generated subtitles will be subsequently fed into the MUSA machine translation subsystem to render them in Greek and French. The initial prototype operates on English source segments of BBC transcripts and produces corresponding subtitles. Transcripts used as input in this version of the subtitling component are error-free as they are the product of validated transcription provided by BBC. This report accompanies the web-based demo available at http://cnts.uia.ac.be/cgi-bin/musa.

WP5-D5.2-3: Subtitling Subsystem Prototype with built-in the Subtitling core engine

WP-D5.2-3 reports on the activities of the MUSA consortium concerning the design and implementation of the Subtitling Component, with emphasis on the developments and extensions of the initial prototype described in Deliverable D5.1. It provides a description of a) the constraint formulation and calculation module, b) the segment compression module and c) the subtitle production module by means of automatically editing the compressed segment. Automatically generated subtitles are subsequently fed into the MUSA machine translation subsystem to render them in Greek and French. Translated subtitles are linguistically annotated and edited according to specifications by the subtitling editor. The initial prototype operated on English source segments of BBC transcripts and produced corresponding subtitles. Transcripts used as input in that version of the subtitling component were error-free as they were the product of validated transcription provided by BBC. The current version of the integrated subtitling component operates on transcripts generated by the MUSA speech recognition component operating in both modes, script/transcript and audio based. This report is accompanied by a web-based demo available at http://cnts.uia.ac.be/cgi-bin/musa.

WP6-D6.1: Initial prototype of the machine translation engine, the translation memory shell and the text processing tool set.

WP6-D.6.1 reports on the activities of the MUSA consortium concerning the design and implementation of the integrated Translation Component, which incorporates two technologically different translation approaches, Translation Memory (ILSP Tr-AID Translation Platform) and Machine Translation (SYSTRAN). The integrated system is implemented as a .NET Web Service and integrates TM Engine as a COM object. TM Engine is used as a shell for HTTP calls to the API of the MT Engine. TM translates all full matches (identical subtitles that exist in its parallel textual database) and MT is employed for all other unmatched segments. The report also describes the specification of the communication framework (xml files interchange) between these two components, the overall system and the other MUSA modules as well. Moreover, the tools and methods that were utilized to compile new language resources are also briefly described. The MUSA parallel subtitle corpus (enfr, enel) was aligned using the Tr-AID Align Tool and the aligned data formed the Translation Memory database. ILSP’s Term Extractors and Bilingual Concordancing Tools and Systran’s expert linguists were employed to automatically produce bilingual lexica (more than 5000 entries in each language pair) exploiting the aligned corpus. These resources enriched the MT Term Dictionaries, Noun phrases and Do Not Translate entries, advancing the overall system performance.

In addition, we also present the high tuning effectuated to the enfr, and enel MT engines according to the project settings and the content idiosyncrasies in the two language pairs and the adapted tools for the MT customization process.

WP6-D6.2: Revised prototype of the Translation Subsystem

WP6-D6.2 reports on the revised prototype of the machine translation engine, the translation memory shell and the text processing tool set. It also describes the activities of the MUSA consortium concerning the implementation of the integrated Translation Component, incorporating Translation Memory and Machine Translation modules. The integrated system is implemented as a .NET Web Service.

WP6-D6.3 v 1.0: Assessment and Evaluation of the Translation Component

WP6-D 6.3 v 1.0 describes the evaluation procedure followed for the Evaluation of the translation component. Evaluation tasks have been performed by human judges (BBC, LCC). For each evaluation task, the parameters are specified.

WP6-D6.3 v 2.0: Assessment and Evaluation of the Translation Component

WP6-D 6.3 v 2.0 describes the evaluation procedure followed for the Evaluation of the machine translation component during the second evaluation round.

WP7-D.7.1-2: MUSA Functional Application Prototype & Graphical User Interface

D.7.1-2 provides a detailed and thorough description of the latest developments concerning the first functional MUSA prototype. In particular, it reports on:

The implemented system architecture of the initial prototype
The provided functionality (server APIs) of the MUSA major subsystems: Speech Recognition, Subtitling and the Translation subsystem, as well as the MUSA Functional Prototype API responsible for calling each subsystem in a processing pipeline
The Graphical User Interface that allows subtitlers to reliably inspect and edit the automatically produced subtitles.

WP7-D.7.3-4: MUSA Upgraded Final System Software and Documentation

The present deliverable D.7.3-4 MUSA Upgraded Final System Software and Documentation, provides a detailed and thorough description of the latest developments concerning the final MUSA prototype. In particular, it reports on:

Revisions and improvements in the MUSA functional prototype
The implemented system architecture of the initial prototype
The provided functionality (server APIs) of the MUSA major subsystems: Speech Recognition, Subtitling and the Translation subsystem, as well as the MUSA Functional Prototype API responsible for calling each subsystem in a processing pipeline
The Graphical User Interface that allows subtitlers to reliably inspect and edit the automatically produced subtitles.

WP8-D8.1: Assessment and Evaluation Plan

WP8-D8.1 describes the evaluation scenarios according to which the MUSA results will be evaluated. Evaluation will be performed at two levels: component and system. Evaluation tasks will be performed either automatically or by using human judges.

The MUSA speech recognition component will be evaluated by calculating WER and accuracy of the audio-transcript alignment. The subtitling component will be evaluated in a twofold manner. First, human judges will evaluate the grammaticality and semantic acceptability of the English subtitles produced. Then, inspired by the BLUE method, the automatically produced subtitles will be automatically compared with a set of gold subtitles based on n-gram counting. The translation component will be evaluated by involving human judges, scoring each translated unit according to a set of scores. Apart from the evaluation of components with high quality input data, the same component will be evaluated with data resulting from the processing performed by the previous component in the processing chain.

System evaluation will be performed by involving professional subtitlers who will be asked to correct the automatically generated and translated subtitles and record the time required to elevate system results to accepted industrial level output. The automatically generated subtitles and the corrected versions of them will be automatically compared and the edit distance will be calculated.

WP8-D8.2 versions 1.0 and 2.0: Assessment and Evaluation of the Integrated MUSA Prototype

WP8-D8.2 describes the results of the evaluation according to the scenarios presented in D8.1. System evaluation was performed by involving professional subtitlers who were asked to correct the automatically generated and translated subtitles and record the time required to elevate system results to accepted industrial level output. Required time was then compared to time reference data, held by LCC.

WP9-D9.2 versions 1.0 and 2.0: Market Analysis and Tech Watch.

WP9-D9.2 focuses on subtitling market analysis and assessment and reports on available tools in the field. The aim is to provide a likely scenario for the near future of the subtitling market as regards the technologies to be used and the situation of the potential market targeted by the project.

WP9-D9.3: Dissemination and Use Plan v. 1.0

WP9-D9.3 reports on the activities of the MUSA Consortium concerning a) the dissemination plan and b) the use plan for the project results. More specifically, it addresses the following issues:

dissemination methodology of the Consortium,
dissemination activities and material (past and planned for the future),
identification of the system features,
preliminary market analysis and assessment,
technology watch depicting the current status in the subtitling field,
description and use plans of the consortium partners.

The goal set upon by the Consortium for WP9 involves a two-way dynamic and interactive process:

to promote the project results at the scientific/academic and technical level, as well as at the business level,
to gather information from the subtitling field relevant to the project, as regards the needs and requirements of potential end-users, on the one hand, and the technological state-of-the-art (common practices and advances), on the other.

This view of the dissemination actions serves the final objective of the MUSA project, that of delivering exploitable results which could offer a realistic solution to the problems of the field.

To this end, the present deliverable includes a market analysis and assessment report as well as a report of available tools in the field. The aim is to forecast a likely scenario for the near future of the subtitling market as regards the technologies to be used and the situation of the potential market targeted by the project.

Based on the current market analysis, it is estimated that there is new market of 180MM USD waiting to be explored. Taking into consideration the unavailability of adequate numbers of qualified professionals, it is forecast that the need for MUSA-like applications is imperative, aiming at increasing productivity and reducing the cost of the subtitling procedure. The relative merits of MUSA-based applications in comparison to existing tools are evident in so far as none of the existing tools provides automation solutions for the full subtitle production chain.

Last, the present report takes a look at the system architecture and system components since, to ensure best exploitation, the use plans defined by the Consortium take into consideration not only the system as a whole, but also its individual modules as distinct exploitable project results.

WP9-D9.4: Exploitation Plan v. 1.0

D9.4 reports on the activities of the MUSA partners, their specific technical development roles and achievements, as well as their purported use and exploitation plans. The partners plan to use, verify and validate the MUSA concept and architecture in their internal production processes and launch spin-off activities to market the integrated project results either on a product or a service basis. Technology and know-how licencing schemes are possible and a commercial exploitation agreement has been planned.