1. Workshop Introduction

jeudi 3 avril 2014

A large panel of research areas historically arise from artificial intelligence and deal with language, interaction, vision and multimedia. They have in common to make an extensive use of large corpora and automatic processing within a statistical modeling paradigm. Humanities and in particular applied linguistics and language sciences share similar objects of study. As the former, they tend to increase their research activities including large corpora and relying in particular on automatic instruments. We can take as an example the communities working on speech, with growing links between researchers in automatic speech and language processing and linguists, e.g. dealing with spontaneous speech (disfluencies, phonetic reduction, etc), corpus linguistics (where scientific facts are derived from large corpora), or innovative instrumental linguistics (where scientific facts are produced by automatic systems on the same corpus).

Much progress has been made in improving methods and models of automatic processing, through objective measures over system errors. However, beyond a certain level of performance, the marginal cost incurred to address residual errors increases exponentially. Watching these errors not only via a global measure but as an object of study per se, may allow us to overcome this drawback.

For the duration of an interdisciplinary workshop, we propose to gather these different communities around the problem of error analysis in 3M data (multimedia, multimodal, multilingual) processing. Errors may arise from the following automatic processes covered in this workshop : automatic speech transcription, multimedia person recognition and translation as well as from human annotation and inconsistency errors.

Errors may be investigated along different dimensions dealing with localization and segmentation, annotation (guidelines and production), measurements (absolute or relative to an application), analysis (relative to intrinsic properties or relative to human performance) and diagnosis. The purpose of this workshop is to share an interdisciplinary expertise on a heterogeneous phenomenon referred to as "errors". Researchers are invited to share their thoughts and observations through case studies run in the context of various initiatives.

Sharing our experience with errors is expected to produce beneficial insights for the different communities :

- for automatic speech and language processing, residual errors indicate regions which escape current modelisation capacities. In-depth analyses in collaboration with specialists in human sciences may contribute to better understand these phenomena and to propose innovative strategies.
- for humanities, accurate automated systems can be used as exploration instruments in corpus linguistics, and more importantly, automated systems can be viewed as tentative models of human perception. Errors, revealing weaknesses of the implemented models, question these as models of perception. Furthermore, the accuracy of the instruments is essential to discover undescribed phenomena via the investigation of error regions.

The workshop will last 2 days and is organized by the IMMI-CNRS and co-organized by various initiatives such as the PEPS CNRS Humain Errare, the ELRA association, the Quaero project, the French ANR projects Vera and Qcompere, a consortium of the REPERE challenge, the European CHISTERA Camomile project, The DIGITEO Chair DISLOG, and the French CGI/ANR Labex EFL project.

