Languages : français English
Home > Teams > TALN


Natural Language Processing

Team leader

Béatrice DAILLE
Phone +33(0)2 51 12 58 54

Second leader

Phone +33(0)2 51 12 57 74

Team Presentation

(en mai 2011 de gauche à droite : Annie TARTIER, Yukie NAKAO, Christine JACQUIN, Denis BECHET, Chantal ENGUEHARD, Alexandre DIKOVSKY, Sebastian PENA SALDARRIAGA, Annie LARDENOIS, Laura MONCEAUX-CACHARD, Emmanuel MORIN, Béatrice DAILLE, Jungyeul PARK, Amir HAZEM, Fabien POULARD, Jérôme ROCHETEAU, Prajol SHRESTHA, Matthieu VERNIER, Ramadan ALFARED et Nicolas HERNANDEZ - Absents Emmanuel DESMONTILS et Colin DE LA HIGUERA)

The work of the TALN team lies in the area of automatic language processing. It includes research of a fundamental nature and research applied to natural languages taking account of the specific features and their complexity. The language data that is processed is real data: The minimum unit is a text that can be collected from the web and regrouped into a reasoned collection: test corpora.

The specific feature of the team stems from the variety of language data processed, with expertise in the language production in specialised fields, new forms of written communication (SMS), multiple communicational media (Blogs, social networks), the languages covered (European, African, Russian, Japanese), as well as links with other language methods such as transcribed oral language or transcribed handwritten language.

The team’s research areas are:

  • Analysis and discovery
    • formal models of language syntax and semantics
    • dependency-based syntactic analysis
    • semantics of texts, opinion modelling
    • terminological discovery
    • production of linguistic resources: corpora, lexicons, grammars
  • Alignment and comparison
  1. empirical translation models: compositionality and contextuality
  2. Comparability measurements of texts and corpora
  3. Multimodal similarity

These lines of research cooperate within three major application fields that are the processing of multilingualism, multimodality and information retrieval.

The team’s software creations are carried out in the software platform UIMA software platform and under licence Apache 2. Linguistic resources are under LGPLLR licence.

Last update : Saturday 8 March 2014