TALN Natural Language Processing
Natural Language Processing
(en mai 2011 de gauche à droite : Annie TARTIER, Yukie NAKAO, Christine JACQUIN, Denis BECHET, Chantal ENGUEHARD, Alexandre DIKOVSKY, Sebastian PENA SALDARRIAGA, Annie LARDENOIS, Laura MONCEAUX-CACHARD, Emmanuel MORIN, Béatrice DAILLE, Jungyeul PARK, Amir HAZEM, Fabien POULARD, Jérôme ROCHETEAU, Prajol SHRESTHA, Matthieu VERNIER, Ramadan ALFARED et Nicolas HERNANDEZ - Absents Emmanuel DESMONTILS et Colin DE LA HIGUERA)
The work of the TALN team lies in the area of automatic language processing. It includes research of a fundamental nature and research applied to natural languages taking account of the specific features and their complexity. The language data that is processed is real data: The minimum unit is a text that can be collected from the web and regrouped into a reasoned collection: test corpora.
The specific feature of the team stems from the variety of language data processed, with expertise in the language production in specialised fields, new forms of written communication (SMS), multiple communicational media (Blogs, social networks), the languages covered (European, African, Russian, Japanese), as well as links with other language methods such as transcribed oral language or transcribed handwritten language.
The team’s research areas are:
These lines of research cooperate within three major application fields that are the processing of multilingualism, multimodality and information retrieval.
The team’s software creations are carried out in the software platform UIMA software platform and under licence Apache 2. Linguistic resources are under LGPLLR licence.