User Tools

Site Tools

Agence Nationale de la Recherche


Work Package 2: MWE Lexicon

  • Partners in charge: LI (Agata Savary) and ATILF (Mathieu Constant)
  • Partners involved: LI, LIF, ATILF, LIGM
  • Objectives: Build a unified and enriched MWE lexicons, including morphological, distributional, syntactic and semantic information. Multiword NEs will get special treatment as they will be associated with pragmatic information (i.e. linking with the LOD). The encoded features will be of varying nature - either symbolic or numeric.
  • Final products:
    • FP.2.1: A new lexical resource, distributed under an open license, in a standard format,
    • FP.2.2: A tool to project an MWE lexicon on treebanks
  • Subtasks:
    • WP 2.1: Compilation and analysis of existing lexicons
    • WP 2.2: Construction of a unified framework;
    • WP 2.3: Enrichment of the lexicon
    • WP 2.4: Interlinking of MWEs with the Linked Open Data
    • WP 2.5: Converting the lexicon to a standard export format
    • WP 2.6: Projection on treebanks


Before the actual construction of the unified MWE lexicon, some preliminary studies have been performed:

  • a state-of-the-art of the different formats of MWE lexicons by Agata Savary in the framework of the PARSEME COST Action.
  • experiments for extracting linguistic information from various existing MWE lexicons (training period at LIGM in 2016 by Manolo Iborra, supervised by Mathieu Constant)
  • inventory and documentation of the properties in the lexicon-grammar tables of frozen expressions, as well as selection of lexical entries based on WP1 criteria (training period at LIGM, by Fabrice Beltran, supervised by Eric Laporte).

Preparatory work for next WP2 tasks has also been undertaken:

  • Carlos Ramisch and colleagues developed methods based on word embeddings to perform discovery and semantic processing of MWEs (Cordeiro et al. ACL 2016, Ramisch et al. ACL 2016, Ramisch et al. LREC 2016, Ramisch et al. MWE 2017, Vargas et al. MWE 2017).
  • Waszczuk et Savary (BSNLP 2017) designed an algorithm to project heterogeneous MWE lexicons on a constituent treebank. Cordeiro et al. (SemEval 2016) developed a symbolic method to identify MWEs in a text from a lexicon.

Work in progress

A group of researchers (Mathieu Constant, Agata Savary, Jean-Yves Antoine, Caroline Pasquer, Takuya Nakamura, Carlos Ramisch) is presently working on an internal format of lexicon in order to encode fine-grained properties of MWEs.

In the meantime, Agata Savary and colleagues are exploring the platform XMG in order to have an object-oriented encoding of MWEs.(cf. Lichte et al. to appear).

wp2.txt · Last modified: 2017/09/18 17:50 by matthieu.constant