Agence Nationale de la Recherche


Project outcomes

The goal of our project is to develop linguistic resources (lexicons, corpora, annotation guidelines) and software (parsers, MWE identifiers and linkers). They are currently under development and will be published here when they are ready.


MWE identification software

  • Transition
  • VarIDE
  • Veyn
  • LGTagger (?)
  • mwetoolkit (?)

Other software

  • PARSEME shared task tools (?)
  • Lexicon tools (?)
  • Demonstrator of MWE identifiers and a corpus-lexicon browser

Language resources and datasets

Verbal MWE-annotated corpora of the PARSEME shared tasks

The datasets of the PARSEME shared task contain 18-20 languages, including French, and can be downloaded from:

Full-MWE annotated Sequoia treebank

Coming soon…

Project-internal resources

