The goal of our project is to develop linguistic resources (lexicons, corpora, annotation guidelines) and software (parsers, MWE identifiers and linkers). They are currently under development and will be published here when they are ready.

MWE-annotated corpus

The first release of our MWE-annotated corpus corresponds to the French dataset of the PARSEME Shared Task on identification of verbal multiword expressions (edition 1.0). You can freely download it here from the ORTOLANG platform.

The full dataset of the Shared Task contains 18 languages and can be downloaded from LINDAT/CLARIN.

