Syntactic Parsing and Multiword Expressions in French

Integrating Multiword Expressions at the heart of statistical syntactic and semantic analysis

Application accepted until position is fulfilled
Field: natural language processing/computational linguistics
Location: ATILF, University of Lorraine (Nancy, France)
Supervisor: Matthieu Constant
Co-supervisor: Marie Candito (Univ. Paris Diderot and INRIA)
Duration: 3 years, October 2016 to September 2019
Remuneration: around 1700 €/month
Funding: CNRS funding, ANR PARSEME-FR project
Keywords: multiword expressions (MWEs), syntactic parsing, semantic parsing, deep learning

Context

The proposed PhD thesis falls into the field of natural language processing at the crossroads of computer science and linguistics. In particular, it will focus on processing of multiword expressions, namely sequences of several words with a certain degree of idiomaticity. These expressions are very frequent and diverse. For instance, hot dog, piece of cake, cut the mustard, take a step, by and large. Identifying multiword expressions in context constitutes an essential step for syntactic parsing, semantic analysis and more generally for natural language processing applications such as machine translation. This PhD proposal holds in the framework of the ANR-funded PARSEME-FR project that aims at widely integrating such expressions into syntactic and semantic parsers.

Profile

Master in computer science or computational linguistics
Good knowledge of French and English, another language would be a plus
Interests in linguistics and familiarity with language technology
Capacity to work independently and as part of a team

Application

Candidates should send the following documents in PDF format, in French or in English, to Mathieu Constant (FirstName.LastName@u-pem.fr) and Marie Candito (FirstName.LastName@linguist.univ-paris-diderot.fr)

CV
Cover letter
Transcript of MSc and BSc grades (translated if not in French or English)
Reference letters would be a plus

Hosting Institutions

Main affiliation

Laboratory: ATILF
University: University of Lorraine

Secondary affiliation

Laboratory: ALPAGE -INRIA
Institutions: Université Paris Diderot and INRIA Paris

Scientific description

This PhD thesis aims at revisiting statistical syntactic and semantic analysis in the light of multiword expressions. More precisely, it falls within the framework of linear-time dependency parsing.

Taking multiword expressions into account is a challenge for automatic text analysis, mainly due to their non-compositionality, i.e. to the partial or total irregularity in the way their elements combine at the lexical, morpho-syntactic and/or semantic levels. Furthermore, there exists a continuum between entirely fixed expressions (piece of cake) and almost free expressions (traffic light). A wide majority of these expressions are actually partially compositional (white wine, take a nap) thus requiring a non-atomic representation. The first work will consist in designing a new lexical, syntactic and semantic representation that would enable a satisfying handling of such expressions. Given this new representation, the next step will consist in developing new parsing algorithms integrating MWEs. Priority will be given to a system that jointly performs both MWE identification and syntactic parsing, in such a way both tasks can mutually inform each other. Multiword expressions generally representing semantic units, a natural extension of this joint system is to develop a system that automatically constructs a shallow semantic graph for an input sentence.

The developed parsers should combine two features: speed and accuracy. To reach high accuracy, joint prediction can enable the system to benefit from richer linguistic information at analysis time. Further, the use of deep learning techniques and large-scale MWE resources can be investigated. Yet this sophistication comes at the cost of increased complexity and ambiguity. A possible solution is to add constraints reducing search space. Finally, we wish the proposed solutions to have (quasi-)linear speed complexity, in order to reasonably consider parsing big textual data.

This thesis will be in collaboration with Joakim Nivre (Univ. Uppsala, Sweden), in the framework of the European COST Action PARSEME.

Syntactic Parsing and Multiword Expressions in French

Sidebar

Table of Contents

Integrating Multiword Expressions at the heart of statistical syntactic and semantic analysis

Context

Profile

Application

Hosting Institutions

Main affiliation

Secondary affiliation

Scientific description

Bibliography

Syntactic Parsing and Multiword Expressions in French

User Tools

Site Tools

Sidebar

Table of Contents

Integrating Multiword Expressions at the heart of statistical syntactic and semantic analysis

Context

Profile

Application

Hosting Institutions

Main affiliation

Secondary affiliation

Scientific description

Bibliography

Page Tools