User Tools

Site Tools

Agence Nationale de la Recherche

job-2016-ligm-alpage-phd

ATILF offers a PhD position in computational linguistics.

Integrating Multiword Expressions at the heart of statistical syntactic and semantic analysis

  • Application accepted until position is fulfilled
  • Field: natural language processing/computational linguistics
  • Location: ATILF, University of Lorraine (Nancy, France)
  • Supervisor: Matthieu Constant
  • Co-supervisor: Marie Candito (Univ. Paris Diderot and INRIA)
  • Duration: 3 years, October 2016 to September 2019
  • Remuneration: around 1700 €/month
  • Funding: CNRS funding, ANR PARSEME-FR project
  • Keywords: multiword expressions (MWEs), syntactic parsing, semantic parsing, deep learning

Context

The proposed PhD thesis falls into the field of natural language processing at the crossroads of computer science and linguistics. In particular, it will focus on processing of multiword expressions, namely sequences of several words with a certain degree of idiomaticity. These expressions are very frequent and diverse. For instance, hot dog, piece of cake, cut the mustard, take a step, by and large. Identifying multiword expressions in context constitutes an essential step for syntactic parsing, semantic analysis and more generally for natural language processing applications such as machine translation. This PhD proposal holds in the framework of the ANR-funded PARSEME-FR project that aims at widely integrating such expressions into syntactic and semantic parsers.


Profile

  • Master in computer science or computational linguistics
  • Good knowledge of French and English, another language would be a plus
  • Interests in linguistics and familiarity with language technology
  • Capacity to work independently and as part of a team

Application

Candidates should send the following documents in PDF format, in French or in English, to Mathieu Constant (FirstName.LastName@u-pem.fr) and Marie Candito (FirstName.LastName@linguist.univ-paris-diderot.fr)

  • CV
  • Cover letter
  • Transcript of MSc and BSc grades (translated if not in French or English)
  • Reference letters would be a plus

Hosting Institutions

Main affiliation

Secondary affiliation

Scientific description

This PhD thesis aims at revisiting statistical syntactic and semantic analysis in the light of multiword expressions. More precisely, it falls within the framework of linear-time dependency parsing.

Taking multiword expressions into account is a challenge for automatic text analysis, mainly due to their non-compositionality, i.e. to the partial or total irregularity in the way their elements combine at the lexical, morpho-syntactic and/or semantic levels. Furthermore, there exists a continuum between entirely fixed expressions (piece of cake) and almost free expressions (traffic light). A wide majority of these expressions are actually partially compositional (white wine, take a nap) thus requiring a non-atomic representation. The first work will consist in designing a new lexical, syntactic and semantic representation that would enable a satisfying handling of such expressions. Given this new representation, the next step will consist in developing new parsing algorithms integrating MWEs. Priority will be given to a system that jointly performs both MWE identification and syntactic parsing, in such a way both tasks can mutually inform each other. Multiword expressions generally representing semantic units, a natural extension of this joint system is to develop a system that automatically constructs a shallow semantic graph for an input sentence.

The developed parsers should combine two features: speed and accuracy. To reach high accuracy, joint prediction can enable the system to benefit from richer linguistic information at analysis time. Further, the use of deep learning techniques and large-scale MWE resources can be investigated. Yet this sophistication comes at the cost of increased complexity and ambiguity. A possible solution is to add constraints reducing search space. Finally, we wish the proposed solutions to have (quasi-)linear speed complexity, in order to reasonably consider parsing big textual data.

This thesis will be in collaboration with Joakim Nivre (Univ. Uppsala, Sweden), in the framework of the European COST Action PARSEME.


Bibliography

job-2016-ligm-alpage-phd.txt · Last modified: 2016/06/14 21:41 by matthieu.constant