Annotation guidelines
PARSEME shared task on automatic identification of verbal MWEs - edition 1.0 (2017)

Lexicalized components and open slots

Just like a regular verb, the head verb of a VMWE may have a varying number of compulsory arguments, that is, arguments that must be present in each occurrence of this VMWE. For instance, the direct object and the prepositional complement are compulsory in the VMWE to take someone by surprise.

Some components of such compulsory arguments may be lexicalized, that is, always realized by the same lexemes. Here, by surprise is lexicalized while someone is not.This definition of a lexicalized component naturally extends to any syntactic type of MWE. Namely, the head of a (nominal, adjectival, prepositional etc.) MWE is lexicalized (always realized by the same lexeme) together with at least one component of at least one of its modifiers. The head verb of a VMWE is always considered lexicalized. When it can be replaced by another verb, like in to make/take a decision, we consider that these are two different VMWEs, although possibly synonymous.

Conversely, a component of a compulsory argument which can be realized by a free lexeme taken from a relatively large semantic class is called an open slot. In the following VMWE examples (cited after Gross 1994), all having the same syntactic structure NP V NP Prep NP, the lexicalized arguments are highlighted in bold:

  • Max took the bull by the horns.
  • The news took John by surprise.
  • Bob took part in the inquiry
  • Money burns a hole in Bob’s pocket.

Special cases

Prepositions have a special status with respect to the notion of lexicalization. In the first, second and fourth example above, the prepositions by and in are lexicalized since they introduce lexicalized complements (the horns, surprise and pocket). However, in the third case the preposition in introduces an open slot whose meaning compositionally combines with the meaning of the VMWE took part. We say in this case that the preposition is selected by the VMWE but it is not lexicalized and should not be annotated. Prepositions selected by the governing verb, noun, adjective or adverb are fixed in the sense that they cannot vary freely. However, this kind of fixedness belongs to the phenomenon of valency and is considered a regular property of the grammatical system, thus outside of our annotation scope.

Reflexive clitics in inherently reflexive verbs also have a special lexicalization status. In some languages, the same reflexive clitic is used regardless of the person and number, inflecting for case only:

  • смея се laugh se.REFL to laugh
    намирам се find se.REFL to be (somewhere)
  • ??
  • n.a.
  • n.a.
  • znajduję się find.1.SG.PRES self I find myself
    znajdujesz się find.2.SG.PRES self you find yourself
    znajdują się find.3.PL.PRES self they find themselves
  • n.a.
  • n.a.
  • smejim se laugh.1.SG self I laugh
    smejiš se laugh.2.SG self You laugh
    smejijo se laugh.3.PL self they laugh

In other languages, reflexive clitics agree in person and number with the subject and the verb:

  • No examples found for Bulgarian.
  • sie wundert sich she wonders self.3.SG she wonders
    ihr wundert euch you.PL wonder.2.PL self.2.PL you wonder
  • n.a.
  • yo me quejo I self.1.SG complain I complain
    tu te quejas you self.2.SG complain You complain
  • je me trouve I self.1.SG find I find myslef
    tu te trouves you self.2.SG find you find yourself
  • io mi meraviglio I self.1.SG wonder I wonder
    tu ti meravigli you self.2.SG woder you wonder
  • eu me queixo I self.1.SG complain I complain
    tu te queixas you self.2.SG complain You complain
  • eu mă gândesc I Refl.Cl.1sg.Acc. think I am thinking
    tu te gândeștiyou Refl.Cl.2sg.Acc. thinkyou are thinking

It this case, the clitic is realized by different lexemes, depending on the number and gender. Strictly speaking, it is not lexicalized. However, we admit that, regardless of the language, the reflexive clitic is a unique lexeme (with lemma się, se, sich, etc.) inflecting for person and number. It is thus lexicalized in inherently reflexive verbs.