Annotation guidelines
PARSEME shared task on automatic identification of verbal MWEs - edition 1.0 (2017)


New edition of the shared task: these are the guidelines for edition 1.0 (2017). For the most up-to-date version, please check the guidelines for edition 1.1 (2018).

Welcome to the official annotation guidelines of the PARSEME shared task on verbal MWE identification!

Here, you'll find detailed definitons, examples and linguistic tests to guide your decision as to whether a given combination in your language is a verbal multiword expression. Use the table of contents on the left to navigate between sections and the header buttons to show/hide examples.

In addition to these general guidelines, language teams may also provide extra documentation, like lists of borderline cases and decisions taken concerning them. They should all be compatible with these general guidelines.

If you spot errors or if something remains unclear after reading the guidelines, please contact us and we'll do our best to correct the problems.

Authors and contributors (alphabetical order)

Marie Candito, Fabienne Cap, Silvio Cordeiro, Vassiliki Foufi, Polona Gantar, Voula Giouli, Carlos Herrero, Mihaela Ionescu, Verginica Mititelu, Johanna Monti, Joakim Nivre, Mihaela Onofrei, Carla Parra Escartín, Manfred Sailer, Carlos Ramisch, Monica-Mihaela Rizea, Agata Savary, Ivelina Stonayova, Sara Stymne, Veronika Vincze.

Table of contents


Section 1

Definitions and scope

In this shared task, we aim at identifying verbal Multiword Expressions (VMWEs) in running texts in about 20 languages from several language families. VMWEs are of particular interest to the PARSEME COST action since they frequently introduce discontinuity and long-distance dependency issues, which are central to deep parsing and to other Natural Language Processing tasks.

This document defines the annotation scope and puts forward a classification of VMWEs together with linguistic tests for their identification and categorization.


Section 1.1

Notation

The notational convention used throughout the document is the following:

  • Italic is used to display example sentences and expressions.
  • Bold is used to highlight the lexicalized components of a candidate VMWE inside an example (positive or negative).
  • Underline is used to focus the reader's attention on the important part of an example
  • An asterisk (*) precedes ungrammatical examples.
  • A hash (#) precedes examples where a standard modification yields unexpected meaning shifts with respect to the original expression.
  • Different colors are used to display examples:
    • Red is used for counter-examples, that is, expressions which look like VMWEs but are not one, whatever the language.
    • According to the language, different colors are used for other examples, that is, positive examples of the phenomenon being discussed:
      • Shades of green are used for positive examples in Germanic languages.
      • Shades of blue are used for positive examples in Romance languages.
      • Shades of orange are used for positive examples in Slavic languages.
      • Shades of pink are used for positive examples in other language families.
  • Examples are preceded by the 2-letter language code in parentheses
  • Examples can be shown and hidden using the toggle buttons in the header.

Section 1.2

Words and tokens

While the definition of an MWE inherently relies on the notion of a word, manual annotation and automatic identification of VMWEs in our task is performed on texts which are automatically tokenized. It is therefore important to understand the distinction between words and tokens in the context of VMWEs.

A word is a linguistically (notably semantically) motivated unit. The detection of words is, thus, language-dependent and annotation experts should have a clear idea of how to define it for their own language (even if this definition proves hard in general).

A token is a technical and pragmatic notion, defined according to more or less linguistically motivated clues and depending on the particular tokenization tool at hand.

Tokens should ideally be as close as possible to words. However, in practice - due to the hardness of the (automatic) tokenization task - the relation between tokens and words is not always 1-to-1. The following cases occur:

  • A token coincides with a word:
    • вземам, решение, наяве, бял, на, се, д-р
    • mít, hlad, se, úžas
    • einen, Spaziergang, machen, Überraschung
    • κάνω, άνω, κάτω, ποδήλατο, καλός
    • take, a, walk, astonishment
    • dar, un, paseo, sorpresa
    • ،من کتاب، دوست
    • faire, une, promenade, étonnement
    • napraviti/činiti, jedan, šetnja, začuđenost
    • tesz, egy, séta, meglepetés
    • fare, una, passeggiata, sorpresa
    • ferħ, libes, sabiħ
    • robić to do, na on, dokładność precision
    • dar, uma, caminhada, supresa
    • face, o, plimbare
    • gå, på, promenad, förvåning
    • iti, na, en, sprehod, začudenost
  • Several tokens build up one word, like in abbreviations, possessive markers, words with "accidental" separators, inflected or derived forms of foreign names, etc. In this case we speak of a multitoken word (MTW): The pipe symbol '|' indicates token separation in these examples
    • т|.|н|. etc.
      год|. year
    • z|.|B|. for instance
      Wie geht|'|s How goes it How are you
    • κ. κύριος Mister
      υπΔρ υποψήφιος διδάκτορας PhD candidate
    • M|. Mister
      pp|. pages
      Pandora|'|s
    • a|. |C|. antes de Cristo before Christ
      p|. |ej|. por ejemplo for instance
      Rte|. remitente sender
    • می|-|روم، آیت|-|الله، کتاب|-|ها
    • aujourd|'|hui today
    • danas today
    • időjárás|-|jelentés weather forecast
    • vice|-|presidente vice-president
    • libs|et she wore
    • Chomsky|'|ego of Chomsky
      SMS|-|ować to write an SMS
    • vice|-|presidente vice-president
    • prim|-|ministru prime minister
      d|-|voastră polite "you"
    • EU|:|s EU's
    • g|. Mister
      str|. pages
      le|-|to
  • One token can contain several words, like in contractions and compounds. In this case we speak of a multiword token (MWT): See also the representation of MWTs in Universal Dependencies The precise word forms cannot always be straightforwardly deduced from the MWT containing them and vice versa, as in don't, della, du, etc.
    • вагон-ресторант train carriage+restaurant train buffet
    • Schulaufgabe = Schule+Aufgabe school+exercisehomework
      Apfelbaum = Apfel+Baum apple treeapple tree
    • στου = σε+του at+the.GEN
      στον = σε+τον at+the.ACC
    • don't = do+not
    • del = de+el from the
      pelirrojo = pelo + rojo hair+red red-haired
    • کتابش=کتاب+ش
    • du = de+le from the
    • della = de+la of the
    • Białymstoku=Białym+stoku white+slope Białystok.INST (a city name)
      robiłem=robi+łem do.3.SG.PRES+be.1.SG.PAST.AGLI did
      żeśmy = że+śmy that+be.1.PL.AGL that-we
    • neles = em+eles on them
    • într-o = într-+o in a
    • arvsmassa = arv+massa genetic stock
    • nanj = na+njega on him

While a VMWE always contains at least two words, the relation between VMWEs and tokens can be twofold:

  • A VMWE contains several tokens, whether each of them coincides with a word or not:
    • вземам решение make a decision (2 words, 2 tokens)
      прочитам от корица до корица to read from cover to cover (5 words, 5 tokens)
    • eine Rede halten (2 words, 2 tokens) a speech hold to give a speech
      wie geht's (2 words, 4 tokens) how goes it how are you
    • δίνω τον λόγο μου (3 words, 3 tokens) give the speech to promise
      παίζω στα δάχτυλα (3 words, possibly 4 tokens) play in-the fingers know very well
    • to take a walk (2 words, 2 tokens)
      to open Pandora's box (3 words, possibly 5 tokens)
    • dar un paseo 2 words, 2 tokens to give a walk to take a walk
      dar por sentado 3 words, 3 tokens to give for seated to take for granted irse de rositas 3 words, 4 tokens to go_self of little_roses to get off scot free
    • دستور داد (2 words, 2 tokens)
    • napraviti šetnju (2 words, 2 tokens)
      otvoriti Pandorinu kutiju(3 words, 3 tokens)
    • sétát tesz to take a walk (2 words, 2 tokens)
    • tenere un discorso (2 words, 2 tokens) hold a speech to give a speech
      cavalcare l'onda (3 words, 4 tokens) ride the wave ride the wave
    • kien idur fuq il-fatt turns on the fact
    • robi z igły widły make.3.SG a pitchfork out of a needle he makes a mountain out of a molehill (4 words, 4 tokens)
      robiłem z igły widły made.3.SG.M1+be.1.SG.AGL a pitchfork out of a needle I made a mountain out of a molehill (4 words, 5 tokens)
    • dar uma caminhada to give a walk (2 words, 2 tokens)
      cair de pára-quedas to fall with parachute to arrive unprepared in the middle of a situation (3 words, possibly 5 tokens) According to new orthography rules, this word would be written 'paraquedas'. Old spelling may still be found in annotated texts, though.
      queixar-se-ia complain-self-would would complain (2 words, possibly 5 tokens)
    • a da ortul popii to die (3 words, 3 tokens)
    • hålla ett tal (2 words, 2 tokens) hold a speech to give a speech
    • klicati jelene to call cerfs to vomit (2 words, 2 tokens)
      vreči puško v koruzo throw a rifle in the corn to give up (4 words, 4 tokens)
  • A VMWE contains one (multiword) token:
    • no example found for Bulgarian
    • vorbereiten to pre-arrange to prepare
      anfangen at-catch to begin
    • έδωσα-πήρα gave-1SG took-1SG to manage
    • to pretty-print
    • suicidarse suicide_self to commit suicide
    • court-circuiter to short circuit
    • pripremiti unaprijed napraviti/ urediti to prepare
    • kinyír out.cut to kill
    • corto-circuitare to short circuit suicidarsi suicide_self to commit suicide
    • no example found for Polish
    • queixar-se-ia compain-SELF-would would complain
    • a se-ndura RCLI.ACC-have.the.heart to have the heart
    • klargöra clear-make clarify påpeka on-point point out
    • pripraviti to pre-arrange to prepare

Note finally that multitoken words are not considered verbal MWEs since they contain one (multitoken) word only:

  • no example found for Bulgarian
  • ??
  • n.a.
  • maldecir bad say curse bienvivir well live to live in comfort
  • ricaricare to recharge
  • SMS-ować to write an SMS
  • pós-datar to post-date
  • a re-mpărțiPREFIX-splitto split againwith the aphaeresis of the sound 'î' in rapid speech; this is one word, multitoken
  • SMS-jati to write an SMS

Whenever the distinction between a word and a token is judged by a particular language team as hard to tackle, a possible option is to consider these two notions equivalent for the needs of this shared task.


Section 1.3

Verbal Multiword expressions

Multiword expressions (MWEs) are (continuous or discontinuous) sequences of words with the following compulsory properties:

  • They show some degree of orthographic, morphological, syntactic or semantic idiosyncrasy (see tests 1 to 5) with respect to what is considered general grammar rules of a language. Collocations, i.e. word co-occurrences whose idiosyncrasy is of statistical nature only (e.g. the graphic shows, drastically drop) are not annotated.
  • Their component words include a head word and at least one other syntactically related word. Most often the relation they maintain is a syntactic (direct or indirect) dependence but it can also be e.g. a coordination. Depending on the category of the head word, the whole MWE can be nominal, adjectival, prepositional, verbal, sentential, etc.
  • At least two components of such a word sequence have to be lexicalized. In this task we only annotate the lexicalized components and ignore open slots.

Probably the most salient property of MWEs is semantic non-compositionality. In other words, it is often impossible to deduce the meaning of the whole unit from the meanings of its parts and from its syntactic structure. For instance, while it is easy to interpret phrases like to kick the ball or to spill some water from the words that compose them, it is almost impossible to guess, without knowing it beforehand, that to kick the bucket means 'to die' and to spill the beans actually means 'to reveal a secret'.

However, as non-compositionality is a subjective notion, we use inflexibility as a proxy in the tests. Our underlying hypothesis is that (verbal) MWEs have some degree of semantic non-compositionality that implies limited flexibility.

Verbal MWEs (VMWEs) are simply multiword expressions whose syntactic head in the prototypical form is a verb.


Section 1.4

Syntactic variants of VMWEs

VMWEs in this task include following syntactic structures:

  1. Prototypical verb phrases: in most cases, the prototypical form of a VMWE is a verb in finite form – or a participle, infinitive or gerund with finite auxiliaries – in active voice, and whose other components depend directly or by transitivity on the verb. The VMWE can also contain coordinated verbs. These phrases can be:
    • Partly saturated, where only some of their arguments are lexicalized:
      • пиля нечии нерви scrape someone's nerves to annoy someone
        вземам трудно решение make a difficult decision to make a difficult decision
      • traf eine Entscheidung made a decision
        nahm sich das zu Herzen took this to heart
      • παίρνω μία δύσκολη απόφαση take-1SG a-FE.SG.AC difficult-FE.SG.AC decision-FE.SG.AC to take a difficult decision
        παίρνω τα μέτρα μου take-1.SG the-NE.PL.AC measures-NE.PL.AC my-1.SG.GE.POSS to take precautions
        γράφω κάποιον στα παλιά μου τα παπούτσια write-1.SG someone to-the-NE.PL.AC old-NE.PL.AC my-1.SG.GE.POSS shoes-NE.PL.AC to ignore someone
      • made a decision
        break her heart
        took this to heart
        could take this to heart
        would have been making a decision
        could have made a different decision
      • tomó una decisión took.he/she a decision he/she made a decision
        le hubiera roto el corazón him/her would_have broken.he/she the heart he/she would have broken his/her heart
      • a eu du courage has had courage had courage
      • sétát tesz to take a walk
      • prendere una decisionemake a decision take a decision
        spezzare il cuorebreak the heart break the heart
        prendere a cuore
        take to heart take to heart
      • podjął niejedną trudną decyzję took.3.SG not-one hard decision he took several hard decisions
      • eles deram uma caminhada they gave a walk they took a walk
      • a trece asta cu vederea to pass this with sight.the.ACC to overlook
        a trece ceva sub tăcere to keep something under silence.ACC to keep quiet about something
      • fattade ett belsut made a decision
      • sprejeti odločitev to make a decision, zlomiti komu srce to break someone's heart to upset someone by letting them know that you do not love them, vzeti si k srcu take something to heart to think about something seriously
    • Partly saturated, where the lexicalized arguments include the subject:
      • излиза ми име appears for-me.DAT name a name sticks for/to me
        чашата на търпението ми прелива glass.DET of patience my.POS overflows my patience runs out
      • ein Vöglein hat mir gezwitschert a little bird has to me twittered a little bird told me
      • μου έφυγε ο τάκος me.GEN left the chosk I was very tired
      • a little bird told someone, the problem lies in something
      • me lo ha dicho un pajarito to_me it has said a little_bird a little bird has told me
      • me lo ha detto un uccellino to me it told a little bird a little bird told me
      • mina komuś zrzedła the face someone.DAT thinned one lost one's confidence
      • a sua hora chegou your time has arrived your time has come
      • a mustra cugetul (pe cineva) to chide consciousness-the (PE_Acc somebody) the consciousness chides (PE somebody)
      • srce pade v hlače komu (someone's) hart drops into the pants one hasn't enough courage to do somethiing, sekira pade v med komu
    • Partly saturated, where lexicalized head verbs are coordinated:
      • цъфна и вържа to blossom and give fruit (usually sarcastically) to prosper
      • leben und leben lassen to live and let live to live and let live
      • απορώ και εξίσταμαι wonder1.SG and be-amazed1.SG to wonder
      • drink and drive
      • coser y cantar to_sew and to_sing easy as pie, a piece of cake
      • vivi e lascia vivere to live and let live to live and let live
      • pluł i łapał he spit and catched (he) was lazy, (he) did nothing useful
      • pintar e bordar paint and knit to abuse
      • a tunat și i-a adunatit.has thundered and CL.ACC-it.has gatheredbirds of a feather flock together
        seamănă, dar nu răsaresow.3SG (homonym of resemble), but not sprout.3SGnot to resemble
      • živi in pusti živeti to live and let live to live and let live
    • Fully saturated:
      • пиле не може да прехвръкне bird cannot fly in it is very strictly guarded
      • der frühe Vogel fängt den Wurmthe early bird catches the wormthe early bird catches the worm
      • το έξυπνο πουλί από τη μύτη πιάνεται the clever bird is-caught-3SG by the nose people who consider themelves clever fail
      • the early bird catches the worm
      • los ojos son el reflejo del alma the eyes are the reflection from_the soul the eyes are the window to the soul
      • à quelque chose malheur est bon for something bad luck is good bad experiences may bring unexpected positive effects
      • nie od razu Kraków zbudowano not at once Cracow was built Rome was not built in a day
        kości zostały rzuconethe dice have been thrownalea iacta est
      • quem vê cara não vê coração who sees face doesn't see heart a person can lie/omit his/her feelings
      • se revarsă zorile Refl.Cl.3sg.Acc. flow_out dawns it is getting morning
      • kdor prej pride, prej melje who came first mills first
  2. Meaning-preserving variants belonging to the following syntactic categories.
    • infinitives:
      • eine Entscheidung treffen a decision meet to make a decision
      • έχω πάρει την απόφασή μου
      • to make a decision, to break one's heart
      • tomar una decisión to_take a decision to_make a decision
        hacer ilusión to_make excitement to look forward to, to be excited about
      • avoir du courage to have courage
      • prendere una decisione take a decision to make a decision
      • podjąć niejedną trudną decyzję to take several hard decisions
      • tomar essa decisão to take this decision
      • lua o decizie make a decision
      • odločitev je treba sprejeti decisions has to be made
    • Nominal groups (headed by nominal complements from the prototypical VMWEs) with relative clauses:
      • решението, което взех decision.DET which I.PRODROP made the decision which I made
      • Entscheidungen die wir trafen decisions which we made
        Herzen die wir gebrochen haben hearts which we broke have hearts which we have broken
      • η απόφαση που πήραμε
        η πλάκα που κάναμε
      • decisions which we made
        heart which we broke
      • la decisión que tomamos the decision which made.we the decision we took
        la ilusión que le hizo the excitement which to_him/to_her made.it he/she was very excited by it
      • les décisions que nous avons prises hier sont bonnes the decisions that we have taken yesterday are good the decisions that were made yesterday are good
      • la decisione che prendemmo the decision which we took the decision which we made
        i cuori che abbiamo spezzato the hearts which we have broken hearts which we have broken
      • decyzje, które podjął decisions which he took
      • a apresentação que Maria fez the presentation that Mary made
      • decizia pe care am luat-o the decision that we have made
      • odločitev, ki jo je sprejel decisions which he took
    • Gerunds:
      • вземайки това решение (while) making this decision
      • n.a. (?)
      • παίρνοντας αποφάσεις
        κάνοντας πλάκα
      • decision making
        heart breaking
      • n.a.
      • En prenant cette décision by making this decision
        les personnes subissant plusieurs opérations sont fragiles the people undergoing several surgical operations are fragile
      • Prendendo decisioni importanti, imparerai by making important decisions, you will learn
      • podejmowanie decyzji decision making
      • Estou tomando uma decisão I am taking a decision
        A pessoa tomando banho sou eu The person taking shower am I
      • luând decizia making the decision
      • sprejemanje odločitev making the decision
    • Nominal and adjectival groups with participles:
      • взетите вече решения the decisions already made
        броящ звезди who counts stars
        неразбитите все още от него сърца the hearts not yet broken by him
      • früher getroffene Entscheidungen earlier made decisions
      • n.a.
      • decisions previously made
        all hearts broken by him
        heart-breaking
        breaking her heart
      • las decisiones tomadas ayer the decisions taken yesterday the decisions made yesterday
        el trato hecho previamente the agreement made previously the previously made agreement
      • les décisions prises hier sont bonnes the decisions taken yesterday are good the decisions that were made yesterday are good
        les personnes subissant plusieurs opérations sont fragiles the people undergoing several surgical operations are fragile
      • decisioni prese made decisions
      • podejmujący trudne decyzje making hard decisions
      • a decisão tomada ontem the decision made yesterday
        a mulher tomando um banho a woman taking a bath
      • decizia recent luată the decision recently made
      • včeraj sprejete odločitve the decision made yesterday
    • Diathesis alternation (passive, middle) Some VMWE (especially LVC) do allow diathesis alternation.:
      • important decisions were made (passive alternation)
      • les décisions importantes se prennent en groupe the decisions important SELF take in group important decisions are taken collectively (middle alternation)

If a candidate VMWE is a meaning-preserving variant of a prototypical verbal phrase, all linguistic tests to identify and categorize VMWEs must be applied to this prototypical phrase, rather than to the variant itself. The prototypical forms of a meaning-preserving variant are also known as its canonical forms.

Note that, for some VMWEs, the only possible form is not prototypical. For instance, some VMWEs appear in passive voice but never in active voice. In that case, the linguistic tests should be applied to the passive form with finite passive auxiliary.

Lexicalized verb complements can occur with all sorts of non-lexicalized simple and complex determiners, quantifiers and modifiers (the, some, half a dozen, an impressive number of, …). These optional elements should not be included in the annotated VMWE. Moreover, depending on the syntactic theory used, the lexicalized elements will not depend directly on the verb, but only indirectly, by transitivity. In this case, the canonical form is the one that includes no complex determiners, quantifiers and modifiers, thus creating direct dependency links between the lexicalized components.

  • they had taken a significant number of stepsapply tests to they take steps
  • dostać połowę spadku to receive half of the heritage
    nie mieć cienia wątpliwości not to have a shadow of a doubt to have no doubt
  • ele fez o restante do trabalho to make the remainder of the workapply tests to ele fez o trabalho he did the work

Expressions of the syntactic categories mentioned above are considered VMWEs only if they function as verb phrases (case 1) or nominal/participial phrases (case 2). Other kinds of variants are not considered VMWEs. This concerns nominalizations morphologically derived from verbs and describing a process, result, state, agent, etc.

  • вземане на решение making a decision
    удар в гърба a stab in the back
    високо вдигната летва highly raised bar high bar
    играч на карти card player
  • Wortbruch word-break a promise which has not been hold
  • η λήψη αποφάσεων the-FE.SG.NOM taking-FE.SG.NOM decisions-GE.PL.GEN deciding
  • a take-off
  • toma de decisiones taking of decisions decision making
    puesta a punto lay to point set-up
  • la prise en compte the fact of taking into account'
    une mise à disposition the fact of making available
  • la messa a disposizionethe made to availability the fact of making available
  • zabawa czyimś kosztem a play at someone else's expenses derived from bawić się czyimś kosztem to enjoy oneself at someone else's expenses
  • a tomada de decisão the making of decisions
    o tomador de decisão the decision-maker
  • luarea unei decizii take-noun.suffix a-genit decisionmaking a decision
  • sprejeta odločitev the decision

We also do not annotate MWEs containing verbs but functioning as adverbials or nominals (other than in case 2):

  • може би (it) may be maybe
    разбира се (it) is understood of course
  • Vergiss-mein-nichtforget-me-notforget-me-not
  • τα πάρε-δώσε the.NE.PL.ACC give-2SG.IMP take-2SG.IMP relationship of some type
  • forget-me-not
  • sacacorchos take_out corks corkscrew
    portalápices carry pencils pencil case
    nomeolvidesnot to_me forget_you-forget-me-not
  • peut-être may-be maybe
    porte-feuille carry-sheets wallet
  • non-ti-scordar-di-menot you forget-me-forget-me-not
  • zrobić coś za Bóg-zapłać do something for a God-pay to do something for free
  • um saca-rolhas a pull-corks a corkscrew
    um faz-de-conta a make-as-story a make-believe
  • treacă-meargă pass-golet it be

Particular language teams may decide to extend the annotation scope to these variants. It is recommended in this case to introduce a new category for them (e.g. NVPC: nominal verb-particle constructions) so as to keep the (quasi-)universal categories intact.

Like other VMWE occurrences, syntactic variants are also annotated if they contain one multi-word token only, e.g. particle verbs like (DE) aus|machen.


Section 1.5

Lexicalized components and open slots

Just like a regular verb, the head verb of a VMWE may have a varying number of compulsory arguments, that is, arguments that must be present in each occurrence of this VMWE. For instance, the direct object and the prepositional complement are compulsory in the VMWE to take someone by surprise.

Some components of such compulsory arguments may be lexicalized, that is, always realized by the same lexemes. Here, by surprise is lexicalized while someone is not.This definition of a lexicalized component naturally extends to any syntactic type of MWE. Namely, the head of a (nominal, adjectival, prepositional etc.) MWE is lexicalized (always realized by the same lexeme) together with at least one component of at least one of its modifiers. The head verb of a VMWE is always considered lexicalized. When it can be replaced by another verb, like in to make/take a decision, we consider that these are two different VMWEs, although possibly synonymous.

Conversely, a component of a compulsory argument which can be realized by a free lexeme taken from a relatively large semantic class is called an open slot. In the following VMWE examples (cited after Gross 1994), all having the same syntactic structure NP V NP Prep NP, the lexicalized arguments are highlighted in bold:

  • Max took the bull by the horns.
  • The news took John by surprise.
  • Bob took part in the inquiry
  • Money burns a hole in Bob’s pocket.

Special cases

Prepositions have a special status with respect to the notion of lexicalization. In the first, second and fourth example above, the prepositions by and in are lexicalized since they introduce lexicalized complements (the horns, surprise and pocket). However, in the third case the preposition in introduces an open slot whose meaning compositionally combines with the meaning of the VMWE took part. We say in this case that the preposition is selected by the VMWE but it is not lexicalized and should not be annotated. Prepositions selected by the governing verb, noun, adjective or adverb are fixed in the sense that they cannot vary freely. However, this kind of fixedness belongs to the phenomenon of valency and is considered a regular property of the grammatical system, thus outside of our annotation scope.

Reflexive clitics in inherently reflexive verbs also have a special lexicalization status. In some languages, the same reflexive clitic is used regardless of the person and number, inflecting for case only:

  • смея се laugh se.REFL to laugh
    намирам се find se.REFL to be (somewhere)
  • ??
  • n.a.
  • n.a.
  • znajduję się find.1.SG.PRES self I find myself
    znajdujesz się find.2.SG.PRES self you find yourself
    znajdują się find.3.PL.PRES self they find themselves
  • n.a.
  • n.a.
  • smejim se laugh.1.SG self I laugh
    smejiš se laugh.2.SG self You laugh
    smejijo se laugh.3.PL self they laugh

In other languages, reflexive clitics agree in person and number with the subject and the verb:

  • No examples found for Bulgarian.
  • sie wundert sich she wonders self.3.SG she wonders
    ihr wundert euch you.PL wonder.2.PL self.2.PL you wonder
  • n.a.
  • yo me quejo I self.1.SG complain I complain
    tu te quejas you self.2.SG complain You complain
  • je me trouve I self.1.SG find I find myslef
    tu te trouves you self.2.SG find you find yourself
  • io mi meraviglio I self.1.SG wonder I wonder
    tu ti meravigli you self.2.SG woder you wonder
  • eu me queixo I self.1.SG complain I complain
    tu te queixas you self.2.SG complain You complain
  • eu mă gândesc I Refl.Cl.1sg.Acc. think I am thinking
    tu te gândeștiyou Refl.Cl.2sg.Acc. thinkyou are thinking

It this case, the clitic is realized by different lexemes, depending on the number and gender. Strictly speaking, it is not lexicalized. However, we admit that, regardless of the language, the reflexive clitic is a unique lexeme (with lemma się, se, sich, etc.) inflecting for person and number. It is thus lexicalized in inherently reflexive verbs.


Section 1.6

Verbal multiword expressions versus collocations

Collocations are not considered VMWEs in this task and should not be annotated. However, the boundary between both categories is not always easy to define and should be handled with care.

We understand collocations as combinations of words whose idiosyncrasy is purely statistical. In other words, words in collocations tend to co-occur with each other more often than expected by chance, but they show no substantial orthographic, morphological, syntactic and (most notably) semantic idiosyncrasy.

Some combinations happen to be very frequent and are perceived as "frozen":

  • качвам цената raise the price
  • eine Frage beantworten to answer a question, die Graphik zeigt the grahpic shows, einen Bus nehmen to take a bus
  • κάνω βόλτα take-1SG a walk
  • drastically drop
    the graphic shows
    to take a bus
  • responder a una pregunta to answer a question
    el gráfico muestra the graphic shows
    coger el autobús to take the bus
  • rispondere auna domanda to answer a question
    il grafico mostra the graphic shows
    prendere un bus to take a bus
  • zalać rynek to flood the market to dominate the market
  • bater um recorde to break a record (bater to beat has a regular sense of to overcome in addition to the litteral sense)
    entrar em cartaz enter into poster arrive in theaters (for a movie) (the MWE is em cartaz in poster in theaters, the verb just usually collocates with this MWE)
  • lua un autobuztake a bus
  • drastičen upad drastically drop, graf prikazuje graphic shows, vzeti taksi to take a taxi

However, applying regular lexical alternations to them does not markedly impact their meaning.

  • вдигам цената raise the price, увеличавам цената raise the price, качвам залога raise the bet, качвам температурата raise the temperature
  • eine Anfrage beantworten to answer a request, das Diagramm zeigt the diagram shows, mit einem Bus fahren to take a bus
  • πάω βόλτα go for a walk
  • significantly drop, drastically decrease, the diagram shows, the graphic illustrates, to take a coach
  • responder a una petición to answer a request
    el diagrama muestra the diagram shows
    coger un tren take a train
  • rispondere a una richiesta to answer a request
    il diagramma mostra the diagram shows
  • zdominować/zarzucić/zapełnić/nasycić rynek to dominate/overwhelm/fill/saturate the market
  • quebrar/bater/ultrapassar/estabelecer um recorde to break/beat/overcome/establish a record
    o recorde foi quebrado the record was broken
    entrar/estar/permanecer/ficar/continuar/ter em cartaz enter/be/remain/stay/continue/have in poster
  • lua o mașină
  • občuten upad significantly drop, drastično zmanjšanje drastically decrease, diagram prikazuje diagram shows, slika prikazuje picture shows

The difficulty of distinguishing collocations from VMWEs lies in the fact that lexical variability is relevant to some VMWEs:

  • нямам пукната пара/пукнат грош to not have a single penny, be very poor
    имам твърда/дебела глава to have a thick head, to be stubborn and not listen to advice
  • einen Willen/Menschen brechen to break a will/person
  • to come in handy/useful, to stand firm/fast, to break someone's spirit/will, to take the cake/biscuit
  • dar un paseo/ una vuelta give a walk / a turn to go for a walk
    darse/tomar una ducha give.self/take a shower take a shower
  • cogliere/prendere di sorpresa, dare/fornire un contributo
  • zapisać się złotymi literami/zgłoskami to record iteself with golden letters/syllables to be remebered and commemorated for a merit
    zamarznąć na kość/lód/sopel to freeze to bone/ice/icicle to freeze strongly
  • levar em conta/consideração take into account/consideration
    chutar o balde/pau da barraca to kick the bucket/the tent's stick to act irresponsibly
  • lua o decizie/hotărâremake a decision
  • sprejeti odločitev/sklepmake a decision

However, the extent of the vocabulary concerned by this variability is different for collocations and VMWEs. Namely, a head verb in a collocation usually selects a whole semantic class for each of its required arguments. For instance, the verb to take to use a vehicle to travel selects a whole semantic class of means of transport. Similarly, the verb to drop can select a large set of adverbs describing the degree: drastically/significantly/remarkably/slightly/reasonably drop. Conversely, lexical variability in a VMWE is limited to a closed list of lexemes, sometimes only loosely semantically related. For instance, the VMWEs to take a cake/biscuit and to stand firm/fast do not keep their idiomatic readings with semantically close complements: #to take a cookie/wafer, *to stand hard/rigid/solid etc. See also test 2.


Section 1.7

Verbal multiword expressions versus metaphor

Another phenomenon closely related to VMWEs is metaphor. According to (Shutova 2010), "a metaphor occurs when one concept is viewed in terms of the properties of the other. In other words it is based on similarity (presence of common characteristics) between two concepts".

Many VMWEs, especially idioms, are based on metaphors. For instance, to take the bull by the horns means to address a problem (the bull) starting with its most challenging aspect (the horns). To set the world on fire is to do something extraordinary and get the admiration (set on fire) of other people (the world), to put all one's eggs in one basket means to rely on one particular course of action (a basket) for success rather than giving oneself several possibilities.

However, verbal metaphors are not always VMWEs. Consider the newspaper title "simple steps to lift your dark cloud of stress", and the extract of a poem by Wordsworth, cited by Shutova: "and then my heart with pleasure fills, and dances with the daffodils". The metaphorical expressions to lift dark cloud of stress to relax and my heart ... dances with the daffodils I am happy are not semantically compositional. These expressions, however, were probably constructed for the needs of one article/poem only and are not sufficiently established in the common vocabulary to be considered VMWEs.

The distinction between MWEs and metaphors is a relatively unstudied and open question. There are few precise tests, other than statistical, which would allow human annotators to resolve it reliably. Gross (1982) gives some clues on the reproducibility and predictability of metaphors. It remains to be seen how heavily this problem will impact the annotation of texts selected for our shared task. We suggest that the annotators take notes of such cases and discuss them within their communities, both local and international.


Section 2

Textual annotation scope

In this annotation task, all occurrences of all syntactic types of VMWEs are to be annotated in the text.

We annotate, as integral parts of VMWEs, all lexicalized elements that can form a separate word. For instance, lexicalized particles are annotated at but case suffixes are not. Thus, in to put something up, the verb and the particle are integral parts of the VMWE (see VPC tests), while in (HU) döntést hoz valamiről decision-ACC bring something-DEL make a decision, only döntést hoz is annotated, even if the delative case suffix is also lexically determined.

Both continuous and discontinuous sequences of lexicalized components of VMWEs are annotated.

Reflexive pronouns, particles and prepositions need to be handled with special care, given their particular lexicalization status. Verb+pronoun and verb+particle combinations are annotated essentially if they are inherently reflexive verbs or verb-particle combinations. In this version of the guidelines, verb+preposition combinations like to rely on somebody and to come across something are no longer considered VMWEs.

The annotation considers only flat, tokenized sentences whose tokens will be tagged by annotators as part of a VMWE or not. We do not annotate their internal syntactic structure. We do annotate, however, VMWEs embedded in other VMWEs. For instance, the VMWE to let the cat out of the bag contains the embedded VMWE let out and both are to be annotated as different VMWEs.

Once identified in a text, VMWEs are also to be assigned to exactly one of the categories described in the following sections. In this version of the guidelines, we no longer admit hesitation between two different categories. Hesitation can, however, be expressed in a comment and a particular value of the annotator's confidence assigned to a particular VMWE occurrence.


Section 3

Categories of verbal MWEs

In this task we distinguish the following categories of verbal MWEs:

  • Two universal categories, i. e. valid for all languages participating in the task:
    • light verb constructions (LVC):
      • държа под контрол to keep under control
      • eine Rede haltena speech holdto give a speech
      • κάνω μία βόλτα make-1SG a walk to walk
      • to give a lecture
      • hacer una foto to_make a picture to take a picture
      • avoir du courage to have courage
      • fare un discorsogive a speechto give a speech
      • ħa deċizjoni took a decision
      • podjąć decyzję to take a decision
      • fazer uma promessa to make a promise
      • a lua o decizie to take a decisionto make a decision
      • imeti predavanje, sprejeti odločitev to take a decision
    • idioms (ID):
      • правя се на дръж ми шапката to behave myself as 'hold my hat' pretend to be naive and innocent
      • schwarz fahren to drive black take a ride without a ticket, in Kraft treten into force step to come into effect, in die Waagschale werfen in the weighing pan throw to bring to bear
      • χάνω τα αυγά και τα καλάθια loose-1SG the eggs and the baskets to be at a complete and utter loss
      • to go bananas, fortune favors the bold
      • hacer de tripas corazón make of intestines heart to pluck up the courage
        entrar en vigor enter in vigor to come into force/effect
      • défendre son bifteck defend one's beefsteak to defend one's interests
      • entrare in vigore to enter into force to come into effect, gettare le perle ai porci to throw the pearls to the pigs to waste something good on someone who doesn't care about it
      • għasfur żgħir qalli a bird small told me to hear something from the grapevine
      • rzucać grochem o ścianę throw peas agains a wall to try to convince somebody in vain
      • fazer das tripas coração transform the tripes into heart to try everything possible
      • a trage pe sfoară to pull on rope to fool
      • ubiti dve muhi na en mah to to achieve two aims at once, spati kot ubit sleep like dead sleep soundly
  • Two quasi-universal categories, valid for some language groups or languages but not all:
    • inherently reflexive verbs (IReflV):
      • усмихвам се to smile
      • sich bemühen to endeavour, sich enthalten himself contain to abstain
      • n.a.
      • suicidarse to suicide
      • se suicider to suicide
        quejarse to complain
      • suicidarsi to suicide
      • bać się to fear SELFto be afraid
      • se queixar to complain
      • a se gândi to think
      • bati se to be afraid
    • verb-particle combinations (VPC):
      • not applicable to Bulgarian
      • er gibt auf he gives up, er wirft ihr das vor he throws her that against he reproches that to her
      • μπαίνω μέσα get in to go bankrupt
      • to do in
      • n.a.
      • buttare giù throw down to swallow
      • not applicable to Polish
      • jogar fora This seems to be the only VPC in Portuguese. We annotate it as ID and do not use the VPC category.
      • n.a.
      • dati skozi give through to go through, gre za it goes about it is about
  • language-specific categories, defined for a particular language in a separate documentation.
  • other verbal MWEs (OTH), which gather the types not belonging to any of the categories above:
    • цъфна и вържа to blossom and give fruit (usually sarcastically) to prosper
      река и отсека to say and cut to say firmly, decisively
    • einen drauf setzen going one better
    • απορώ και εξίσταμαι wonder1SG.PST and be-amazed1SG.PST to wonder
    • to drink and drive
      to voice act
      to pretty-print
      to short-circuit
      to tumble dry
    • coser y cantarto_sew and to_singeasy as pie, a piece of cake
    • court-circuiter to short-circuit
    • andare e venire to come and goback and forth
      corto-circuitare
      to short-circuit
    • iqum u joqgħod jump and stay to fidget
    • pluć i łapać to spit and catch to be lazy, to do nothing useful
    • pintar e bordar paint and knit to abuse
    • a tunat și i-a adunatit.has thundered and CL.ACC-it.has gatheredbirds of a feather flock together

In practice, to identify and categorize verbal MWEs during manual annotation, one must use the rigorous generic and category-specific tests provided.


Section 4

Annotation process and decision tree

We propose the following methodology for VMWE annotation:

  • Step 1 - identify a candidate, that is, a combination of a verb with at least one other word which could form a VMWE. If the candidate is a meaning-preserving variant of a prototypical verbal phrase, the following steps apply to this prototypical phrase, called the canonical form. This step is largely based on the annotators' linguistic knowledge and intuition after reading this guide.
  • Step 2 - determine which components of the candidate (or of its canonical form) are lexicalized, that is, if they are omitted, the VMWE does not occur any more. Corpus and web searches may be required to confirm intuitions about acceptable variants.
  • Step 3 - formally check if the candidate (or its canonical form) forms a VMWE and categorize it into one of the available categories, using the decision trees and detailed tests in the following sections.

We provide two decision trees that indicate the order in which tests should be applied in step 3. They determine the priority of different categories when several tests match. The decision trees are a useful summary to consult during annotation, but contain very short descriptions of the tests. Each test is detailed and explained with examples in the following sections.

Decision tree 1: Identification

In this tree, one YES to one of the tests is sufficient to identify a VMWE
  • Apply test 1 - [CRAN: Candidate contains cranberry word?]
    • Annotate as a VMWE and go to test 6 - [HEAD]
    • Apply test 2 - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
      • Annotate as a VMWE and go to test 6 - [HEAD]
      • Apply test 3 - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
        • Annotate as a VMWE and go to test 6 - [HEAD]
        • Apply test 4 - [MORPHSYNT: Regular morphosyntactic change ⇒ unexpected meaning shift?]
          • Annotate as a VMWE and go to test 6 - [HEAD]
          • Apply test 5 - [SYNT: Regular syntactic change ⇒ unexpected meaning shift?]
            • Annotate as a VMWE and go to test 6 - [HEAD]
            • Apply the LVC hypothesis - [Candidate has operator verb + activity or state noun?]
              • Assume a VMWE and go to test 6 - [HEAD]
              • It is not a VMWE, exit

Decision tree 2: Categorization

  • Apply test 6 - [HEAD: Unique verb as syntactic head of the whole?]
    • Annotate as a VMWE of category OTH
    • Apply test 7 - [1DEP: Verb v has exactly one dependent d?]
      • Annotate as a VMWE of category ID
      • Apply test 8 - [CATEG: What is the morphosyntactic category of d?]
        • Reflexive clitic ⇒ Apply IReflV-specific testsIReflV tests positive?
          • Annotate as a VMWE of category IReflV
          • It is not a VMWE, exit
        • Particle ⇒ Apply VPC-specific testsVPC tests positive?
          • Annotate as a VMWE of category VPC
          • It is not a VMWE, exit
        • NP or PP ⇒ Apply LVC-specific decision treeAnswer positive?
          • Annotate as a VMWE of category LVC
          • Annotate as a VMWE of category ID
        • Other category ⇒ Annotate as a VMWE of category ID

Section 5

Generic tests for identifying VMWEs

In order to decide if a candidate is a VMWE, we apply the following generic idiosyncrasy tests. If a candidate expression passes at least one test from 1 to 5, we consider it to be a VMWE, and it can further be categorized by decision tree 2 based on category-specific tests. If tests 1 to 5 fail, the LVC hypothesis may apply but LVC-specific tests are needed to confirm the candidate's VMWE status (at the same time as its LVC category).

Test 1 - [CRAN] - Cranberry word

Does the candidate expression contain a cranberry word?

  • it is a VMWE
    • хващам натясно catch in a tight place to coerce, to pressureнатясно is only used in MWEs
      правя на бъзе и коприва to turn into elder and nettle to scold, to tell off бъзе is an old word, very rarely used independently
      вземам предвид, имам предвид to предвид (as adverb) is only used in MWEs
      стоя диван чапраз to stay upright as in Osman council to stay ready to serve чапраз is an old word, very rarely used independently
    • sich um etw. scharen to gather around something scharen is not a stand-alone word
    • μάλλιασε η γλώσσα μου is-full-of-hair-3SG the-SG.NOM tongue-SG.NOM my-SG.GEN.POSSto repeat the same thing again and again μάλλιασε is not a stand-alone word
    • to go astray astray is not a stand-alone word
    • sin decir ni chus ni mus chus is not a stand-alone word without to_say neither chus nor mus without saying a word
      no decir ni chus ni mus chus is not a stand-alone word not to_say neither chus nor mus not to say a word, to remain silent
      a troche y moche troche is not a stand-alone word to troche and dulled in a nonsensical way, willy-nilly, haphazardly
    • prendre la poudre d'escampette to escape escampette is not a stand-alone word
    • mangiare a ufo to eat without paying a ufo is not a stand-alone word
      fare lo gnorri to play dumb gnorri is not a stand-alone word
      scendere in lizza to enter the lists lizza is not a stand-alone word
    • odsądzić kogoś od czci i wiary to refuse honor and faith to someone to drag sb's name through the mire/mud, to damage someone's reputation by saying insulting things about them
    • ir para as cucuias to go wrong cucuias is not a stand-alone word
    • a nu avea habar to have no idea habar is not a stand-alone word
    • att komma ihåg to remember ihåg is not a stand-alone word
    • biti si kvit to pay up a debt, owe nothing to somebody kvit is not a stand-alone word
  • further tests are required
    • правя на сос правя and сос are stand-alone words
    • sich um etw. herum stellen to stand around something → all words are stand-alone words
    • to go away go and away are stand-alone words
    • ir a la universidad to go to university ir, a, la and universidad are stand-alone words
    • andare giù to go down andare and giù are stand-alone words
    • wyznać tajemnicę to reveal a secret wyznać and tajemnica are standalone words
    • ir para a escola to go to school ir, para, a and escola are stand-alone words
    • a nu avea idee to have no idea → all words are stand-alone words
    • att komma på to figure out komma and are stand-alone words

Test 2 - [LEX] - Lexical inflexibility

Does a regular replacement of one of the components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?

  • it is a VMWE
    • бълвам змии и гущери to spew snakes and lizards#бълвам влечуги (to spew reptiles)
      всяка жаба да си знае гьола every frog to know its own puddle#всяка жаба да си знае локвата
    • die Katze aus dem Sack lassen to let the cat out of the bag#den Hund aus dem Karton lassen #to let the dog out of the box
      eine Entscheidung treffen to meet a decision to make a decision#eine Entscheidung machen/herstellen a decision make/produce #to make/produce a decision
    • to let the cat out of the bag#to allow the feline out of the container
      to make a decision*to produce/build/create a decision
      to go on*to go upon
      to stand firm/fast*to stand hard/rigid/solid
    • meterse en la boca del lobo to_get_into.self in the mouth from_the wolf venture into the lion's den#meterse en el ojo del gato
      tomar una decisiónto_take a decision to make a decision#hacer/coger/producir una decisión to_make/grab/produce a decision #to make/grab/produce a decision
    • non dire gatto se non ce l'hai nel sacco don't say cat if it is not in the sack don't count on something before it happens#non dire cane se non ce l'hai nel sacco#don't say dog if it is not in the sack
      sputare il rospo spit the toad spit it out#sputare la rana#spit the frog
    • wiedzieć, co w trawie piszczy to know what in grass squeals to be well informed#wiedzieć, co w trawniku popiskuje
      wziąć udział to take participation to take part#to podjąć/pobrać/zabrać członkostwo/uczestniczenie
    • quebrar um galho break a branch to help#danificar um ramo to damage a stem
    • a da cu bâta în baltă to give with bat-the in pond to say sth embarrassing*a da cu bățul în baltă to give with stick-the in pond, *a da cu bâta în lac to give with bat-the in lake
    • att Plocka russinen ur kakan to pick the raisins out of the cake to choose only the best things#att välja ut nötterna från kakan
    • imeti mačka to have a cat to have a hangover#imeti psa to have a dog
      iti rakom žvižgat to go whistling to cancers to fail, to die#iti jastogom pet to go singing to the lobsters
  • further tests are required
    • изнасям доклад present a report → изнасям урок/лекция/презентация и т.н.
    • den Bus nehmen to take the bus → den Zug/ das Flugzeug, etc nehmen to take the train/plain/etc
    • to take a plane → to take a bus/car/boat, etc.
    • coger el autobús to_take the busto take the bus → coger el avión/tren, etc. to take the plain/train/etc.
    • prendere il trenoto take the bus → prendere il bus/aereo/etc to take the bus/plain/etc
    • jqum u joqgħod always moving about
    • sprawić kłopot to make a troublesprawić przykrość/trudność/niedogodność/problem/zawikłanie/nieprzyjemnośćto make a(n) nuisance/difficulty/inconvenience/problem/complication
    • quebrar um braço to break an arm → quebrar uma perna/costela/falange to break a leg/rib/phalanx
    • a lua o decizieto take a decision to make a decision → a lua o hotărâre to take a decree to make a decision
    • att ta bussen to take the bus → att ta tåget/flyget, etc to take the train/plain/etc
    • delati težave to make a troubledelati preglavice/probleme/ to make a(n) nuisance/problem

Usual modifications for [LEX] include replacing content words in the candidate by synonyms, hypernyms, hyponyms, antonyms, troponyms, meronyms, and related words in general.

Test 3 - [MORPH] - Morphological inflexibility

Does a regular morphological change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

  • it is a VMWE
    • хвърлям око throw an eye to throw a glance#хвърлям очи.PLURAL
      хващам бика.DEF за рогата take the bull by the horns#хващам бик.INDEF за рогата
      не мога да си намеря място cannot find a place for myself to be extremely nervous → only exists in negative form
    • ins Gras beißen to bite into the grass to die#in ein Gras beißento bite into a grass #in die Gräser beißen to bite into the grasses, in Kraft treten into force step to come into effect#in Kräfte treten into forces step
    • to kick the bucket#to kick the buckets
      to pretty-print*to prettier-print
      to take turns#to take a turn
    • coger el toro por los cuernos to_take the bull by the horns to take the bull by the horns#coger el toro por el cuernoto_take the bull by the horn #to take the bulls by the horns to_take the bulls by the horns #to take the bulls by the horns
      entrar en vigor to_enter in vigor to come into effect#entrar en vigores to_enter in vigors #to come into effects
    • prendre le taureau par les cornes to_take the bull by the horns#prendre le taureau par une corne to_take the bull by a horn
    • andare a letto con le gallineto go to bed with the hens to go to bed early#andare a letto con la gallina to_go to bed with the hen
      cercare il pelo nell'uovo to look for the hair in the egg to be pedantic #cercare i peli nell'uovo
    • budować zamki na lodzie to build castles on ice to rely on unstable foundations#budować zamek na lodzie to build a castle on ice
      mucha kogoś ugryzła a fly bit someone someone is in a bad temper#mucha kogoś ugryzie a fly will bite someone
      wyciągnąć nogito stretch.PERF legsto die#wiciągać nogi to stretch.IMPERF legs (imperfective aspectual variant prohibited)
    • bater perna hit leg to walk aroundbater a/uma/essas perna/pernas/perninha/pernona to hit the/one/these leg/legs/leg.SMALL/leg.BIG
    • a da colțul to give corner.the to die*a da colţurileto give corners.the
    • träda i kraft step in force to come into effect#träda i krafter step into forces
    • klicati jelene to call cerfs to vomit#klicati jelena to call a cerf
  • further tests are required
    • хвърлям топка to throw a ball → хвърлям топка/топката/топки/топките
    • einen Kuchen backen to bake a cake → viele/keine/den Kuchen backen/machen many/no/the cake bake/make
    • to make a cake → to make a/many/those/no cake/cakes
    • mover el brazo to_move the arm to move the arm → mover/agitar/levantar/estirar el brazo/la pierna/las manos/las piernas to_move/shake/raise/stretch the arm/the leg/the hands/the legs to move/shake/raise/stretch the arm/the leg/the hands/the legs
    • fare un dolce → fare un/molti/dei/quei/nessun dolce/dolci
    • kształtować opinię to form an opinionkształtować opinie to form opinions
    • bater o braço to hit the arm→ bater o/os/um/esse braço/braços/bracinho hit the/the.PL/a/this arm/arms/arm.SMALL
    • a face o prăjiturăto make a cake → a face multe/aceste prăjiturito make many/these cakes
    • att baka en kaka to bake a cake → att baka flera/den där/några/ingen kaka/kakor to bake several/that/some/no cake(s)
    • vzeti taksi to take a cab → ne vzeti nobenega taksija/en taksi/dva taksija to take no/one/two/… cab(s)

Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, tense, mood, aspect, etc. - depending on the target language's morphology.

Test 4 - [MORPHSYNT] - Morpho-syntactic inflexibility

Does a regular morpho-syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

  • it is a VMWE
    • аз ти давам думата си I give you my word#аз ти давам неговата дума (I give you HIS word)
      аз си продавам душата I sell my soul#аз продавам неговата душа (I sell his soul)
    • Ichwerde mein Bestes tun I will my best do I will do my best*Ich werde dein Bestes tun I will do your best, Ich gebe dir mein Wort I give you my word*Ich gebe dir ihr Wort I give you her word
    • I will do my best*I will do your best
      I give you my word for that → #I give you his word for that
      he was pulling my leg#I was pulling my leg
    • te doy mi palabra to_you give_I my word I give you my word#te doy su palabra to_you give_I his/her word I give you his/her word
    • il vide son sac he empties his bag he reveals his secret thoughts#il vide mon sac he empties my bag
    • Iofarò del mio meglio*Io farò del tuo meglio
      Io ti do la mia parola#Io ti do la sua parola
    • Polish VMWEs do not seem to exhibit this kind of inflexibility
    • ele se suicidou he self.3P.SG suicided*ele me suicidou
      eu perdi meu tempo I wasted my timeeu perdi teu/seu/nosso tempo English allows this, Portuguese doesn't. We say I made you waste your time instead.
    • Îți dau cuvântul meu CL.DAT give.1SG word.the my I give you my word#Îți dau cuvântul luiCL.DAT give.1SG word.the his I give you his word
    • Jag gör mitt bästa I do my best I do my best*Jag gör ditt bästa I do your best
    • Vlečeš me za nos you are pulling my nose you're pulling my leg *Vlečeš se za nos you're pulling your nose
      Pojdi se solit! to go salt oneself Get lost!*Pojdi ga solit go salt him
  • further tests are required
    • копая си гроба to dig my graveкопая ти/му/й/им гроба (to dig your/his/her/their grave)
    • er traf seine Entscheidung he made his decision → er traf meine/ihre/unsere/eure Entscheidung he made my/her/our/your decision
    • he did his job → he did my/her/our/your job
    • Ha hecho su trabajo Has_he/she done his/her work He/She has done his/her workHa hecho mi/tu/nuestro trabajo Has_he/she done my/your/our work He/She has done my/your/our work
    • hafatto il suo lavoro → ha fatto il mio/tuo/nostro/vostro/loro lavoro
    • Polish VMWEs do not seem to exhibit this kind of inflexibility
    • Eu fiz meu trabalho I did my job → Tu/ele/nós fizeste/fez/fizemos meu trabalho You/he/we made my job
    • el își face tema he his does homework.the he does his homework → el îmi/ne/le face tema he my/our/their does homework.the he does my/our/their homework
    • han gör sitt jobb he does his job → han gör mitt/hennes/vårt jobb he does my/her/our job
    • opravil je svojo nalogo he did his jobopravil je mojo/njeno/našo/tvojo nalogo he did my/her/our/your job

Usual modifications for [MORPHSYNT] involve agreement or loss of agreement between some components in the candidate.

Test 5 - [SYNT] - Syntactic inflexibility

Does a regular syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

  • it is a VMWE
    • на стар краставичар краставици продавам to an old cucumber seller cucumbers to sell to try to cheat a more experienced person#продавам краставици на стар краставичар, #краставиците са продадени
      бълвам змии и гущери#бълвам гущери и змии
    • Noun phrase (NP) or prepositional phrase (PP)
    • speak of the devil the person one is talking about shows up#he was speaking of the devil
      to go bananas to get crazy#bananas are gone
      to drink and drive#drive and drink
      to kick the bucket#the bucket was kicked
    • coser y cantar to_sew and to_sing easy as pie, a piece of cake#cantar y coser to sing and to sew
      perder la cabeza To_loose the head to go bananas#Perder las cabezas To_loose the heads
    • alzare la cresta to lift the crest become cocky#la cresta è stata alzata the crest has been lifted
      andare in malora go to ruin go to ruin #nella malora è andata in ruin was gone
      vivi e lascia vivere live and let live#lascia vivere e vivi let live and live
    • kogoś krew zalewa blood foods someone someone gets furious#ktoś jest zalewany przez krew someone is flooded by blood (passive blocked)
      robić bokami to do with-sidesto have serious financial problems→#robić swoją robotę bokami to do one's job with sides (regular modification blocked)
      dobrze komuś z oczu patrzy well someone.DAT from eyes lookssomeone looks like a good person#uprzejmość dobrze komuś z oczu patrzy kindness well someone.DAT from eyes looks (subject prohibited)
      nie zagrzać miejsca w pracy not to warm a place at worknot to stay long at one work #zagrzać miejsce w pracy to warm a place at work (negation is compulsory)
      zdechł pies! died the dog!it is a lost cause#pies zdechł the dog died (a regular word order variability is blocked)
      wziąć w łebto take into headto fail #wziąć porażkę w łeb to take failure into head(direct object prohibited for the normally transitive verb wziąćto take)
    • pisar na bola step on the ball make a mistake#a bola na qual ele pisou the ball on which he stepped
    • a da colțul to give corner.the to die*colțul a fost dat corner.the has been given
    • det knallar och går it trots and walks it is OK/as usual#det går och knallar
    • delati se Francoza to pretend to be French to pretend to be indifferent*delan Francoz made French
  • further tests are required
    • продавам неговата кола I sell his car → колата му беше продадена (his car was sold), неговата кола, която тя продаде (his car which she sold), т.н.
    • jemandes Auto waschen to wash one's car → ihr Auto wurde gewaschen her car was washed, das Auto, welches sie wusch the car that she washed, Autowaschen car-washing, etc
    • to wash one's car → her car was washed, the car that she washed, car washing, etc.
    • pisar la arena to step on the sand → la arena que pisaste The sand on which you stepped
    • lavare la macchina →la sua macchina è stata lavata, la macchina che ha lavato, il lavaggio della macchina, etc.
    • kształtować opinię to form an opinion opinia jest kształtowana the opinion is formed
    • pisar na areia to step on the sand → a areia na qual você pisou the sand on which you stepped
      jogar futebol to play football → ?futebol é jogado football is played One may argue that this is a VMWE because passive sounds strange. However, we assume that this sense of jogar does not accept passive. Since this construction is very productive, we do not annotate it as VMWE.
    • a spăla maşinato wash the car→ maşina a fost spălată, maşina pe care a spălat-o, spălarea maşinii etc.the car was washed, the car that he/she washed, car washing
    • att tvätta bilen to wash one's carmin bil tvättades my car was washed, bilen som hon tvättade the car that she washed, biltvätt car-wash etc.
    • narediti film to make a movie → Film, narejen po knjigi a movie based on a book

LVC Hypothesis

Does the candidate consist of a verb and a nominal complement, where the verb has a purely syntactic operator function (performing an activity or being in a state) and the noun expresses this activity or state?

  • assume that it is a VMWE
    • вземам решение to make a decision → the semantics is expressed by the noun решение, while the verb вземам expresses tense, aspect and mood, and assigns semantic role 'agent' to the subject
    • eine Entscheidung treffen a decision meet to make a decision treffen only expresses that an activity (Entscheidung) happened
      Mut haben to have courage haben only expresses that the subject has a property (Mut)
    • to make a decision make only expresses that an activity (decision) happened
      to commit suicide commit only expresses that an activity (suicide) happened
      to have courage have only expresses that the subject has a property (courage)
    • tomar una decisión to make a decision tomar only expresses that an activity (decisión) happened
      cometer un crimen to commit a crime cometer only expresses that an activity (crimen) happened
      tener coraje to have courage tener only expresses that the subject has a property (corage)
    • dare un suggerimento give a suggestion give a suggestion dare only expresses that an activity (suggerimento) happened
      avere coraggio to have courage avere only expresses that the subject has a property (coraggio)
    • wziąć udział to take a participationto take part wziąć only expresses an activity (participation)
    • tomar uma decisão to make a decision tomar only expresses that an activity (decisão) happened
      cometer um crime to commit a crime cometer only expresses that an activity (crime) happened
      ter coragem to have courage ter only expresses that the subject has a property (coragem)
    • a lua o decizie to take a decisionto make a decisionlua only expresses that an activity (decizie) happened
    • ta ett beslut take a decision to make a decision ta only expresses that an activity (beslut) happened
    • sprejeti odločitev sprejeti only expresses that an activity (odločitev) happened
      narediti samomor narediti only expresses that an activity (samomor) happened
      imeti pogum imeti only expresses that the subject has a property (pogum)
  • the candidate is NOT a VMWE
    • вземам пари to take money вземам has a concrete meaning and пари (money) is not an activity or state
    • einen Kuchen machento make a cake machen has a concrete meaning and the thing being made (Kuchen) is not an activity or state.
      Nachbarn haben to have neighbours haben might be an operator verb but Nachbarn are not activities or propertiesHoffnung geben to give hopeHoffnung is a state/property, but geben adds inchoative (i.e. change-of-state) semantics to it
    • to make a cake make has a concrete meaning and the thing being made (cake) is not an activity or state
      to have neighbors have might be an operator verb but neighbors are not activities or properties
      to give hope hope is a state/property, but give adds inchoative (i.e. change-of-state) semantics to it
    • tomar un zumo to_take a juiceto have a juice tomar has a concrete meaning and the thing being taken (zumo) is not an activity or state
      tener vecinos to_have neighbors tener might be an operator verb but vecinos are not activities or properties
      dar esperanza to give hope esperanza is a state/property, but dar adds inchoative (i.e. change-of-state) semantics to it
    • fare un quadro to make a painting to make a painting fare has a concrete meaning and the thing being made (quadro) is not an activity or state
      avere fratelli avere might be an operator verb but fratelli are not activities or properties
      dare speranza speranza is a state/property, but dare adds inchoative (i.e. change-of-state) semantics to it
    • odmówić udziału to refuse the participationto refuse to participate udział is a state/activity but odmówić adds semantics to it (refusal to perform the activity)
    • tomar um suco take a juicehave a juice tomar has a concrete meaning and the thing being taken (suco) is not an activity or state
      ter vizinhos to have neighbors ter might be an operator verb but vizinhos are not activities or properties
      dar esperança to give hope esperança is a state/property, but dar adds inchoative (i.e. change-of-state) semantics to it
    • a face o prăjiturăto make a cake face has a concrete meaning and the thing being made (prăjitură) is not an activity or state
    • narediti palačinke to make pancakes narediti make has a concrete meaning and the thing being made (palačinke) is not an activity or state.

The LVC hypothesis is not a real test, but its application is largely based on intuition and it may be hard to judge whether a verb is only performing the role of operator. This hypothesis accounts for LVCs that have otherwise no salient inflexibility but still correspond to multiword predicates we want to annotate. If you are unsure, we advise you to assume that the combination is a VMWE and go to the LVC tests. If the expression fails the LVC tests, then you must change your mind and consider that the answer to the LVC hypothesis was actually NO.


Section 6

Specific tests for categorizing verbal MWEs

Once a candidate verbal MWE has been pre-identified according to one of the identification tests, the confirmation of its status as a MWE, as well as its categorization can be based on category-specific tests.


Section 6.1

Structural tests

Structural tests are quite simple preliminary tests that help determining the syntactic structure of the VMWE. This is required in order to pursue categorization by pointing to the right category-specific tests in the last step. In practice, annotators will rarely need them since they will already have an intuition about the VMWE's category when they identify it.

Test 6 - [HEAD] - Syntactic head

Does the candidate contain a unique verb functioning as the syntactic head of the whole?

  • continue to the next test
    • гушна букета to hug the bunch of flowers to die гушна is the head and the NP depends on it
      правя на салата to make into salad to scold правя is the head and the PP depends on it
    • eine Fratze ziehen a grimace pull to make a face ziehen is the head and the NP depends on it
      er gibt auf he gives up gibt is the head and auf is the particle depending on it
    • κάνωγκριμάτσα to make grimace to make a face κάνω is the head and the NP depends on it
    • to make a face make is the head and the NP depends on it
      to give up give is the head and up is a particle depending on it
    • to fare le linguacce to make the grimaces fare is the head and the NP depends on it
      to far fuori to make out to kill far is the head and fuori is a particle depending on it
    • zbijać bąki to smash fartsto fool around, to do nothing usefulzbijać is the head and the NP bąki depends on it
      dać komuś popalićto let someone smoketo make someone's life hard dać is the head and the infinitive popalić depends on it
    • bater as botas bater is the head and the NP depends on it
      criar vergonha na cara criar is the head and the two NPs depend on it
    • a face baie to make bath to bath face is the head and the NP depends on it
      a ieși înainte to go forth to greet ieși is the head and înainte is a particle depending on it
    • att ge upp to give up ge is the head and upp is the particle depending on it
    • imeti krompir to have potatoes to be lucky imeti is the head and the NP depends on it
  • annotate as OTH
    • цъфна и вържа → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
    • leben und leben lassen live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
    • to pretty-print → there is an unusual case of an adjective modifying a verb
      to drink and drive → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
    • vivi e lascia vivere live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
    • pluć i łapać to spit and catchto be lazy, to do nothing useful → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
    • pintar e bordar paint and knit to abuse
    • det knallar och går it trots and walks it is OK/as usual → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination

The aim of this test is to distinguish VMWEs of category OTH from those that require further tests. For the special case of nominal, participle and gerund variants of VMWEs, the test should be applied to the canonical verbal form instead. This is required because there may be no verb or the verb may not be the syntactic head

  • вземам решение passes the test → variants like решението, което беше взето pass the test as well
  • eine Entscheidung treffen make a decision passes the test → variants like die Entscheidung wurde getroffen the decision was made, die Entscheidung, welche getroffen wurde the decision which was made, das Treffen der Entscheidung the making of the decision pass the test as well
  • to make a decision passes the test → variants like the decision which was made, decision-making, the making of the decision pass the test as well
  • prendere una decisione to take a decision make a decision passes the test → variants like la decisione è stata presa the decision was made, la decisione, che è stata presa the decision which was made, la presa di decisione the making of the decision pass the test as well
  • zbijać bąki to smash fartsto fool around, to do nothing useful passes the test → variants like zbijanie bąków farts smashingfooling around, doing nothing useful, zbijający bąki smashing farts pass the test as well
  • tomar uma decisão make a decision passes the test → variants like a decisão que foi tomada the decision which was made, decisão tomada decision made pass the test as well
  • a lua o decizie make a decision passes the test → variants like decizia care a fost luată the decision which was made, luarea deciziei decision-making pass the test as well
  • sprejeti odločitev make a decision passes the test → variants like odločitev, sprejeta dne ... the decision was made pass the test as well

Test 7 - [1DEP] - Single dependent

Does the VMWE contain exactly one lexicalized syntactic dependent d of the head verb v?

  • continue to next test
    • ритам камбаната kick the bell to diethe single dependent is a noun phrase, камбаната
      ставам на кайма turn into mince to be destroyedthe single dependent is a prepositional phrase, на кайма
      одирам жив skin alive to make someone sufferthe single dependent is an small clause (adjective), жив
    • eine Fratze ziehen a grimace pull to make a face → the single dependent is a noun phrase, Fratze
      , in Betracht ziehen to take into consideration → the single dependent is a prepositional phrase, in Betracht
      er gibt auf he gives up → the single dependent is a particle auf
    • to make a facethe single dependent is a noun phrase, face
      to take into accountthe single dependent is a prepositional phrase, into account
      to take turnsthe single dependent is a noun, turns
      to give upthe single dependent is a particle, up
    • fare le linguacce to make the grimaces to make a face → the single dependent is a noun phrase, linguacce
      , prendere in considerazione to take into consideration → the single dependent is a prepositional phrase, in considerazione
      egli lo fa fuori he kills him → the single dependent is a particle fuori
    • bić na alarm to strike on alarmto raise the alarm → the single dependent is a prepositional phrase, na alarm on alarm
      cholera wie cholera knowsI have no idea→ the single dependent is the nominal subject cholera
    • cometer um crime to commit a crime → one dependent
    • a face fațăto make faceto to deal withthe single dependent is a noun phrase, față
      a ieși înaintethe single dependent is an adverb, înainte
    • att ge upp to give up → the single dependent i s the particle upp
    • gre za it is about → the single dependent is a particle, za
      smejati se to laugh → the single dependent is a reflexive clitic, se
      imeti mačka to have a hangover → the single dependent is a noun, maček
  • annotate as ID
    • на стар краставичар краставици продавам to an old cucumber seller cucumbers to sell to try to cheat a more experienced persontwo dependents, на стар краставичар (PP) and краставици (NP)
      прочитам от корица до корица to read from cover to covertwo dependents, от корица (PP) and до корица (PP)
      правя (нечий) живот черен make someone'l life black to ruin someone's lifetwo dependents, (нечий) живот (NP) and черен (small clause)
    • die Katze aus dem Sack lassen to let the cat out of the bag → two dependents die Katze and aus dem Sack
    • to make ends meettwo dependents, ends and meet
      to let the cat out of the bagtwo dependents, the cat and out of the bag
    • non diregatto se non ce l'hai nel sacco don't say cat if you don't have it in the bag don't count your chickens before they're hatched → two dependents gatto and nel sacco
    • chować głowę w piasek to hide head in sandto pretend not to see a problem → two dependents, głowę head and w piasek in sand
      bać się własnego cienia to fear SELF one's own shadowto be very timid → two dependents, się SELF and własnego cienia own shadow
    • tapar o sol com a peneira to hide the sun with a sieve to sugar-coat → two dependents
    • a da bir cu fugițiito give tribute with fugitives theto disappeartwo dependents, bir and cu fugiții
      a- i ieși ochii din cap to his come out eyes the from head to starethree dependents, i, which is a non-RCLI, ochii, and din cap
    • att sätta sig upp mot någon to sit oneslef up against someone To defy someonetwo dependents, sig and upp
    • skrivati glavo v pesekto hide head in sand to pretend not to see a problem → two dependents, glavahead and v pesekin sand
      vlečeš me za nosyou are pulling my nose you're pulling my leg → two dependents, meme and za nosmy nose

The test covers only lexicalized dependents. There may be other, non-lexicalized dependents, which the test ignores. We explicitly call the non-verbal elements dependents instead of arguments or complements because argument-adjunct distinction is irrelevant. The outcome of the test is positive if the verb has a single lexicalized dependent, which can be the subject, the direct or indirect object, but also an adverbial complement, adverb, particle, relative clause, etc.

Test 8 - [CATEG] - Category of the dependent

What is the morphosyntactic category of the dependent d that co-occurs with the head verb v?

  • Reflexive clitic - apply IReflV tests. If the outcome is negative, discard the VMWE candidate.
    • страхувам се fear myself.REFL to be afraid
      радвам се feel joy myself.REFL to feel joy
    • sich wundern to wonder, sichschämen to be ashamed
    • English does not have IReflV expressions
    • se suicider to suicide, s'évanouir to faint
    • suicidarsi to suicide, vergognarsi to be ashamed
    • bać się fear SELFto be afraid
    • suicidar-se to suicide, queixar-se to complain
    • a se sinucide to commit suicide with obligatory ACC reflexive clitic
      a se holba to stare with obligatory ACC reflexive clitic
    • čuditi se to wonder, smejati se to laugh, onesvestiti se to faint
  • Particle - apply VPC tests. If the outcome is negative, discard VMWE candidate.
    • Bulgarian does not have VPC expressions
    • anfangento begin, er fängt anhe begins, er hat angefangen he has begun → in German, VPCs may occur separated or within one word, we annotate all occurrences!
      ich schlage vor I propose
    • παίρνω μπρος, βάζω μπροςνα, κάνω πίσω
    • to give up, to look forward to
    • far fuori to make out to kill, lo fa fuorihe kills him , lo ha fatto fuori he killed him
    • Polish does not have VPC expressions
    • Portuguese does not have VPC expressions
    • Romanian does not have VPC expressions
    • gre za it is about
      dati skozi to go through
      biti za to agree
  • Noun phrase (NP) or prepositional phrase (PP) headed by a preposition governing a noun - apply LVC tests. If the outcome is negative, categorize as ID.
    • ритам камбаната kick the bell to dieкамбаната is a noun phrase composed of a single noun
      давам зелена светлина give green light to allowзелена светлина is a noun phrase composed of an adjective and a noun
      ставам на кайма turn into mince to be destroyedна кайма is a prepositional phrase composed of a preposition governing a noun
    • dieNase rümpfen the nose wrinkle turn up one's nose at sth. die Nase is a noun phrase composed of a determiner and a noun
      in Kraft treten into
    • to make a wish a wish is a noun phrase composed of a determiner and a noun
      to take turns turns is a noun phrase composed of a single plural noun
      mettere radici radici is a noun phrase composed of a single plural noun
    • prendere in considerazione take into account in considerazione is a prepositional phrase composed of a preposition and a noun
      rompere il silenzio to break the silence il silenzio is a noun phrase composed of an article and singular noun
    • podjąć decyzjęto take a decisiondecyzję decision is a nominal phrase composed of a single noun
      chodzić prostą drogą to go (on) a straight road.INST to avoid complications prostą drogą(on)a straight road is a noun phrase composed of an adjective and a noun in (instrumental)
      bujać w obłokach to swing in the cloudsto fantasizew obłokach in the clouds is a prepositinal phrase composed of a preposition and a noun
    • tomar banho to take a shower banho is a noun phrase composed of a single noun
    • a rupe tăcerea to break silence the to start talking tăcerea is a noun phrase composed composed of a single noun
      a face baie to do bathto take a shower baie is a noun phrase composed of a single noun
    • biti v dvomihto doubtv dvomih in doubt is a prepositional phrase composed of a preposition governing a noun
  • Other - categorize as ID.
    • Adjective:
      • излизам сух от водата to come out dry from the water to avoid taking responsibility
        одирам жив skin alive to make somone suffer
        гоня дивото chase the wild.ADJ to take risks дивото is a substantive
      • rot sehen to see red
      • τα βάφω μαύρα them-NE.PL.ACC paint-1.SG black-NE.PL.ACC be very sad
      • to stand firm, to see red
      • voir rouge to see red to be very angry
      • vedere nero to see black
      • zrobić swojeto do one's ownto do what one is supposed to do
      • pensar grande to think big
      • a vedea roșu to see red
        a o face lată to CL.ACC make wideto party
      • narediti svojeto do one's ownto do what one is supposed to do
    • Verb:
      • не искам и да чуя don't want to even hear to oppose strongly и да чуя is a VP
        правя сам да си говори make someone talk to himself to drive someone crazy сам да си говори is a clause
      • ??
      • to make do
      • laisser tomber let fall to let down
        vouloir dire want say to mean
      • lasciar andare let go to unhand
        voler dire want say to mean
      • dać komuś popalić to let someone smoketo make someone's life hard
      • querer dizer want say to mean
      • zagosti jo komu to play music to someoneplay a joke on someone
    • Adverb:
      • изваждам наяве take out in the open to uncover
        хващам натясно catch in a tight place to coerce, to pressure
      • φέρωβαρέως
      • to get well
      • buttare giù to throw down to demoralize
      • chcieć dobrze to want wellto have good intentions
        robić komuś dobrze to do someone.DAT wellto please someone
        źle/marnie skończyć badly finishto come to a bad end
      • cair
          bem
        fall well to be appropriate
      • a se face bine to himself make well to get well
        a face bine to make well to help
      • obrniti se na bolje to turn for better to be better
    • Pronoun:
      • мързи ме (it feels) lazy me.ACC to be lazy
      • τακαταφέρνω to make it
        τηνπατάω to fail
      • to make it
      • le faire it make to be enough/successful
      • farcela to make it to manage
      • No example found in Polish
      • dá-lhe João! give to him/her, João! show them what you got, João!
      • a o lua la măsea CL.ACC.F.3SG take PREP tooth.ACC to drink heavilywith the non-anaphoric feminine clitic 'o' functioning as an expletive
    • Etc.

Section 6.2

Light verb constructions (LVC)

Light verb constructions (LVC) constitute a universal category. We retain the following key characteristics:

  1. They are formed by a verb v and a noun n, which either directly depends on v or is introduced by a preposition.
    • вземам решение to make a decision
      държа под контрол to keep under control
    • zum Einsatz kommen to the use come to be called into action
      eine Rede halten a speech hold to give a speech
    • παίρνω μία απόφαση make a decision
      δίνω στα νεύρα give to the nerves
    • to give a lecture
      to come into bloom
    • faire une présentation make a presentation
    • chiamare in causa to call in cause to single out
      fare una passeggiata to make a walk to have a walk
    • odnieść sukces carry-away success to be successful
    • fazer um aborto to make an abortion
      estar com fome be with hunger to be hungry
    • a duce dorul to carry yearning.theto miss somebody
      a da divorț to give divorce to divorce
      a da în clocot to give in boil to come to the boil
      a da în fiert to give in boil to come to the boil
    • biti v dvomih to doubt
      imeti predavanje to give a lecture
  2. The noun n is predicative, ofter referring to an event (e.g. decision, visit) or a state (e.g. fear, courage).
    • вземам решение to make a decision → noun refers to an act or event
      давам съгласие to give permission → noun refers to an act or event
      имам притеснения to have concerns → noun refers to a feeling or state
      имам готовност to be ready → noun refers to a feeling or state
    • eine Entscheidung treffen to make a decision → noun refers to an event
      Angst habento have fear→ noun refers to a state
    • παίρνω μία απόφαση, κάνω βόλτα → noun refers to an event
      έχω αγωνία, κάνω κουράγιο → noun refers to a state
    • to make a decision, to pay a visit → noun refers to an event
      to have fear, to have courage → noun refers to a state
    • donner un conseil give advice→ noun refers to an eventavoir du courage to have courage→ noun refers to a state
    • fare una domanda → noun refers to an event
      avere paura, avere coraggio → noun refers to a state
    • prowadzić rozmowy to lead conversationsto lead negotiations→ the noun refers to an event
      mieć rację to have rightto be right→ thre noun refers to a state
    • fazer uma prece to make a prayer → noun refers to an event
      ter sintomas to have symptoms → noun refers to a state
    • a lua o decizie to make a decision, a face o vizită to pay a visit→ noun refers to an event
      a avea curaj → noun refers to a state
    • imeti pogum to have courage → noun refers to a state
      sprejeti odločitev to make a decision → noun refers to an event
  3. The verb v is "light", i.e. it contributes to the meaning of the whole only by bearing tense and mode. It may be "light" either per se, or when used in the specific context of the noun. This implies that v's syntactic subject is n's semantic argument.

Many authors distinguish support verbs from light verbs, still others differentiate between true light verbs and vague action verbs. On the one hand, we take a narrower scope by ignoring aspectual or causative support verbs, since they do contribute an additional (change of state) meaning to the expression. For instance, for the predicative noun walk, we will consider the light verb to have, but not the aspectual verbs to start, to pursue, to stop a walk. For the noun bloom, which is in itself inchoative, we do consider come into bloom as LVC (both the verb and the noun are inchoative, so the verb does not add any semantics to the noun). In the same vein, we do not consider constructions with causative support verbs as LVCs (as in give a headache compared to have a headache).

On the other hand we do take in cases in which the verb has per se a light semantics (it only bears the tense and mood in any case), which hence cannot be described as "bleached" as is usually said of support verbs. For instance, whereas to pay does not have its usual meaning in to pay a visit, it cannot really be said that commit does not have one of its meanings in commit a crime (note that commit can be used with any negatively charged achievement noun, e.g. suicide, crime, fraud, felony...). These are borderline cases in that they do not fulfill the tests 1 to 5, but we take them as LVCs.

The noun n functions as a regular syntactic dependent, so LVCs exhibit regular syntactic variants.

  • взема решениерешението, което президентът взе the decision that the president made
  • eine Entscheidung treffen → die Entscheidung die der Direktor zu treffen hatte.
  • παίρνωμία απόφαση → η απόφαση που πρέπει κάποιος να πάρει.
  • make a decision → the decision that the director has to make.
  • prendre une décision → la décision prise par la directrice.
  • prendere una decisione → la decisione che il direttore ha dovuto prendere.
  • wziąć udział to take participation.ACCto take part wzięcie udziału taking.GER participation.GENtaking part, biorący udział taking.PART participation.ACCtaking part
  • tomar banho take shower → o banho que eu tomei estava bom the shower which I took was good
  • a lua o decizieto make a decisiondecizia pe care directorul trebuie să o ia the decison that the director has to make.
  • sprejeti odločitev to make a decisionodločitev, ki jo je moral sprejeti direktor the decison that the director has to make

In many cases of LVCs, it can be said that there is some degree of selection of the verb by the noun.

  • вземам решение to make a decision vs *вземам отговорност to take responsibility
    имам право to be right vs *притежавам право
  • eine Entscheidung treffen a decision meet make a decisionvs.*eine Entscheidung machen a decision make vs. *einen Beschluss treffen a resolution meet
  • κάνω διάλειμμα vs. *παίρνω διάλειμμα
  • have a walk vs *have a race
    run a race vs *run a walk
  • faire une marche make a walk take a walk vs *procéder à une promenade perform a walk but faire/procéder à une enquête make/perform an inquiry
  • prendere una decisione take a decision make a decisionvs.*fareuna decisione make a decision vs. *prendere una conclusione take a conclusion
  • wziąć udział to take participation vs. *wziąć uczestnictwo
    mieć rację to have rightto be right vs. *posiadać rację to possess right
  • fazer uma prece to make a prayer vs. *dar uma prece to give a prayer but fazer/dar uma caminhada to make/give a walk
  • a da divorț to give divorce to divorce vs. *a oferi divorț
  • postaviti vprašanje to put a question to pose a question vs *postaviti odgovor

Yet some regularities exist. For example, large classes of nouns function with have (e.g. +property) or commit (+negative achievement). Therefore, we chose not to retain the selection of the verb as a criterion for LVC categorization . Instead, the following decision tree should be applied to decide whether a candidate should be annotated as an LVC.

LVC-specific decision tree:

In this tree, a single NO to one of the tests is sufficient to decide that a candidate is not an LVC.
  • Apply test 9 - [N-EVENT: The noun describes an event/state?]
    • It is not an LVC, exit
    • Apply test 10 - [N-SEM: The noun keeps its usual sense?]
      • It is not an LVC, exit
      • Apply test 11 - [V-LIGHT: The verb adds zero semantics?]
        • It is not an LVC, exit
        • Apply test 12 - [V-REDUC: Subj+v+n transformable to subj's n?]
          • It is not an LVC, exit
          • Apply test 13 - [N-PROHIB-ARG: Noun prohibits a regular argument?]
            • It is not an LVC, exit
            • It is an LVC, exit

Test 9 - [N-EVENT] Noun denoting an event/state

Does the noun n refer to an event or state (including permanent or non permanent properties, relations) with at least one semantic argument?

  • continue to next test
    • поставям акцент to emphasize → event, with two arguments: the agent and the object being emphasized
      имам право → property, with one semantic argument: the possessor of the property
    • einen Besuch abstatten to pay a visit → event, with two arguments: the visitor and the visitee
      Angst haben to have fear → property with one semantic argument: the entity having fear
      einen Blick auf etwas werfen a glance at sth. throw to take a glance at sth → an event with two arguments the entity glancing and the entity glanced at
    • κάνω μία επίσκεψη to-make a visit pay a visit, visit → event, with two arguments: the visitor and the visitee
      έχω τη δυνατότητα to-have the ability to be able → property, with one core semantic argument: the entity having the ability
      έχω μίσος → state, with two arguments: the entity being in state hate and the entity hated
    • pay a visit → event, with two arguments: the visitor and the visitee
      have strength → property, with one semantic argument: the entity having strength
      take a glance at something → event, with two arguments: the entity glancing and the entity glanced at
    • avoir du courage to have courage→ state(property), with one argument: the entity having courage
    • fare una visita → event, with two arguments: the visitor and the visitee
      avere forza → property, with one semantic argument: the entity having strength
      dare unosguardo a qualcosa → event, with two arguments: the entity glancing and the entity glanced at
    • złożyć wizytę to submit a visitto pay a visit→ event, with two arguments: the visitor and the visitee
      złożyć skargę to submit a complaintto make a complaint → event, with two arguments: the complaining person and the one he/she complains about
    • ter fome to have hunger to be hungry → property, with one argument: the entity that is hungry
      ter idade para fazer algo to have age (to do something) to be old enough (to do something) → state, with one argument: the entity that is old enough
      We include as states and events (predicative nouns) the following classes: diseases (gripe, trombose, infarto), physical sensations (fome, sede, sono), emotions (medo, paixão, nojo), cognitive entities (ideia, opinião, preocupação), characteristics (coragem, teimosia, fraqueza), relations (contato, conflito, amizade) and communication (conversa, discussão, briga).
    • a face o vizită to make a visit to pay a visit → event, with one argument: the entity that visits
      a avea curaj to have courage → property, with one semantic argument: the entity having courage
    • imeti predavanje to give a lecture → event, with two arguments: a lecturer and the people who are attending the lecture
  • it is not an LVC
    • Иван хвърли боклука Ivan threw out the garbage → physical entity (not event/state)
    • Joe macht einen Kuchen→physical entity (not event/state), even though Joe could be considered a semantic argument
    • Joe makes a cake → physical entity (not event/state), even though Joe could be considered its semantic argument
      Joe experienced a tornado → event, but has no semantic argument
    • Anna a un vélo → noun not an event, nor a state
    • Joe fa un dolce → physical entity (not event/state), even though Joe could be considered its semantic argument
      Joe ha vissuto un tornado → event, but has no semantic argument
    • złożyć kartkę to fold a sheet→ physical entity (not event/state)
      bić pianę to beat foamto exaggerate about a problem→ physical entity (not event/state)
    • quebrar a cabeça to break one's head to rack one's brain → physical entity (not event/state)
      We exclude from the test abstract nouns represeting: informational content that do not require agents (informações, notícias), natural phenomena (chuva, neve, tornado).
    • Joe a făcut o prăjiturăJoe made a cake → physical entity (not event/state), even though Joe could be considered its semantic argument
    • Janez ima avto → the person that has a car could be considered as a semantic argument, but the car is not an event or a state

Test 10 - [N-SEM] Noun keeping its sense

Is the noun n used in one of its original senses?

  • continue to next test
    • поемам отговорност to take responsibility → the noun is literally understood
    • einen Besuch abstatten → noun is literally understood
    • pay a visit → noun is literally understood
    • rendre visite → noun is literally understood
    • fare una visita → noun is literally understood
    • podjąć decyzjęto take a decision→ the noun decyzja is literally understood
      pobić rekord to beat a recordto break a record→ the noun rekord is literally understood
    • tomar banho → noun is literally understood
    • a face o vizită to make a visit to pay a visit → noun is literally understood
    • imeti pogum to have courage → noun is literally understood
  • it is not an LVC
    • яхвам метлата to get on the broom to get very angry → noun is not literally understood (it's an ID)
    • Herzklopfen haben heartbeating have to be in love → noun is not used in one of its normal senses (it's an ID)
    • have kittens to be worried or angry → noun is not used in one of its normal senses (it's an ID)
    • jeter l'éponge to give up → noun is not used in one of its normal senses (it's an ID)
    • avere il batticuore have the heartbeating to be in love → noun is not used in one of its normal senses (it's an ID)
    • nadstawiać karkuto expose neck.GENto take personal risks→ the noun karkneck is not literally understood
    • quebrar um galho to break a branch to do a favor → noun is not used in one of its normal senses (it's an ID)
    • a face fațăto make face to to succeed → noun is not used in one of its normal senses (it's an ID)
    • imeti krompir to have potatoes to be lucky → noun is not used in one of its normal senses (it's an ID)

Test 11 - [V-LIGHT] Verb with light/void semantics

Does v only bear tense and mood, and add no semantic that is not already present in n, other than pointing to which semantic role is played by v's subject with respect to n's predicate?

  • continue to next test
    • вземам решение make a decision вземам adds no meaning to решение decision besides that of performing an act
      държа реч to make a speech държа adds no meaning to реч besides that of performing an act
      поемам отговорност to take responsibility поемам adds no meaning to отговорност besides that of having a property
    • eine Entscheidung treffen a decision meet to make a decision treffen adds no meaning to Entscheidung besides that of performing an activity
      Angst haben to have fear haben adds no meaning to Angst besides that of having a property.
    • take a walk take adds no meaning to walk besides that of performing an activity
      make a decision make adds no meaning to decision besides that of performing an activity
      have fear have adds no meaning to fear besides that of having a property
      perform a check perform is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
      commit a crime commit is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
      pay a visit → the verb in its usual sense means 'to spend some money on a visit', but here it is not used in this sense and does not add any semantics to the "visiting" event
      deliver a speech → the verb in its usual sense means 'to move from one placeto another', but here it is not used in this sense and does not add any semantics to the "speech" event
      undergo a surgery undergo adds no meaning to surgery besides indicating that the subject is the patient of the surgery
    • ils ont du courage they have some courage have adds no meaning to courage besides that of having a property
      ils reçoivent l’ordre de partir they receive the order of leavingthey are ordered to leave receive adds no meaning to order besides indicating that the subject is the recepient of the order
      il a subi une intervention chirurgicale he has undergone an intervention surgery he underwent surgery undergo adds no meaning to surgery besides indicating that the subject is the patient of the surgery
    • fareuna passeggiata fare adds no meaning to passeggiata besides that of performing an activity
      prendere una decisione prendere adds no meaning to decisione besides that of performing an activity
      avere paura avere adds no meaning to paura besides that of having a property
      eseguire un controllo eseguire is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
      commettere un crimine commettere is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
      fare una visita → the verb in its usual sense means 'to spend some money on a visit', but here it is not used in this sense and does not add any semantics to the "visiting" event
      fare un discorso → the verb in its usual sense means 'to make', but here it is not used in this sense and does not add any semantics to the "speech" event
    • oddać hołd to give-back tributeto pay tribute oddać give-back adds no meaning to hołdtribute besides that of performing an activity
      wystąpić z wnioskiem to stand out with a proposal to put forward a motion wystąpić z stand out with adds no meaning to wniosekmotion besides that of performing an activity
    • mover uma ação judicial to move a lawsuit to sue to move adds no meaning to lawsuit besides that of performing an activity
      apresentar uma lesão present a lesion to have a lesion to present adds no meaning to lesion besides that of having a property
      estar com medo be with fear to be afraid to be with adds no meaning to fear besides that of being in a state
    • a avea curaj to have courage avea adds no meaning to curaj besides that of thaving a property
      a lua o decizieto make a decision lua adds no meaning to decizie besides that of performing an activity
    • imeti pogum to have courage imeti have adds no meaning to pogum courage besides that of having a property
      sprejeti odločitev to make a decision sprejeti in its usual sense means 'to receive', but here it is not used in this sense and does not add any semantics to event
  • it is not an LVC
    • започвам играта start the game, start playing започвам start adds an aspectual meaning to the noun
    • eine Rede beginnen to begin a speech beginnen adds an aspectual meaning to the noun Rede
    • to start a walk start adds an aspectual meaning to the noun
    • donner du courage to give courage
      donner son avis to give one's opiniondonner adds the information that the opinion is communicated
    • cominciare un ballo to start a dance cominciare adds an aspectual meaning to the noun ballo
    • wymierzyć sprawiedliwośćto measure justiceto do justicewymierzyćmeasure adds an aspectual meaning to sprawiedliwośćjustice
      przejść na emeryturęto cross to retirementto take retirementprzejść adds an inchoative (change-of-state) meaning to the noun
    • entrar com uma ação judicial to enter with a lawsuit to file a lawsuit to enter adds an aspectual meaning to the noun
      enfrentar uma ação judicial to face a lawsuit enfrentar introduces the patient of a lawsuit, not the agent
      dar uma opinião to give an opinion to giveadds the meaning of communication which is not present in the name itself (one can ter uma opinião to have an opinion without ccommunicating it).
    • a începe muncato start work the to start working începe adds an aspectual meaning to the noun

Note that this light semantics of the verb is either usual for that verb (i.e. the verb is a pure syntactic operator, like commit, perform), or happens in the context of the particular noun (e.g. for pay in to payvisit).

Test 12 - [V-REDUC] - Verb reduction

Can an NP in which v's subject becomes n's dependent evoke the same event or state as the candidate construction does?

  • continue to next test
    • Иван пое отговорност Ivan took responsibility отговорността на Иван — both refer to the same property/event
      Иван взе решение Ivan made a decision решението на Иван — both refer to the same property/event
    • Paul hat eine Rede gehalten Paul has given a speech Paul's speech both refer to the same speech event
      Ich habe ihm einen Besuch abgestattet I have paid him a visit mein Besuchmy visit both refer to the same visiting event
    • Paul had a walk Paul's walk — both refer to the same walking event
      I paid him a visit my visit — both refer to the same visiting event
    • Paul a fait une enquête Paul made an inquiryL'enquête de Paul Paul's inquiry
      Paul procède à une perquisition Paul makes a searchLa perquisition de/par Paul the search of/by Paul
      Le général donne l'ordre de partir The general gives the order to leave The general orders to leave l'ordre du général de partir The general's order to leave
      Les soldats reçoivent l'ordre de partir The soldiers receive the order to leave The soldiers are ordered to leavel'ordre aux soldats de partir The order to the soldiers to leave
      Jean souffre de troubles psychiques John suffers from psychic troubles Les troubles psychiques de Jean John's psychic troubles
      Jean présente une hypersensibilité John presents a hypersensibility John has a hypersensibilityl'hypersensibilité de Jean John's hypersensibility
      Paul reçoit des menaces de (la part de) Pierre Paul receives threats from (the part of) Peter Paul is threatened by Peterles menaces de Pierre à Paul Peter's threats to Paul
      Ce médicament présente un risque This medicine presents a risk This medicine poses a risk le risque de ce médicamentthis medicine's risk
    • Paolo ha fatto una conquistaPaul made a conquerla conquista di Paolo
      Il generale da l' ordinedi partire. The general gives the order to leaveThe general orders to leave L'ordine di/da parte del generale di partire
      Paolo riceve delle minacce da (parte di) Piero le minacce di Piero a Paolo
    • Obecni oddali hołd poległym The present gave-back tribute to the fallen The audience payed tribute to the fallenhołd obecnych the tribute of the audience
      Jan miał na myśli Marię Jan had on thought Maria Jan meant Mariamyśl JanaJan's thought
      Jan otrzymał wymówienieJan received a dismissalwymówienie dla Jana dismissal for Jan
      Inwestycja przynosi zyski the investment brings profitzyski z inwestycji profit from the investment
    • João cometeu um deslize o deslize do João — both refer to the same event
      O jogador cobrou um pênalti the player charged a penalty kick the player took a penalty kick o pênalti do jogador the player's penalty kick — both refer to the same event
      João tem consciência do perigo John has conscience of the danger John is aware of the danger a consciência do João sobre o perigo John's awareness of the danger — both refer to the same state
      João recebeu a remuneração John received the remuneration a remuneração do João John's remuneration — both refer to the same event
      O paciente recebeu a visita dos familiares The patient received the visit of the relatives a visita dos familiares ao paciente the visit of the relatives to the patient — both refer to the same event
      João apresenta lesões John presents lesions as lesões do João John's lesions — both refer to the same state
    • Paul a făcut o plimbarePaul had a walk plimbarea lui Paul Paul's walk — both refer to the same walking event
      i-am făcut o vizită I paid him a visit vizita mea — both refer to the same visiting event
    • Zdravnik je postavil diagnozo The doctor made a diagnosis njegova diagnoza His diagnosis both refer to the same event
  • it is not an LVC
    • Иван хвърли поглед на вестника Ivan threw a glance at the newspaper #погледът на Иван върху вестника — different semantics; and requires a different preposition
    • Paul hat einen guten Eindruck gemachtPaul has made a good impression #Paul's Eindruck auf seine Freunde Paul's impression on his friends has a different semantics
    • Paul made a good impression #Paul's impression on his wife — different semantics
    • Son comportement porte une atteinte grave à l'honneur des soldats His behaviour seriously jeopardises the soldiers' honnour #l'atteinte de son comportement the jeopardy of his behaviour
      Ce fait attire l'attention de la justice This fact attracts the attention of the justice ?l'attention de la justice pour/sur ce fait the attention of the justice on/about this fact
    • Michael Phelps pobił rekord sprzed 2 tysięcy latMichael Phelps broke the record from 2 thousand years ago→ #Michael Phelps' record
      Ulica nosi imię sławnego poety The street carries the forename of a famous poet The street carries the name of a famous poet.imię ulicy the forename of the street
    • O jogador cobrou uma falta the player charged a foul the player took a free kick a falta do jogador the player's foul — the focus changes from taking a free kick to being one of the parts involved in a foul (it's an ID)
      O jogador provocou uma lesão the player provoked a lesion a lesão do jogador the player's lesion — In the reduced NP, the focus changes from hurting somebody else to getting hurt
      O músico apresenta suas composições the musician presents his compositions as composições do músico the musician's compositions — the reduced NP does not keep the sense of presenting, it is not refer to the same event as the verbal construction
    • Paul a făcut o impresie bunăPaul made a good impression #Impresia lui Paul despre soția sa Paul's impression on his wife— different semantics

Test 13 - [N-PROHIB-ARG] Noun's prohibited argument

Does the noun n, in the presence of v, prohibit at least one syntactic argument a which it normally licensed in the absence of v (except when a is in the whole–part relation with v's subject)? An alternative formulation for this test is the following: Let s be the subject of v, and let r be the semantic role that s plays with respect to the noun n. Is it prohibited for r to be realized both by s and by a syntactic argument a of n, except when a is in the whole–part relation with s? The rationale for this tests is that a semantic argument n cannot be realized as its syntactic dependent, since it is already realized as v's syntactic dependent instead (usually as v's subject). For instance the noun visit takes two semantic arguments, the visitor and the visited entity, as in "the visit of the Queen to the Prime Minister". When used in to pay a visit, the visitor semantic argument is realized as the subject of to pay (The Queen paid a visit to the Prime Minister), and cannot be realized at the same time within the NP headed by visit (*The Queen paid a visit of the Lady to the Prime Minister).

  • it is an LVC
    • Петър Стоянов взе решението да подпише договора Ivan made the decision to sign the contract + решението на президента да подпише договора*Петър Стоянов взе решението на президента да подпише договора — the noun cannot be modified by the person performing the act/event (which is the subject)
    • Die Königin hat dem Premierminister einen Besuch abgestattet the Queen has paid a visit to the Prime Minister + ein Besuch der Dame beim Premierminister a visit of the Lady to the Prime Minister *Die Königin hat einen Besuch der Dame beim Premierminister abgestattet*The Queen paid a visit of the Lady to the Prime Minister — the visitor cannot be a modifier of visit
      Paul hat eine Entscheidung über das Budget getroffen Paul made a decision on the budget + die Entscheidung des Rates über das Budget the council's decision on the budget*Paul traf die Entscheidung des Rates über das Budget *Paul made the committee's decision on the budget — the decision maker cannot modify decision
    • The Queen paid a visit to the Prime Minister + a visit of the Lady to the Prime Minister*The Queen paid a visit of the Lady to the Prime Minister — the visitor cannot be a modifier of visit
      Paul made a decision on the budget + the committee's decision on the budget*Paul made the committee's decision on the budget — the decision maker cannot modify decision
      Paul leads the discussion + Peter's discussion*Paul leads Peter's discussion but Paul leads the discussion of the committee — the discussing entity can only modify discussion when the subject Paul is part of the committee
      Bjarnson scored a goal + Arnason's goal*Paul scored Arnason's goal but Paul scored the goal of Iceland — the scoring entity can only modify goal in the last case, when they are part of the Iceland team
    • La ministre a rendu une visite aux victimes + la visite de la ministre aux victimes*La ministre a rendu une visite du président aux victimes — the visitor cannot be a modifier of visite
      Bjarnson a marqué un but + le but d'Arnason*Paul a marqué le but d'Arnason but Paul a marqué le but de l'Islande — the scoring entity can only modify but (goal) in the last case, when they are part of the Iceland team
    • Il primo ministro ha preso la decisione di dimettersi the Prime Minister decided to resign + le dimissioni del governo the resignation of the government*Il primo ministro ha preso la decisione del governo di dimettersi — the resigner cannot be a modifier of resignation
    • Paweł złożył rezygnację ze stanowiska dyrektora Paweł submitted a resignation from the position of the director Paweł tendered his resignation from the director position + rezygnacja Piotra *Paweł złożył rezygnację Piotra ze stanowiska dyrektora Paweł tendered Piotr's resignation from the director position - the resignation cannot be modified by the resigning person
      Paweł prowadzi rozmowy *Paweł prowadzi rozmowy Piotra Paweł leads Piotr's talks , Paweł prowadzi rozmowy komisji Paweł leads the talks of the commission - the discussing entity komisjacommission can only modify rozmowytalks if Paweł belongs to the commission.
      Jan otrzymał wymówienieJan received a dismissal + wymówienie dla Pawła dismissal for Paweł *Jan otrzymał wymówienie dla Piotra
    • João está tomando banho John is taking shower + o banho do Pedro Pedro's shower*João está tomando o banho do Pedro — the bath cannot be modified by a bath taker
      Pedro sofreu prejuízo com a compra Pedro suffered finantial loss with the purchase + o prejuízo do José José's finantial loss*Pedro sofreu o prejuízo do José com a compra — the financial loss cannot be modified by the affected entity
      A Maria fez um aborto Maria made an abortion + o aborto da Joana Joana's abortion#A Maria fez o aborto da Joana — the noun cannot be modified by another patient
      O médico realizou o parto com sucesso The doctor performed the childbirth with success + o parto do Dr. Pedro Dr. Smith's childbirth*O médico realizou o parto do Dr. Pedro com sucesso — the childbirth could be modified by the mother (patient) but not by another doctor (agent).
    • Paul a dat sfaturi surorii salePaul gave advice to his sister + sfatul lui Petre Peter's advicePaul a dat sfatul lui Petre surorii sale Paul gave Peter's advice to his sistersfatul the advice cannot be modified by its author
    • Učiteljica je sprejela odločitev v zvezi z nalogo The teacher made a decision regarding the homework + dijakova odločitev v zvezi z nalogo pupil's decision regarding the homework*učiteljica je sprejela dijakovo odločitev v zvezi z nalogo — the decision maker cannot modify decision
  • it is not an LVC
    • Иван предаде решението на сестра си Ivan transmitted the decision to his sister + решението на комисиятаИван предаде решението на комисията на сестра си Ivan transmitted the decision of the comission to his sister — the noun can be modified by the person performing the act/event (which can be different from the subject)
    • Paul hat seiner Schwester die Entscheidung überbracht Paul has transmitted the decision to his sister+ Peter's Entscheidung Peter's decisionPaul hat seiner Schwester Peter's Entscheidung überbracht Paul transmitted Peter's decision to his sister — the decision can be modified by its author
    • Paul transmitted the advice to his sister + Peter's advicePaul transmitted Peter's advice to his sister — the advice can be complemented by its author
    • Paul a transmis l'ordre aux soldats Paul transmitted the order to the soldiers+l'ordre de Paul aux soldats Paul's order to the soldiersPaul a transmis l'ordre du général aux soldats Paul transmitted the general's orders to the soldiers — l' ordre can have as a complement the person who gave it
    • Paweł podważył niedawną decyzjęPaweł questioned the recent decision +decyzja PiotraPiotr's decision→ Paweł podważył niedawną decyzje PiotraPaweł questioned Piotr's recent decision — the decision can be modified by the decision maker
    • Paulo tem notícias sobre o conflito Paul has news about the conflict+ as notícias de Pedro Peter's newsPaulo tem as notícias de Pedro sobre o conflito Paul has Peter's news about the conflict — the news can be complemented by its source
    • Paul a transmis sfatul surorii salePaul transmitted the advice to his sister + sfatul lui Peter Peter's advicePaul a transmis sfatul lui Peter surorii sale Paul transmitted Peter's advice to his sistersfatul the advice can be modified by its author
    • Janez mi je povedal mnenje o filmu Janez told me his opinion on the movie+ Ninino mnenje o filmu Nina's opinion on the movieJanez mi je povedal Ninino mnenje o filmu Janez told me Nina's opinion on the movie – the opinion can be modified by the person who has this opinion

Section 6.3

Idioms (ID)

Idioms constitute a universal category. An idiom (ID) has at least two lexicalized components including a head verb and at least one of its arguments. The argument can be of different types. Here are some examples:

  • Subject
    • броят му се ребрата be counted someone's (possessive pronoun) ribs (someone) to be very thin and skinny
    • ein kleines Vöglein hat mir gezwitschert a little bird told me
    • a little bird told someone
    • tu hora ha llegado your time has arrived your time has come
    • licho wie devil knowsI have no idea
    • a sua hora chegou your time has arrived your time has come
    • a șoptit o păsăricăwhispered a bird little a little bird told someone
    • srce mu je padlo v hlače His heart fell into his pants someone has lost courage
  • Direct object
    • гушна букета hug the bunch of flowers to die
    • er hat den Schuss nicht gehört he did the shoot not hear it takes him a long(er) time to understand sth
    • to kick the bucket
    • estirar la pata to strech the leg kick the bucket
    • udać Greka to pretend to be a Greekto pretend not to understand
    • bater as botas to hit the boots to die, abrir mão de algo to open hand (of something) to give up (on something)
    • a arunca vina to throw guilt-the to blame
    • ustreliti kozla to shoot the goat to say or do something stupid
  • Circumstantial or adverbial complement
    • удрям в гръб hit in the back to stab in the back
      правя сам да си говори make (someone) to talk to himself to drive (someone) crazy
    • etwas wie warme Semmeln verkaufen sth. like warm bread rolls to sell sth. fast and easy
    • to take something with a pinch of salt, to sell like hotcakes, to strike while the iron is hot, to come off with flying colors
    • coger algo con pinzas to hang something with pegs take something with a pinch of salt
    • wiercić komuś dziurę w brzuchu to drill a hole in one's bellyto intrusively solicit someone, to insist too much
    • levar em conta to bring in account to take into account
      ir ao ar go to the air to go on air
    • a lua în considerare to bring in account to take into account
    • spati kot ubit to sleep like dead to sleep soundly

It is often challenging to distinguish IDs from other VMWE categories if only one argument of the head verb is lexicalized. The VMWE categorisation depends on the category of this argument:

  • Noun or preposition governing a noun: fine-grained tests need to be applied in order to discriminate between an LVC and an ID. See the section on Structural tests.
  • Particle or reflexive pronoun: the VMWE is either a VPC (particle) or an IReflV (reflexive pronoun), never an ID.

With an argument of any other category, the VMWE is always an ID, including the following:

  • Preposition governing a complex noun phrase
    • удрям някого в гръб hit someone in the back to stab someone in the back
    • jmd springt im Dreieck s.o. jumps in the triangle s.o. can soon no more control his anger
    • to take something with a pinch of salt
    • coger algo con pinzas to hang something with pegs take something with a pinch of salt
    • dopiąć coś na ostatni guzik to button something up to the last buttonto complete something
    • bater na mesma tecla to hit the same key to insist on something
    • a da cu piciorul to give with leg-the to give up the chance
    • skrivati glavo v pesek to hide head in the sand to pretend not to see a problem
  • Adjectival phrase
    • schwarz fahren to drive black to take a ride without a ticket
    • to come clean, to stand firm
    • jugar sucio to play dirty to play dirty
    • zrobić swoje to do one's ownto do what one is supposed to do
      tykać cudze to touch someone else'sto take something that does not belong to you
      dopiąć swego to button up one's ownto fulfill one's plans
    • to jogar sujo to play dirty
    • a juca murdar to play dirty
    • biti zelen od zavisti to be green with envy
  • Verbal phrase
    • will sagen want to say that is to say
    • to make do
    • n.a.
    • laisser tomber to let fall to give up
    • dać komuś żyćto let someone livenot to bother someone
      można wytrzymaćone can standthe situatiion is reasonably good
    • querer dizer to want to say to mean
    • n.a.
    • n.a.
  • Relative clause
    • wissen wo es langgeht to know where things are heading to know on which side one's bread is buttered
    • to know on which side the bread is buttered
    • wiedzieć, skąd wieje wiatr to know where wind blows fromto know on which side your bread is buttered, to know how to take advantage of the situation
    • saber onde pisar know where to-step to know the way to succeed in something
      mostrar com quantos paus se faz uma canoa show with how many sticks one makes a canoe to punish or take revenge
    • a ști cu ce se mănâncă to know with what CL.Refl. eats to knwo what it is about
    • vedeti koliko je ura to know what time it is to realize the truth
  • Non-reflexive pronoun
    • es gibt it gives there is
    • τα καταφέρνω, την πατάω
    • to make it
    • l'emporter to take it away to win
    • prender le to take it to be beaten
    • Polish does not seem to have this type of VMWEs
    • dá-lhe João! give to him/her, João! show them what you got, João!
    • a o șterge to her delete to fly the coop
      a o întinde to her extend to fly the coop synonymous expressions with the non-anaphoric feminine ACC personal clitic 'o' functioning as an expletive
    • ucvreti jo to escape her to escape something/someone by running

Sentential expressions with no open slots, such as proverbs and conventionalized sentences, are included in the scope of IDs.

  • Rom wurde nicht an einem Tag erbaut Rome was not build in a day wer A sagt muss auch B sagen who says A must also say B you must finish what you start
  • Rome was not built in a day
    Fortune favors the bold
    The pleasure is mine
    I beg your pardon!
  • trafiła kosa na kamień met the scythe a stonesomeone rude/dishonest came across someone else who used similar methods against him/her
  • quem vê cara não vê coração who sees face doesn't see heart a person can lie/omit his/her feelings
  • Urciorul nu merge de multe ori la apă Pitcher-the not goes of many times at water The pitcher goes so often to the well that it is broken at last
  • Počasi se daleč pride more haste less speed
    Po toči zvoniti je prepozno there is no use ringing the bells after hail it is to late

If more than one argument of the head verb is lexicalized, then the candidate VMWE it is always classified as an ID.

  • die Katze aus dem Sack lassen to let the cat out of the bag
  • to let the cat out of the bag, to cut a long story short, to call it a day
  • se faire des idées to make SELF ideas to imagine something false,s'en aller to go SELF from there to leave,il y a it has there there is
  • chować głowę w piasek to hide head in sandto pretend not to see a problem
  • tapar o sol com a peneira to hide the sun with a sieve to sugar-coat
  • a da bir cu fugiții to give tribute with fugitives.the to back away
  • att sätta sig upp mot någon to sit oneself up against someone to defy someone
    att dra sitt strå till stacken to draw one's straw to stack.the to contribute (in a small way)
  • beseda mi je ostala v grlu word got stuck in my throat I am speechless

In case of several lexicalized arguments, special care must be taken to identify and also annotate embedded VMWEs.

  • einen Plan aufstellen to set up a plan to draw up a plan → contains the VPC aufstellen to set up
  • to let the cat out of the bag → contains the VPC to let out
  • se faire des idées to make SELF ideas to imagine something false → contains the non-VMWEs se faire and faire des idées
  • bać się własnego cienia to fear SELF one's own shadowto be very timid → contains the IReflV bać się to fear SELFto be afraid
  • virar-se nos trinta turn-RCLI in-the thirty to get by contains the synonymous IReflV virar-se to get by ≠ virar to turn/become
  • a da cărțile pe față to give cards.the on face to reaveal one's true intentions → contains the ID a da pe față to reveal
    a-și da arama pe față to give his/her copper.the on face to reveal his/her true (evil) nature → this is even more complicated since, besides the ID a da pe față, the IReflV has to be annotated as well - a three-level embedding
  • delati se norca iz koga to make RCLI fool of someone to make fun of someone → contains the IReflV delati se to make oneself to pretend

Idioms whose head verb is the copula (to be) can pose special challenges because their complements may be (nominal, adjectival, etc.) MWEs themselves. In this task, we consider constructions with a copula to be VMWEs only if the complement does not retain the idiomatic meaning when used without the verb.

  • sei kein Frosch be no frog be no chicken → idiom because #kein Frosch no frog loses the meaning
  • to be no chicken → idiom because #no chicken loses the meaning
    to be somebody → idiom because #somebody loses the meaning
    it is double Dutch to me → non-VMWE because the copula can be omitted, as in he seems to speak double Dutch
  • Ser un pelota to be a ball to suck/butter up
  • być jedną nogą na tamtym świecie to be with one leg in the other worldto be close to death idiom because #jedna noga na tamtym świecie one leg in the other world loses the meaning
    być do rzeczy to be to the thingto be relevant → non-VMWE because the copula can be omitted, as in dał parę argumentów całkiem do rzeczyhe gave a couple of quite relevant arguments
  • ser alguém na vida to be somebody in life to be somebody → idiom because #alguém na vida loses the meaning
    não ser flor que se cheire to not be a flower that one may smell to be an untrustworthy person → idiom because #flor que se cheire loses the meaning
    isso é grego pra mim that's greek to me → non-VMWE because the copula can be omitted, as in você está falando grego
  • a fi ușă de biserică to be door of church to be honest → idiom because #ușă de biserică loses the meaning
    a fi un papă-lapte to be a eat-milk to be a piker → idiom because #un papă-lapte preserves the meaning
  • biti trn v peti komu to be a thorn in somebody's heel to be a big problem, obstacle → idiom because #trn v peti loses the meaning

Note that special care must be taken in languages in which the copula omission is a regular or even a compulsory phenomenon (e.g. in Russian). In those cases, language-specific tests are required to distinguish a copula-based idiom from a non-verbal MWE.

Idioms typically have both a literal and an idiomatic reading. Thus, they are closely connected to the phenomenon of a metaphor (see also the section on VMWEs versus metaphors). This often makes them semantically totally non-compositional, i.e. none of their lexicalized components retains any of their original meanings.Some authors argue though that partial semantic compositionality can be obtained via decomposability, e.g. to spill the beans is compositional provided that to spill is paraphrased as to reveal and the beans as a secret


Section 6.4

Inherently reflexive verbs (IReflV)

Reflexive clitics (RCLI) are clitic pronouns that refer to the subject of the verb, like oneself in English. They are very common in many languages and play several semantic roles depending on the context, as detailed below.

Reflexive verbs (REFLV), sometimes also called pronominal verbs, are formed by a full verb combined with a RCLI, although the clitic does not always have a reflexive meaning. REFLV can be categorized into different classes, some of which should be annotated as verbal MWEs.

Namely, we will only annotate a REFLV as an inherently reflexive verb (IReflV) when (a) it never occurs without the clitic, or (b) the REFLV and non-reflexive versions have clearly different senses or subcategorization frames. Inherently reflexive verb constitute a quasi-universal category.

We start by listing the various categories of REFLV before providing tests to decide whether to annotate a given occurrence as IReflV.

  • Inherently reflexive ⇒ ANNOTATE as IReflV
    • The verb without the RCLI does not exist
      • stydět se to be ashamed, divit se to wonder
      • sich schämen to be ashamed, sich wundern to wonder
      • suicidarse to suicide, abstenerse to abstain
      • s'évanouir to faint, se suicider to suicide
      • suicidarsi to suicide, arrabbiarsi to get angry
      • dowiedzieć się to find out, bać się to be afraid
      • queixar-se to complain, abster-se to abstain
      • a se teme to be afraid with obligatory ACC reflexive clitic
        a își însuși to appropriate with obligatory DAT reflexive clitic
      • att försova sig to sleep in
        att gifta sig to get married
      • sramovati se to be ashamed, bati se to be afraid
    • The verb without the RCLI does exist, but has a very different meaning
      • sich enthalten ≠ enthalten to abstain ≠ to contain, sich (um etw.) handeln ≠ handeln to be ≠ to handle
      • recoger ≠ recogerse to gather ≠ to go home, empeñar ≠ empeñarse to pawn ≠ to insist
      • s'apercevoir ≠ apercevoir to realize ≠ to see, s'agir ≠ agir to be ≠ to act
      • riferire ≠ riferirsi to report, tell ≠ to refer
      • znajdować ≠ znajdować się to find ≠ to be, radzić ≠ radzić sobie to advise ≠ to manage
      • encontrar-se ≠ encontrar to be ≠ to meet, referir-se ≠ referir to concern ≠ to refer
      • a se îndura ≠ a îndura to have the heart ≠ to suffer
        a se face≠ a face to become ≠ to make even if it is inchoative (Dindelegan 2013: 79) a se face (=to become) is IReflV (it passes Test15)
      • att känna sig ledsen/arg to feel sad/angry ≠ to touch
      • tikati se česa ≠ tikati to concern ≠ to refer, pobrati se ≠ pobrati to stand up ≠ to pick up
  • Reciprocal ⇒ NOT ANNOTATED
    • The RCLI has a sense of mutually:
      • líbat se to kiss each other, potkávat se to meet each other
      • sich küssen to kiss each other, sich treffen to meet each other
      • besarse to kiss each other, verse to see each other
      • s'embrasser to kiss each other, se rencontrer to meet each other
      • baciarsi to kiss each other
      • całować się to kiss each other, spotykać się to meet each other
      • cumprimentar-se to greet each other, ver-se to see each other
      • a se saluta to greet each other
      • poljubljati se to kiss each other, srečati se to meet each other
  • Reflexive ⇒ NOT ANNOTATED
    • The RCLI marks the reflexive or reciprocal construction, that is, the clitic plays the role of self in English
      • mýt se to wash oneself, drbat se to scratch oneself
      • sich waschen to wash oneself, sich kratzen to scratch oneself
      • mirarse to look at oneself, vestirse to dress oneself
      • se laver to wash oneself, se parler to talk to oneself
      • lavarsi to wash oneself, vestirsi to dress oneself
      • myć się to wash oneself, drapać się po głowie to scratch oneself on the head
      • apressar-se to hurry oneself, vestir-se to dress oneself
      • a se spăla to wash oneself
      • att tvätta sig to wash oneself
      • umivati se to wash oneself, praskati se to scratch oneself
  • Body part, also called possessive reflexive ⇒ NOT ANNOTATED
    • Specific type of reflexive use in which the direct object is a body part or, more generally, an inalienable part of the subject
      • mýt si nohy wash RCLI.DAT the feet wash one's feet
      • sich das Bein brechen RCLI the leg break break one's leg
      • rascarse el brazo scratch.RCLI the arm scratch one's arm
      • se gratter la tête RCLI scratch the head scratch one's head
      • grattarsi la testa RCLI scratch the head scratch one's head
      • myć sobie nogi wash RCLI.DAT the feet wash one's feet
      • impossible, uses possessive instead
      • a-şi rupe mâna RCLI.DAT break arm break one's arm
      • umivati noge wash RCLI.DAT the feet wash one's feet, zlomiti roko RCLI.DAT break arm break one's arm
  • Middle with preverbal subject, also called synthetic passive ⇒ NOT ANNOTATED
    • The clitic marks a regular syntactic alternation for transitive verbs. Just like in regular passive alternation, the direct object of the transitive version appears as the subject of the REFLV version, and thus the verb agrees with the subject.
    • Differently from inchoative (see below), the subject of the transitive version is absent in the REFLV version but it exists necessarily, though it is underspecified
      • die Häuser verkaufen sich gut the houses sell RCLI well the houses sell well
      • las casas se venden bien the houses RCLI sell well the houses sell well
      • les pots se vendent bien the pots RCLI sell well the pots sell well
      • le case si affittano the houses RCLI rent the houses are rented
      • domy dobrze się sprzedają houses sell.PL RCLI well houses sell well
      • as casas se vendem bem the houses RCLI sell well the houses sell well
      • casele se vând bine houses-the RCLI sell well houses sell well
      • hiše se dobro prodajajo the houses sell RCLI well the houses sell well
  • Middle with postverbal subject, also called synthetic passive ⇒ NOT ANNOTATED
    • In some languages, middle alternation with preverbal subject sounds unnatural and middle alternation with postverbal subject is preferred. Depending on the languages, it is viewed as a postverbal subject (ES, PL, PT, RO) or as an object which agrees with the unaccusative verb form (IT). Middle alternation with postverbal subject is impossible in FR and DE.
      • se alquilan casas RCLI rent houses people rent houses
      • si affittano case RCLI rent houses people rent houses
      • dobrze sprzedają się te domy well sell RCLI these houses these houses sell well Polish is a relatively free word-order language and a postverbal subject is a regular (even if stylistically marked) alternation.
      • alugam-se casas rent-RCLI houses people rent houses
      • se vând bine apartamentele din blocurile noi RCLI sell well apartments-the from blocks-the new Apartments from new blocks sell well
        se construiesc locuințe noi RCLI built houses new new houses are built
      • nove hiše se gradijo new houses RCLI built new houses are built
  • Impersonal ⇒ NOT ANNOTATED
    • The RCLI marks an impersonal verb alternation possible for various transitivity classes, depending on the language: only transitive verbs (FR), only intransitive verbs with manner adjuncts (DE), preferably intransitive but tolerated for transitive verbs (PT), either transitive or intransitive verbs (IT, ES, RO, PL)
    • There is no noun phrase before the verb (empty subject slot), the presence of the RCLI indicates a verb interpreted with a generic and underspecified subject
    • The verb is in third person singular, even when the object is plural
      • hier tanzt es sich gut here dances it RCLI well people dance well here
      • se busca a actores RCLI searches to actors people look for actors
        se trabaja mejor aquí RCLI works better here people work better here
      • il se dit des bêtises it RCLI says silly things people say silly things
      • si lavora troppo RCLI works too much people work too much
        si affitta molte case RCLI rents many houses people rent many houses
      • za dużo się pracuje too much RCLI works people work too much
        bzdury się opowiada nonsense RCLI tells people tell nonsense
      • dorme-se muito sleeps-RCLI much people sleep a lot
        conta-se histórias tells-RCLI stories people tell stories Transitive impersonal is considered wrong by traditional grammar but it is found in corpora.
      • se lucrează până târziu RCLI works until late people work until late transitive verbs can be impersonal in RO only when they are null-object verbs (se lucrează până târziu - *este lucrat până târziu) or when their subject is realized by a clause headed by a complementizer Dindelegan 2013: 174
        se suferă din cauza sărăciei RCLI suffer because of poverty one suffers because of poverty RO impersonal reflexive verbs are mostly intransitive Dindelegan 2013: 173
        se aleargă dimineața RCLI run in the morning people run in the morning
      • govori se/govorijo se neumnosti it says/they say RCLI silly things people say silly things
  • Inchoative ⇒ NOT ANNOTATED
    • Similar to middle, but the RCLI marks a less productive syntactic alternation:
      • the direct object of the transitive version appears as subject of the REFLV
      • the subject of the transitive version is not only absent, it is also semantically unclear or nonexistent
        • dveře se otvírají the door opens
        • die Tür öffnet sich the door opens
        • la puerta se abrió the door opened
        • la porte s'est subitement ouverte the door suddenly opened
        • la porta si apre the door opens
        • drzwi się otwierają the door opens
        • o vaso se quebrou the vase broke
        • mașina s-a stricat the car broke down
          ușa s-a deschis the door opened
        • dörren öppnar sig the door opens
        • vrata se odpirajo the door opens

IReflV-specific decision tree

  • Apply test 14 - [INHERENT]
    • Annotate as IReflV
    • Apply test 15 - [DIFF-SENSE]
      • Annotate as IReflV
      • Apply test 16 - [DIFF-SUBCAT]
        • Annotate as IReflV
          • verb has no subject ⇒ Apply test 17 - [IMPERS]
            • It is not a VMWE, exit
            • Annotate as IReflV
          • verb has a subject ⇒ Apply test 18 - [MIDDLE-INCHO]
            • It is not a VMWE, exit
            • Apply test 19 - [REFL]
              • It is not a VMWE, exit
                • subject is SINGULAR ⇒ Apply test 20 - [REFL-MUTUAL]
                  • It is not a VMWE, exit
                  • Annotate as IReflV
                • subject is PLURAL ⇒ Apply test 21 - [RECIPRO]
                  • It is not a VMWE, exit
                  • Annotate as IReflV

Test 14 - [INHERENT] Inherent clitic

Does the verb only exist with the RCLI and never occurs without it?

  • annotate as IReflV
    • sich schämen ⇒ *schämen to be ashamed
      sich wundern ⇒ *wundern to wonder
    • suicidarse ⇒ *suicidar to suicide
      abstenerse ⇒ *abstener to abstain
    • s'évanouir ⇒ *évanouir to faint
      se suicider ⇒ *suicider to suicide
    • suicidarsi ⇒ *suicidare to suicide
    • dowiedzieć się ⇒ *dowiedzieć to find out
      bać się ⇒ *bać to be afraid
    • queixar-se ⇒ *queixar to complain
      abster-se ⇒ *abster to abstain
    • a se teme ⇒ *a teme to be afraid
      a își însuși ⇒ *a însuși to appropriate
    • sramovati se ⇒ *sramovati to be ashamed
      čuditi se ⇒ *čuditi to wonder
  • next test

Test 15 - [DIFF-SENSE] - Different sense

Given the same verb without the RCLI, are all of its meanings clearly different from the REFLV form?

  • annotate as IReflV
    • sich verstehen ≠ verstehen to get along well ≠ to understand
    • recogerse ≠ recoger to go home ≠ to pick up, to gather
    • s'apercevoir ≠ apercevoir to realize ≠ to see
      s'agir ≠ agir to be ≠ to act
    • riferirsi ≠ riferire to refer ≠ to report, to tell
    • znajdować się ≠ znajdować to find oneself ≠ to be
    • encontrar-se ≠ encontrar to be ≠ to meet
      referir-se ≠ referir to concern ≠ to refer
    • a se îndura ≠ a îndura to have the heart to ≠ to suffer
    • razumeti se ≠ razumeti to get along well ≠ to understand
  • next test

Test 16 - [DIFF-SUBCAT] - Different subcategorization frame

Is the subcategorization frame of the simple verb without the RCLI different from the subcategorization frame of the REFLV, except for the addition of a direct or indirect object corresponding to the same argument as the RCLI in the REFLV version?

  • annotate as IReflV
    • X verliert sich in Y ⇔ X verliert Y X looses RCLI in Y ⇔ X looses Y
    • X se olvidó de Y ⇔ X olvidó Y X RCLI forgot of Y ⇔ X forgot Y
    • X se confesse de Y ⇔ X confesse Y (but *X confesse de Y) X RCLI confesses of Y ⇔ X confesses Y (but not *X confesses of Y)
      X se plaint de Z ⇒ *Y plaint (à) X de Z X RCLI complains of Z ⇒ *Y complains (to) X of Z → the verb without RCLI, plus direct or indirect object. does not subcategorize for the PP with preposition de
      X se refuse à Vinf ⇒ *Y refuse (à) X à Vinf X RCLI refuses to Vinf ⇒ *Y refuses (to) X to Vinf
    • X si è dimenticato di Y ⇔ X ha dimenticato Y X RCLI forgot of Y ⇔ X forgot Y
    • X tłumaczy się z Y ⇔ X tłumaczy Y X explains SELF of Y ⇔ X explains Y
    • X se esqueceu de Y ⇔ X esqueceu Y X RCLI forgot of Y ⇔ X forgot Y
    • X se gândeşte la Y ⇔ X gândeşte că Y X RCLI thinks of Y ⇔ X thinks that Y
  • next test

Test 17 - [IMPERS] - Impersonal

When you replace the RCLI by an underspecified subject such as one or people, does the sentence keep its meaning?

  • do not annotate as verbal MWE
    • hier tanzt es sich gut ⇔ hier tanzen die Leute gut people dance well here
    • se duerme mucho ⇔ las personas duermen mucho people sleep a lot
      se busca a actores ⇔ la gente busca a actores people look for actors
    • il se dit des bêtises ⇔ les personnes disent des bêtises people say silly things
    • si dorme molto ⇔ le persone dormono molto people sleep a lot
      si affitta molte case ⇔ le persone affittano molte case people rent many houses
    • pracuje się za dużo ⇔ ludzie pracują za dużo people work too much
      opowiada się bzdury ⇔ ludzie opowiadają bzdury people tell nonsense
    • dorme-se muito ⇔ as pessoas dormem muito people sleep a lot
      conta-se histórias ⇔ as pessoas contam histórias people tell stories
    • se lucrează până târziu ⇔ lumea lucrează până târziu people work until late
      se aleargă dimineața ⇔ lumea aleargă dimineața people run in the morning
    • govorijo se neumnosti ⇔ ljudje govorijo neumnosti people tell nonsense
  • annotate as IReflV

Test 18 - [MIDDLE-INCHO] - Middle or Inchoative

When you move the subject to the object position, remove the RCLI and add a generic subject (people, somebody), thus building a transitive version, does it imply the REFLV version? In other words, people/somebody V [to] X ⇒ X REFLV?

  • do not annotate as verbal MWE
    • man kann die Häuser gut verkaufen ⇒ die Häuser verkaufen sich gut people can sell the houses well ⇒ the houses sell well
      jemand öffnet die Tür ⇒ die Tür öffnet sich somebody opens the door ⇒ the door opens
    • la gente cuenta historias ⇒ se cuentan historias people tell stories ⇒ stories are told
      alguien abrió la puerta ⇒ la puerta se abrió somebody opened the door ⇒ the door opened
    • on vend bien Ce produit ⇒ ce produit se vend bien people sell this product well ⇒ this product sells well
      quelqu'un ouvre la porte ⇒ la porte s'ouvre, somebody opens the door ⇒ the door opens
    • qualcuno vende bene questo prodotto ⇒ questo prodotto si vende bene someone people sells this product well ⇒ this product sells well
      qualcuno apre la porta ⇒ la porta si apre somebody opens the door ⇒ the door opens
    • ktoś sprzedaje te domy ⇒ te domy się sprzedają somebody sells these houses ⇒ these houses sell well
      ktoś otwiera drzwi ⇒ drzwi się otwierają somebody opens the door ⇒ the door opens
    • alguém conta histórias ⇒ contam-se histórias somebody tells stories ⇒ stories are told
      alguém acalmou o menino ⇒ o menino se acalmou somebody calmed the boy down ⇒ the boy calmed down
    • cineva spune glume ⇒ se spun glume somebody tells jokes ⇒ jokes are told
      cineva a deschis ușa ⇒ ușa s-a deschis somebody opened the door ⇒ the door opened
    • nekdo pripoveduje šale ⇒ šale se pripovedujejo somebody tells jokes ⇒ jokes are told
      nekdo je odprl vrata ⇒ vrata so se odprla somebody opened the door ⇒ the door opened
  • next test

Test 19 - [REFL] - Reflexive

When you replace the RCLI by oneself only or to oneself only, does it imply the REFLV version? In other words, X V [to] himself only ⇒ X REFLV?

  • do not annotate as verbal MWE
    • Paul kratzt nur sich selbst ⇒ Paul kratzt sich Paul scratches himself
    • Pablo se lava a sí mismo ⇒ Pablo se lava Paul washes himself
    • Paul ne soigne que lui-même ⇒ Paul se soigne Paul heals himself
      Paul ne parle qu'à lui-même ⇒ Paul se parle Paul talks to himself
    • Paolo cura solo se stesso ⇒ Paolo si cura Paul heals himself
      Paolo parla solo a se stesso ⇒ Paolo si parla Paul talks to himself
    • Paweł leczy tylko siebie ⇒ Paweł leczy się Paul heals himself
    • Paulo só lava a si mesmo ⇒ Paulo se lava Paul washes himself
    • Paul se spală doar pe sine ⇒ Paul se spală. Paul washes himself
    • Pavel praska samo sebe ⇒ Pavel se praska Paul scratches himself
  • next test

Test 20 - [REFL-MUTUAL] - Reflexive-mutual

Is a reciprocal version possible? Namely: Is it acceptable to replace the singular subject by a plural and add each other to the REFLV form without changing the REFLV's meaning?

  • do not annotate as verbal MWE The test applies only if test 15 has failed. For example, for "X se marie" 'X gets married' in French, it is odd though possible to say 'X and Y marry each other', but this does not mean 'X gets married', because it is only possible if X and Y are marriage officiants
    • Paul wäscht sich ⇔ Sie waschen sich gegenseitig / einander they wash each other
    • Pablo se lava ⇔ ellos se lavan mutuamente / los unos a los otros they wash each other
    • Paul se lave ⇔ ils se lavent mutuellement / les uns les autres they wash each other
    • Paolo si lava ⇔ essi si lavano reciprocamente / l'un l'altro they wash each other
    • Paweł się myje ⇔ oni myją się nawzajem they wash each other
    • Paulo se lava ⇔ eles se lavam mutuamente / uns aos outros they wash each other
    • el se spală ⇔ ei se spală unul pe altul they wash each other
    • Pavel se umiva ⇔ umivajo drug drugega they wash each other
  • annotate as IReflV

Test 21 - [RECIPRO] - Reciprocal

Is it possible to remove the RCLI and replace the coordinated subject (A and B) or plural subject (A.PL) by a singular subject (A or A.PL) and a singular object, often introduced by to/with (B or A.PL), without changing the REFLV's meaning? That is:

  • Coordinated subject: A and B PronV ⇔ A V [to/with] B and B V [to/with] A?
  • Plural subject: A.PL PronV ⇔ A.PL V [to/with] A.PL?
  • do not annotate as verbal MWE
    • Paul und Anna umarmen sich ⇔ Paul umarmt Anna and Anna umarmt Paul Paul and Anna hug each other
      die Affen kratzen sich ⇔ die Affen kratzen die Affen the monkeys scratch each other
    • Pablo y Ana se abrazan ⇔ Pablo abraza a Ana and Ana abraza a Pablo Paul and Ann hug each other
      los niños se abrazan ⇔ los niños abrazan a los niños the children hug each other
    • Paul et Anne s'embrassent ⇔ Paul embrasse Anne and Anne embrasse Paul Paul and Ann kiss
      les jours se suivent ⇔ les jours suivent les jours the days follow each other
    • Giovanni e Anna si baciano ⇔ Giovanni bacia Anna and Anna bacia Giovanni John and Ann kiss
      i giorni si seguono ⇔ i giorni seguono i giorni i giorni seguono l'un l'altro
    • Paweł i Elena się całują ⇔ Paweł całuje Elenę and Elenę całuje Paweł Paweł and Elena kiss
    • João e Ana se beijam ⇔ João beija Ana and Ana beija João John and Ann kiss
      os presos se agridem ⇔ os presos agridem os presos the prisoners aggress each other
    • Ion şi George se salută ⇔ Ion îl salută pe George and George îl salută pe Ion Ion and George greet each other
      participanții se salută ⇔ participanții îi salută pe participanți the participants greet each other
    • Pavel in Ana se objemata ⇔ Pavel objema Ano in Ana objema Pavla Paul and Anna hug each other
  • annotate as IReflV

Problematic cases and remarks

Polysemy

Keep in mind that both simple and reflexive verbs can have several senses. In test 15, we ask that ALL senses you can think of are different from the REFLV form in the given context. For example, French verb trouver can mean to find something, to have an opinion about something, discover something, etc. But it has a totally different and unrelated meaning of to be (located at) in the sentence L'église se trouve à Paris the church is located in Paris . It should thus be annotated as a MWE. As the REFLV is polysemous itself, it should NOT be annotated as IReflV in sentences like Elle se trouve grosse she finds herself fat where it means have an opinion about (herself), equivalent to the non-reflexive version.

Clitics position and concatenation

In some languages the clitics are joint with the verb, sometimes using a hyphen but not always. When there is no hyphen, the REFLV will probably be tokenized as a single token in the corpus.

  • In French, orthography and pronunciation rules require the clitic to be concatenated with the verb and its last vowel to be replaced by an apostrophe (liaison):
    • s'abstenir to abstain
  • In Spanish and Italian, the clitic can appear concatenated after the verb in some verbal forms (e.g. infinitives, gerunds):
    • enamorarse to fall in love
    • alzarsi to get up
  • In Portuguese, there are always hyphens for postponed clitics (enclisis), but in conditional tense the clitic is in the middle of the verb (mesoclisis), separating the root from the suffix:
    • queixar-se-ia would complain
  • In Romanian the clitic and the verb are either separate or have a hyphen between them:
    • se aude un clopot RCLI hears a bell a bell is heard
      s-aude un clopot RCLI-hears a bell a bell is heard

The current annotation format allows annotating a single token as a MWE if it is a multiword token. Therefore, it should be annotated as an MWE.

Overlap ID - IReflV

Some idiomatic constructions include reflexive clitics. Two cases are possible:

  • If a syntactically comparable literal construction is impossible or the REFLV would not be annotated in syntactically comparable literal constructions, annotate only the ID:
    • sich über etwas im Klaren sein dass S RCLI about s.th. in.the clear be to be aware of s.th./that S ⇒ *sich in N sein, dass for any noun N
    • darse cuenta de to realize ⇒ *darse N de for any noun N
      meterse en líos to get in troubleREFLV not annotated in literal equivalents like meterse en una tienda to get in a store
    • se rendre compte de to realize ⇒ *se rendre N de for any noun N
      s'arracher les cheveux RCLI tear the hair worryREFLV not annotated in literal equivalents like s'arracher un ongle to tear oneself's nail
    • rendersi conto di to realize ⇒ *si rende N di for any noun N
      si strappa i capelli RCLI tear the hair to worryREFLV not annotated in literal equivalents like strapparsi un unghia to tear oneself's nail
    • zdawać sobie sprawę z to realize ⇒ *zdawać sobie N z for any noun N
    • dar-se mal to faildar-se ADV intransitive is acceptable only for antonym bem well
      meter-se numa fria to get-RCLI in a cold to get in troubleREFLV not annotated in literal equivalent like meter-se numa cabine to get into a cabin
    • puliti si lase tear RCLI the hair to worryREFLV not annotated in literal equivalents like puliti si obrvi to pluck one's eyebrows
  • If the REFLV would be annotated as IReflV in syntactically comparable literal constructions, annotate both the IReflV and the ID as embedded MWEs (rare):
    • rozlatywać się w proch scatter itself into dust disappear
    • virar-se nos trinta turn-RCLI in-the thirty contains virar-se to get by ≠ virar to turn/become
    • a i se face rău to CL.DAT RCLI.ACC make ill to feel sick this is a case when both a non-reflexive, dative clitic and a RCLI.ACC appear in the structure; the REFLV is annotated as IReflV; both the IReflV and the ID are annotated as embedded MWEs; note that the non-reflexive clitic is also considered as part of an ID (6.4_R)
      a se duce pe apa sâmbetei RCLI go on water-the Saturday-of to get lost the REFLV is annotated in literal equivalent a se duce pe apa Bistriței he goes on the river Bistriţathere is a notable difference in meaning betwee the non-REFLV a duceto take and the REFLV a se duce to go
    • režati se kot pečen maček to laugh RCLI like a baked tomcat to laugh loudly režati se is IReflV
Overlap LVC - IReflV

It is rare, although possible, to find light verb constructions in which a reflexive clitic changes the original meaning significantly, thus characterizing an IReflV:

  • Fragen stellen to ask questionssich Fragen stellen to doubt/hesitate
  • poser des questions to ask questionsse poser des questions to doubt/hesitate
  • no examples found for RO

In this case, the whole construction, including the verb, the noun and the reflexive clitic, must be annotated as ID, since there are two syntactic arguments:

  • sich Fragen stellen to doubt/hesitate
  • se poser des questions
  • no examples found for RO

Notice that annotating only the verb and the RCLI as IReflV would be wrong, since it will have a completely different meaning without the noun, sometimes even coinciding with another IReflV:

  • sich stellen to surrender
  • se poser to sit/lay down
Dative clitics and double clitics

In some languages, e.g. Polish, clitics inflect for case. Most cases of IReflV seem to be restricted to the accusative case:

  • bát se to be afraid
  • bać się to be afraid
  • a se sinchisito RCLI.ACC care to care
    a se sfiito RCLI.ACC be.shy to be shy
    a se căito RCLI.ACC repent to repent
  • bati se to be afraid

However, other cases can appear in IReflV:

  • poradit si to advise oneself.DAT to manage
  • radzić sobie to advise oneself.DAT to manage
  • a-și însuși to-RCLI.DAT appropriateto appropriate - with a Dative clitic
    a-și apropriato-RCLI.DAT appropriateto appropriate - with a Dative clitic
  • drzniti si to dare oneself.DAT to dare

Some expressions can have double clitics. Only the first two words belong to the IReflV:

  • przyglądać się sobie to observe RCLI.acc RCLI.DAT to observe each other
    radzić sobie z sobą to advise RCLI.DAT with RCLI.INST to manage with oneself
  • n.a.
  • nasmehniti se sebi to smile at oneself
Non-reflexive clitics

This category does not cover other types of pronouns and clitics. They are covered by regular ID tests and should be annotated as such. Examples of constructions that should be annotated as ID rather than IReflV include:

  • es gibt it gives there is
  • l'emporter to take it away to win
    s'en aller to self from-it go to leave
    en avoir marre to have from-it enough to be fed up
    il y avoir it at-it haveto exist
  • prender-ci to take to-it to make the right choice
    prender-le to take it to be beaten
  • dá-lhe João! give to-him/her, João! show them what you got, João!
  • a-i arde to CL.DAT burn to have a desire
    a o lua pe jos to take CL.ACC on footto walkaccording to the current guidelines, such examples pass the ID tests (see also 6.3_B5); both have literal correspondents that are not characterized by an obligatory non-reflexive clitic: a arde to burn and a lua to take
    a-i repugnato CL.DAT loathe to loathe
    a-i priito CL.DATto be favourable to sb.
  • ucvreti jo to escape her to escape something/someone by running

Section 6.5

Verb-particle constructions (VPC)

Verb-­particle constructions (VPCs), sometimes called phrasal verbs or phrasal-prepositional verbs, like

  • um|fahren over|drive to run over,mit|kommen with|come to join,vor|bereiten before|prepare to prepare
  • to put off,to blow up,to do in
  • n.a.
  • n.a.
  • biti za to be for to agree,šteti za to count for to consider as

constitute another quasi-universal category. They have the following general characteristics:

  1. They are formed by a lexicalized head verb v and a lexicalized particle p dependent on v.
  2. The meaning of the VPC is non-compositional. Notably, the change in the meaning of v goes significantly beyond adding the meaning of p:
    • die Fische sind eingegangen the fish went in the fish died
    • to do in to die
    • n.a.
    • n.a.
    • n.a.
    • gre za it goes for it is about,biti ob to be at to lose

VPCs are pervasive in English, German, Swedish, Hungarian and possibly some other languages but irrelevant to or very rare in Romance and Slavic languages or in Farsi and Greek for instance.

In some Germanic languages and also in Hungarian, verb-particle constructions can be spelled either as one (multiword) token or separated. Both types of occurrences are to be annotated:

  • Die Kinder sollen in der Schule aufpassen The children must pay attention at school
    Herr Müller, passen Sie auf! Mr. Müller, be careful
  • n.a.
  • n.a.
  • n.a.
  • n.a.

The first challenge in identifying a VPC is to properly distinguish the particle from a possibly homographic preposition, e.g.:

  • to get up a petition vs to get up a hill
  • n.a.
  • n.a.
  • n.a.
  • biti za njeno idejo He agrees with her idea vs biti za zaveso He is behind the curtain

or a verbal prefix:

  • um- in um|fahren vs umfahren
  • n.a.
  • n.a.
  • n.a.

Namely, a particle, contrary to a preposition, cannot introduce a complement

  • to do sb in, *to do in sb
  • n.a.
  • n.a.
  • n.a.

and prefixes can never be spelled separately from the verb, nor can the past tense of prefixed verbs be formed with the infix -ge-

  • *er fuhr den See um
    *er hat den See umgefahren, instead: er hat den See umfahren he drove around the lake but: er hat das Schild umgefahren he run over the sign
  • n.a.
  • n.a.
  • n.a.

See the language-specific tests for more details on distinguishing particles from prepositions and verbal prefixes.

Note that in this shared task we do not account for compositional verb-particle combinations, i.e. those whose meaning can be deduced from the meaning of the preposition and of the verb:

  • er legt das Buch ab he puts down the book, er kommt ins Haus rein he comes into the house he enters the house
  • to lie down, to come in
  • n.a.
  • n.a.
  • n.a.
  • prišel je do hriba He came to the hill

Some combinations may have both compositional and non-compositional meanings depending on the context and only the latter should be annotated:

  • ein Schild aufstellen to put up a sign vs. einen Plan aufstellen to draw up a plan
  • to put up a flag vs. to put up a friend for the night
  • n.a.
  • n.a.
  • n.a.
  • gre za vodičem he follows the guide vs. gre za naše temeljno načelo it is about our fundamental principle

The essential compositionality test is to see if a sentence without the particle can refer to the same event/state as the sentence with the particle.

Test 22 - [V+PART-DIFF-SENSE] - Sense shift due to the particle

Does the particle provoke an unexpected change in meaning of the verb? I.e., does a sentence without the particle fail to refer to the same event/state as the sentence with the particle (special care must be taken when the same construction might or might not be a valid VPC depending on its context)?

  • it is a VPC
    • Der Lehrling fängt ein Praktikum an the apprentice catches an internship on the apprentice begins an internship does not imply #Der Lehrling fängt ein Praktikum the apprentice catches an internship
      Die Bäuerin hat sich wieder eingefangen the farmer’s wife has herself again catched the farmer’s wife has calmed down again does not imply #Die Bäuerin hat sich wieder gefangen the farmer’s wife has catched herself again
      Der Schüler legt die Prüfung ab the pupil lays the exam off the pupil takes the exam does not imply #der Schüler legt die Prüfung the pupil lays the exam
      Das Schiff legt vom Hafen ab the boat lays from the harbor off the ship leaves the harbor does not imply #das Schiff legt vom Hafen the boat lays from the harbor
    • to do somebody in to kill sb does not imply #to do somebody
      to check in upon arrival does not imply #to check upon arrival
    • n.a.
    • A meccs után csak az edző nem rúgott be Only the coach did not get drunk after the match A meccs után az edző berúgottThe coach got drunk after the match does not imply #Az edző rúgott the coach kicked
      Nem jött be ez a koktél nekem I didn’t like this cocktail Bejött ez a koktél nekem I liked this cocktail does not imply #Jött ez a koktél nekem this cocktail bumped into me
    • n.a.
    • n.a.
    • Ona je za enakopravnost she is for equality does not imply #Ona je enakopravnost she is equality
      postavili so ga za sodnika they set him for a judge they appointed him a judge does not imply *Postavili so ga sodnika they set him judge – no sense
  • it is not a VPC
    • Der Bauer fängt die Hühner ein the farmer catches the chickens in the farmer catches the chickens implies der Bauer fängt die Hühner the farmer catches the chickens
      Der Lehrer legt das Buch auf dem Tisch ab the teacher lays the book on the table apart the teacher puts the book away on the table implies Der Lehrer legt das Buch auf den Tisch the teacher puts the book on the table
      Der Lehrer legt den Mantel ab the teacher lays the coat off the teacher takes off his coat implies Der Lehrer legt den Mantel the teacher puts the coat
    • to look up into the sky implies to look into the sky
      to eat up the cookies implies to eat the cookies
    • n.a.
    • A csatár nem rúgta be a helyzetét The forward missed its chance to score a goal A csatár berúgta a helyzetét implies A csatár rúgott The forward kicked
      Nem jött be a szobába He did not come into the room (Bejött a szobába he entered the room implies Jött a szobába he came into the room
    • n.a.
    • n.a.

Section 6.6

Language-specific categories

Language-specific categories can be proposed for annotation in this task provided that they are carefully defined and accompanied by linguistic tests that allow to distinguish them from other categories. We recommended not redefining the universal and quasi-universal categories described here, but introducing new names and abbreviations in order to answer such needs.


Section 6.7

Other verbal MWEs (OTH)

This category is meant to contain VMWEs which do not fit to the preceding categories, that is, whose lexicalized components do not include a single head verb and at least one of its arguments. VMWEs in this category fail the structural test 6 [HEAD]. They include:

  • coordinations of verbs
    • leben und leben lassen live and let live
    • to drink and drive
    • coser y cantarto_sew and to_singeasy as pie, a piece of cake
    • pluć i łapać spit and catch to be lazy, to do nothing useful
      coś kogoś ani ziębi, ani grzeje something neither cools nor warms someonesomeone is indifferent to something
      badż tak dobry i zrób cośbe so good and do somenthingbe so good as to do something
    • pintar e bordar paint and knit to abuse
    • a tunat și i-a adunatit.has thundered and CL.ACC-it.has gatheredbirds of a feather flock together
      seamănă, dar nu răsaresow.3SG (homonym of resemble), but not sprout.3SGnot to resemble
    • živi in pusti živeti live and let live
  • compound verbs, resulting usually from conversion of nominal compounds:
    • to voice act
      to pretty-print
      to short-circuit
      to tumble dry
    • court-circuiter to short-circuit
    • n.a. there are no cases of compound hyphenated verbs in RO
    • n.a. there are no cases of compound hyphenated verbs in SL

No specific tests apply to this category. In other words an expression should be annotated as OTH if:


Section 7

Language-specific tests

Language-specific tests may be necessary in one of 3 cases. Firstly, a VMWE category may be universal or quasi-universal but it may require different tests in different languages. Secondly, any category specific to a langue must be associated with appropriate test in the same language. Thirdly, universal categorisation tests can build upon more elementary language-specific tests (e.g. to distinguish a particle from a preposition).


Section 7.1

Particles versus prepositions and prefixes

The following tests allow to properly identify prepositional verb particles in cases where they might be homographic with prepositions in prepositional phrases (PPs) or with verbal prefixes. The word to be discriminated is referred to as a candidate word. The tests are language-specific and concern English and German.

English-specific test for distinguishing particles from preposition

The following tests concern English words which can be either a proposition or a particle depending on the context, e.g. up, on, through, etc. If a candidate word passes any of the three following tests it can be categorized as a particle.

Test 7.1.EN - [PART+PREP] - Particle followed by a preposition

Is the candidate word followed by a preposition?

  • it is a particle
    • ich halte es nicht länger aus mit ihm I can no longer put up with him
    • I can no longer put up with him
    • n.a.
    • n.a.
  • other tests are needed
    • Ich klettere den Berg hinauf I climb up the hill
    • I got up the hill
    • n.a.
    • n.a.

Test 7.2.EN - [FIN-PART] - Sentence-final particle

Does the candidate word w occur at the end of a clause which is: (i) affirmative or imperative, (ii) headed by the verb governing w, and (iii) not a relative clause? Or can the sentence be reformulated so as to put the candidate word at the end of such a clause?

  • the candidate word is a particle
    • n.a.
    • They got up a petition on Monday. They got it up.
    • n.a.
    • n.a.
  • other tests are needed
    • n.a.
    • I got up the hill. *I got it up.
    • n.a.
    • n.a.

Test 7.3.EN - [AD-INS] - Adjunct insertion

Is an insertion of a circumstantial adjunct prohibited between the governing verb and the candidate word?

  • the candidate word is a particle
    • n.a.
    • I took off my clothes at once. *I took at once off my clothes.
      She always tries to take in her clients. *She tries to take always in her clients.
    • n.a.
    • n.a.
  • other tests are needed
    • n.a.
    • He has been off alcohol recently. He has been recently off alcohol.
    • n.a.
    • n.a.

German-specific tests for distinguishing particles from prepositions and verbal prefixes

The following tests concern German words which can be both a particle and either a preposition or a verbal prefix, depending on the context, e.g. mit, um, vor, etc. If a candidate word passes any of the three following tests it can be categorized as a particle.

Test 7.1.DE - [FIN-PART] - Sentence-final particle

Does the candidate word occur at the end of the sentence or can the sentence be reformulated so as to put the candidate word at the end?

  • it is a particle
    • Kommst Du mit? come you with? are you coming?
      Ich schlage vor allen zu verzeihen. I propose to forgive everyone Ich schlage es vor I propose it
      Der Mülleimer wurde umgefahren. The trash bin was knocked down Er fuhr den Mülleimer um. He knocked down the trash bin
    • n.a.
    • n.a.
  • other tests are needed
    • Kommst Du mit jemandem? Are you coming with someone? *Kommst Du jemandem mit?
      Er umfuhr den ganzen See mit dem Fahrrad. He drove around the whole lake with a bike *Er fuhr ihn um.
    • n.a.
    • n.a.

Test 7.2.DE - [SEP-PART] - Separable particle

Can the verb and the candidate word be spelled both separately and together?

  • it is a particle
    • Passen Sie auf die Autos auf! Be careful with the cars! Sie müssen auf die Autos aufpassen! You must be careful with the cars!
      Er fuhr das Schild um. He drove over the sign Er sollte das Schild nicht umfahren He should not drive over the sign
    • n.a.
    • n.a.
  • other tests are needed
    • Er umfuhr den ganzen See mit dem Fahrrad. He rode around the whole lake with a bike *Er fuhr den ganzen See mit dem Fahrrad um.
      Sprechen Sie mit ihm! Speak with him! *Sie sollen ihm mitsprechen.
    • n.a.
    • n.a.

Section 8

Annotation management

This section groups the documentation on practical aspects of the annotation campaign management. Some of these aspects are specific to this shared task, such as the edition of examples by language leaders and the use of the annotation platform FLAT. Others are more generic and concern the guidelines in general, such as the FAQ section.


Section 8.1

Frequently Asked Questions (FAQ)

Annotators often face questions and challenging examples. When several annotators ask the same question, we will update the list of frequently asked questions.

However, we suggest that language teams set up another communication platform to deal with questions that are specific to a language. This can take the form of a shared online document, a wiki, a dedicated bug tracking system or mailing list. We also suggest keeping track of decisions taken considering borderline examples (with a list of expressions to which the decision applies). These should be kept in a centralized document or page that all annotators can access.

Whenever you think that a question can also be interesting to other languages, please notify the organizers and we will try to update this page.

  1. How to define an unexpected change in meaning​?
  2. How to annotate lexicalized words which belong to contractions and compounds?
  3. How to annotate coordinated​ VMWEs sharing some components?
  4. How to annotate elliptical​ occurrences of VMWEs?
  5. How to annotate VMWEs that seem to belong to more than one category​?
  6. How to annotate embedded​ VMWEs?
  7. Are existential expressions with there is/are considered VMWEs?
  8. How to categorize VMWEs which seem LVCs​ but do not pass all LVC tests?
  9. Why are verb+noun constructions with pure​ operator verbs​ (to commit, to make, to have etc.) considered LVCs?
  10. Does the IReflV category include verbs with non-­reflexive clitics?
  11. Should nominalizations​ of VMWEs be annotated?
  12. How to express hesitation between different VMWE categories?
  13. In test 9, how can one decide whether an abstract noun is an event or a state?
  14. How does one decide if a more or less frozen determiner is a lexicalized VMWE component?
  15. Should I annotate compound and serial verbs as VMWEs? Of which category?
  16. If an LVC contains a complex (fixed) NP as a dependent, should I include the whole NP or just the head?

1. How to define an unexpected change in meaning​?

Check the glossary entry that defines undexpected change in meaning

2. How to annotate lexicalized words which belong to contractions and compounds?

In some languages prepositions, clitics and determiners are subject to contractions (i.e. they yield multi­word tokens, MWTs). Tokenizers might not handle contraction splitting properly. In this case, a lexicalized component of a VMWE can be merged with an external word:

  • n.a.
  • haberse suicidado have+REFL suicided committed suicide
  • n.a.
  • n.a.

A similar problem occurs in languages with productive compounding, where a lexicalized component of a VMWE and a free modifier can build up a multitoken word (since compound splitting might not be a standard feature of a tokenizer):

  • unter Drogeneinfluss stehen to be under the influence of drugs
    Heisshunger haben to have hot hunger to be ravenously hungry
  • n.a.

Since the current annotation format is token­-based, we prohibit correcting tokenization errors and compound splitting by the annotators for the sake of coherence. Therefore the annotation of such contractions and compounds finds no fully satisfactory solution in our schema. We propose to annotate a whole MWT each time it contains a word which is part of a VMWE. Annotators should add a textual comment about the mixed status of this MWT:

  • Drogeneinfluss → MWT containing a lexicalized VMWE component Einfluss and an external word Drogen
    Heisshunger → MWT containing a lexicalized VMWE Hunger and an additional modifier heiss
  • haberse → MWT containing a lexicalized VMWE component se and an external word haber
  • n.a.
3. How to annotate coordinated​ VMWEs sharing some components?

A component shared by two or more coordinated VMWEs should be annotated as belonging to ​both of them.

  • Regeln und Richtlinien aufstellen to set up rules and guidelines to draw up rules and guidelines aufstellen must be annotated both as part of​ to Regeln aufstellen to lay down rules and of Richtlinien aufstellen to draw up guidelines
  • to have a walk or a ride have must be annotated both as part of​ to have a walk and of to have a ride
  • odprawić mszę i pokutę celebrate a mass and a penanceodprawić should be annotated both as part of​ odprawić mszę to celebrate a mass and of odprawić pokutę to celebrate a penance
  • imeti dober želodec in dobre živce to have a good stomach to bear something well and good nerves to be mentally strong imeti have must be annotated both as part of​ imeti dober želodec and of imeti dobre živce
4. How to annotate elliptical​ occurrences of VMWEs?
Instances of a VMWE in which all but one lexicalized component were omitted or pronominalized should not be annotated. This concerns in particular the cases where a nominal component is concerned by anaphora. For instance, in this decision was hard but he took it, we should not annotate take and decision or it as an instance of a VMWE. We annotate only the transformations in which the syntactic dependency link between the head verb and the ​lexicalized ​complement is preserved, e.g. the decision which he took.
5. How to annotate VMWEs that seem to belong to more than one category​?

Such hesitation issues should normally be solved by the decision trees 1 and 2. For instance, consider the German expression sich eine Frage stellen SELF a question put to doubt. It may seem to belong to both IReflV, since sich is required only if stellen co-occurs with Frage, and LVC, since Frage keeps its original meaning and stellen brings no additional meaning. However, test 7 [1DEP] indicates that an expression like this should be annotated as ID, since the verb has more than one lexicalized syntactic dependent.

Similarly, the French expression avoir peur have fear to be afraid seems to have features of an ID. Unlike most LVCs, ­it does not allow a determiner *avoir une peur have a fear, except when the noun is modified avoir une grande peur have a great fear. However, test 8 [CATEG] in decision tree 2, and the LVC-­specific decision tree indicate that it belongs to the LVC category.

6. How to annotate embedded​ VMWEs?

Candidate VMWEs embedded in other VMWEs should be annotated only if they have a VMWE status also outside the particular context. For instance, the VMWE to let the cat out of the bag should be annotated as ID, and its embedded VMWE to let out as a VPC.

On the other hand, the French expression se faire des idées SELF make DET.PL ideas to imagine things which are not true, se faire should not be annotated as IReflV, since it is not inherently reflexive as a standalone verb+clitic combination.

7. Are existential expressions with there is/are considered VMWEs?

Hesitations about a possible LVC status can arise with respect to existential constructions with nouns introducing events or properties (see test 9 [N­-EVENT]) as in:

  • es gibt Beschwerden there are complaints
  • there are complaints
  • il existe des plaintes it there has complaints there are complaints
  • n.a.
  • queixas has complaints there are complaints
  • imeti pripombe have complaints there are complaints

Namely, the noun keeps its original sense and the existential verb to be or to have brings no additional meaning. However, a candidate LVC must also pass test 12 [V­-REDUC]. This requires the modification of the noun by the verb's subject, which is impossible with impersonal and empty subjects like there. Therefore, such candidates cannot be LVCs.

Note,​ however, that existential expressions themselves can be VMWEs of type ID. For instance, in the French example il y a des plaintes it there has complaints there are complaints, two dependents of the verb a has are lexicalized: il it and y there, therefore it is an ID (see test 7 [1DEP]).

8. How to categorize VMWEs which seem LVCs​ but do not pass all LVC tests?

If at least one of the five LVC tests (9 to 13) is not passed, the candidate is not considered an LVC. For the sake of a deterministic VMWE categorization and higher inter-­annotator agreement, we admit a definition of an LVC which might seem more restrictive than some linguistic studies usually assume. Thus, we exclude from the LVC scope:

  • expressions in which the verb's syntactic subject is not necessarily the noun's semantic subject, like to give courage or to make an impression. These candidates do not pass test 12 [V-­REDUC].
  • expressions where the lexicalized nominal dependent of the verb is its subject, as in the problem lies in something; these candidates do not pass test 12 [V-­REDUC].
  • expressions with aspectual verbs, as in to start, to pursue, to stop a walk. These do not pass test 11 [V-­LIGHT] since they add (aspectual) semantics to the noun. The only exception is when the noun itself is already aspectual, as in to come into bloom
9. Why are verb+noun constructions with pure​ operator verbs​ (to commit, to make, to have etc.) considered LVCs?

Pure operator verbs, i.e. such verbs which never have any semantics per se but only carry the grammatical (tense, mood etc.) information, seem to contradict the intuition behind a VMWE. Namely, they usually select a whole semantic class of nouns. For instance to commit selects any negative act (a crime, a suicide, a theft) and to perform selects any activity (a task, an experiment, a miracle). In this sense, their complements resemble open slots and the whole combinations resemble collocations. However, for the sake of a deterministic VMWE categorization and higher inter­-annotator agreement, we do include verb+noun combinations with pure operator verbs, such as to commit a crime and to ​perform a task, into the LVC category. This is because such combinations pass all 5 LVC­-specific test (9 through 13).

We could have organized decision tree 1 differently and exclude such cases from the VMWE scope by eliminating the LVC hypothesis. Then, to commit a crime and to perform a task would pass none of the tests from 1 to 5 and would be eliminated. However, we would also have to eliminate prototypical LVCs like to make a decision (it passes none of the tests from 1 to 5 either), which we do wish to take in as an LVC.

10. Does the IReflV category include verbs with non­-reflexive clitics?

No, the IReflV category only includes (some) combinations of a head verb with a reflexive clitic. As indicated in the borderline cases page of IReflV category, other pronouns, whenever lexicalized, trigger the ID category. Recall that whenever more than one dependent of the verb is lexicalized (including or not a reflexive clitic), the VMWE is always categorized as an ID

  • sich Fragen stellen SELF questions put to doubt
  • s'en aller SELF of-there go to leave
  • n.a.
  • ucvreti jo to escape her to escape something/someone by running
11. Should nominalizations​ of VMWEs be annotated?

The only nominal VMWE variants within our annotation scope are those:

  • headed by the gerund stemming from the head verb of the VMWE - taking of the decision, and
  • in which a noun stemming from a VMWE is modified by a participle or a relative clause headed by the verb stemming from the same VMWE - the decisions taken yesterday, the decision which he took.

Other nominalizations are excluded:

  • Wortbruch word-break a promise which has not been hold
  • a break-down, a forget-me-not
  • la prise en compte the taking into account the fact of taking something into account, peut-être may-be maybe, porte-feuilles carry-sheets wallet
  • zabawa czyimś kosztem a play at someone else's expenses derived from bawić się czyimś kosztem to enjoy oneself at someone else's expenses
  • šala na tuj račun a joke at someone else's expenses derived from šaliti se na tuj račun to play a joke on someone

For practical reasons (e.g. compatibility with an existing annotation, or usefulness for a particular application) they can be considered language-specific VMWEs but then a new category should be defined for them, so as to keep the universal and the quasi­-universal categories intact

12. How to express hesitation between different VMWE categories?

Once identified in a text, each VMWE is to be assigned to exactly one category. Note that in this version of the guidelines we no longer admit "hesitation labels" (e.g. LVC/ID) used in the pilot annotation. Hesitation can, however, be expressed in a comment and a particular value of the annotator's confidence assigned to a particular VMWE occurrence.

13. In test 9, how can one decide whether an abstract noun is an event or a state?

The goal of test 9 is to identify whether a nouns is predicative, that is, whether it requires some semantic arguments. We talk about events and states to circumvent the question of whether a noun is predicative. Here, they are understood very largely as roughly corresponding to binary and unary predicates. For instance, we consider that an event is something that happens, and can be related to an action, activity, process or phenomenon. A state is understood as a property that may or may not change over time, including feelings, sensations, permanent and temporary properties and relations between entities. These are a very generic definitions that go far beyond the scope of what is commonly understood as an event or state.

While it is hard to define required tests to identify a predicative noun, there are some useful clues that can be used for abstract nouns (sufficient criteria).

Verb paraphrase: Is the abstract noun derivationally related to a verb with the same semantics?

  • John makes a decision = John decides
    John has a walk = John walks

Adjective paraphrase: Is the abstract noun derivationally related to an adjective with the same semantics?

  • John has courage = John is courageous → and, more generally, characteristics and attributes
    John has hunger/thirst = John is hungry/thirsty → and, more generally, physical sensations
    John has passion/fear/anger = John is passionate/afraid/angry → and, more generally, feelings and emotions
    John has problems/difficulties = Something is problematic/difficult for John → and, more generally, states

Synonym verb or adjective paraphrase: Does the abstract noun have a synonym/hypernym derivationally related to a verb or adjective with the same semantics?

  • John and Mary reach a consensus = John and Mary agree consensus has no corresponging verb or adjective, but agreement is a synonym
    John has a chance to do something = John is likely to do something chance has no corresponding verb or adjective, but likelihood is a synonym

For many classes of abstract nouns, it can be tricky to apply the tests above. We advise listing in a separate document those classes of nouns that pass test 9 in your language. We suggest considering that the following categories pass test 9:

  • Illnesses, symptoms and health conditions:
    John has a flu = John is ill (illness is a hypernym of flu)
    Relations:
    John has contact with somebody = John contacts somebody
    John has an affair with somebody = John is involved with somebody (involvement is a synonym of affair)
    Mental content (internal to a cognizer):
    John has a worry = John worries
    John has an idea = John thinks (thought is a synonym of idea)
    John has an opinion = John believes (belief is a synonym of opinion)

Please notice that events and states that have no semantic arguments do not pass test 9, even if they have verbal/adjectival paraphrases:

  • Natural phenomena: rain, snow, tornado, flood, earthquake
    Informational content (external to a cognizer): information, news

Finally, notice that not any verb + predicative noun combination forms an LVC. Additionally, the verb needs to be "light", not adding semantics to the noun. The remaining LVC tests (tests 10 to 13) guarantee this.

14. How does one decide if a more or less frozen determiner is a lexicalized VMWE component?

Most of the time, it is easy to test whether a determiner is lexicalized by searching alternatives in corpora (or on the web). For instance, the is lexicalized in to kick the bucket because searches for other determiners (this, a, some, three, many, etc.) either do not return any result or return only literal uses of this verb phrase.

However, borderline cases do exist, in which alternatives are rare but possible, specially for LVCs and decomposable IDs. For instance, while the standard form of the idiom spill the beans forbids some determiners (#spill three/twenty beans), it is possible to find some variation (spill these/many/all/my/his/more/no beans).

We argue that the selection of some determiners (but not all) by a VMWE is comparable to selected prepositions for verbs. Thus, it can be seen as a regular grammatical phenomenon, suggesting that when the determiner varies, then it should not be included. In some VMWEs, though, determiner variation may be considered as marginal and/or incorrect, which means that it should be included in the scope of the annotated VMWE.

In short, determiners can exhibit limited variability. As a consequence, each language should document their decisions as to whether to include them or not for particular VMWE classes, to ensure consistency.

  • avoir la pêche have the peach to have much energy
    avoir de la chance have some luck to be lucky
    avoir l'occasion to have the opportunity

After annotation, we suggest that LLs use the provided analysis scripts to detect inconsistencies in the annotation of the same VMWE (e.g., including or not a determiner). They can then take an arbitrary decision and homogenise all annotated occurrences.

15. Should I annotate compound and serial verbs as VMWEs? Of which category?

It depends. Most of the languages covered by the shared task for the moment do not have this kind of verb. The guidelines were written having these languages in mind, so they are not clear about compound verbs

In many Indo-European languages (including Germanic, Romance and Balto-Slavic families), verbal chains using auxiliary and modal verbs are used to express tense, modality and aspect. This is a regular linguistic phenomenon that can be applied to any verb and should not be annotated.

On the other hand, some languages like Maltese have many compound verbs that do not necessarily express tense, mood and modality. We suggest that, when the verb combinations regularly combine with any other verb adding a given meaning, they should not be annotated. Future versions of these guidelines should study the need for a new category for compound verbs, in order to cover this phenomenon.

In short, verbal chains should only be annotated as ID when they are idiomatic:

  • laisser tomber let fall to give up
    vouloir dire want say to mean
    faire tomber make fall to drop
    vouloir changer want change to want to change
  • dak x'mar jgħid ilbieraħ that (person) what'he-went he-says yesterday what the hell did he say yesterday
  • querer dizer want say to mean
    querer falar want speak to want to speak
16. If an LVC contains a complex (fixed) NP as a dependent, should I include the whole NP or just the head?

The guidelines determine that only lexicalized components should be annotated. Therefore, we suggest that, in such cases, if the NP is compositional, only the head of the NP is included in the scope of the LVC. This may lead to the annotation of odd LVCs that actually never occur by themselves without a modifier. This is not a problem and is already the case for other VMWEs, e.g. the ones that only occur with a determiner, but the determiner is not lexicalized. The only cases where the NP should be included as a whole is if the complement is a non-compositional MWE, so that it would not make any sense to annotate only the head.

  • παίζω το χαρτί του ευρωσκεπτικισμού to-play the paper the.SG.GEN euroscepticism.SG.GEN to use the asset of euroscepticism, to use euroscepticism as an asset
    κάνω στάση εργασίας to-make stop work.SG.GEN to go on strike, to strike → the expression στάση εργασίας is non-compositional (term)
  • présenter un Syndrome Coronairien Aigu to present an acute coronary syndrome
    mener une vie de débauche to have a life of pleasures
    faire un faux pas make a false step to commit a faux pas → the expression faux pas is non-compositional
  • mieć wyrzuty sumienia to have reproaches of the conscience to feel guilty
  • fazer uma sessão de fotos/autógrafos to make a photo/autograph session
    fazer roleta russa to make russian roulette to play russian roulette → the expression roleta russa is non-compositional
    ter uma situação financeira/profissional/estável to have a financial/professional/stable situation

Notice that these suggestions also apply to LVCs whose nominal complements are introduced by prepositions (i.e. verb+PP LVCs). As usual, the preposition should be included if it is lexicalized and then the NP introduced by the preposition is analyzed exactly as described above.

If the complex dependent is an acronym, you may want to add the textual comment "PART" to indiate that only part of the full version is lexicalized (generally, the head), just like for contractions and compounds.


Section 8.2

Adding new examples in your language

It is often useful to have examples of a phenomenon shown in your own language. We collect these examples for each language using an online shared spreadsheet, and we present these examples as in the template below:

  • MWEs with their lexicalized components in Bulgarian are indicated like this.
  • MWEs with their lexicalized components in Czech are indicated like this.
  • MWEs with their lexicalized components in German are indicated like this.
  • MWEs with their lexicalized components in Greek are indicated like this.
  • MWEs with their lexicalized components in English are indicated like this.
  • MWEs with their lexicalized components in Spanish are indicated like this.
  • MWEs with their lexicalized components in Farsi are indicated like this.
  • MWEs with their lexicalized components in French are indicated like this.
  • MWEs with their lexicalized components in Hebrew are indicated like this.
  • MWEs with their lexicalized components in Croatian are indicated like this.
  • MWEs with their lexicalized components in Hungarian are indicated like this.
  • MWEs with their lexicalized components in Italian are indicated like this.
  • MWEs with their lexicalized components in Lithuanian are indicated like this.
  • MWEs with their lexicalized components in Maltese are indicated like this.
  • MWEs with their lexicalized components in Polish are indicated like this.
  • MWEs with their lexicalized components in Portuguese are indicated like this.
  • MWEs with their lexicalized components in Romanian are indicated like this.
  • MWEs with their lexicalized components in Swedish are indicated like this.
  • MWEs with their lexicalized components in Slovene are indicated like this.
  • MWEs with their lexicalized components in Turkish are indicated like this.
  • MWEs with their lexicalized components in Yiddish are indicated like this.

Examples are preceded by the 2-letter language code in parentheses (e.g. EN for English). You can control what languages are shown and hidden by toggling the header buttons. See the section on notation for more information.

In order to see the ID of all examples, make sure the ID button is toggled on the header of the current page. Now look at the template above. You should see this ID: 8.2_A_template-mwe. The 8.2 represents the current section number (in bold in the TOC on the left). The letter A (or B, C, D...) indicates the position of the example inside this page. The name template-mwe is a more human-readable identifier for this example.

The spreadsheet

The spreadsheet can be accessed through this link to Google Docs. From time to time, the guidelines will be updated based on the contents of the spreadsheet.

The spreadsheet is divided into the following columns: ID-section, ID-order, ID-name, lang, HTML-example and Status. In order to edit an example, you need to look at its ID, and then find the appropriate place in the spreadsheet. For example, for the ID 8.2_A_template-mwe, you should look for the lines with ID-order 8.2 (towards the bottom of the spreadsheet). Then look for ID-order A on the second column. Check that the third column contains the ID-name template-mwe.

You will then see a sequence of examples, one for each language. The examples in the template above were collected from this spreadsheet. The rest of this page will teach you how to add you own examples to this spreadsheet.

When adding examples for your own language, we advise you to always start by copying an example that has already been filled in for another language, and then adapting it to your language. Remember that you should not translate an example, but rather find an example of the target phenomenon in your language, regardless if it is a direct translation or not. Therefore, before entering an example in the spreadsheet, you should always check its context using its ID. A quick way to do this is to search (Ctrl+F) the ID of an example in the full-text version of the guidelines (where the ID button is on).

If we notice something wrong or suspicious with your example, we may correct it (e.g. you forgot a closing <lex> tag). If we cannot correct the example, we will ask you to check it by using the last column of the spreadsheet, Status.

If you think that a phenomenon is not relevant for your language or that examples are not needed for a given phenomenon, just leave the corresponding cell empty.

Examples with tags

If you have not done it yet, open the spreadsheet and look for the entry 8.2_A_template-mwe. Let us analyse the English example (look for EN in the fourth column). The fifth column should read as follows:

MWEs with <lex>their lexicalized components</lex> in English are indicated like this.

As you can see, this is exactly the same text that was shown in the template above, except that the lexicalized components are surrounded by the tags <lex> and </lex>. When writing an example, you will often have to use XML tags. We describe below the most important ones.

Bold: you should surround lexicalized components with the tags <lex> and </lex>. For example, consider the code He will <lex>take</lex> a <lex>shower</lex>. This code is presented as follows:

  • He will take a shower

Red: By default, all examples are typeset using the language's color. Sometimes, examples contain counter-examples, that is, something that looks like a VMWE but that should not be annotated. The <nmwe> and </nmwe> tags can be used to represent these non-MWEs, which will be shown in red. For example, the code <nmwe>This is not an MWE</nmwe> yields the following:

  • This is not an MWE

Underlining: Some examples use underlining to focus on some of the words. This can be done with the tags <u> and </u>. For example, the code <nmwe>This is <u>not</u> an MWE</nmwe> yields the following:

  • This is not an MWE

Gloss icon: You should also provide English glosses and translation for your examples. Glosses and translations should always be provided in English, and never in another language. Glosses must be surrounded by the tags <gl> and </gl>. Translations must be surrounded by <trans> and </trans>. English examples can also use the tag <trans> to indicate the meaning of an idiomatic expression. For example, the code <lex>défendre</lex> son <lex>bifteck</lex>  <gl>defend one's beefsteak</gl>  <trans>to defend one's interests</trans> generates the example below. Notice that the code for gloss and translation is only shown when the user hovers the gloss icon. For consistency, you should always follow this order: original text  <gl>the gloss</gl>  <trans>the translation</trans>.

  • défendre son bifteck defend one's beefsteak to defend one's interests

Normal: Some examples are presented followed by an explanation, in normal font (black color). This is done by using the tags <n> and </n>. For example, the code some words <n>→ further details</n> generates this:

  • some words → further details

Newline: Sometimes, one may want to add several examples for a single phenomenon in the same language. If they are rather long, they should be presented on separate lines using the tag <br/>. This tag is special as it does not come in pairs: you only write one tag with the slash at the end (technically, it is an empty XML element). For example, the code example 1 <br/> example 2 <br/> example 3 will be rendered as follows:

  • example 1
    example 2
    example 3

Inside normal text, you may also use tags such as <i> (italics), <strong> (bold), as well as other HTML tags. If another language is using a given tag for an example, you can use it too. Otherwise, try to stick to the established conventions.


Section 8.3

Annotation platform FLAT

The annotation will be performed using the online annotation platform FLAT.


Section 9

Glossary

Cranberry word

A cranberry word is a token that does not have the status of a stand-alone word, has no proper distribution, and no stand-alone meaning, but it may have a syntactic category and an inflection paradigm. It only occurs in a particular expression (or a closed list of expressions) and can never be found in different contexts, as the underlined words below:

  • jemandem Angst einjagen to-someone chase-in fear to frighten someone
    jemanden einen Besuch abstatten
  • to go astray
  • se mettre martel en tête SELF put a hammer in head to worry a lot
  • odsądzić kogoś od czci i wiary to refuse honor and faith to someone to drag sb's name through the mire/mud, to damage someone's reputation by saying insulting things about them
  • biti si kvit owe nothing to somebody; each party got what it deserved/asked for

Candidate VMWE

A candidate VMWE is group of tokens that seems to have some idiosyncrasy of the type listed in the MWE definition. However, further tests are required to decide whether it is to be annotated as a true VMWE or, instead, it was a false alarm. The lexicalized elements of candidate VMWEs are highlighted in bold.

Syntactic operator

A syntactic operator is a verb that only bears the grammatical features (person, number, tense and mood) but adds no semantics to the complement. This definition is more restricted that the traditional notion of a light verb. Notably, aspectual light verbs (which adds aspectual semantics to the complement), as in to start a walk, to give courage, are not considered operators. Operators are typical head verbs of light-verb constructions:

  • eine Entscheidung treffen to make a decision
    Angst haben to have fear
    ein Verbrechen begehen to commit a crime
  • to make a decision
    to have fear
    to commit a crime
  • oddać hołd to give-back tributeto pay tribute
  • priti v poštev to come into consideration to consider

Collocation

A collocation is a word co-occurrence whose idiosyncrasy is of statistical nature only. Collocations are not considered VMWEs in this task:

  • eine Anfrage beantworten to answer a request, das Diagramm zeigt the diagram shows, mit einem Bus fahren to take a bus
  • the graphic shows
    drastically drop
  • zalać rynek to flood the market to dominate the market
  • občutno zmanjšati significantly reduce
    drastično zmanjšati drastically reduce

Canonical form

The canonical form of a candidate VMWE is a prototypical verbal phrase preserving the same meaning.

  • the canonical form of das Herz welches er bricht the heard which he breaks is er bricht ihr das Herz he breaks her heart
    the canonical form of Wortbruch word-break a promise which has not been hold is Wort brechen to break the word not to hold a promise
  • the canonical form of the heart which he broke is he broke (her) heart
    the canonical form of making an impression on him is (she) makes an impression on him
  • the canonical form of decyzje, które podjął decisions which he took is podjął decyzjęhe took a decision
  • the canonical form of decisão nunca antes tomada decision never before taken is tomar uma decisãotake a decision
  • the canonical form of odločitev, ki jo je sprejel decisions which he took is sprejeti odločitevhe took a decision

Reflexive clitics

Reflexive clitics are a special type of object pronoun that refers to the subject of the verb. See the guidelines of IReflV category for more details. In English, the reflexive is expressed as a suffix -self appended to object pronouns. However, many languages have special reflexive pronouns, which are a relatively small closed class of words:

  • mich, dich, sich, uns, euch
  • me, te, se, nous, vous
  • mi, ti, si, ci, vi
  • się, sobie
  • me, te, se, nos, vos
  • se, si

Particles

Particles are hard to distinguish from homographic prepositions:

  • ich schlage vor allen zu verzeihen I propose to forgive everyone
    ich schlage vor allen Dingen die Sahne I mix prior to anything the cream
  • to get up a petition
    to get up a hill
  • n.a.
  • n.a.
  • sem za njen predlog I support her proposal
    sem za hišo I'm standing behind the house

The fundamental property to capture is that a preposition governs a prepositional group, while a particle functions as an adverbial. In some languages particles can also be homographic with verbal prefixes:

  • das Schild um|fahren to drive over the sign
    den See umfahren to drive around the lake

Most tests discriminating particles from prepositions and prefixes are language-specific and should be proposed by the individual language team. See the guidelines on particles for more details.

Unexpected change in meaning

An unexpected change in meaning, signaled by the # (hash) sign, is a phenomenon referred to in generic and category-specifc tests, based on the notion of inflexibility​. Inflexibility is verified by attempting a regular modification which yields an unexpected acceptability or meaning shift, that is, beyond what would be expected by the initial modification. In order to judge whether a shift in acceptability or meaning is unexpected, one can try to apply the same modification to a similar compositional construction, using analogy. For example, book and word have synonyms including notebook/novel/volume/publication and term/expression/headword, respectively. However, while the slight shift in the meaning of book is compositionally reflected in:

  • Ich gebe dir mein Buch I give you my book Ich gebe Dir mein(e) Publikation/Doktorarbeit/Kapitel/Novelle/Ausgabe I give you my publication/thesis/chapter/novel/edition
  • I give you my book I give you my notebook/novel/volume/publication
  • daję ci książkęI give you a book daję Ci zeszyt/powieść/tom/publikację I give you a notebook/novel/volume/publication
  • dam ti knjigoI give you a book dam ti zvezek/roman/publikacijo I give you a notebook/novel/publication

the same does not hold for:

  • Ich gebe Dir mein Wort I give you my word, i.e. I promise #Ich gebe Dir mein(e) Publikation/Doktorarbeit/Kapitel/Novelle/Ausgabe I give you my publication/thesis/chapter/novel/edition
  • I give you my word #I give you my notebook/novel/volume/publication
  • daję ci słowo I give you a wordI give you my word daję Ci wyraz/sylabę/czasownik I give you a word/syllable/verb
  • dam ti besedo I give you a wordI give you my word #dam ti izraz/zlog/glagol I give you a word/syllable/verb

I.e. the latter replacement produces an unexpected change of meaning that goes beyond the semantic difference between the original and the replaced word. Thus, Test 2 [LEX] applies and:

  • jmd. sein Wort geben to give one's word to s.o.
  • to give one's word to someone
  • dać komuś słowo to give someone a wordI give one's word to someone
  • n.a.

is a VMWE.

Similarly, Test 22 [V+PART-DIFF-SENSE] refers to an unexpected change in meaning of the verb stemming from the addition of the particle. We do so by checking if the situation described by the verb with the particle implies the one described without the particle:

  • Ich fange das Buch an I begin to read the book does not imply Ich fange das Buch I catch the book
    Ich lege das Buch auf dem Tisch ab I put down the book on the table implies Ich lege das Buch auf den Tisch I put the book on the table
  • to check in upon arrival does not imply to check upon arrival (it is VPC)
    to look up into the sky implies to look into the sky (it is not a VPC)
  • n.a.
  • n.a.

Ungrammaticality

Ungrammaticality of an utterance is its non-conformity to the syntactic or semantic rules of the language. We suppose that ungrammaticlity judgement is a basic competence of a native speaker of a language. Ungrammatical examples are signaled with * (star).

Section 10

Contact

These guidelines were written by many authors. If you have questions, comments, suggestions, you can send them to the shared task mailing list or contact one of the organisers. They will then forward your message to technical team, guidelines editors, language group leaders or language leaders.

Email addresses are available on the who-is-who document.