Annotation guidelines
PARSEME shared task on automatic identification of verbal MWEs - edition 1.1 (2018)


New edition of the shared task: these are the guidelines for edition 1.1 (2018). For the most up-to-date version, please check the guidelines for edition 1.2 (2020).

Welcome to the official annotation guidelines of the PARSEME shared task 1.1 on verbal MWE identification!

For previous versions, you can check the index of versions. See also what is new in the guidelines version 1.1 as compared to version 1.0.

Here, you'll find detailed definitons, examples and linguistic tests to guide your decision as to whether a given combination in your language is a verbal multiword expression. Use the table of contents on the left to navigate between sections and the header buttons to show/hide examples.

In addition to these general guidelines, language teams may also provide extra documentation, like lists of borderline cases and decisions taken concerning them. They should all be compatible with these general guidelines.

If you spot errors or if something remains unclear after reading the guidelines, please contact us and we'll do our best to correct the problems.

Authors and contributors (alphabetical order)

Archna Bhatia, Claire Bonial, Marie Candito, Fabienne Cap, Silvio Cordeiro, Vassiliki Foufi, Polona Gantar, Voula Giouli, Carlos Herrero, Uxoa Iñurrieta, Mihaela Ionescu, Alfredo Maldonado, Verginica Mititelu, Johanna Monti, Joakim Nivre, Mihaela Onofrei, Viola Ow, Carla Parra Escartín, Manfred Sailer, Carlos Ramisch, Renata Ramisch, Monica-Mihaela Rizea, Agata Savary, Nathan Schneider, Ivelina Stonayova, Sara Stymne, Ashwini Vaidya, Veronika Vincze, Abigail Walsh.

Table of contents


Section 1

Definitions and scope

In this shared task, we aim at identifying verbal Multiword Expressions (VMWEs) in running texts in about 20 languages from several language families. VMWEs are of particular interest to the PARSEME COST action since they frequently introduce discontinuity and long-distance dependency issues, which are central to deep parsing and to other Natural Language Processing tasks.

This document defines the annotation scope and puts forward a classification of VMWEs together with linguistic tests for their identification and categorization.


Section 1.1

Notation

The notational convention used throughout the document is the following:

  • Italic is used to display example sentences and expressions.
  • Bold is used to highlight the lexicalized components of a candidate VMWE inside an example (positive or negative).
  • Underline is used to focus the reader's attention on the important part of an example
  • An asterisk (*) precedes ungrammatical examples.
  • A hash (#) precedes examples where a standard modification yields unexpected meaning shifts with respect to the original expression.
  • Different colors are used to display examples:
    • Red is used for counter-examples, that is, expressions which look like VMWEs but are not one, whatever the language.
    • According to the language, different colors are used for other examples, that is, positive examples of the phenomenon being discussed:
      • Shades of green are used for positive examples in Germanic languages.
      • Shades of blue are used for positive examples in Romance languages.
      • Shades of orange are used for positive examples in Slavic languages.
      • Shades of pink are used for positive examples in other language families.
  • Examples are preceded by the 2-letter language code in parentheses
  • Examples can be shown and hidden using the toggle buttons in the header.

Section 1.2

Words and tokens

While the definition of an MWE inherently relies on the notion of a word, manual annotation and automatic identification of VMWEs in our task is performed on texts which are automatically tokenized. It is therefore important to understand the distinction between words and tokens in the context of VMWEs.

A word is a linguistically (notably semantically) motivated unit. The detection of words is, thus, language-dependent and annotation experts should have a clear idea of how to define it for their own language (even if this definition proves hard in general).

A token is a technical and pragmatic notion, defined according to more or less linguistically motivated clues and depending on the particular tokenization tool at hand. Note that the notion of a token is ambiguous in NLP. It can also mean an individual occurrence of a certain linguistic unit, as opposed to a type, i.e. the set of all surface realisations of a unit. In these guidelines, we refrain from using this seconf sense.

Tokens should ideally be as close as possible to words. However, in practice - due to the hardness of the (automatic) tokenization task - the relation between tokens and words is not always 1-to-1. The following cases occur:

  • A token coincides with a word:
    • вземам, решение, наяве, бял, на, се, д-р
    • mít, hlad, se, úžas
    • einen, Spaziergang, machen, Überraschung
    • κάνω, μία, βόλτα, έκπληξη
    • take, a, walk, astonishment
    • dar, un, paseo, sorpresa, maldecir, bienvivir
    • ibilaldi, bat, egin, ezuste
    • ،من کتاب، دوست
    • faire, une, promenade, étonnement
    • tóg, siúl, ionadh
    • napraviti, jedan, šetnja, začuđenost
    • tesz, egy, séta, meglepetés
    • mengambil, sebuah, berjalan, heran
    • fare, una, passeggiata, sorpresa
    • 取る, その, 歩く, 驚き
    • ferħ, libes, sabiħ
    • robić to do, na on, dokładność precision
    • dar, uma, caminhada, supresa
    • face, o, plimbare
    • iti, na, en, sprehod, začudenost
    • gå, på, promenad, förvåning
    • 采取, 一个, 步行, 惊愕
  • Several tokens build up one word, like in abbreviations, possessive markers, words with "accidental" separators, inflected or derived forms of foreign names, etc. In this case we speak of a multitoken word (MTW): The pipe symbol '|' indicates token separation in these examples
    • т|.|н|. etc.
      год|. year
    • z|.|B|. for instance
      Wie geht|'|s How goes it How are you
    • κ. κύριος Mister
      υπΔρ υποψήφιος διδάκτορας PhD candidate
    • M|. Mister
      pp|. pages
      Pandora|'|s
    • A|/|A|. a la atención de for the attention of
      a|/|f|. a favor in favor
      Rte|. remitente sender
    • etab|. eta abar and so on
    • می|-|روم، آیت|-|الله، کتاب|-|ها
    • aujourd|'|hui today
    • danas today
    • időjárás|-|jelentés weather forecast
    • vice|-|presidente vice-president
    • libs|et she wore
    • Chomsky|'|ego of Chomsky
      SMS|-|ować to write an SMS
    • vice|-|presidente vice-president
    • prim|-|ministru prime minister
      d|-|voastră polite "you"
    • g|. Mister
      str|. pages
      le|-|to
    • EU|:|s EU's
  • One token can contain several words, like in contractions and compounds. In this case we speak of a multiword token (MWT): See also the representation of MWTs in Universal Dependencies The precise word forms cannot always be straightforwardly deduced from the MWT containing them and vice versa, as in don't, della, du, etc.
    • вагон-ресторант train carriage+restaurant train buffet
    • Schulaufgabe = Schule+Aufgabe school+exercisehomework
      Apfelbaum = Apfel+Baum apple treeapple tree
    • στου = σε+του at+the.GEN
      στον = σε+τον at+the.ACC
    • don't = do+not
    • del = de+el of the from/of the
      al = a+el to+the to the
      compárese = compare+se compare SE_PARTICLE be it compared
      suicidarse = suicididar+se suicide SELF to commit suicide
    • sudurluze = sudur+luze nose+long long-nosed
      jarleku = jar(ri)+leku sit+place seat
    • کتابش=کتاب+ش
    • du = de+le from the
    • sa = i+an in the
      b'fhearr = ba+fhearr be.COND better prefer
    • uzbrdo = uz+brdo uphill
    • della = di+la of the
    • Białymstoku=Białym+stoku white+slope Białystok.INST (a city name)
      robiłem=robi+łem do.3.SG.PRES+be.1.SG.PAST.AGLI did
      żeśmy = że+śmy that+be.1.PL.AGL that-we
    • neles = em+eles on them
    • într-o = într-+o in a
    • nanj = na+njega on him
    • arvsmassa = arv+massa genetic stock

While a VMWE always contains at least two words, the relation between VMWEs and tokens can be twofold:

  • A VMWE contains several tokens, whether each of them coincides with a word or not:
    • вземам решение make a decision (2 words, 2 tokens)
      прочитам от корица до корица to read from cover to cover (5 words, 5 tokens)
    • eine Rede halten (2 words, 2 tokens) a speech hold to give a speech
      wie geht's (2 words, 4 tokens) how goes it how are you
    • δίνω τον λόγο μου (3 words, 3 tokens) give the speech to promise
      παίζω στα δάχτυλα (3 words, possibly 4 tokens) play in-the fingers know very well
    • to take a walk (2 words, 2 tokens)
      to open Pandora's box (3 words, possibly 5 tokens)
    • dar un paseo 2 words, 2 tokens to give a walk to take a walk
      dar por sentado 3 words, 3 tokens to give for seated to take for granted
      irse de rositas 3 words, 4 tokens to go_self of little_roses to get off scot free
    • ibilaldia egin (2 words, 2 tokens)
    • دستور داد (2 words, 2 tokens)
    • b'fhearr liom (2 words, 4 tokens) I would prefer
    • dignuti ruke to raise hands to give up (2 words, 2 tokens), otvoriti Pandorinu kutiju open Pandora's box to face with problems (3 words, 3 tokens)
    • sétát tesz to take a walk (2 words, 2 tokens)
    • tenere un discorso (2 words, 2 tokens) hold a speech to give a speech
      cavalcare l'onda (3 words, 4 tokens) ride the wave ride the wave
    • kien idur fuq il-fatt turns on the fact
    • robi z igły widły make.3.SG a pitchfork out of a needle he makes a mountain out of a molehill (4 words, 4 tokens)
      robił|em z igły widły made.3.SG.M1+be.1.SG.AGL a pitchfork out of a needle I made a mountain out of a molehill (4 words, 5 tokens)
    • dar uma caminhada to give a walk (2 words, 2 tokens)
      cair de pára-quedas to fall with parachute to arrive unprepared in the middle of a situation (3 words, possibly 5 tokens) According to new orthography rules, this word would be written 'paraquedas'. Old spelling may still be found in annotated texts, though.
      queixar-se-ia complain-self-would would complain (2 words, possibly 5 tokens)
    • a da ortul popii to die (3 words, 3 tokens)
    • klicati jelene to call cerfs to vomit (2 words, 2 tokens)
      vreči puško v koruzo throw a rifle in the corn to give up (4 words, 4 tokens)
    • hålla ett tal (2 words, 2 tokens) hold a speech to give a speech
  • A VMWE contains one (multiword) token:
    • no example found for Bulgarian
    • vorbereiten to pre-arrange to prepare
      anfangen at-catch to begin
    • έδωσα-πήρα gave-1SG took-1SG I tried hard
    • to pretty-print
    • suicidarse suicide_self to commit suicide
    • n.a.
    • court-circuiter to short circuit
    • pripremiti to pre-arrange to prepare
    • kinyír out.cut to kill
    • corto-circuitare to short circuit suicidarsi suicide_self to commit suicide
    • no example found for Polish
    • queixar-se-ia compain-SELF-would would complain
    • a se-ndura RCLI.ACC-have.the.heart to have the heart
    • pripraviti to pre-arrange to prepare
    • klargöra clear-make clarify påpeka on-point point out

Note finally that multitoken words are not considered verbal MWEs since they contain one (multitoken) word only:

  • no example found for Bulgarian
  • ??
  • αερολογώ air+talk to talk aimlessly
  • n.a.
  • odolustu blood+empty to bleed
  • SMS-ati to write an SMS
  • anteporre to put + in front of
  • SMS-ować to write an SMS
  • pós-datar to post-date
  • a binedispunewell-disposeto cheer up
  • SMS-jati to write an SMS

Whenever the distinction between a word and a token is judged by a particular language team as hard to tackle, a possible option is to consider these two notions equivalent for the needs of this shared task.


Section 1.3

Verbal Multiword expressions

Multiword expressions (MWEs) are (continuous or discontinuous) sequences of words with the following compulsory properties:

  • They show some degree of orthographic, morphological, syntactic or semantic idiosyncrasy with respect to what is considered general grammar rules of a language. Collocations, i.e. word co-occurrences whose idiosyncrasy is of statistical nature only (e.g. the graphic shows, drastically drop) are not annotated.
  • Their component words include a head word and at least one other syntactically related word. Most often the relation they maintain is a syntactic (direct or indirect) dependence but it can also be e.g. a coordination. Depending on the category of the head word, the whole MWE can be nominal, adjectival, prepositional, verbal, sentential, etc.
  • At least two components of such a word sequence have to be lexicalized. In this task we only annotate the lexicalized components and ignore open slots.

Probably the most salient property of MWEs is semantic non-compositionality. In other words, it is often impossible to deduce the meaning of the whole unit from the meanings of its parts and from its syntactic structure. For instance, while it is easy to interpret phrases like to kick the ball or to spill some water from the words that compose them, it is almost impossible to guess, without knowing it beforehand, that to kick the bucket means 'to die' and to spill the beans actually means 'to reveal a secret'.

However, as non-compositionality is a subjective notion, we use inflexibility as a proxy in the tests. Our underlying hypothesis is that (verbal) MWEs have some degree of semantic non-compositionality that implies limited flexibility.

Verbal MWEs (VMWEs) are simply multiword expressions whose syntactic head in the prototypical form is a verb.


Section 1.4

Syntactic variants of VMWEs

VMWEs occurring in a corpus can have various syntactic structures. Since the linguistic tests are structure-driven (cf. e.g. structural tests), there is a necessity to neutralize variation before the tests are applied. In this section we introduce definitions answering these needs.

Prototypical forms

A (candidate) VMWE in its prototypical form (if it exists) is a verbal phrase in active voice whose head verb is in a finite form and whose other lexicalized components depend either on the verb or on another lexicalized component. The VMWE can also contain coordinated verbs. These phrases can be:
  • Partly saturated, where only some of their arguments are lexicalized:
    • пиля нечии нерви scrape someone's nerves to annoy someone
      вземам трудно решение make a difficult decision to make a difficult decision
    • traf eine Entscheidung made a decision
      nahm sich das zu Herzen took this to heart
    • παίρνω μία δύσκολη απόφαση take-1SG a-FE.SG.AC difficult-FE.SG.AC decision-FE.SG.AC to make a difficult decision
      παίρνω τα μέτρα μου take-1.SG the-NE.PL.AC measures-NE.PL.AC my-1.SG.GE.POSS to take precautions
      γράφω κάποιον στα παλιά μου τα παπούτσια write-1.SG someone to-the-NE.PL.AC old-NE.PL.AC my-1.SG.GE.POSS shoes-NE.PL.AC to ignore someone
    • made a decision
      break her heart
      took this to heart
      could take this to heart
      would have been making a decision
      could have made a different decision
    • tomó una decisión took.he/she a decision he/she made a decision
      le hubiera roto el corazón him/her would_have broken.he/she the heart he/she would have broken his/her heart
      se lo tomaría muy a pecho him/her it would_take very to breast he/she would to it deeply to heart
    • erabakia hartu decision take make the decision
      erabaki bat hartu decision one take make a decision
      erabaki garrantzitsuak hartzen ari ziren decision importants taking they-were they were making important decisions
    • a eu du courage has had some courage had courage
    • déan comhairle make direction make a decision
      déan dearmad ar rud do forgetfulness on something forget something
    • podnijeti ostavku to submit resign to resign, uzeti što k srcu to take something to heart to think about something seriously
    • sétát tesz to take a walk
    • prendere una decisionemake a decision take a decision
      spezzare il cuorebreak the heart break the heart
      prendere a cuore take to heart take to heart
    • podjął niejedną trudną decyzję took.3.SG not-one hard decision he took several hard decisions
    • eles deram uma caminhada they gave a walk they took a walk
    • a trece ceva cu vederea to pass sth with sight.the.ACC to overlook
      a trece ceva sub tăcere to keep something under silence.ACC to keep quiet about something
    • zlomiti komu srce to break someone's heart to hurt someone's feelings bad
      vzeti si k srcu take something to heart to think about something seriously
      bi si lahko to vzel k srcu could take this to heart could think about this seriously
      bo v pomoč will be in help will be helpful
    • fattade ett belsut made a decision
  • Partly saturated, where the lexicalized arguments include the subject:
    • излиза ми име appears for-me.DAT name a name sticks for/to me
      чашата на търпението ми прелива glass.DET of patience my.POS overflows my patience runs out
    • ein Vöglein hat mir gezwitschert a little bird has to me twittered a little bird told me
    • μου έφυγε ο τάκος me.GEN left the chosk I was very tired
    • a little bird told someone, the problem lies in something
    • me lo ha dicho un pajarito to_me it has said a little_bird a little bird has told me
    • txoritxo batek kontatu dit little-bird one told me a little bird told me
    • problem leži u čemu the problem lies in something
    • me lo ha detto un uccellino to me it told a little bird a little bird told me
    • mina komuś zrzedła the face someone.DAT thinned one lost one's confidence
    • a sua hora chegou your time has arrived your time has come
    • a mustra cugetul (pe cineva) to chide consciousness-the (PE_Acc somebody) the consciousness chides (PE somebody)
    • srce pade v hlače komu (someone's) heart drops into the pants one is lacking courage to do something , sekira pade v med komu (someone's) hatchet falls in honey one gets lucky
    • en liten fågel viskade i mitt öra a little bird whispered in my ear a little bird told me
  • Partly saturated, where lexicalized head verbs are coordinated:
    • цъфна и вържа to blossom and give fruit (usually sarcastically) to prosper
    • leben und leben lassen to live and let live to live and let live
    • απορώ και εξίσταμαι wonder1.SG and be-amazed1.SG to wonder
    • drink and drive
    • coser y cantar to_sew and to_sing easy as pie, a piece of cake
    • ikusi eta ikasi see and learn
    • žariti i paliti to stoke and to burn to be powerful , vedriti i oblačiti to brighten and to cloud to be poweful
    • vivi e lascia vivere to live and let live to live and let live
    • pluł i łapał he spit and catched (he) was lazy, (he) did nothing useful
    • pintar e bordar paint and knit to abuse
    • a tunat și i-a adunatit.has thundered and CL.ACC-it.has gatheredbirds of a feather flock together
      seamănă, dar nu răsaresow.3SG (homonym of resemble), but not sprout.3SGnot to resemble
    • živi in pusti živeti to live and let live to live and let live
    • det knallar och går it trots and walks it is OK / things are reasonably good
  • Fully saturated:
    • пиле не може да прехвръкне bird cannot fly in it is very strictly guarded
    • der frühe Vogel fängt den Wurmthe early bird catches the wormthe early bird catches the worm
    • το έξυπνο πουλί από τη μύτη πιάνεται the clever bird is-caught-3SG by the nose people who consider themelves clever fail
    • the early bird catches the worm
    • los ojos son el reflejo del alma the eyes are the reflection from_the soul the eyes are the window to the soul
    • aho hertsitik ez da eulirik sartzen mouth shut-from no is fly coming-in a shut mouth catches no flies
    • à quelque chose malheur est bon for something bad luck is good bad experiences may bring unexpected positive effects
    • tá dhá thaobh ar an mbád there are two sides on the boat there are two sides to every story
    • tko rano rani, dvije sreće grabi who early comes, two lucks grabs the early bird catches the worm
    • il mattino ha l'oro in boccathe morning has the gold in mouth the early bird catches the worm
    • nie od razu Kraków zbudowano not at once Cracow was built Rome was not built in a day
      kości zostały rzuconethe dice have been thrownalea iacta est
    • quem vê cara não vê coração who sees face doesn't see heart a person can lie/omit his/her feelings
    • se revarsă zorile Refl.Cl.3sg.Acc. flow_out dawns it is getting morning
    • kdor prej pride, prej melje who comes first, mills first it's on a first-come, first-serve basis
      še pes ima rad pri jedi mir even the dog does not want to be disturbed during its meals do not bother people during their meals

Meaning-preserving variants

Meaning-preserving variants of a (candidate) VMWE include notably:
  • Verbal expressions with analytical tenses and modals:
    • щях да съм взел важно решение I would have made a decition
    • έχουν πάρει την απόφαση they have taken the decision they have made a decision
    • they have made a decision
      we are making a decision
    • han tomado una decisión have taken_they a decision they have made a decision
    • ils ont pris une décision they have made a decision
    • mi smo donijeli odluku we have made the decision
    • hanno preso una decisione they have made a decision
    • będzie podejmował niejedną trudną decyzję will take several hard decisions
    • Estou tomando uma decisão I-am taking a decision I am making a decision
      eles haviam tomado uma decisão they had made a decision
    • noi am luat decizia we have made the decision
    • je zlomil komu srce has broken someone's heart has hurt bad someone's feelings
  • Nominal groups (headed by nominal complements from the prototypical VMWEs) with relative clauses:
    • решението, което взех decision.DET which I.PRODROP made the decision which I made
    • Entscheidungen die wir trafen decisions which we made
      Herzen die wir gebrochen haben hearts which we broke have hearts which we have broken
    • η απόφαση που πήραμε the decision> which made.1pl the decision we made
      η πλάκα που κάναμε the fun which had.1pl the fun we had
    • decisions which we made
      heart which we broke
    • la decisión que tomamos es arriesgada the decision which took_we is bold the decision we made is bold
      no te imaginas la ilusión que le hizo not you imagine_you the excitement which to_him/to_her made_it you cannot imagine how excited he/she was by it
    • hartutako erabakia taken decision the decision (which was) made
    • les décisions que nous avons prises hier sont bonnes the decisions that we have taken yesterday are good the decisions that we have made yesterday are good
    • odluka koju je donio decisions which he took
    • la decisione che prendemmo the decision which we took the decision which we made
      i cuori che abbiamo spezzato the hearts which we have broken hearts which we have broken
    • decyzje, które podjął decisions which he took
    • a apresentação que Maria fez the presentation that Mary made
    • decizia pe care am luat-o the decision that we have made
    • srca, ki jih je zlomil hearts which he has broken (people's) feelings which he hurt bad
  • Non-finite verbal clauses (with infinitives, participles, gerunds, etc.)
    • взетите вече решения the decisions already made
      броящ звезди who counts stars
      неразбитите все още от него сърца the hearts not yet broken by him
      вземайки това решение (while) making this decision
    • aus|machen out|make open
      eine Entscheidung treffen a decision meet to make a decision
      früher getroffene Entscheidungen earlier made decisions
    • έχω πάρει την απόφασή μου I have made my decision
      παίρνοντας αποφάσεις making decisions
      κάνοντας πλάκα having fun
    • we want to make a decision
      to break one's heart is easy
      we avoid making decisions
      heart breakingthe act of breaking hearts
      decisions previously made
      all hearts broken by him
      breaking her heart
      they passed a piece of watered-down legislation
    • tomar una decisión será difícil to_take a decision will be hard it will be hard to_make a decision
      solo te tiene que hacer ilusión only to_you has that to_make excitement you only need to be excited about it
      las decisiones tomadas ayer son decisivas the decisions taken yesterday are decisive the decisions made yesterday are final
      el trato hecho previamente será respetado the agreement made previously will_be observed the previously made agreement will be observed
    • erabakia hartu decision take decide, make a decision
      erabaki asko hartu decision many take make many decisions
      emandako pausoak given steps the steps (which were) taken
      pauso-ematea step-give the step taking, the action of taking steps
    • il faut avoir du courage one must to-have some courage to have courage is necessary
      les décisions prises hier sont bonnes the decisions taken yesterday are good the decisions that were made yesterday are good
      les personnes subissant plusieurs opérations sont fragiles the people undergoing several surgical operations are fragile
    • donijeti odluku to bring a decision to make a decision
      unaprijed donesena odluka the decision made in advance
    • prendere una decisione take a decision to make a decision
      decisioni prese precedentemente earlier made decisions
      prendendo questa decisione by making this decision
    • podjąć niejedną trudną decyzję to make several hard decisions
      podejmowanie decyzji teh making of a decisiondecision making
      podejmujący trudne decyzje making hard decisions
    • tomar essa decisão será difícil to-make this decision will-be difficult to make this decision will be difficult
      eu evito tomar decisões precipitadas I avoid to-take decisions precipitated I avoid taking precipitated decisions
      a decisão tomada ontem the decision made yesterday
      a mulher tomando um banho the woman taking a shower
    • lua o decizie to make a decision
      decizia recent luată the decision recently made
      luând decizia making the decision
    • treba ji je bilo zlomiti srce it was necessary to broke her heart her feelings had to be hurt bad
      lomljenje src breaking (people's) hearts hurting (people's) feelings
      nedavno zlomljeno srce recently broken heart
  • Diathesis alternation (passive, impersonal, middle, etc.) Some VMWEs (especially LVCs) do allow diathesis alternation.:
    • беше взето важно решение (passive alternation) a decision has been made
    • ελήφθησαν σημαντικές αποφάσεις (passive alternation) were-taken important decisions important decisions were made
      έγιναν σημαντικές αλλαγές (middle alternation) were-made important changes important changes were made
    • important decisions were made (passive alternation)
    • decisiones importantes fueron tomadas durante la reunión (passive alternation) decisions important were taken during the meeting important decisions were made during the meeting
      las decisiones importantes se toman con calma (middle alternation) the decisions important SE_PARTICLE take with calm important decisions are made quietly
    • erabaki garrantzitsuak hartu ziren decision importants taken were important decisions were made →middle, impersonal use
    • les décisions importantes se prennent en groupe the decisions important SELF take in group important decisions are made collectively (middle alternation)
    • odluka je donesena the decision was taken (passive alternation)
    • le decisioni importanti sono state prese the decisions important are been taken important decisions are made (passive alternation)
      le decisioni importanti si prendono con calma (middle alternation) the decisions important SE_PARTICLE take with calm important decisions are made quietly
    • decyzja została podjęta the decision was taken (passive alternation)
      decyzje nie podejmują się same decisions do not take SELF alone decisions are not taken on their own (middle alternation)
    • decisões importantes foram tomadas ontem (passive alternation) decisions important were taken yesterday important decisions were made yesterday
      tomam-se decisões importantes aqui (middle alternation) take-SELF decisions important here important decisions are made here
    • deciziile au fost luate de consiliu decisions-the have been taken by council the decisions have been taken by the council (passive alternation)
    • njegov piskrček je bil pristavljen k nečemu his little pot was added to someting to join something in order to profit from it (passive alternation)
  • Expressions with interposed modifiers (e.g. complex determiners and quantifiers, such as half a dozen, an impressive number of, …):
    • те нанесоха многобройни щетиapply tests to те нанесоха щети
    • η Μαρία πήρε μερικές πολύ σημαντικές αποφάσεις Maria made some very important decisionsapply tests to η Μαρία πήρε αποφάσεις
    • they had taken a significant number of stepsapply tests to they take steps
    • habían tomado un gran número de decisiones had_they taken a great number of decisions they had made a vast number of decisionsapply tests to habían tomado decisiones
    • n.a.
    • donijeli su mnoštvo odluka they have made a lot of decisions
    • hanno preso un significativo numero di provvedimentiapply tests to preso provvedimenti
    • dostać połowę spadku to receive half of the heritageapply tests to dostać spadek receive heritage
      nie mieć cienia wątpliwości not to have a shadow of a doubt to have no doubtapply tests to mieć wątpliwość have a doubt
    • ele fez o restante do trabalho to make the remainder of the workapply tests to ele fez o trabalho he did the work
    • ei au luat o serie de importante decizii they have taken a series of important decisions they have made a series of important decisions

    Canonical form

    For a given (candidate) VMWE occurrence, if its prototypical forms exist and keep the same meaning, these forms are called canonical.

    • a canonical form for решението, което взеха is взема решение
      a canonical form for нанасяйки тежки щети is нанасям щети
    • a canonical form for das Herz welches er bricht the heart which he breaks is er bricht ihr das Herz he breaks her heart
      a canonical form for Wortbruch word-break a promise which has not been hold is Wort brechen to break the word not to hold a promise
    • a canonical form for αποφάσεις που δεν λαμβάνονται is λαμβάνουν απόφαση
      a canonical form for κάνοντάς του μεγάλη εντύπωση is (του) κάνει μεγάλη εντύπωση
    • a canonical form for the heart which he broke is he broke her heart
      a canonical form for making an impression on him is she makes an impression on him
    • a canonical form for una decisión nunca antes tomada a decision never before taken a decision never made before is ellos tomaron una decisión they took a decision they made a decision
    • a canonical form for une décision toujours prise a decision always taken is prend une décisionmakes a decision
    • a canonical form for odluka koju je donio the decision which he made is donijeti odluku to make a decision
    • a canonical form for una decisione sempre presa a decision always taken is prende una decisionemakes a decision
    • a canonical form for decyzje, które podjął decisions which he made is podjął decyzjęhe made a decision
    • a canonical form for decisão nunca antes tomada decision never before takendecision never made before is eu tomei uma decisãoI made a decision
    • a canonical form for decizia pe care el a luat-o is el aluat decizia
    • a canonical form for srca, ki ji je zlomil hearts which he has broken (people's) feelings which he hurt bad is zlomiti komu srce to break someone's heart to hurt someone's feelings bad
      a canonical form for njegov piskrček je bil pristavljen k nečemu his little pot was added to someting to join something in order to profit from it is pristaviti svoj piskrček k nečemu to add (one's) pot (to something) to join something in order to profit from it

    For some VMWEs, the only possible forms are not prototypical. For instance, some VMWEs appear in passive voice but never in active voice. If no prototypical form exists or does not preserve the meaning, the given occurrence is considered canonical itself.

    • броят ми се ребратаbe counted my ribsto be very skinny but not #броя си ребрата I can count my ribs
    • μου ανέβηκε το αίμα στο κεφάλι me was raised the blood to-the head I was very angry but not #μου ανέβασε το αίμα στο κεφάλι he raised me the blood to-the head
    • está todo el pescado vendido is_it all the fish sold it's all over bar the shouting but not #he vendido todo el pescado have_I sold all the fish I have sold all the fish
    • les carrotes sont cuites the carrots are cooked it's too late but not #j'ai cuit les carrotes I have cooked the carrots
    • kocka je bačena the dice have been thrown alea iacta est
    • il dado è tratto the dice has been cast the dice is cast
    • kości zostały rzuconethe dice have been thrownalea iacta est
    • vivendo e aprendendo living and learning live and learn but not #vivi e aprendi I-lived and I-learned I lived and I learned
    • zarurile au fost aruncate dice-the have been cast the die is cast but not cineva a aruncat zarurile someone has cast the dice someone has cast the die

    The linguistic tests for identification and categorization of VMWEs are always to be applied to a canonical form of the candidate VMWE. Note that, for brevity, many of the VMWE examples in these guidelines are given in their infinitive variants. Still, it is most often a canonical form that is implicitly meant.

    Non-verbal variants (not annotated)

    Expressions of the syntactic categories mentioned above are considered VMWEs only if they function as verb phrases (in prototypical forms) or nominal phrases (under meaning-preserving variants). Other kinds of variants are not considered VMWEs. This concerns nominalizations morphologically derived from verbs and describing a process, result, state, agent, etc.

    • вземане на решение making a decision
      удар в гърба a stab in the back
      високо вдигната летва highly raised bar high bar
      играч на карти card player
    • Wortbruch word-break a promise which has not been hold
    • η λήψη αποφάσεων the-FE.SG.NOM taking-FE.SG.NOM decisions-GE.PL.GEN deciding
    • a take-off
    • toma de decisiones taking of decisions decision making
      puesta a punto setting to point set-up
    • kontu-hartzea issue taking control (n)
    • la prise en compte the fact of taking into account'
      une mise à disposition the fact of making available
    • donošenje odluke the making of decision
    • la messa a disposizionethe made to availability the fact of making available
    • zabawa czyimś kosztem a play at someone else's expenses derived from bawić się czyimś kosztem to enjoy oneself at someone else's expenses
    • a tomada de decisão the making of decisions
      o tomador de decisão the decision-maker
    • luarea unei decizii take-noun.suffix a-genit decisionmaking a decision
    • zlomljeno srce broken heart

    We also do not annotate MWEs containing verbs but functioning as adverbials, adjectives or nominals that are not meaning-preserving variants:

    • може би (it) may be maybe
      разбира се (it) is understood of course
    • Vergiss-mein-nichtforget-me-notforget-me-not
    • τα πάρε-δώσε the.NE.PL.ACC give-2SG.IMP take-2SG.IMP relationship of some type
    • forget-me-not
      a run-down apartment
    • n.a.
    • izen-ematea name giving registration
    • peut-être may-be maybe
      porte-feuille carry-sheets wallet
      couru d'avance run in advance forgone conclusion
    • Bog daj God give hello, hi
    • non-ti-scordar-di-menot you forget-me-forget-me-not
    • zrobić coś za Bóg-zapłać do something for a God-pay to do something for free
    • um saca-rolhas a pull-corks a corkscrew
      um faz-de-conta a make-as-story a make-believe
    • treacă-meargă pass-golet it be

    Particular language teams may decide to extend the annotation scope to these variants. It is recommended in this case to introduce a new category for them (e.g. NVPC: nominal verb-particle constructions) so as to keep the (quasi-)universal categories intact.


    Section 1.5

    Lexicalized components and open slots

    Just like a regular verb, the head verb of a VMWE may have a varying number of compulsory arguments, that is, arguments that must be present in each occurrence of this VMWE. For instance, the direct object and the prepositional complement are compulsory in the VMWE to take someone by surprise.

    Some components of such compulsory arguments may be lexicalized, that is, always realized by the same lexemes. Here, by surprise is lexicalized while someone is not.This definition of a lexicalized component naturally extends to any syntactic type of MWE. Namely, the head of a (nominal, adjectival, prepositional etc.) MWE is lexicalized (always realized by the same lexeme) together with at least one component of at least one of its modifiers. The head verb of a VMWE is always considered lexicalized. When it can be replaced by another verb, like in to make/take a decision, we consider that these are two different VMWEs, although possibly synonymous.

    Conversely, a component of a compulsory argument which can be realized by a free lexeme taken from a relatively large semantic class is called an open slot. In the following VMWE examples (cited after Gross 1994), all having the same syntactic structure NP V NP Prep NP, the lexicalized arguments are highlighted in bold:

    • Max took the bull by the horns.
    • The news took John by surprise.
    • Bob took part in the inquiry
    • Money burns a hole in Bob’s pocket.

    Note on terminology: our definition of lexicalization applies to the component words of a VMWE, and not to the whole VMWE. This might be counter-intuitive, given the traditional definition of lexicalization as a diachronic process by which a lexeme (word or phrase) acquires the status of an autonomous lexical unit, that is, "a form which it could not have if it had arisen by the application of productive rules" (Bauer 1983, p. 50, apud Lipka et al. 2004, p. 6). In other words, traditionally linguistic studies would use the term "lexicalized" to refer to the whole VMWE, as it has idiosyncratic behavior and thus must be listed in the language's lexicon. Our definition, however, stems from computational linguistics and in particular from the parsing literature, in which lexicalized rules refer to rules containing terminal lexemes attached to non-terminal symbols, and a lexicalized grammar is a grammar in which the rules are lexicalized (Manning and Schütze 1999, p. 417; Jurafsky and Martin 2009, p. 507). In this sense, we regard VMWEs as syntactic subtrees in which some of the nodes are annotated with the corresponding terminal symbols that are always realized by the same lexeme (i.e. the lexicalized components) and others are non-terminal nodes that can be realized by any lexeme taken from a larger class (i.e. the open slots).

    Special cases

    Prepositions have a special status with respect to the notion of lexicalization. In the first, second and fourth example above, the prepositions by and in are lexicalized since they introduce lexicalized complements (the horns, surprise and pocket). However, in the third case the preposition in introduces an open slot whose meaning compositionally combines with the meaning of the VMWE took part. We say in this case that the preposition is selected by the VMWE, i.e. it belongs to the valency properties of the verb. Selected prepositions were discarded in edition 1.0 of the guidelines, and are now re-introduced experimentally and optionally via the inherently adpositional verbs (IAV). If the language team decides to take them into account, they are to be considered in the post-annotation step (step 4), i.e. when all other categories have previously been identified and categorized in the given sentence.

    Reflexive clitics in inherently reflexive verbs and possesive pronouns in verbal idioms also have a special lexicalization status (see also the note on more or less frozen determiners). In some languages, the same reflexive clitic or possesive pronoun is used regardless of the person and number, inflecting for case only:

    • смея се laugh se.REFL to laugh
      намирам се find se.REFL to be (somewhere)
    • ??
    • n.a.
    • n.a.
    • n.a.
    • smijem se laugh.1.SG self I laugh
      smiješ se laugh.2.SG self You laugh
      smiju se laugh.3.PL self they laugh
    • n.a.
    • znajduję się find.1.SG.PRES self I find myself
      znajdujesz się find.2.SG.PRES self you find yourself
      znajdują się find.3.PL.PRES self they find themselves
      pójdą na swoje they will go on ones's own they will establish their own household
      pójdziemy na swoje we will go on ones's own we will establish our own household
    • n.a.
    • n.a.
    • smejim se laugh.1.SG self I laugh
      smejiš se laugh.2.SG self You laugh
      smejijo se laugh.3.PL self they laugh

    In other languages, reflexive clitics and possesive pronouns agree with the subject and the verb:

    • No examples found for Bulgarian.
    • sie wundert sich she wonders self.3.SG she wonders
      ihr wundert euch you.PL wonder.2.PL self.2.PL you wonder
    • Ο Γιάννης έκανε την πλάκα του The John made the fun his John had fun
      Τα παιδιά έκαναν την πλάκα τους The kids made the fun their The kids had fun
    • I will do my best, They will do their best
    • yo me quejo I self.1.SG complain I complain
      te quejas you self.2.SG complain You complain
    • n.a.
    • je me trouve I self.1.SG find I find myslef
      tu te trouves you self.2.SG find you find yourself
      je vide mon sac I empty my bag I express my secret feelings
      elle vide son sac she empties her bag she expresses her secret feelings
    • io mi meraviglio I self.1.SG wonder I wonder
      tu ti meravigli you self.2.SG woder you wonder
    • eu me queixo I self.1.SG complain I complain
      tu te queixas you self.2.SG complain You complain
    • eu mă gândesc I Refl.Cl.1sg.Acc. think I am thinking
      tu te gândeștiyou Refl.Cl.2sg.Acc. thinkyou are thinking

    It this case, the clitic or the pronoun is realized by different lexemes, depending on the number and gender. Strictly speaking, it is not lexicalized. However, we admit that, regardless of the language, the reflexive clitic and the possesive prounun is a unique lexeme (with lemma się, se, sich, etc. or swój, son, one's) inflecting for person and number. It is thus lexicalized in inherently reflexive verbs and verbal idioms.


    Section 1.6

    Verbal multiword expressions versus collocations

    Collocations are not considered VMWEs in this task and should not be annotated. However, the boundary between both categories is not always easy to define and should be handled with care.

    We understand collocations as combinations of words whose idiosyncrasy is purely statistical. In other words, tokens in collocations tend to co-occur with each other more often than expected by chance, but they show no substantial orthographic, morphological, syntactic and (most notably) semantic idiosyncrasy. In this way we oppose MWEs to collocations.

    Note that other authors understand collocations slightly differently. E.g. for Sag et al. (2002), collocations are any statistically significant cooccurrences, i.e. they include all forms of MWE. For Baldwin and Kim (2010), collocations form a proper subset of MWEs. According to (Melcuk, 2010), collocations are binary sematically compositional combinations of words subject to lexical selection constraints, i.e. they intersect with what is here understood as MWEs.

    Some combinations happen to be very frequent and are perceived as "frozen":

    • качвам цената raise the price
    • eine Frage beantworten to answer a question, die Graphik zeigt the grahpic shows, einen Bus nehmen to take a bus
    • κάνω βόλτα take-1SG a walk take a walk
    • drastically drop
      the graphic shows
      to take a bus
    • responder a una pregunta to answer a question
      el gráfico muestra the graphic shows
      coger el autobús to take the bus
    • interesa agertu interest show to show interest
      galdera bati erantzun question one-to answer answer a question
      autobusa hartu bus take to take the bus
    • riješiti dvojbu to solve a dilemma, pripremati jelo to prepare a meal
    • rispondere a una domanda to answer a question
      il grafico mostra the graphic shows
      prendere un bus to take a bus
    • zalać rynek to flood the market to dominate the market
    • bater um recorde to break a record (bater to beat has a regular sense of to overcome in addition to the litteral sense)
      entrar em cartaz enter into poster arrive in theaters (for a movie) (the MWE is em cartaz in poster in theaters, the verb just usually collocates with this MWE)
    • lua un autobuztake a bus
    • drastičen upad drastical drop, graf prikazuje graphic shows, vzeti taksi to take a taxi

    However, applying regular lexical alternations to them does not markedly impact their meaning.

    • вдигам цената raise the price, увеличавам цената raise the price, качвам залога raise the bet, качвам температурата raise the temperature
    • eine Anfrage beantworten to answer a request, das Diagramm zeigt the diagram shows, mit einem Bus fahren to go by bus
    • πάω βόλτα go walk go for a walk
    • significantly drop, drastically decrease, the diagram shows, the graphic illustrates, to take a coach
    • responder a una petición to answer a request
      el diagrama muestra the diagram shows
      coger el tren to take the train
    • interesa erakutsi interest show to show interest →'erakutsi' and 'agertu' are synonyms in this context in Basque
      zalantza bati erantzun doubt one-to answer answer a doubt
      trena hartu train take to take the train
    • riješiti dilemu to solve a dilemma, pripremati obrok to prepare a meal
    • rispondere a una richiesta to answer a request
      il diagramma mostra the diagram shows
    • zdominować/zarzucić/zapełnić/nasycić rynek to dominate/overwhelm/fill/saturate the market
    • quebrar/bater/ultrapassar/estabelecer um recorde to break/beat/overcome/establish a record
      o recorde foi quebrado the record was broken
      entrar/estar/permanecer/ficar/continuar/ter em cartaz enter/be/remain/stay/continue/have in poster
    • lua o mașină
    • občuten upad significant drop, drastično zmanjšanje drastical decrease, diagram prikazuje diagram shows, slika prikazuje picture shows

    The difficulty of distinguishing collocations from VMWEs lies in the fact that lexical variability is relevant to some VMWEs:

    • нямам пукната пара/пукнат грош to not have a single penny, to be very poor
      имам твърда/дебела глава to have a thick head, to be stubborn and not listen to advice
    • einen Willen/Menschen brechen to break a will/person
    • to come in handy/useful, to stand firm/fast, to break someone's spirit/will, to take the cake/biscuit
    • dar un paseo/una vuelta give a walk / a turn to go for a walk
      darse/tomar una ducha give.self/take a shower take a shower
    • min eman/egin pain give/do to hurt (somebody)
      eskola/klasea eman class give to give a class →'eskola' and 'klasea' are synonyms in Basque
    • slomiti čiju/čiji volju/duh to break someone's will/spirit
    • cogliere/prendere di sorpresa, dare/fornire un contributo
    • zapisać się złotymi literami/zgłoskami to record iteself with golden letters/syllables to be remebered and commemorated for a merit
      zamarznąć na kość/lód/sopel to freeze to bone/ice/icicle to freeze strongly
    • levar em conta/consideração take into account/consideration
      chutar o balde/pau da barraca to kick the bucket/the tent's stick to act irresponsibly
    • lua o decizie/hotărâremake a decision
    • imeti nekaj na voljo/razpolago to have something available/at disposal, odpreti nekomu pot/vrata to open a way/a door (for someone) to give someone an opportunity to do something

    However, the extent of the vocabulary concerned by this variability is different for collocations and VMWEs. Namely, a head verb in a collocation usually selects a whole semantic class for each of its required arguments. For instance, the verb to take to use a vehicle to travel selects a whole semantic class of means of transport. Similarly, the verb to drop can select a large set of adverbs describing the degree: drastically/significantly/remarkably/slightly/reasonably drop. Conversely, lexical variability in a VMWE is limited to a closed list of lexemes, sometimes only loosely semantically related. For instance, the VMWEs to take a cake/biscuit and to stand firm/fast do not keep their idiomatic readings with semantically close complements: #to take a cookie/wafer, *to stand hard/rigid/solid etc. See also Test VID.2.

    Some Light-verb constructions (LVCs) and multiverb constructions (MVCs) belong to the gray zone between MWEs and collocations in the sense that some operator (light) verbs seem to select large classes of nouns, as in to make a speech/declaration/remark/etc. However, some studies (e.g. Bonial 2014) show that there is no such thing as truly productive light verbs (e.g. to give a look vs. to give a stare). Therefore, we do include LVCs and MVCs in our annotation scope.


    Section 1.7

    Verbal multiword expressions versus metaphor

    Another phenomenon closely related to VMWEs is metaphor. According to (Shutova 2010), "a metaphor occurs when one concept is viewed in terms of the properties of the other. In other words it is based on similarity (presence of common characteristics) between two concepts".

    Many VMWEs, especially idioms, are based on metaphors. For instance, to take the bull by the horns means to address a problem (the bull) starting with its most challenging aspect (the horns). To set the world on fire is to do something extraordinary and get the admiration (set on fire) of other people (the world), to put all one's eggs in one basket means to rely on one particular course of action (a basket) for success rather than giving oneself several possibilities.

    However, verbal metaphors are not always VMWEs. Consider the newspaper title "simple steps to lift your dark cloud of stress", and the extract of a poem by Wordsworth, cited by Shutova: "and then my heart with pleasure fills, and dances with the daffodils". The metaphorical expressions to lift dark cloud of stress to relax and my heart ... dances with the daffodils I am happy are not semantically compositional. These expressions, however, were probably constructed for the needs of one article/poem only and are not sufficiently established in the common vocabulary to be considered VMWEs.

    The distinction between MWEs and metaphors is a relatively unstudied and open question. There are few precise tests, other than statistical, which would allow human annotators to resolve it reliably. Gross (1982) gives some clues on the reproducibility and predictability of metaphors. It remains to be seen how heavily this problem will impact the annotation of texts selected for our shared task. We suggest that the annotators take notes of such cases and discuss them within their communities, both local and international.


    Section 2

    Textual annotation scope

    In this annotation task, all occurrences of all syntactic types of VMWEs are to be annotated in the text.

    We annotate, as integral parts of VMWEs, all lexicalized elements that can form a separate word. For instance, lexicalized particles are annotated but case suffixes are only annotated if the noun they modify is also lexicalized. Thus, in to put something up, the verb and the particle are integral parts of the VMWE (see VPC tests), while in (HU) döntést hoz valamiről decision-ACC bring something-DEL make a decision, only döntést hoz is annotated, even if the delative case suffix is also lexically determined.

    Similarly, auxiliairies and modals accompanying the main verb of a VMWE are only annotated if they are themselves lexicalized but not when they simply mark syntactic variants of the VMWE. For instance will is lexicalized, and to be annotated as such, in even a worm will turneven a meek person will resist if pushed too far but not in they will spill the beans.

    Both continuous and discontinuous sequences of lexicalized components of VMWEs are annotated.

    Reflexive pronouns, particles and prepositions need to be handled with special care, given their particular lexicalization status. Verb+pronoun and verb+particle combinations are annotated essentially if they are inherently reflexive verbs or verb-particle constructions. In this version of the guidelines, verb+preposition combinations like to rely on somebody and to come across something or to put up with somebody are re-introduced optionally and experimentally as via the inherently adpositional verbs (IAVs).

    The annotation considers only flat, tokenized sentences whose tokens will be tagged by annotators as part of a VMWE or not. We do not annotate their internal syntactic structure. We do annotate, however, VMWEs embedded in other VMWEs. For instance, the VMWE to let the cat out of the bag contains the embedded VMWE let out and both are to be annotated as different VMWEs. Embeddings are discussed on each category's page, in the "Problematic cases and remarks" sections (e.g. IRVs overlapping with VIDs).

    Once identified in a text, VMWEs are also to be assigned to exactly one of the categories described in the following sections. We do not admit assigning two different categories to a single VMWE in order to express hesitation. A comment and a particular value of the annotator's confidence should be used instead.


    Section 3

    Categories of verbal MWEs

    In edition 1.1 of this task we distinguish the following categories of verbal MWEs:

    • Two universal categories, i. e. valid for all languages participating in the task:
      • Light verb constructions (LVCs) with two subcategories:
        • LVCs in which the verb is semantically totally bleached (LVC.full)
          • държа под контрол to keep under control
          • eine Rede halten a speech holdto give a speech
          • κάνω μία βόλτα make-1SG a walk to walk
            δίνω μια εξήγηση
          • to give a lecture
          • hacer una promesa to_make a promise to make a promise
          • min hartu pain take to hurt oneself
            lo egin sleep do to sleep
          • avoir du courage to have courage
          • bain triail as extract trial from try
          • držati govor hold a speech to give a speech
          • fare un discorsoto_make a speechto give a speech
            fare una promessa to_make a promise to make a promise
          • ħa deċizjoni took a decision
          • podjąć decyzję to take a decision
          • fazer uma promessa to make a promise
          • a lua o decizie to take a decisionto make a decision
          • imeti predavanje to have a lecture to give a lecture, biti mnenja to be of opinion to have an opinion
          • hålla ett tal hold a speechto give a speech
        • LVCs in which the verb adds a causative meaning to the noun (LVC.cause)
          • давам възможност give an opportunity
          • δίνω προτεραιότητα
          • to grant rights
            to give a headache
            to provoke the destruction of the building
          • dar dolor de cabeza to_give pain of head to give a headache
            hacer ilusión to_make excitement to make excited/to look forward to
          • cuir lúcháir ar put joy on give delight to
          • zadati glavobolju komu to give a headache to someone, izazvati nezadovoljstvo to cause dissatisfaction
          • dare il mal di testa to_give pain of head to give a headache
            dare noia to_give trouble to annoy
          • nakłada obowiązek na użytkowników put a duty on the users
            dać prawo to give the rightto grant the right
            narazić na straty expose to losses
            stawiać komuś celto put an aim to someone to set a goal to someone
          • da cuiva bătăi de cap give sb. a hard time
          • dati ime nekomu to give (somebody) a name to name (somebody), narediti konec nečemu to make an end (to something) to end (something)
      • verbal idioms (VIDs):
        • правя се на дръж ми шапката to behave myself as 'hold my hat' pretend to be naive and innocent
          цъфна и вържа to blossom and give fruit (usually sarcastically) to prosper
          река и отсека to say and cut to say firmly, decisively
        • schwarz fahren to drive black take a ride without a ticket, in Kraft treten into force step to come into effect, in die Waagschale werfen in the weighing pan throw to bring to bear
          einen drauf setzen going one better
        • χάνω τα αυγά και τα καλάθια loose-1SG the eggs and the baskets to be at a complete and utter loss
          απορώ και εξίσταμαι wonder1SG.PST and be-amazed1SG.PST to wonder
        • to go bananas
          fortune favors the bold
          to drink and drive
          to voice act
          to pretty-print
          to short-circuit
          to tumble dry
        • hacer de tripas corazón make of intestines heart to pluck up the courage
          dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more
          dar gato por liebre to_give cat for hare to rip off, to take for a ride
        • adarra jo horn play to pull (somebody's) leg, to be kidding
          burua hautsi head break to rack one's brains, to think very hard
          ikusi eta ikasi see and learn
          hortxe dago koska just-there is the-crux that's the crux of the matter
        • défendre son bifteck defend one's beefsteak to defend one's interests
          court-circuiter to short-circuit
        • ag cur is ag cúiteamh arguing and debating arguing back and forth
        • mlatiti praznu slamu to beat empty straw to talk aimlessly, mazati komu oči to blur eyes to someone to cheat someone
        • gettare le perle ai porci to_throw the pearls to the pigs to waste something good on someone who doesn't care about it
          andare e venire to_come and goback and forth
          corto-circuitare
          to short-circuit
        • għasfur żgħir qalli a bird small told me to hear something from the grapevine
          iqum u joqgħod jump and stay to fidget
        • rzucać grochem o ścianę throw peas agains a wall to try to convince somebody in vain
          pluć i łapać to spit and catch to be lazy, to do nothing useful
        • fazer das tripas coração transform the tripes into heart to try everything possible
          pintar e bordar paint and knit to abuse
        • a trage pe sfoară to pull on rope to fool
          a tunat și i-a adunatit.has thundered and CL.ACC-it.has gatheredbirds of a feather flock together
        • ubiti dve muhi na en mah to kill two flies with one strike to achieve two aims at once, spati kot ubit to sleep like dead to sleep soundly
    • Three quasi-universal categories, valid for some language groups or languages but non-existent or very exceptional in others:
      • inherently reflexive verbs (IRV):
        • усмихвам се to smile
        • sich bemühen to endeavour, sich enthalten himself contain to abstain
        • n.a.
        • to find oneself in a difficult situation
          to to help oneself to the cookies
        • suicidarse to suicide
          quejarse to complain
        • n.a.
        • se suicider to suicide
          se soucier to worry
        • n.a.
        • smijati se to laugh
        • suicidarsi to suicide
          lamentarsi to moan
        • bać się to fear SELFto be afraid
        • se queixar to complain
        • a se gândi to think
        • bati se to be afraid, smejati se to laugh, drzniti si to dare to do something
      • verb-particle constructions (VPC) with two subcategories:
        • fully non-compositional VPCs (VPC.full), in which the particle totally changes the meaning of the verb
          • not applicable to Bulgarian
          • er gibt auf he gives up, er wirft ihr das vor he throws her that against he reproches that to her
          • μπαίνω μέσα get in to go bankrupt
          • to do in
          • n.a.
          • n.a.
          • cas chuig turn towards happen to have
          • postaviti za to set for to appoint
          • buttare giù to_throw down to swallow
          • not applicable to Polish
          • jogar fora This seems to be the only VPC in Portuguese. We annotate it as ID and do not use the VPC category.
          • n.a.
          • n.a.
        • semi non-compositional VPCs (VPC.semi), in which the particle adds a partly predictable but non-spatial meaning to the verb
          • not applicable to Bulgarian
          • n.a.
          • to eat up
          • n.a.
          • tabhair suas give up
          • andare avanti to_go forward to move on
          • n.a.
          • n.a.
      • multi-verb constructions (MVC):
        • will sagen want to say that is to say
        • έχω να κάνω με have to do with to cope
        • to let go
          to make do
        • querer decir to_want to_say to mean
        • ?
        • laisser tomber let fall to give up
          vouloir dire want say to mean
        • ?
        • može biti can be it is possible
        • lasciar andare to_let go to unhand
          voler dire to_want say to mean
        • dać komuś żyćto let someone livenot to bother someone
          można wytrzymaćone can standthe situatiion is reasonably good
        • querer dizer want say to mean
          ouvir falar hear speak to know/remember vaguely
        • n.a.
        • n.a.
    • language-specific categories, defined for a particular language in a separate documentation.

    We also introduce an optional experimental category which (if admitted by the given language) is to be considered in the post-annotation step:

    • inherently adpositional verbs (IAVs)
      • излизам пред някого/нещо come in front of someone/something to surpass, to outdo
        излизам със становище come out with a statement
      • n.a.
      • to come across
        to rely on
      • confiar en to_trust in to trust in entender de to_understand of to know about
      • n.a.
      • caith anuas ar throw down on belittle
      • suočiti s to face with
      • confidare su to_trust in to trust in intendersi di to_understand of to know about
      • godzić się na każde warunki to agree on any condition
        mieć do czynienia z czymś to have to do with sth
        odwieść kogoś od czegoś to dissuade someone from doing sth
      • conta pe count on
      • dati skozi give through to go through, gre za it goes about it is about

    In practice, to identify and categorize verbal MWEs during manual annotation, one must use the rigorous generic decision tree and the structural and category-specific cross-lingual tests provided.

    For a summary of changes with respect to edition 1.0 of the guidelines, see the what's new file.


    Section 4

    Annotation process and decision tree

    We propose the following methodology for VMWE annotation:

    • Step 1 - identify a candidate, that is, a combination of a verb with at least one other word which could form a VMWE. If the candidate has the structure of a meaning-preserving variant, the following steps apply to its canonical form. This step is largely based on the annotators' linguistic knowledge and intuition after reading this guide.
    • Step 2 - determine which components of the candidate (or of its canonical form) are lexicalized, that is, if they are omitted, the VMWE does not occur any more. Corpus and web searches may be required to confirm intuitions about acceptable variants.
    • Step 3 - depending on the syntactic structure of the candidate's canonical form, formally check if it is a VMWE using the generic and category-specific decision trees and tests below. Notice that your intuitions used in Step 1 to identify a given candidate are not sufficient to annotate it: you must confirm them by applying the tests in the guidelines.
    • Step 4 (experimental and optional) - if your language team chose to experimentally annotate the IAV category follow the dedicated inherently adpositional verb (IAV) tests. These tests should always be applied once the 3 previous steps are complete, i.e. the IAV overlays the universal annotation.

    The decision tree below indicates the order in which tests should be applied in step 3. The decision trees are a useful summary to consult during annotation, but contain very short descriptions of the tests. Each test is detailed and explained with examples in the following sections.

    Generic decision tree

    If you are annotating Italian or Hindi, go to the Italian-specific decision tree or Hindi-specific decision tree. For all other languages follow the tree below.

    • Apply test S.1 (prev. 6) - [1HEAD: Unique verb as functional syntactic head of the whole?]
      • Apply the VID-specific testsVID tests positive?
        • Annotate as a VMWE of category VID
        • It is not a VMWE, exit
      • Apply test S.2 (prev. 7) - [1DEP: Verb v has exactly one lexicalized dependent d?]
        • Apply the VID-specific testsVID tests positive?
          • Annotate as a VMWE of category VID
          • It is not a VMWE, exit
        • Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
          • Apply the VID-specific testsVID tests positive?
            • Annotate as a VMWE of category VID
            • It is not a VMWE, exit
          • Apply test S.4 (prev. 8) - [CATEG: What is the morphosyntactic category of d?]
            • Reflexive clitic ⇒ Apply IRV-specific testsIRV tests positive?
              • Annotate as a VMWE of category IRV
              • It is not a VMWE, exit
            • Particle ⇒ Apply VPC-specific testsVPC tests positive?
              • Annotate as a VMWE of category VPC.full or VPC.semi
              • It is not a VMWE, exit
            • Verb with no lexicalized dependent ⇒ Apply MVC-specific testsMVC tests positive?
              • Annotate as a VMWE of category MVC
              • Apply the VID-specific testsVID tests positive?
                • Annotate as a VMWE of category VID
                • It is not a VMWE, exit
            • Extended NP ⇒ Apply LVC-specific decision treeLVC tests positive?
              • Annotate as a VMWE of category LVC
              • Apply the VID-specific testsVID tests positive?
                • Annotate as a VMWE of category VID
                • It is not a VMWE, exit
            • Another category ⇒ Apply the VID-specific testsVID tests positive?
              • Annotate as a VMWE of category VID
              • It is not a VMWE, exit

    Section 5

    Specific tests for categorizing verbal MWEs

    Once a candidate VMWE has been pre-identified in steps 1 and 2 of the annotation process, the confirmation of its status as a VMWE, as well as its categorization, is done according to the decision tree referring to the following cross-lingual tests:

    Additionally, language-specific categories (LS) can be defined and tests for them can be used to annotate them in a given language or language group only.


    Section 5.1

    Structural tests (S)

    Structural tests are quite simple preliminary tests that help determining the syntactic structure of the VMWE. This is required in order to point at the right category-specific identification tests. In practice, annotators will rarely need them since they will already have an intuition about the VMWE candidate category when they identify it.

    Test S.1 (prev. 6) - [HEAD] - Syntactic head

    Does the candidate contain a unique verb functioning as the functional syntactic head of the whole?

    • Apply the VID-specific tests
      • цъфна и вържа → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
      • leben und leben lassen live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
      • έδωσε πήρε gave3SG.PA took3SG.PA he struggled> τα κατάφερε → none of the verbs is clearly the head
      • to pretty-print → there is an unusual case of an adjective modifying a verb
        to drink and drive → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
      • coser y cantarto_sew and to_singeasy as pie, a piece of cake
      • ikusi eta ikasi see and learn → none of the verbs is clearly the head
      • ag cur is ag cúiteamh arguing and debating arguing back and forth → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
      • žariti i paliti to stoke and to burn to be powerful , vedriti i oblačiti to brighten and to cloud to be poweful → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
      • vivi e lascia vivere live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
      • pluć i łapać to spit and catchto be lazy, to do nothing useful → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
      • pintar e bordar paint and knit to abuse
      • živi in pusti živeti to live and let live to live and let live → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
      • det knallar och går it trots and walks it is OK/as usual → none of the verbs is clearly the head, as there is no universally accepted syntactic representation of coordination
    • continue to the next test
      • гушна букета to hug the bunch of flowers to die гушна is the head and the NP depends on it
        правя на салата to make into salad to scold правя is the head and the PP depends on it
      • eine Fratze ziehen a grimace pull to make a face ziehen is the head and the NP depends on it
        er gibt auf he gives up gibt is the head and auf is the particle depending on it
      • κάνωγκριμάτσα to make grimace to make a face κάνω is the head and the NP depends on it
      • to make a face make is the head and the NP depends on it
        to give up give is the head and up is a particle depending on it
      • dar la cara to_put the face face the consequences dar is the head and the NP depends on it
        hacer muecas to_make grimmaces to make a face hacer is the head and the NP depends on it
      • lan egin work do to work → the verb egin is the head and the NP depends on it
      • éirigh as rise out of quit → the verb éirigh is the head and the particle as depends on it
      • složiti facu make a face to show reaction složiti is the head and the NP depends on it
      • fare le linguacce to_make the grimaces fare is the head and the NP depends on it
        far fuori to_make out to kill fare is the head and fuori is a particle depending on it
      • zbijać bąki to smash fartsto fool around, to do nothing usefulzbijać is the head and the NP bąki depends on it
        dać komuś popalićto let someone smoketo make someone's life hard dać is the head and the infinitive popalić depends on it
      • bater as botas bater is the head and the NP depends on it
        criar vergonha na cara criar is the head and the two NPs depend on it
      • a face baie to make bath to bath face is the head and the NP depends on it
        a ieși înainte to go forth to greet ieși is the head and înainte is a particle depending on it
      • imeti krompir to have potatoes to be lucky imeti is the head and the NP depends on it
      • att ge upp to give up ge is the head and upp is the particle depending on it

    The aim of this test is to categorize (as VID or no VMWE) those candidates which have no single clearly identified head verb. This is necessary because all other tests refer to the single head verb v and its dependents. Note that for VMWE candidates having the structure of a meaning-preserving variant, the test should be applied to their canonical form instead. This is required because there may be no verb or the verb may not be the syntactic head in such variants.

    • вземам решение passes the test → variants like решението, което беше взето pass the test as well
    • eine Entscheidung treffen make a decision passes the test → variants like die Entscheidung wurde getroffen the decision was made, die Entscheidung, welche getroffen wurde the decision which was made, das Treffen der Entscheidung the making of the decision pass the test as well
    • παίρνω μία απόφαση make a decision passes the test → variants like η απόφαση που πήραμε, πάρθηκε απόφαση, παίρνοντας απόφαση pass the test as well
    • to make a decision passes the test → variants like the decision which was made, decision-making, the making of the decision pass the test as well
    • tomar una decisión passes the test → variants like la decisión fue tomada, tomando esa decisión, la decisión que tomaron pass the test as well
    • erabakia hartu decision take to make a decision passes the test → variants like hartutako erabakia the decision (which was) made, erabaki hura hartzea (the fact of) making that decision, erabakiak hartutakoan when the decisions were made pass the test as well
    • déan comhairle make counsel make a decision passes the test → variants like comhairle a dhéanamh counsel to make to make a decision ag déanamh comhairle at making counsel making a decision
    • donijeti odluku make a decision passes the test → variants like odluka donesena tada decision made then pass the test as well
    • prendere una decisione to_take a decision make a decision passes the test → variants like la decisione è stata presa the decision was made, la decisione, che è stata presa the decision which was made, prendendo la decisione taking the decision pass the test as well
    • zbijać bąki to smash fartsto fool around, to do nothing useful passes the test → variants like zbijanie bąków farts smashingfooling around, doing nothing useful, zbijający bąki smashing farts pass the test as well
    • tomar uma decisão make a decision passes the test → variants like a decisão que foi tomada the decision which was made, decisão tomada decision made pass the test as well
    • a lua o decizie make a decision passes the test → variants like decizia care a fost luată the decision which was made, luarea deciziei decision-making pass the test as well
    • zlomiti komu srce to break someone's heart to hurt someone's feelings bad passes the test → variants like srca, ki jih je zlomil hearts which he has broken (people's) feelings which he hurt bad, lomljenje src breaking (people's) hearts hurting (people's) feelings and nedavno zlomljeno srce recently broken heart pass the test as well

    Test S.2 (prev. 7) - [1DEP] - Single dependent

    Does the VMWE contain exactly one lexicalized (functional) syntactic dependent d of the head verb v?

    • Apply the VID-specific tests
      • на стар краставичар краставици продавам to an old cucumber seller cucumbers to sell to try to cheat a more experienced persontwo dependents, на стар краставичар (PP) and краставици (NP)
        прочитам от корица до корица to read from cover to covertwo dependents, от корица (PP) and до корица (PP)
        правя (нечий) живот черен make someone'l life black to ruin someone's lifetwo dependents, (нечий) живот (NP) and черен (small clause)
      • die Katze aus dem Sack lassen to let the cat out of the bag → two dependents die Katze and aus dem Sack
      • κάνω την καρδιά μου πέτρα make my heart rock try to remain calmtwo dependents, την καρδιά and πέτρα
      • to make ends meettwo dependents, ends and meet
        to let the cat out of the bagtwo dependents, the cat and out of the bag
      • dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting moretwo dependents, con la miel and en los labios
        dar gato por liebre to_give cat for hare to rip off, to take for a ridetwo dependents, gato and por liebre
      • odolkiak ordainetan eman black-puddings in-exchange give to do something as a response to something somebody has done to oneself (similar to 'what goes around comes around')
      • ići glavom kroz zid to go with head through the wall to be stubborn → two dependents glavom and kroz zid
      • mettere il carro davanti ai buoi to_put the cart in front of the oxen put the cart in front of the horse → two dependents carro and davanti ai buoi
      • chować głowę w piasek to hide head in sandto pretend not to see a problem → two dependents, głowę head and w piasek in sand
        bać się własnego cienia to fear SELF one's own shadowto be very timid → two dependents, się SELF and własnego cienia own shadow
      • tapar o sol com a peneira to hide the sun with a sieve to sugar-coat → two dependents
      • a da bir cu fugițiito give tribute with fugitives theto disappeartwo dependents, bir and cu fugiții
        a- i ieși ochii din cap to his come out eyes the from head to starethree dependents, i, which is a non-RCLI, ochii, and din cap
      • skrivati glavo v pesekto hide head in sand to pretend not to see a problem → two dependents, glavahead and v pesekin sand
        vlečeš me za nosyou are pulling my nose you're pulling my leg → two dependents, meme and za nosmy nose
      • att sätta sig upp mot någon to sit oneslef up against someone To defy someonetwo dependents, sig and upp
    • Continue to the next test
      • ритам камбаната kick the bell to diethe single dependent is a noun phrase, камбаната
        ставам на кайма turn into mince to be destroyedthe single dependent is a prepositional phrase, на кайма
        одирам жив skin alive to make someone sufferthe single dependent is an small clause (adjective), жив
      • eine Fratze ziehen a grimace pull to make a face → the single dependent is a noun phrase, Fratze
        , in Betracht ziehen to take into consideration → the single dependent is a prepositional phrase, in Betracht
        er gibt auf he gives up → the single dependent is a particle auf
      • παίρνω σκληρά μέτραtake strict measuresthe single dependent is a noun phrase, σκληρά μέτρασκληρά μέτρα
        παίρνω στο κυνήγι take to-the chasechasethe single dependent is a prepositional phrase, στο κυνήγιto-the chase
        κάνω επίσκεψηmake visit>visitthe single dependent is a noun, επίσκεψηvisit
        μπαίνω μέσαget inbecome bankruptthe single dependent is an adverb functioning as particle, μέσαin
      • to make a facethe single dependent is a noun phrase, face
        to take into accountthe single dependent is a prepositional phrase, into account
        to take turnsthe single dependent is a noun, turns
        to give upthe single dependent is a particle, up
      • hacer muecas to_make grimmaces to make facesthe single dependent is a noun phrase, muecas
        tener en cuenta to_have in account to take into accountthe single dependent is a prepositional phrase, en cuenta
      • min eman pain give to hurt (somebody)the single dependent is a noun phrase, min
        kontuan hartu into-account take to take into accountthe single dependent is a noun phrase with a postpositional suffix, kontuan
      • bain triail get trial trythe single dependent is a noun, éirigh as rise out of quitthe single dependent is a particle
      • imati osjećaj to have a feeling → the single dependent is a noun, osjećaj
      • fare le linguacce to_make the grimaces to make a face → the single dependent is a noun phrase linguacce
        prendere in considerazione to take into consideration → the single dependent is a prepositional phrase, in considerazione
        egli lo fa fuori he kills him → the single dependent is a particle fuori
      • bić na alarm to strike on alarmto raise the alarm → the single dependent is a prepositional phrase, na alarm on alarm
        cholera wie cholera knowsI have no idea→ the single dependent is the nominal subject cholera
      • cometer um crime to commit a crime → one dependent
      • a face fațăto make faceto to deal withthe single dependent is a noun phrase, față
        a ieși înaintethe single dependent is an adverb, înainte
      • gre za it is about → the single dependent is a particle, za
        smejati se to laugh → the single dependent is a reflexive clitic, se
        imeti mačka to have a hangover → the single dependent is a noun, maček
      • att ge upp to give up → the single dependent i s the particle upp

    The test covers only lexicalized dependents. There may be other, non-lexicalized dependents, which the test ignores. We explicitly call the non-verbal elements dependents instead of arguments or complements because argument-adjunct distinction is irrelevant. The outcome of the test is positive if the verb has a single lexicalized dependent, which can be the subject, the direct or indirect object, but also an adverbial complement, adverb, particle, relative clause, etc.

    Test S.3 (previously nonexistent) - [LEX-SUBJ] - Lexicalized subject

    Is the single lexicalized (functional) syntactic dependent d of the head verb v its subject?

    • Apply the VID-specific tests
      • чашата преля the glass overflowed this is the last straw чашата is the subject of преля
      • ein kleines Vöglein hat mir gezwitschert a little bird told me
      • μου είπε ένα πουλάκι ένα πουλάκιa little-bird is the subject of είπεtold
      • a little bird told someone a little bird is the subject of told
      • ha llegado tu hora has arrived your time your time has come tu hora is the subject of ha llegado
        me lo ha dicho un pajarito it to_me has told a little_bird a little bird has told me un pajarito is the subject of ha dicho
      • txoritxo batek esan txoritxo batek is the subject of esan
      • ptičica mi je šapnula a little bird whispered to me ptičica is the subject of šapnula
      • me lo ha detto l'uccellino a little bird told me l'uccellino is the subject of ha detto
      • licho wie devil knowsI have no idea
      • a sua hora chegou your time has arrived your time has come
        um passarinho me contou que ... a little-bird me.DAT told that ... little bird told me that...
      • a șoptit o păsăricăwhispered a bird little a little bird told someone
      • srce pade v hlače komu (someone's) heart drops into the pants one is lacking courage to do something srce heart is the subject of pade falls , sekira pade v med komu (someone's) hatchet falls in honey one gets lucky sekira hatchet is the subject of pade falls
    • Continue to the next test
      • обичам чашката love the glass to be an alcoholic
        вземам назаем take in loan to borrow
        намирам се find SELF to be situated
      • κάνω μια ευχή μία ευχή a wish is the object of κάνω make
      • to make a wish a wish is the object of make
      • pedir un deseo to_ask a wish to make a wish un deseo is the object of pedir
      • hitz eman hitz is the object of eman
      • napraviti prekršaj to make an offense prekršaj is the object of napraviti
      • dare spettacolo to_make a scene spettacolo is the object of dare
      • bać się fear SELFto be afraid
        chodzić prostą drogą to go (on) a straight road.INST to avoid complications
        zacznać od zera to start from zero to start from scratch
      • plouă cu găleata rains with bucket-the It rains heavily cu găleata is the adverbial of plouă
      • imeti glavo na ramenih to have head on shoulders to be sensible glava head is the object of imeti have

    This test captures the fact that VMWEs with lexicalized subjects always belong to the VID category. Note that for the VMWE candidates having the structure of a meaning-preserving variant, the test should be applied to their canonical form instead. This is required because there may be no verb or the verb may not be the syntactic head in such variants.

    Test S.4 (prev. 8) - [CATEG] - Category of the dependent

    What is the morphosyntactic category of the (functional) dependent d that co-occurs with the head verb v?

    • Reflexive clitic - apply IRV tests. If the outcome is negative, discard the VMWE candidate.
      • страхувам се fear myself.REFL to be afraid
        радвам се feel joy myself.REFL to feel joy
      • sich wundern to wonder, sichschämen to be ashamed
      • n.a.
      • English does not have IRV expressions
      • suicidarse to suicide, quejarse to complain
      • n.a.
      • se suicider to suicide, s'évanouir to faint
      • čuditi se to wonder, penjati se to climb
      • suicidarsi to suicide, vergognarsi to be ashamed
      • bać się fear SELFto be afraid
      • suicidar-se to suicide, queixar-se to complain
      • a se sinucide to commit suicide with obligatory ACC reflexive clitic
        a se holba to stare with obligatory ACC reflexive clitic
      • čuditi se to wonder, smejati se to laugh, onesvestiti se to faint
    • Particle (as opposed to an adposition) - apply VPC tests. If the outcome is negative, discard VMWE candidate.
      • Bulgarian does not have VPC expressions
      • anfangento begin, er fängt anhe begins, er hat angefangen he has begun → in German, VPCs may occur separated or within one word, we annotate all occurrences!
        ich schlage vor I propose
      • παίρνω μπρος, βάζω μπρος να, κάνω πίσω
      • to give up, to look forward to
      • n.a.
      • n.a.
      • biti na to be onto to look like
      • far fuori to_make out to kill, lo fa fuorihe kills him , lo ha fatto fuori he killed him
      • Polish does not have VPC expressions
      • jogar fora to-throw outside to discard, throw away
      • Romanian does not have VPC expressions
      • n.a.
    • Verb with no lexicalized dependent - apply MVC tests. If the outcome is negative, apply the VID-specific tests. On negative outcome discard the VMWE candidate.
      • не искам и да чуя don't want to even hear to oppose strongly и да чуя is a VP
      • will sagen want to say that is to say
      • έχω να κάνωhave to doconcern
      • to let go
        to make do
      • querer decir to_want to_say to mean
      • n.a.
      • laisser tomber let fall to give up
        vouloir dire want say to mean
      • pustiti koga živjeti to let someone live not to bother someone, znati raditi to know to work to be capable
      • lasciar andare to_let go to unhand
        voler dire want say to mean
      • dać komuś żyćto let someone livenot to bother someone
        można wytrzymaćone can standthe situatiion is reasonably good
      • querer dizer want say to mean
        ouvir falar hear speak to know/remember vaguely
      • n.a.
      • n.a.
    • Adposition (preposition or postposition, as opposed to a particle) - in step 3 of the annotation process adpositions are not annotated unless they introduce a lexicalized dependent. Adpositions are covered optionally and experimentally in the post-annotation step (step 4), following the inherently adpositional verb (IAV) guidelines.
      • разчитам на to rely on
        излизам със to come out with
      • n.a.
      • to come across
        to rely on
      • confiar en to_trust in to trust in entender de to_understand of to know about
      • n.a.
      • izlaziti s kim to go out with someone
      • confidare su to_trust in to trust in intendersi di to_understand of to know about
      • conta pe count on
    • Extended nominal phrase (possibly including modifiers, prepositions, postpositions or case markers) - apply LVC tests. If the outcome is negative, apply the VID-specific tests. On negative outcome discard the VMWE candidate.
      • ритам камбаната kick the bell to dieкамбаната is a noun phrase composed of a single noun
        давам зелена светлина give green light to allowзелена светлина is a noun phrase composed of an adjective and a noun
        ставам на кайма turn into mince to be destroyedна кайма is a prepositional phrase composed of a preposition governing a noun
      • die Nase rümpfen the nose wrinkle turn up one's nose at sth. die Nase is a noun phrase composed of a determiner and a noun
        in Kraft treten into
      • to make a wish a wish is a noun phrase composed of a determiner and a noun
        to take turns turns is a noun phrase composed of a single plural noun
      • pedir un deseo un deseo is a noun phrase composed of a determiner and a noun
        entrar en vigoren vigor is a prepositional phrase composed of a preposition and a noun
      • kontuan hartu into-account take to take into accountthe NP, kontuan, is composed of a noun (kontu), a determiner (a) and a postposition (-n)
        urratsak egin steps do to take stepsthe NP, urratsak, is composed of a single plural noun (urrats+ak)
      • doći do zaključkato come to conlusion, to concludedo zaključka in doubt is a prepositional phrase composed of a preposition governing a noun
      • prendere in considerazione take into account in considerazione is a prepositional phrase composed of a preposition and a noun
        rompere il silenzio to break the silence il silenzio is a noun phrase composed of an article and singular noun
        mettere radici radici is a noun phrase composed of a single plural noun
      • podjąć decyzjęto take a decisiondecyzję decision is a nominal phrase composed of a single noun
        chodzić prostą drogą to go (on) a straight road.INST to avoid complications prostą drogą(on)a straight road is a noun phrase composed of an adjective and a noun in (instrumental)
        bujać w obłokach to swing in the cloudsto fantasizew obłokach in the clouds is a prepositinal phrase composed of a preposition and a noun
      • tomar banho to take a shower banho is a noun phrase composed of a single noun
      • a rupe tăcerea to break silence the to start talking tăcerea is a noun phrase composed composed of a single noun
        a face baie to do bathto take a shower baie is a noun phrase composed of a single noun
      • biti v dvomih to be in doubts to doubtv dvomih in doubts is a prepositional phrase composed of a preposition governing a noun, klicati jelene to call cerfs to vomit jeleni cerfs is a noun phrase composed of a single plural noun
    • (Hindi-specific) Adjective which is morphologically identical to an eventive noun: Apply the LVC tests. If the outcome is negative, apply the VID-specific tests. On negative outcome discard the VMWE candidate.
      • Warning! Examples not found in database for id=5.1_R_test-s4-hindi-adjective
    • Adjective: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
      • излизам сух от водата to come out dry from the water to avoid taking responsibility
        одирам жив skin alive to make somone suffer
        гоня дивото chase the wild.ADJ to take risks дивото is a substantive
      • rot sehen to see red
      • τα βάφω μαύρα them-NE.PL.ACC paint-1.SG black-NE.PL.ACC be very sad
      • to stand firm, to see red
      • me las vi negras me the saw black I saw myself in trouble
        ponerse negro put.self black to get/become irritated
        poner verde put green to criticise (someone)
      • zuriak eta beltzak aditu white and black hear to hear all sorts of things
      • voir rouge to see red to be very angry
      • ostati svoj to stay one's own to be consistent
      • vedere nero to see black
      • zrobić swojeto do one's ownto do what one is supposed to do
      • pensar grande to think big
      • a vedea roșu to see red
        a o face lată to CL.ACC make wideto party
      • narediti svojeto do one's ownto do what one is supposed to do
    • Adverb: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
      • изваждам наяве take out in the open to uncover
        хващам натясно catch in a tight place to coerce, to pressure
      • φέρωβαρέως
      • to get well
      • caer bien fall well to be liked by
      • alferrik galdu uselessly get-lost to ruin, to spoil
      • dobroproći to go well to be successful
      • fare passi avanti to_make steps forward to make progress
      • chcieć dobrze to want wellto have good intentions
        robić komuś dobrze to do someone.DAT wellto please someone
        źle/marnie skończyć badly finishto come to a bad end
      • cair
          bem
        fall well to be appropriate
      • a se face bine to himself make well to get well
        a face bine to make well to help
      • obrniti se na bolje to turn for better to be better, iti predaleč to go to far to demand to much or to do something inappropriate
    • Pronoun: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
      • мързи ме (it feels) lazy me.ACC to be lazy
      • τακαταφέρνω to make it
        τηνπατάω to fail
      • to make it
      • jugársela play.self.it to risk it
      • elkar hartu each-other take to get on with somebody, to agree
      • suarekin jolasean ibili with-fire playing be to play with fire
      • le faire it make to be enough/successful
      • farcela to make it to manage
      • No example found in Polish
      • dá-lhe João! give to him/her, João! show them what you got, João!
      • a o coti CL.ACC.F.3SG turn to turnwith the non-anaphoric feminine clitic 'o' functioning as an expletive
      • imeti ga pod kapo to have him under one's hat to be drunk, mahniti jo to hit her to start going (somewhere)
    • Verb with lexicalized dependents including fully lexicalized clauses: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.
      • не мога да кажа две думи на кръст cannot say two words on a cross to not be able to speak or express oneself
        правя сам да си говори make someone talk to himself to drive someone crazy
      • to make ends meet, to know on which side the bread is buttered

      • hacer de tripas corazón make of intestines heart to pluck up the courage
        dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more
        dar gato por liebre to_give cat for hare to rip off, to take for a ride
      • n.a.
      • okretati se kako vjetar puše to turn how the wind blows to be inconsistent
      • sbarcare il lunario to_land the living to make ends meet
        non avere peli sulla lingua do not have hair on the tongue to be outspoken
      • wiedzieć, co w trawie piszczy to know what in the grass squeaks to know what is going on, to be well informed
      • vedeti, koliko je ura to know what the time it is to realize the truth
    • Other: Apply the VID-specific tests. On negative outcome discard the VMWE candidate.

    The aim of this test is to determine which category-specific identification tests should be applied. Note that for the VMWE candidates having the structure of a meaning-preserving variant, the test should be applied to their canonical form instead. This is required because there may be no verb or the verb may not be the syntactic head in such variants.


    Section 5.2

    Light verb constructions (LVC)

    Light verb constructions (LVC) constitute a universal category. We retain the following key characteristics:

    1. They are formed by a verb v and a (single or compound) noun n, which either directly depends on v (and possibly contains a case marker or a postposition), or is introduced by a preposition.
      In case of Hindi, the noun can be replaced by an adjective which is morphologically identical to an eventive noun. If you annotate Hindi, everywhere is this page when the noun is referred to, you should read the noun or the adjective.
      • вземам решение to make a decision
        държа под контрол to keep under control
      • zum Einsatz kommen to the use come to be called into action
        eine Rede halten a speech hold to give a speech
      • παίρνω μία απόφαση make a decision
        δίνω στα νεύρα give to the nerves
        δίνω τη χαριστική βολή
      • to give a lecture → verb + direct-object noun
        to come into bloom → verb + prepositional-object noun
        to make a high five → verb + compound noun
      • hacer una promesa make a promise to make a promise
        poner en peligro put in danger endanger, jeopardise→ verb + prepositional-object noun
        tener dolor de cabeza have pain of head to have a headache → verb + compound noun
      • lan egin work do to work, aurrera egin front-to do to go ahead
      • faire une présentation make a presentation → verb + direct-object noun
        procéder à une analyse proceed to an analysis to make an analysis → verb + prepositional-object noun
        faire un faux pas make a faux-pas → verb + compound noun
      • stupiti na snagu step into force come into force
        držati predavanje to hold a speech to give a speech
      • chiamare in causa to_call in cause to single out
        fare una passeggiata to_make a walk to have a walk
      • odnieść sukces carry-away success to be successful
        mieć wyrzuty sumienia to have reproaches of conscience to blame oneself
        wykonać rzut karny to perform a penalty kick
      • fazer um aborto to make an abortion → verb + direct-object noun
        estar com fome be with hunger to be hungry → verb + prepositional-object noun
        fazer uma mesa redonda make a table round to have a round table (discussion) → verb + compound noun
      • a duce dorul to carry yearning.the to miss somebody
        a da divorț to give divorce to divorce
        a da în clocot to give in boil to come to the boil
        a da în fiert to give in boil to come to the boil
      • biti v dvomih to be in doubts → verb + prepositional-object noun, to doubt
        imeti predavanje to give a lecture → verb + direct-object noun
    2. The (single or compound) noun n is predicative and refers to an event (e.g. decision, visit) or a state (e.g. fear, courage). Predicative nouns are nouns that have semantic arguments, that is, they express predicates whose meaning is only fully specified by their semantic arguments:
      • вземам решение to make a decision → noun refers to an act or event
        давам съгласие to give permission → noun refers to an act or event
        имам притеснения to have concerns → noun refers to a feeling or state
        имам готовност to be ready → noun refers to a feeling or state
      • eine Entscheidung treffen to make a decision → noun refers to an event
        Angst habento have fear→ noun refers to a state
      • παίρνω μία απόφαση, κάνω βόλτα → noun refers to an event
        έχω αγωνία, κάνω κουράγιο → noun refers to a state
      • to make a decision → noun refers to an event, there are 2 arguments: a decider and a choice
        to pay a visit → noun refers to an event, there are 2 arguments: a visitor and a visited place/person
        to have fear→ noun refers to a state, there are 2 arguments: somebody who is afraid and something frightening
        to have courage → noun refers to a state, there is 1 argument: the courageous person
      • dar un consejo give an advise to give advice→ noun refers to an event, there are 3 arguments: an adviser, and advised person, and a theme
        tener valor to have courage→ noun refers to a state, there is 1 argument: the courageous person
      • negar egin cry do to cry → noun refers to an act or event
        lo egin sleep do to sleep → noun refers to a state
      • donner un conseil give advice→ noun refers to an event, there are 3 arguments: an adviser, and advised person, and a theme
        avoir du courage to have courage→ noun refers to a state, there is 1 argument: the courageous person
      • donijeti odluku to bring a decision to make a decision → noun refers to an event
        imati osjećajto have feeling→ noun refers to a state
      • fare una domanda → noun refers to an event
        avere paura, avere coraggio → noun refers to a state
      • prowadzić rozmowy to lead conversations to lead negotiations→ the noun refers to an event
        mieć rację to have rightto be right→ the noun refers to a state
      • fazer uma prece to make a prayer → noun refers to an event, there are 2 arguments: the prayer and the thing she/he prays for
        ter sintomas to have symptoms → noun refers to a state, there are two arguments: the person having symptoms and the disease causing these symptoms
      • a lua o decizie to make a decision, a face o vizită to pay a visit→ noun refers to an event
        a avea curaj → noun refers to a state
      • biti v dvomih to be in doubts to have doubts → noun refers to a state
        imeti predavanje to give a lecture → noun refers to an event
    3. We retain two sub-categories of verbs, which define two sub-categories of LVCs:
      • The verb v is "light" in that it contributes to the meaning of the whole only by bearing morphological features: person, number, tense, mood, as well as morphological aspect. This implies that v's syntactic subject is n's semantic argument. In this case, we annotate the construction as LVC.full.
        • давам изявление give a statement to make a statement
          нанасям щети spread damages to cause damages
        • παίρνω μία απόφαση, κάνω βόλτα , έχω αγωνία, έχω πονοκέφαλο → noun refers to a state
        • to make a presentation
          to pay a visit
          to have rights
          to have a headache
          to carry out a destruction
        • dar un paseo give a walk to go for a walk
          tener valor to have courage
          tener dolor de cabeza have pain of head to have a headache
        • faire une présentation to make a presentation
          faire une visite to make a visit
          avoir le droit to have the right
          avoir un mal de tête to have a headache
        • napraviti pogrešku to make a mistake
        • fare una presentazione to make a presentation
          fare una visita to make a visit
          avere il diritto to have the right
          avere un mal di testa to have a headache
        • odnieść sukces carry-away success to be successful
          mieć rację to have rightto be right
          cierpieć na anemię to suffer from anemia
        • realizar uma apresentação to make a presentation
          fazer uma visita to make a visit
          ter um direito to have a right
          ter dor de cabeça have pain of head to have a headache
        • a face o prezentareto make a presentation
          a face o vizită to pay a visit
        • imeti predavanje to have a lecture to give a lecture, biti mnenja to be of opinion to have an opinion, biti v pomoč to be in help to be helpful, delati razlike to make differences to differentiate
      • The verb v is "causative" in that it indicates that the subject of v is the cause or source of the event or state expressed by n. In other words, the noun has semantic arguments expressed as non-subject elements in the sentence, and the subject of the verb brings an additional information, indicating the cause of source of the event/state. In this case, we annotate the construction as LVC.cause. These constructions are expected to be less idiomatic than other VMWEs and can be understood as complex predicates with a causal support verb.
        • давам възможност to give an opportunity
          нося късмет to bring luck
        • κάνω αιδικίαmake injustice
          δίνω ικανοποίησηgive satisfaction
          προκαλώ καταστροφήcause distruction
        • to grant rights
          to give a headache
          to provoke a reaction
        • dar derecho to grant the right
          dar vértigo give vértigo to make dizzy
          causar un accidente to provoke an accident
        • donner le droit to grant the right
          donner le vertige give the vertigo to make dizzy
          provoquer un accident to provoke an accident
        • dati mogućnost to give an opportunity
        • dare il diritto to grant the right
          dare le vertigini to_give the vertigo to make dizzy
          causare un incidente to provoke an accident
        • to sprawia nam kłopot this causes us trouble
          nakłada obowiązek na użytkowników put a duty on the users
          dać prawo to give the rightto grant the right
          narazić na straty expose to losses
          stawiać komuś celto put an aim to someone to set a goal to someone
        • dar o direito to grant the right
          dar tontura give vertigo to make dizzy
          provocar um acidente to provoke an accident
        • a da dureri de cap to give pains of head to give a headache
        • dati ime nekomu to give (somebody) a name to name (somebody), narediti konec nečemu to make an end (to something) to end (something)

    The following decision tree should be applied to decide whether a candidate should be annotated as a LVC.full, LVC.cause or none.

    LVC-specific decision tree:

    • Apply test LVC.0 - [N-ABS: Is the noun abstract?]
      • It is not an LVC, exit
      • Apply test LVC.1 - [N-PRED: Is the noun predicative?]
        • It is not an LVC, exit
        • Apply test LVC.2 - [V-SUBJ-N-ARG: Is the subject of the verb a semantic argument of the noun?]
          • Apply test LVC.3 - [V-LIGHT: The verb only adds meaning expressed as morphological features?]
            • It is not an LVC, exit
            • Apply test LVC.4 - [V-REDUC: Can a verbless NP-reduction refer to the same event/state?]
              • It is not an LVC, exit
              • It is an LVC.full
          • Apply test LVC.5 - [V-SUBJ-N-CAUSE: Is the subject of the verb the cause of the noun?]
            • It is not an LVC, exit
            • It is an LVC.cause

    Note: test 10 [N-SEM] from the previous version of the guidelines (1.0) was considered unnecessary and has been abandoned in the current version of the guidelines.

    Note: LVC tests are often hard to apply. If you hesitate at some intermediary test, continue to the next one, since the last tests of LVC.full and LVC.cause will help you reach your final decision.

    Test LVC.0 - [N-ABS] Noun is abstract

    Is the noun n abstract?

    • continue to next test
      • проблем problem, възможност opportunity, изявление statement, план plan
      • προτεραιότητα, θυμός,αγάπη,δυσκολία, λόγος,παρουσίαση,γέννηση
      • priority, anger, love, opinion, difficulty, speech, presentation, birth
      • paseo walk, derecho right, ilusión excitement, fe faith, duelo grief
      • pas step, édition edition, discours speech, explication explanation, lute fight
      • problem problem, mogućnost opportunity, ideja idea
      • priorità priority, rabbia anger, amore love, opinione opinion, difficultà difficulty, discorso discourse, presentazione presentation,
      • kłopot problem, wysokość height, praca work, prawo right, zysk profit
      • prioridade priority, festa party, fé faith, nascimento birth, distinção distinction, problema problem, gol goal (soccer)
      • răspuns answer, prezentare presentation
      • dvom doubt, mnenje opinion, ime name, vloga role, odločitev decision
    • it is not an LVC
      • правя торта to make a cake → a cake is a physical entity (not abstract)
        давам пари to give money → money is a physical entity (not abstract)
        подавам ръка to give out handto help in a difficult situation → hand is a physical entity (not abstract)
      • καρέκλα chair , τραπέζι table , χέρι hand , άνθρωπος human
      • chair, keyboard, hand, person
      • mesa table, silla chair, mano hand, foto picture,
      • aulki, teklatu, esku, pertsona
      • chaise chair, clavier keyboard, main hand, personne person
      • stol table, ruka hand, kruna crown
      • sedia chair, tastiera keyboard, mano hand, persona person
      • złożyć kartkę to fold a sheet→ a sheet is a physical entity (not abstract)
        złożyć broń to lay down arms→ arms is a physical entity (not abstract)
        bić pianę to beat foamto exaggerate about a problem→ foam is a physical entity (not abstract)
        wystawić fakturę to issue a bill→ a bill is a physical entity (not abstract)
        mieć brata to have a brother→ a brother is a physical entity (not abstract)
      • cadeira chair, teclado keyboard, mão hand, pessoa person, pedra rock
      • scaun chair, pian piano
      • oseba person, mačka cat, kapa hat, avtomobil car, roka hand

    Some concrete nouns may be predicative (test LVC.1). For instance, a relational noun such as daughter is semantically incomplete without its argument: daughter of X, so daughter is predicative. However, concrete predicative nouns should not pass test LVC.0.

    Some nouns may have both concrete and abstract interpretations. For instance, money is concrete when it refers to banknotes (paper money, bills): I didn't have money so I paid by credit card. However, money is abstract when referring to a conventional value used in transactions between people: He spent a lot of money in the mall. If one cannot be sure that the noun is used in its concrete interpretation, test LVC.0 passes.

    Test LVC.1 (prev. 9) - [N-PRED] Noun is predicative

    Does the noun n have at least one semantic argument, implying that it is a predicative noun?

    • continue to next test
      • поставям акцент to emphasize → event, with two arguments: the agent and the object being emphasized
        имам право → property, with one semantic argument: the possessor of the property
      • einen Besuch abstatten to pay a visit → event, with two arguments: the visitor and the visitee
        Angst haben to have fear → property with one semantic argument: the entity having fear
        einen Blick auf etwas werfen a glance at sth. throw to take a glance at sth → an event with two arguments the entity glancing and the entity glanced at
      • κάνω μία επίσκεψη to-make a visit pay a visit, visit → event, with two arguments: the visitor and the visitee
        έχω τη δυνατότητα to-have the ability to be able → property, with one core semantic argument: the entity having the ability
        έχω μίσος → state, with two arguments: the entity being in state hate and the entity hated
        βγάζω λόγο → event, with one obligatory argument: the entity making the speech
      • pay a visit → event, with two arguments: the visitor and the visitee
        have strength → property, with one semantic argument: the entity having strength
        take a glance at something → event, with two arguments: the entity glancing and the entity glanced at
        make a contribution → event, with two arguments: the contributor and the beneficiary (notice that contribution could refer to both the event and the thing being contributed, but we always prefer the former reading when possible)
      • hacer una visita make a visit to pay a visit → event, with two arguments: the visitor and the visitee
        tener valor to have courage → property, with one semantic argument: the entity having courage
        echar un vistazo a algo give a glance to something to take a quick look at something → event, with two arguments: the entity glancing and the entity glanced at
      • bisita egin visit do to pay a visit event with two arguments: the visitor and the visitee
        itxaropena ukan hope have to hope, to have hope event with one single argument: the person who hopes
      • avoir du courage to have courage→ state(property), with one argument: the entity having courage
      • imati osjećaj to have a feeling → property with one semantic argument: the entity having feeling
        otići u posjet to go to a visit to someone to pay a visit → event, with two arguments: the visitor and the visitee
      • fare una visita → event, with two arguments: the visitor and the visitee
        avere forza → property, with one semantic argument: the entity having strength
        dare uno sguardo a qualcosa → event, with two arguments: the entity glancing and the entity glanced at
      • złożyć wizytę to submit a visitto pay a visit→ event, with two arguments: the visitor and the visitee
        złożyć skargę to submit a complaintto make a complaint → event, with two arguments: the complaining person and the one he/she complains about
        mieć prawo to have the right→ state, with two arguments: the person having the right and the thing (s)he has the right to
        budzić zastrzeżenia to wake-up reservationsto provoke reservations→ state, with two arguments: the person having reservations and the object of the reservations
      • ter fome to have hunger to be hungry → property, with one argument: the entity that is hungry
        ter idade para fazer algo to have age (to do something) to be old enough (to do something) → state, with one argument: the entity that is old enough
        In PT, we consider that the following classes of predicative nouns pass the test: diseases (gripe, trombose, infarto), physical sensations (fome, sede, sono), emotions (medo, paixão, nojo), cognitive entities internal to the cognizer (ideia, opinião, preocupação), characteristics (coragem, teimosia, fraqueza), relations (contato, conflito, amizade) and nouns expressing communication or speech acts (conversa, discussão, briga, conselho).
      • a face o vizită to make a visit to pay a visit → event, with one argument: the entity that visits
        a avea curaj to have courage → property, with one semantic argument: the entity having courage
      • imeti predavanje to give a lecture → event, with two arguments: a lecturer and the people who are attending the lecture
    • it is not an LVC
      • Иван хвърли боклука Ivan threw out the garbage → physical entity (not event/state)
      • Joe macht einen Kuchen→physical entity (not event/state), even though Joe could be considered a semantic argument
      • ο Γιάννης βγάζει τα ρούχα του → physical entity (not event/state)
      • Joe makes a cake → the noun is a physical entity that does not pass test LVC.0, even though Joe could be considered its semantic argument
        Joe experienced a tornado → the noun is an event, but has no semantic arguments
        Joe has a lot of money → the noun is abstract and Joe could be considered its semantic argument, but we consider that money (as well as other goods such as car and bananas) can exist independently of a possessor, so the possessor (owner) should not be considered as semantic argument of money
      • Ana tiene una bicicleta Anna has a bycicle → noun is not abstract, so it does not pass test LVC.0
        Ana hace una foto Ana takes a picture → noun is not abstract, so it does not pass test LVC.0
      • pastela egin cake make to make a cake> → physical entity (not event/state)
      • Anna a un vélo Anna has a bycicle → noun is not abstract, so it does not pass test LVC.0
        Anna affronte la tempête Anna faces the storm → noun is abstract but has no arguments
      • Ivan ima olovku Ivan has a pencil → noun is not abstract, so it does not pass test LVC.0
      • Joe fa un dolce → physical entity (not event/state), even though Joe could be considered its semantic argument
        Joe ha vissuto un tornado → event, but has no semantic argument
      • przetrwać burzę to survive a storm burza storm has no semantic erguments although it is abstract
      • quebrar a cabeça to break one's head to rack one's brain → physical entity, does not pass test LVC.0
        In PT, we consider that the following classes of abstract nouns do not pass this test: informational content that do not require agents (informações, notícias), natural phenomena (chuva, neve, tornado).
      • Joe a făcut o prăjiturăJoe made a cake → physical entity (not event/state), even though Joe could be considered its semantic argument
      • Janez ima avto → the person that has a car could be considered as a semantic argument, but the car is not an event or a state

    We only retain nouns n that have at least one semantic argument, which we define as a semantically mandatory and specific participant of the event or state expressed by the predicative noun.

    Sometimes, it might be useful to consider verbs and adjectives derivationally related to the noun to reason about its semantic arguments.

    Test LVC.2 (prev. 13) - [N-SUBJ-N-ARG] Verb's subject is noun's semantic argument

    Is the subject of the verb a semantic argument of the noun? In other words, is the verb linking the predicative noun to one of its semantic arguments that occurs as the subject of the verb?

    • continue to next test
      • Иван изнесе доклад Ivan presented a report → Иван is the subject of the verb and a semantic argument (agent) of the activity
        Президентът получи покана за посещение в Германия The president received an invitation to visit Germany → Президентът president is the subject of the verb and a semantic argument (the receiver) of the invitation
        Президентът получи награда Тhe president received an awardПрезидентътpresident is the subject of the verb and a semantic argument (the receiver) of наградаaward
      • ο Γιάννης έκανε μία παρουσίαση στο αφεντικό του John made a presentation to his boss→ ο Γιάννης John is the subject of the verb and a semantic argument (the presenter) of the noun
      • John made a presentation to his boss → John is the subject of the verb and a semantic argument (the presenter) of the noun
      • María dio un paseo María went for a walk → María is the subject of the verb and a semantic argument (the walker) of the noun
      • Max fait une promenade Max takes a walk → Max is the subject of the verb and a semantic argument (the walker) of the noun
      • Helena je otišla u posjet prijateljici Helena payed a visit to a friend → Helena is the subject of the verb and a semantic argument (the visitor) of the visit
        Susjed jedobio dozvolu za gradnju Neighbour received a permission for construction → Neighbour is the subject of the verb and a semantic argument (the receiver) of the permission
      • Jan złożył wizytę Marii Jan payed a visit to Maria → Jan is the subject of the verb and a semantic argument (the visitor) of the visit
        Piotr dostał pozwolenie and budowę Piotr received a permission for construction → Piotr is the subject of the verb and a semantic argument (the receiver) of the permission
        Beata ma marzenia o spokoju Beata has dreams about peace → Beata is the subject of the verb and a semantic argument (the possessor) of the dreams
        wyborcy ponoszą za to winę the electorate bears the responsibility for this→ wyborcy electorate is the subject of the verb and a semantic argument (the agent) of the guilt
        ustawa budzić zastrzeżenia the law wakes-up reservationsthe law raises reservationsustawalaw is the subject of the verb and a semantic argument (the theme) of zatrzeżeniareservations
      • Felipe tomou dois banhos Felipe took two showers → Felipe is the subject of the verb and a semantic argument (the person taking a shower) of the noun
      • Ion i-a făcut o prezentare șefului său Ion made a presentation to his boss→ Ion is the subject of the verb and a semantic argument (the presenter) of the noun
      • In Janezovo predavanje o slovenski kulturi za študente prevajalstva, the 3 syntactic arguments are expressed as a modifier with a possessive marker (Janezovo Janez's) and prepositional phrases (o slovenski kulturi on Slovene culture and za študente prevajalstva for students of translating )
    • Go to test LVC.5
      • Приятелят на Мария прекъсна нейния доклад Maria's friend interrupted her report→ Maria's friend, that is, the subject of the verb, is not a semantic argument of the report, since a report does not necessarily have an interrupter
      • το αφεντικό του Γιάννη διέκοψε την παρουσίασή του John's boss interrupted his presentation → το αφεντικό του Γιάννη John's boss, that is, the subject of the verb διέκοψε, is not a semantic argument of the noun predicate παρουσίαση presentation, since a presentation does not necessarily have an interrupter
      • John's boss interrupted his presentation → John's boss, that is, the subject of the verb, is not a semantic argument of the presentation, since a presentation does not necessarily have an interrupter
        The report provides information about the economy → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.
      • El periodista interrumpió el discurso The journalist interrupted the speech → The journalist, that is, the subject of the verb, is not a semantic argument of the speech, since a speech does not necessarily have an interrupter
        El informe facilita información clave the report provides crucial information → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.
      • Le journaliste a interrompu le discours The journalist has interrupted the speech → The journalist, that is, the subject of the verb, is not a semantic argument of the speech, since a speech does not necessarily have an interrupter
        Le rapport fournit des informations cruciales the report provides crucial information → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.
      • Učenici su prekinuli le predavanjeStudents have interrupted the lecture → Students, that is, the subject of the verb, is not a semantic argument of the speech, since a speech does not necessarily have an interrupter
      • Marek dał mi prawo wyboru Marek gave me the right to choose→ Marek is the subject of the verb and but not a semantic argument of the right (a right usually does not need to be grated)
        Incydent ten podważył zaufanie wyborców do kandydata This fact undermined the electorate's confidence in the candidate→ Incydent event is the subject of the verb and but not a semantic argument of the confidence
        komisja przeprowadziła wybory the committee carried out the vote→ komisja committee is the subject of the verb but not a semantic argument of wybory vote, which only requires the voters and the matter of the vote
      • O jornalista interrompeu a inauguração The journalist has interrupted the inauguration → The journalist, that is, the subject of the verb, is not a semantic argument of an inauguration, since an inauguration does not necessarily have an interrupter
        O relatório traz informações polêmicas the report provides polemic information → information only has one argument, which is its topic. The provider/source of an information is not one of its semantic arguments.
      • To define a predavanje lecture one needs to mention three participants: the presenter, the audience and the topic of the presentation. In other words, the existence of a lecture implies the existence of its arguments.

    It is not always easy to determine if the verb's subject is an argument of the noun. You can use the former syntactic version of this test to verify your intuitions.

    Test LVC.3 (prev. 11) - [V-LIGHT] Verb with light semantics

    Is v semantically light, that is, is the semantics that v adds to n restricted to: (i) what stems from its morphological features (e.g. future, plural, perfective aspect, etc.), (ii) pointing at the semantic role of n played by v's subject?

    • continue to next test
      • вземам решение make a decision вземам adds no meaning to решение decision besides that of performing an act
        държа реч to make a speech държа adds no meaning to реч besides that of performing an act
        поемам отговорност to take responsibility поемам adds no meaning to отговорност besides that of having a property
      • eine Entscheidung treffen a decision meet to make a decision treffen adds no meaning to Entscheidung besides that of performing an activity
        Angst haben to have fear haben adds no meaning to Angst besides that of having a property.
      • κάνω μία βόλτα take a walkκάνωmake adds no meaning to βόλτα walkbesides that of performing an activity
        παίρνω μία απόφαση παίρνω take adds no meaning to απόφαση decision besides that of performing an activity
        έχω άγχος have anxiety έχω have adds no meaning to άγχος anxiety besides that of having a property
        διενεργώ έλεγχο perform a check διενεργώ perform is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
        διαπράττω ένα έγκλημα διαπράττω commit is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
      • take a walk take adds no meaning to walk besides that of performing an activity
        make a decision make adds no meaning to decision besides that of performing an activity
        have fear have adds no meaning to fear besides that of having a property
        perform a check perform is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
        commit a crime commit is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
        pay a visit → the verb in its usual sense means 'to spend some money on a visit', but here it is not used in this sense and does not add any semantics to the "visiting" event
        deliver a speech → the verb in its usual sense means 'to move from one place to another', but here it is not used in this sense and does not add any semantics to the "speech" event
        undergo a surgery undergo adds no meaning to surgery besides indicating that the subject is the patient of the surgery
      • dar un paseo to take a walk dar adds no meaning to paseo besides that of performing an activity
        tomar una decisión to make a decisiontomar adds no meaning to decisión besides that of performing an activity
        tener miedo to have fear tener adds no meaning to miedo besides that of having a property
      • usain egin smell do to smell, to sniffthe verb egin adds no meaning to the noun usain besides that of performing an activity
        lo egin sleep do to sleepthe verb egin adds no meaning to the noun lo besides that of performing an activity
      • ils ont du courage they have some courage have adds no meaning to courage besides that of having a property
        ils reçoivent l’ordre de partir they receive the order of leavingthey are ordered to leave receive adds no meaning to order besides indicating that the subject is the recepient of the order
        il a subi une intervention chirurgicale he has undergone an intervention surgery he underwent surgery undergo adds no meaning to surgery besides indicating that the subject is the patient of the surgery
      • imati hrabrost to have courage imati have adds no meaning to hrabrost courage besides that of having a property
        donijeti odluku to make a decision donijeti in its usual sense means 'to bring', but here it is not used in this sense and does not add any semantics to event
      • fareuna passeggiata fare adds no meaning to passeggiata besides that of performing an activity
        prendere una decisione prendere adds no meaning to decisione besides that of performing an activity
        avere paura avere adds no meaning to paura besides that of having a property
        eseguire un controllo eseguire is a pure syntactic operator: in any context, it only bears tense and mood and never adds any sense to the noun
        commettere un crimine commettere is a pure syntactic operator: in any context,it only bears tense and mood and never adds any sense to the noun
        fare una visita → the verb in its usual sense means 'make', but here it is not used in this sense and does not add any semantics to the "visiting" event
        fare un discorso → the verb in its usual sense means 'to make', but here it is not used in this sense and does not add any semantics to the "speech" event
      • oddać hołd to give-back tributeto pay tribute oddać give-back adds no meaning to hołdtribute besides that of performing an activity
        wystąpić z wnioskiem to stand out with a proposal to put forward a motion wystąpić z stand out with adds no meaning to wniosekmotion besides that of performing an activity
      • mover uma ação judicial to move a lawsuit to sue to move adds no meaning to lawsuit besides that of performing an activity
        apresentar uma lesão present a lesion to have a lesion to present adds no meaning to lesion besides that of having a property
        estar com medo be with fear to be afraid to be with adds no meaning to fear besides that of being in a state
      • a avea curaj to have courage avea adds no meaning to curaj besides that of thaving a property
        a lua o decizieto make a decision lua adds no meaning to decizie besides that of performing an activity
      • Janez ima predavanje Janez lectures → Janez is the subject of the verb and a semantic argument of the noun (the lecturer)
    • it is not an LVC
      • започвам играта start the game, start playing започвам start adds an aspectual meaning to the noun
      • eine Rede beginnen to begin a speech beginnen adds an aspectual meaning to the noun Rede
      • ξεκινάω μία προσπάθειαstart an attempt ξεκινάω start adds an aspectual meaning to the noun
      • to start a walk start adds an aspectual meaning to the noun
      • comenzar un discurso to begin a speech comenzar adds an aspectual meaning to the noun discurso
      • oinez hasi foot-by start to start walkingthe verb hasi adds an aspectual meaning to the noun
      • donner du courage to give courage donner indicates the source of the courage (this would not pass test LVC.2)
        donner son avis to give one's opiniondonner adds the information that the opinion is communicated
        Ce fait attire l'attention de la justice This fact attracts the attention of the justice attirer indicates the attention starts
      • početi igru start the game početi start adds an aspectual meaning to the noun
      • cominciare un ballo to start a dance cominciare adds an aspectual meaning to the noun ballo
      • wymierzyć sprawiedliwośćto measure justiceto do justicewymierzyćmeasure adds an aspectual meaning to sprawiedliwośćjustice
        przejść na emeryturęto cross to retirementto take retirementprzejść adds an inchoative (change-of-state) meaning to the noun
        propozycja budzi zastrzeżeniathe proposal wakes-up reservationsthe proposal raises reservationsbudzi wakes-up raises add an inchoative meaning to zastrzeżenia reservations
        dopełnić obowiązkuto fulfill one's dutydopełnićfulfill adds a fulfillment meaning to obowiązekduty
      • entrar com uma ação judicial to enter with a lawsuit to file a lawsuit to enter adds an aspectual meaning to the noun
        dar uma opinião to give an opinion to giveadds the meaning of communication which is not present in the name itself (one can ter uma opinião to have an opinion without ccommunicating it).
      • a începe muncato start work the to start working începe adds an aspectual meaning to the noun
      • Študent je prekinil njegovo predavanje The student has interrupted his lecture → The student, that is, the subject of the verb, is not a semantic argument of the lecture, since a lecture does not necessarily have an interrupter

    Note that this light semantics of the verb is either usual for that verb (i.e. the verb is a pure syntactic operator, like commit, perform), or occurs in the context of the particular noun (e.g. for pay in to pay a visit). Both types of verbs pass the test.

    In our view of LVCs, we do not require a light verb to be "bleached", as it is sometimes described in the literature. We simply do not take into account the relation between the verb's use as a light verb and its other uses. While the specific meaning added by light verbs to the predicative nouns have been extensively studied and described (e.g. by Miriam Butt and Tafseer Ahmed), we do not adopt any fine-grained classification here. If you have a doubt about a verb's "lightness", proceed to the next test: if you can evoke the same event/state without using the verb, then it is considered light.

    Test LVC.4 (prev. 12) - [V-REDUC] - Verb reduction

    Try to build an NP without the verb, in which v's subject s becomes n's dependent. You might need to test several prepositions (of, by, for, from), possessives (my, her, somebody's), postpositions, case markers, as long as you use no verb. Can this verbless NP refer to the same event or state as the candidate v+n construction does?

    • annotate as LVC.full
      • Иван пое отговорност Ivan took responsibility отговорността на Иван — both refer to the same property/event
        Иван взе решение Ivan made a decision решението на Иван — both refer to the same property/event
      • Paul hat eine Rede gehalten Paul has given a speech Paul's speech both refer to the same speech event
        Ich habe ihm einen Besuch abgestattet I have paid him a visit mein Besuchmy visit both refer to the same visiting event
      • ο Γιάννης έκανε μία παρουσίαση John made a presentation η παρουσίαση του Γιάννη John's presentation — both refer to the same presenting event
      • Paul had a walk Paul's walk — both refer to the same walking event
        I paid him a visit my visit to him — both refer to the same visiting event
        Hester gave birth to Pearl Pearl's birth to Hester — both refer to the same birthing event (note that the key criterion is that Hester, the subject of the verb, is a (prepositional) dependent of birth in the paraphrase)
        The party gave priority to senior members → the priority of senior members for the party — both refer to the same prioritization event
      • Pedro dio un paseo Pedro gave a walk Pedro took a walk el paseo de Pedro Pedro's walk— both refer to the same walking event
        El capitán da la orden de partir The captain gives the order to leave The general orders to leave la orden del capitán de partir The general's order to leave
      • Pellok bisita egin zidan → Pelloren bisita -- both refer to the same visiting event
      • Paul a fait une enquête Paul made an inquiryL'enquête de Paul Paul's inquiry
        Paul procède à une perquisition Paul makes a searchLa perquisition de/par Paul the search of/by Paul
        Le général donne l'ordre de partir The general gives the order to leave The general orders to leave l'ordre du général de partir The general's order to leave
        Les soldats reçoivent l'ordre de partir The soldiers receive the order to leave The soldiers are ordered to leavel'ordre aux soldats de partir The order to the soldiers to leave
        Jean souffre de troubles psychiques John suffers from psychic troubles Les troubles psychiques de Jean John's psychic troubles
        Jean présente une hypersensibilité John presents a hypersensibility John has a hypersensibilityl'hypersensibilité de Jean John's hypersensibility
        Paul reçoit des menaces de (la part de) Pierre Paul receives threats from (the part of) Peter Paul is threatened by Peterles menaces de Pierre à Paul Peter's threats to Paul
        Ce médicament présente un risque This medicine presents a risk This medicine poses a risk le risque de ce médicamentthis medicine's risk
        Ce fait attire l'attention de la justice This fact attracts the attention of the justice l'attention de la justice pour/sur ce fait the attention of the justice on/about this fact
      • Istraživač je donio zaključak The researcher made a conclusion njegov zaključak his conclusion both refer to the same event
      • Paolo ha fatto una conquistaPaul made a conquerla conquista di Paolo
        Il generale da l' ordinedi partire. The general gives the order to leaveThe general orders to leave L'ordine di/da parte del generale di partire
        Paolo riceve delle minacce da (parte di) Piero le minacce di Piero a Paolo
      • Obecni oddali hołd poległym The present gave-back tribute to the fallen The audience payed tribute to the fallenhołd obecnych the tribute of the audience
        Jan miał na myśli Marię Jan had on thought Maria Jan meant Mariamyśl JanaJan's thought
        Jan otrzymał wymówienieJan received a dismissalwymówienie dla Jana dismissal for Jan
        Inwestycja przynosi zyski the investment brings profitzyski z inwestycji profit from the investment
      • João cometeu um deslize o deslize do João — both refer to the same event
        O jogador cobrou um pênalti the player charged a penalty kick the player took a penalty kick o pênalti do jogador the player's penalty kick — both refer to the same event
        João tem consciência do perigo John has conscience of the danger John is aware of the danger a consciência do João sobre o perigo John's awareness of the danger — both refer to the same state
        João recebeu a remuneração John received the remuneration a remuneração do João John's remuneration — both refer to the same event
        O paciente recebeu a visita dos familiares The patient received the visit of the relatives a visita dos familiares ao paciente the visit of the relatives to the patient — both refer to the same event
        João apresenta lesões John presents lesions as lesões do João John's lesions — both refer to the same state
      • Paul a făcut o plimbarePaul had a walk plimbarea lui Paul Paul's walk — both refer to the same walking event
        i-am făcut o vizită I paid him a visit vizita mea — both refer to the same visiting event
      • imeti dvome to have doubts to doubt imeti have adds no meaning to dvomi doubts besides that of having a property
        delati razlike to make differences to differentiate delati in its usual sense means 'to make', but here it is not used in this sense and does not add any semantics to event
    • it is not an LVC
      • Иван хвърли поглед на вестника Ivan threw a glance at the newspaper #погледът на Иван върху вестника — different semantics; and requires a different preposition
      • Paul hat einen guten Eindruck gemachtPaul has made a good impression #Paul's Eindruck auf seine Freunde Paul's impression on his friends has a different semantics
      • ο Παύλος πήρε νέα από τον αδερφό του Paul got news from his brother #Τα νέα του Παύλου από τον αδερφό του #Paul's news from his brother — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (Τα νέα του Παύλου) fails to refer to the original event (Paul got news)
      • Paul got news from his brother #Paul's news from his brother — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (Paul's news) fails to refer to the original event (Paul got news)
      • Juan recibió la noticia de su hermano Juan got the news from his brother #La noticia de Juan — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (la noticia de Juan) fails to refer to the original event (Juan recibió una noticia)
      • Hizlariak interesa piztu zuen Speaker interest switched-on The speaker awakened interest#Hizlariaren interesa, #the speaker's interest -- different semantics
      • Son comportement porte une atteinte grave à l'honneur des soldats His behaviour seriously jeopardises the soldiers' honnour #l'atteinte de son comportement the jeopardy of his behaviour
      • Petar je dobio poruku od direktora Petar received message from his boss #Petar's news from his boss — one cannot remove the verb because the sense of communication is missed, so the verb is not light. As a consequence, the verbless NP (Petar's message) fails to refer to the original event (Petar received message)
      • Michael Phelps pobił rekord sprzed 2 tysięcy latMichael Phelps broke the record from 2 thousand years ago→ #Michael Phelps' record
        Ulica nosi imię sławnego poety The street carries the forename of a famous poet The street carries the name of a famous poet.imię ulicy the forename of the street
        Adam jest tego samego zdania Adam is of the same opinion Adam has the same opinion #zdanie Adama Adam's opinion refers to the contents of his opinion, not to the fact of having an opinion
      • O jogador cobrou uma falta the player charged a foul the player took a free kick a falta do jogador the player's foul — the focus changes from taking a free kick to being one of the parts involved in a foul (it's a VID)
        O jogador provocou uma lesão the player provoked a lesion a lesão do jogador the player's lesion — In the reduced NP, the focus changes from hurting somebody else to getting hurt
        O músico apresenta suas composições the musician presents his compositions as composições do músico the musician's compositions — the reduced NP does not keep the sense of presenting, it is not refer to the same event as the verbal construction
      • Paul a făcut o impresie bunăPaul made a good impression #Impresia lui Paul despre soția sa Paul's impression on his wife— different semantics
      • to začeti predavanje to begin a lecture začeti to begin adds an aspectual meaning to the noun

    This test has a simple formulation but its application has some important subtleties which are central to our definition of the LVC.full category. The goal of this test is to keep only constructions in which the predicative noun is an event or state, excluding "gray-zone" predicates.

    First, if it is not possible to build an acceptable NP where the verb v's subject s becomes a dependent of the noun n, e.g. using any preposition, postposition and/or case marker, this means that the verb is not light, and the construction cannot be annotated as LVC.full. This may remove constructions in which there is control, that is, both the noun and the verb share the same subject. However, control is not sufficient to characterize an LVC.full. In other words, LVC.4 fails, the verb is not completely light, and you cannot annotate the construction as LVC.full, even if intuitively it resembles an LVC.full due to control:

    • Paul a l'air de dormir Paul has the air of to-sleep Paul seems to be sleeping *l'air de dormir de Paul is unacceptable
      Paul a eu l'occasion de dormir Paul has had the oportunity to sleep Paul had the oportunity to sleep *l'occasion de Paul de dormir is unacceptable
    • Zdravnik je postavil diagnozo The doctor made a diagnosis njegova diagnoza His diagnosis both refer to the same event
      Politik jedal napoved The politician made a forecast njegova napoved his forecast both refer to the same event

    Second, the fact that the NP is acceptable does not suffice to characterise an LVC.full. Furthermore, the NP version in which the verb was omitted, if acceptable, must evoke the same event or state as the LVC. Here are some tricky examples and some recommendations about how to interpret them:

    • Имам по-голям брат I have an elder brother моят брат my brother refers to one member of the relation, and not to the state of brotherhood between both actants
      отправих покана към приятелите си I sent an invitation to my friendsпокана invitation can be interpreted both as the act of inviting and as its contents; for the first reason we count this candidate as LVC.full
    • Mary has a brother Mary's brother is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
      Mary sent a letter Mary's letter refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
      Mary has an opinion and more generally, cases of have + a noun refering to the state of having a mental content (opinion, belief) → Mary's opinion is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
      Mary made a speech and more generally, cases of make + a noun refering to a speech act → Mary's speech refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
      Mary made a decision decision can refer to the deciding event (a quick decision) and/or to what is decided. We recommend that these cases should be annotated as LVC.full
    • María tiene un hermano María has a brother el hermano de María María's brother is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
      María envió una carta María sent a letter La carta de María María's letter refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
      María dio un discurso María made a speech and more generally, cases of dar + a noun refering to a speech act → el discurso de María refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full
      María tomó una decisión María made a decision decisión decision can refer to the deciding event (a quick decision) and/or to what is decided. We recommend that these cases should be annotated as LVC.full
    • la compagnie a pris des mesures d'économie the company took some measures of savingthe company took cost-saving measures → the NP les mesures d'écononmie de la compagnie is ok, the semantic equivalence is difficult to judge, the "measures" seem to refer to cost-saving actions, so ok to annotated as LVC.full
    • mam starszego brata I have an elder brother mój brat my brother refers to one member of the relation, and not to the state of brotherhood between both actants
      Maria wysłała wiadomość Maria sent a message wiadomość Marii Maria's message refers to the contants of the message sent by Maria, rather than to the sending event itself
      Maria jest zdania, że Mary has the opinon that... zdanie Marii Mary's opinion refers to the content of the opinion, and not to the state of having an opinion
      miał na celu awans He had promotion on the aim His aim was a promotion jego cel refers to the aim inself, and not to the state of having a aim
      ta partia w wyborach miała większość this party had a majority in the elections#większość tej partii the majority of the party provokes a considerable shift in meaning
      złożył zeznania na policji he gave testimony on the police officejego zeznania can be interpreted both as the act of testimony and as its contents; for the first reason we count this candidate as LVC.full
    • Mojca jedala Tini priložnost Mojca gave Tina an opportunity #Mojčina priložnost Mojca's opportunity has a different meaning; if the verb is removed, the original meaning is lost, so the verb is not light.

    Finally, some nouns, especially nominalisations, are ambiguous between events and their participants. For instance, a costruction may be an event (the construction of the bridge took 2 years) or its result (this bridge is a spectacular construction). In that case, if the verbless NP can refer to the event, then you should prefer this reading over the "participant" interpretation. For example, in John made a construction, you may ask if John's construction refers to the construction event or to its result. In this case, it can refer to the event, so it should be annotated as LVC.full.

    Test LVC.5 - [V-SUBJ-N-CAUSE] Verb's subject is noun's cause

    Is the subject of the verb expressing the cause of the predicate expressed by the noun? In other words, does the verb bring an additional participant to the scene, representing the source or cause of the event or state referred to by the noun?

    • annotate as LVC.cause
      • Иван даде възможност на Мария да представи картините си Ivan gave Maria the opportunity to present her paintings→ Ivan is not a semantic argument of възможност opportunity but he is the cause of the opportunity
      • to grant rights → X has the right to Y, the granter is not a semantic argument of rights, but it causes somebody to have the right to do someting
        to give a headache → X has a headache, the cause of the headache, indicated as the subject of give is not a semantic argument
        the new law provoked the destruction of the building → the destruction of X by Y, the reason for the destruction is indicated by the verb provoke, which is a prototypical causative verb. Here, the subject is not the agent of destruction, but its cause. Notice that if the sentence was the explosion provoked the destruction of the building, then the construction would be an LVC.full
        residents seek to build consensus on the development of the territory → the semantic argument of consensus is the topic on which everybody agrees, the subject of build consensus expresses an external participant responsible for the consensus to exist.
      • otorgar derechos to grant rights → X has the right to Y, the granter is not a semantic argument of rights, but it causes somebody to have the right to do someting
        dar dolor de cabeza → X has a headache, the cause of the headache, indicated as the subject of dar is not a semantic argument
        la nueva ley provocó la destrucción del edificio the new law provoked the destruction of the building → the destruction of X by Y, the reason for the destruction is indicated by the verb provocar to provoke, which is a prototypical causative verb. Here, the subject is not the agent of destrucción destruction, but its cause. Notice that if the sentence was la explosión provocó la destrucción del edificio the explosion provoked the destruction of the building, then the construction would be an LVC.full
      • zadati glavobolju to give a headache→ X has a headache, the cause of the headache, indicated as the subject of give is not a semantic argument
      • Marek dał mi prawo wyboru Marek gave me the right to choose→ Marek is not a semantic argument of prawo right but he is the cause of the right
        dać podstawy prawne to give legal foundation
        nakładać na kogoś powinność to put a duty on sb.
        narazić kogoś na straty to expose someone to losses
        stawiać komuś cel to set an aim to someone
        ślady krwi wzbudziły podejrzenia policji the traces of blood raised suspicion to the police
      • Bombardamentul a provocat moartea multor civili. The bombing provoked the death of many civilians.Many civilians (mulți civili) died and their death (moarte) was provoked by the bombing (bombardamentul)
    • it is not an LVC
      • Този инцидент подрони авторитета на кандидата This incident undermined the authority of the candidate→ Инцидентът incident is neither a semantic argument of the authority nor its cause
      • to relieve a headache → the subject of relieve is not what is causing a headache
        to give birth → tricky case, since the subject of give actually is a semantic argument of birth, so it cannot be its cause. This construction must be annotated as VID (it does not pass test VPC.4 either).
        excessive heat provokes fire → even though provoke prototypically expresses a cause, in this case fire is not predicative and should not pass test LVC.1, so the construction cannot be annotated as LVC.cause
      • calmar un dolor de cabeza to relieve a headache → the subject of calmar to relieve is not what is causing a headache
        dar a luz to give birth→ tricky case, since the subject of dar to give actually is a semantic argument of a luz, so it cannot be its cause. This construction must be annotated as VID (it does not pass test VPC.4 either).
        un calor excesivo provoca incendios excessive heat provokes fires→ even though provocar prototypically expresses a cause, in this case incendios is not predicative and should not pass test LVC.1, so the construction cannot be annotated as LVC.cause
      • Incydent ten podważył zaufanie wyborców do kandydata This fact undermined the electorate's confidence in the candidate→ Incydent event is neither a semantic argument of the confidence nor its cause (it is the opposite of the cause)
        komisja przeprowadziła wybory the committee carried out the vote→ komisja committee is neither a semantic argument of wybory vote not its cause
        mocny zapach uśpił czujność psów the strong scent lulled the vigilance of the dogs → the scent is the opposite of the cause of vigilance
      • căldura excesivă provoacă incendii → even though provocaprovoke prototypically expresses a cause, in this case incendiufire is not a predicate and should not pass test LVC.1, so the construction cannot be annotated as LVC.cause
      • Marija ima brata Marija has a brother Marijin brat Marija's brother is a concrete NP referring to one member of the relation (does not pass LVC.0), and not to the state of brotherhood between both actants
        Marija je poslala pismo Marija sent a letter Marijino pismo Marija's letter refers to a concrete object participating in the event (does not pass LVC.0), but not to the sending event itself
        Marija ima mnenje Marija has an opinion and more generally, cases of imeti to have + a noun refering to the state of having a mental content (mnenje, predstava, dvom opinion, idea, doubt ) → Marijino mnenje Marija's opinion is ambiguous between the fact that she has an opinion and the content of her opinion. We recommend that these cases should be annotated as LVC.full
        Marija je postavila vprašanje/trditev Marija posed a question/statement and more generally, cases of postaviti make + a noun refering to a speech act → Marijino vprašanje Mary's question refers to the informational content produced or communicated during the speech act, but can also refer to the act itself. We recommend that these cases should be annotated as LVC.full

    Constructions annotated as LVC.cause involve:

    1. verbs that are typically used to express the cause of predicative nouns in general (e.g. cause, provoke), or
    2. verbs that are only used to express the cause of particular predicative nouns (e.g. grant in to grant a right).

    When the construction involves a typically causative verb (e.g. cause, provoke), it might seem counter-intuitive to annotate it as VMWE because it looks perfectly regular, not presenting any VMWE idiosyncrasy. However, it turned out difficult to distinguish idiosyncratic from regular LVC.cause, so both should be annotated, like for LVC.full. In other words, some LVC.cause constructions are compositional and can be understood as complex predicates with a causal support verb, regardless of their compositionality.

    Typically causative verbs (e.g. cause, provoke) can sometimes be light. In this case, according to the LVC decision tree, LVC.full has priority over LVC.cause. For instance, the announcement provoked an unexpected reaction should be annotated as LVC.full and not LVC.cause, although provoke is a typically causative verb. Indeed, reaction has two arguments (reaction of X to Y), one of which is the subject of the verb (test LVC.2 passes). In other words, typically causative verbs may be used in either LVC.full or LVC.cause, depending upon whether the cause subject of the verb is a normal, canonical argument to the predicative noun (LVC.full) or an "external" non-canonical cause (LVC.cause).

    Some verbs could be considered causative, but their interpretation goes beyond purely indicating the cause of the event/state. Therefore, you should NOT annotate as LVC.cause constructions involving:

    • verbs which encode a manner of causation:
      • to call a meeting entails communication to schedule the meeting
        to hold a meeting entails leadership
        to organize classes entails preparation
    • verbs which encode modality:
      • to allow dialogue entails permission
        to foster dialogue entails assistance
        to require dialogue entails necessity
    • aspectual verbs whose subject is a semantic argument of the noun:
      • we started the meeting
        we ended the meeting
        we continued the meeting

    Problematic cases and remarks

    Syntactic variants

    The (single or compound) noun n functions as a regular syntactic dependent, so LVCs exhibit regular syntactic variants.

    • взема решениерешението, което президентът взе the decision that the president made
    • eine Entscheidung treffen → die Entscheidung die der Direktor zu treffen hatte.
    • παίρνωμία απόφαση → η απόφαση που πρέπει κάποιος να πάρει.
    • make a decision → the decision that the director has to make.
    • tomar una decisión → la decisión tomada por la directora.
    • erabaki bat hartu decision one take to make a decision→ zuzendariak hartutako erabakia director taken decision the decision (which was) made by the director
    • prendre une décision → la décision prise par la directrice.
    • donijeti odluku to make a decisionodluka koju je morao donijeti direktor the decison that the director had to make
    • prendere una decisione → la decisione che il direttore ha dovuto prendere.
    • wziąć udział to take participation.ACCto take part wzięcie udziału taking.GER participation.GENtaking part, biorący udział taking.PART participation.ACCtaking part
    • tomar banho take shower → o banho que eu tomei estava bom the shower which I took was good
    • a lua o decizieto make a decisiondecizia pe care directorul trebuie să o ia the decison that the director has to make.
    • dati ime nekomu to give (somebody) a name to name (somebody) → the object receives a name and this action implies that as a result he/she is named. Therefore person who gives a name causes that something is named. The subject of the verb is not its semantic argument.
      narediti konec nečemu to make an end (to something) to end (something) → the result of this action is that something is finished, which is caused by the subject of narediti to make

    As explained in the section on syntactic variants of VMWEs, all LVC tests should be applied to the canonical form, that is, one in which the verb is in active voice and in finite form. If there is no canonical form, this is an indication that the target construction might not be an LVC, but a verbal idiom instead.

    Selection of the verb

    In many cases of LVCs, it can be said that there is some degree of selection of the verb by the noun.

    • вземам решение to make a decision vs *вземам отговорност to take responsibility
      имам право to be right vs *притежавам право
    • eine Entscheidung treffen a decision meet make a decisionvs.*eine Entscheidung machen a decision make vs. *einen Beschluss treffen a resolution meet
    • κάνω διάλειμμα vs. #παίρνω διάλειμμα
    • have a walk vs *have a race
      run a race vs *run a walk
    • tomar una decisión take a decision make a walk vs *dar una decisión give a decision but darse/tomar una ducha give.self/take a shower
    • pauso eman step give to take a step vs. ?pauso egin step do
      bisita egin visit do to pay a visit vs. bisita eman visit give
    • faire une marche make a walk take a walk vs *procéder à une promenade perform a walk but faire/procéder à une enquête make/perform an inquiry
    • postaviti pitanje to put a question to pose a question vs *postaviti odgovor
    • prendere una decisione take a decision make a decisionvs.*fareuna decisione make a decision vs. *prendere una conclusione take a conclusion
    • wziąć udział to take participation vs. *pobrać udział
      mieć rację to have rightto be right vs. *posiadać rację to possess right
    • fazer uma prece to make a prayer vs. *dar uma prece to give a prayer but fazer/dar uma caminhada to make/give a walk
    • a da divorț to give divorce to divorce vs. *a oferi divorț
    • dati nasvet to give an advice → the subject of dati give cannot cause an advice

    Yet some regularities exist. For example, large classes of nouns function with have (e.g. +property) or commit (+negative achievement). Therefore, we chose not to retain the selection of the verb as a criterion for LVC categorization. Instead, the decision tree should be applied to decide whether a candidate should be annotated as LVC.

    Scope of annotation vs. literature on LVCs

    Many authors distinguish support verbs from light verbs, still others differentiate between true light verbs and vague action verbs.

    On the one hand, we take a narrower scope than what is usually considered in the literature by ignoring aspectual support verbs (except when aspect is morphological). We believe that aspectual verbs do contribute an additional (change of state) meaning to the expression, and most of the time they are completely productive, not forming interesting VMWEs. For instance, for the predicative noun walk, we will consider the light verb to have, but not the aspectual verbs to start, to pursue, to stop a walk. Thus, to have a walk is an LVC.full. Note that for some nouns such as bloom, which are in itself inchoative, we do consider to come into bloom as LVC.full, as both the verb and the noun are inchoative, so the verb does not add any semantics to the noun.

    On the other hand we take a broader scope than what is usually considered in the literature by taking in cases in which the verb has light semantics per se (it only bears morphology, such as the tense and mood, in any case), which hence cannot be described as "bleached" as is usually said of support verbs. For instance, whereas to pay does not have its usual meaning in to pay a visit, it cannot really be said that commit does not have one of its meanings in commit a crime (note that commit can be used with any negatively charged achievement noun, e.g. suicide, crime, fraud, felony...). Nonetheless, we annotate to commit a crime as LVC.full since it passes all tests.

    Verb and adjective paraphrase

    One test often used in the literature is the existence of a morphologically related verb or adjective that means the same as the LVC. For instance, to make a visit is equivalent to to visit, to have an illness is equivalent to to be ill. Note however that it is neither sufficient nor compulsory:

    • some LVCs have no derivationally-related equivalents, such as to have a flu, to have faith and to commit a crime;
    • some constructions that are not LVCs do have a derivationally-related equivalent such as to write an email and to email;
    • some LVCs have derivationally-related equivalents that do not mean the same as the LVC, such as to make a face and to face, or that have different argumental structure from the LVC, such as to have a problem and to be problematic.

    Nonetheless, it might be useful to reason about the derivationally-related equivalents to decide whether a noun is predicative in test LVC.1. Therefore, here are some useful questions that might help deciding about the predicative nature of the noun in the LVC candidate

    Verb paraphrase Is the abstract noun derivationally related to a verb with the same semantics? Then, there is probably a semantic argument, which coincides with the subject of the verb, so test LVC.1 passes:

    • вземам решение to make a decision = решавам to decide
      правя грешка to make a mistake = греша/сгрешавам to make a mistake
    • ο Γιάννης παίρνει μία απόφαση John makes a decision = ο Γιάννης αποφασίζει John decides
      ο Γιάννης κάνει ένα ταξίδι John makes a trip = o Γιάννης ταξιδεύει
      ο Γιάννης έχει θάρρος John has courage = ο Γιάννης είναι θαρραλέος John is courageous → and, more generally, characteristics and attributes
      ο Γιάννης έχει πείνα/δίψα John has hunger/thirst = ο Γιάννης πεινάει/διψάει John is hungry/thirsty → and, more generally, physical sensations
      ο Γιάννης έχει πάθος/φόβο/θυμό John has passion/fear/anger = ο Γιάννης παθιάζεται/φοβάται/θυμώνει John is passionate/afraid/angry → and, more generally, feelings, emotions, states
    • John makes a decision = John decides
      John has a walk = John walks
    • Juan toma una decisión Juan makes a decision = Juan decide Juan decides
      Juan da un paseo Juan takes a walk = Juan pasea Juan walks
    • Jonek erabakia hartu du = Johen erabaki du John decision-the taken has = John decided has John has made a decision = John has decided
    • Ivan donosi odluku Ivan takes decision = Ivan odlučujeIvan decides
      Janica jeodnijela pobjedu Janica carried away a win = Janica je pobijedila Janica won
    • Jan podejmuje decyzję John takes decision = Jan decyduje John decides
      Ewa odniosła zwycięstwo Eva carried away a victory = Ewa zwyciężyła Eva won
    • Ion ia o decizie John makes a decision = Ion decide John decides
    • postaviti vprašanje to pose a questionvprašanje, ki ga je moral postaviti the question that he had to pose

    Adjective paraphrase: Is the abstract noun derivationally related to an adjective with the same semantics? Then, there is probably a semantic argument, which coincides with the noun that is modified by the adjective, so test LVC.1 passes.

    • имам смелост to have courage = съм смел to be courageous
      нямам търпение to not have patience = съм нетърпелив to be impatient
      нося отговорност to carry responsibility = съм отговорен to be responsible
    • ο Γιάννης έχει θάρρος John has courage = ο Γιάννης είναι θαρραλέος John is courageous → and, more generally, characteristics and attributes
    • John has courage = John is courageous → and, more generally, characteristics and attributes
      John has hunger/thirst = John is hungry/thirsty → and, more generally, physical sensations
      John has passion/fear/anger = John is passionate/afraid/angry → and, more generally, feelings and emotions
      John has problems/difficulties = Something is problematic/difficult for John → and, more generally, states
    • Juan tiene miedo Juan has fear = Juan es miedoso Juan is easily scared → and, more generally, characteristics and attributes
      Juan tiene hambre Juan has hunger = Juan está hambriento Juan is hangry → and, more generally, physical sensations
    • Anek itxaropena du = Ane itxaropentsu dago Ane hope has = Ane hopeful is Ane has hope = Ane is hopeful → and, more generally, characteristics and attributes
      Anek = Ane gosetuta Ane hunger has = Ane hungry is Ane has hunger = Ane is hungryand, more generally, physical sensations
    • imati strpljenja to have patience = biti strpljiv to be patient
      nositi odgovornost to carry responsibility = biti odgovoran to be responsible
    • mieć odwagę to have courage = być odważnym to be courageous
      mieć straty to have losses = być stratnym to have lost sth
      mieć sens to have a sense to make sense = być sensownym to be reasonable
    • avea curaj to have courage = fi curajosto be courageous

    Synonym verb/adjective paraphrase: Does the abstract noun have a synonym/hypernym derivationally related to a verb or adjective with the same semantics? Then, the questions above can be applied to the synmonym verb/adjective.

    • Иван и Мария постигнаха консенсус Ivan and Maria reached a consensus = Ivan and Maria agreed consensus has no corresponding verb or adjective, but agreement is a synonym
    • έχω τη γνώμη I have the opinion I think = πιστεύω γνώμη opinionhas no corresponding verb or adjective, but πίστη,άποψη are synonyms
    • John and Mary reach a consensus = John and Mary agree consensus has no corresponding verb or adjective, but agreement is a synonym
      John has a chance to do something = John is likely to do something chance has no corresponding verb or adjective, but likelihood is a synonym
    • Anek min eman dio Joni = Anek Jon mindu du Ane pain given has to-Jon = Ane Jon hurt has Ane has hurt Jon
    • Radnici i uprava postigli su konsenzus workers and managment reached consensus = Radnici i uprava su se dogovorili workers and managment agreedkonsenzus consensus has no corresponding verb or adjective, but dogovor agreement is a synonym
    • mieć 190 cm wzrostu to have 190 cm of height to be 190 cm tall = mierzyć 190 cm tp measure 190 cm to be 190 cm tall
      dokonać inwazji to perform an invasion = wtargnąć to invade
    • da voie=permite

    The existence of a related verb is not a definitive tests, but a hint that the noun is probably predicative. Since determining whether a noun is predicative is tricky, we advise language teams to provide additional documentation and examples for borderline cases.

    Checking if the subject is an argument with syntactic tests

    The previous version of the guidelines had a syntactic test which you can still use to verify if the verb's subject is an argument of the noun. However, this test was considered hard to apply in the previous guidelines, and is not mandatory anymore.

    The syntactic test consists in trying to add the semantic argument as a complement of the noun in the presence of the verb. In other words, does the noun n, in the presence of v, prohibit at least one syntactic argument a which it normally licensed in the absence of v?

    An alternative formulation for this test is the following: Let s be the subject of v, and let r be the semantic role that s plays with respect to the noun n. Is it prohibited for r to be realized both by s and by a syntactic argument a of n, except when a is in the whole–part relation with s?
    • Петър Стоянов взе решението да подпише договора Ivan made the decision to sign the contract + решението на президента да подпише договора*Петър Стоянов взе решението на президента да подпише договора — the noun cannot be modified by the person performing the act/event (which is the subject)
    • Die Königin hat dem Premierminister einen Besuch abgestattet the Queen has paid a visit to the Prime Minister + ein Besuch der Dame beim Premierminister a visit of the Lady to the Prime Minister *Die Königin hat einen Besuch der Dame beim Premierminister abgestattet*The Queen paid a visit of the Lady to the Prime Minister — the visitor cannot be a modifier of visit
      Paul hat eine Entscheidung über das Budget getroffen Paul made a decision on the budget + die Entscheidung des Rates über das Budget the council's decision on the budget*Paul traf die Entscheidung des Rates über das Budget *Paul made the committee's decision on the budget — the decision maker cannot modify decision
    • ο πρωθυπουργός έκανε επίσημη επίσκεψη στον Αμερικανό πρόεδρο The Prime Minister paid a visit to the US President+ η επίσκεψη του πρωθυπουργού στον Αμερικανό πρόεδρο a visit of the Prime Minister to the US President*ο πρωθυπουργός έκανε επίσημη επίσκεψη του υπουργού στον Αμερικανό πρόεδρο *The Prime Minister paid a visit of the Minister to the US President — the visitor cannot be a modifier of επίσκεψηvisit
    • The Queen paid a visit to the Prime Minister + a visit of the Lady to the Prime Minister*The Queen paid a visit of the Lady to the Prime Minister — the visitor cannot be a modifier of visit
      Paul made a decision on the budget + the committee's decision on the budget*Paul made the committee's decision on the budget — the decision maker cannot modify decision
      Paul had a discussion with Mary+ Peter's discussion*Paul had Peter's discussion with Mary
      Bjarnson scored a goal + Arnason's goal*Paul scored Arnason's goal but Paul scored the goal of Iceland — the scoring entity can only modify goal in the last case, when they are part of the Iceland team
    • La reina hizo una visita al primer ministro The Queen paid a visit to the prime minister + una visita de la primera dama al primer ministro a visit of the first Lady to the prime minister*La reina hizo una visita de la primera dama al primer ministro The Queen paid a visit of the first lady to the first minister— the visitor cannot be a modifier of visita
      Pablo tomó una decisión con respecto al presupuesto Pablo made a decision on the budget + la decisión del comité con respecto al presupuesto the committee's decision on the budget*Pablo tomó la decisión del comité con respecto al presupuesto Pablo made the committee's decision on the budget— the decision maker cannot modify decisión
    • Ikasleek arreta jarri zioten irakasleari +lagunen arreta The-students attention put to-the-teacher + friends' attention The students paid attention to the teacher + their friends' attention*Ikasleek lagunen arreta jarri zioten irakasleari The students paid their friends' attention to the teacherthe person paying attention cannot be a modifier of arreta
    • La ministre a rendu une visite aux victimes + la visite de la ministre aux victimes*La ministre a rendu une visite du président aux victimes — the visitor cannot be a modifier of visite
      Bjarnson a marqué un but + le but d'Arnason*Paul a marqué le but d'Arnason but Paul a marqué le but de l'Islande — the scoring entity can only modify but (goal) in the last case, when they are part of the Iceland team
    • Učiteljica je donijela odluku u vezi s izletom The teacher made a decision regarding the excursion + učenikova odluka u vezi s izletom pupil's decision regarding the excursion*učiteljica je donijela učenikovu odluku u vezi s izletom — the decision maker cannot modify decision
    • Il primo ministro ha preso la decisione di dimettersi the Prime Minister decided to resign + le dimissioni del governo the resignation of the government*Il primo ministro ha preso la decisione del governo di dimettersi — the resigner cannot be a modifier of resignation
    • Paweł złożył rezygnację ze stanowiska dyrektora Paweł submitted a resignation from the position of the director Paweł tendered his resignation from the director position + rezygnacja Piotra *Paweł złożył rezygnację Piotra ze stanowiska dyrektora Paweł tendered Piotr's resignation from the director position - the resignation cannot be modified by the resigning person
      Paweł prowadzi rozmowy *Paweł prowadzi rozmowy Piotra Paweł leads Piotr's talks , Paweł prowadzi rozmowy komisji Paweł leads the talks of the commission - the discussing entity komisjacommission can only modify rozmowytalks if Paweł belongs to the commission.
      Jan otrzymał wymówienieJan received a dismissal + wymówienie dla Pawła dismissal for Paweł *Jan otrzymał wymówienie dla Piotra
    • João está tomando banho John is taking shower + o banho do Pedro Pedro's shower*João está tomando o banho do Pedro — the bath cannot be modified by a bath taker
      Pedro sofreu prejuízo com a compra Pedro suffered finantial loss with the purchase + o prejuízo do José José's finantial loss*Pedro sofreu o prejuízo do José com a compra — the financial loss cannot be modified by the affected entity
      A Maria fez um aborto Maria made an abortion + o aborto da Joana Joana's abortion#A Maria fez o aborto da Joana — the noun cannot be modified by another patient
      O médico realizou o parto com sucesso The doctor performed the childbirth with success + o parto do Dr. Pedro Dr. Smith's childbirth*O médico realizou o parto do Dr. Pedro com sucesso — the childbirth could be modified by the mother (patient) but not by another doctor (agent).
    • Paul a dat sfaturi surorii salePaul gave advice to his sister + sfatul lui Petre Peter's advicePaul a dat sfatul lui Petre surorii sale Paul gave Peter's advice to his sistersfatul the advice cannot be modified by its author
    • Aleš si dela skrbi Aleš makes worries Aleš has worries = Aleš je zaskrbljen Aleš is worried → and, more generally, feelings and emotions

    The rationale for this tests is that a semantic argument n cannot be realized as its syntactic dependent, since it is already realized as v's syntactic dependent instead (usually as v's subject). For instance the noun visit takes two semantic arguments, the visitor and the visited entity, as in the visit of the Queen to the Prime Minister. When used in to pay a visit, the visitor semantic argument is realized as the subject of to pay (The Queen paid a visit to the Prime Minister), and cannot be realized at the same time within the NP headed by visit (*The Queen paid a visit of the Lady to the Prime Minister).

    Note that the syntactic formulation may be tricky to apply. It is sometimes possible to add the semantic argument as a complement of the noun in the presence of the verb, if we change the interpretation of the argument (and thus its thematic role). For instance, even though the construction John took Luke's decision may be acceptable, the interpretation would be comparative (John took a decision that Luke should have taken). Therefore, the test passes since the verb is still connecting a predicate (decision) to its argument (John, the decider).


    Section 5.3

    Verbal idioms (VID)

    Verbal idioms constitute a universal category. A verbal idiom (VID) has at least two lexicalized components including a head verb and at least one of its dependents. The dependent can be of different types. Here are some examples:

    • Subject
      • броят му се ребрата be counted someone's (possessive pronoun) ribs (someone) to be very thin and skinny
      • ein kleines Vöglein hat mir gezwitschert a little bird told me
      • μου είπε ένα πουλάκι me told a little-birda little bird told me
      • a little bird told someone
      • tu hora ha llegado your time has arrived your time has come
      • licho wie devil knowsI have no idea
      • a sua hora chegou your time has arrived your time has come
      • a șoptit o păsăricăwhispered a bird little a little bird told someone
      • srce pade v hlače komu (someone's) heart drops into the pants one is lacking courage to do something , sekira pade v med komu (someone's) hatchet falls in honey one gets lucky
    • Direct object
      • гушна букета hug the bunch of flowers to die
      • er hat den Schuss nicht gehört he did the shoot not hear it takes him a long(er) time to understand sth
      • to kick the bucket
      • estirar la pata to strech the leg kick the bucket
      • udać Greka to pretend to be a Greekto pretend not to understand
      • bater as botas to hit the boots to die, abrir mão de algo to open hand (of something) to give up (on something)
      • a arunca vina to throw guilt-the to blame
      • ustreliti kozla to shoot the goat to say or do something stupid
    • Circumstantial or adverbial complement
      • удрям в гръб hit in the back to stab in the back
        правя сам да си говори make (someone) to talk to himself to drive (someone) crazy
      • etwas wie warme Semmeln verkaufen sth. like warm bread rolls to sell sth. fast and easy
      • φέρω βαρέως bring heavily
      • to take something with a pinch of salt, to sell like hotcakes, to strike while the iron is hot, to come off with flying colors
      • coger algo con pinzas to hang something with pegs take something with a pinch of salt
      • wiercić komuś dziurę w brzuchu to drill a hole in one's bellyto intrusively solicit someone, to insist too much
      • levar em conta to bring in account to take into account
        ir ao ar go to the air to go on air
      • a lua în considerare to bring in account to take into account
      • spati kot ubit to sleep like dead to sleep soundly

    It is often challenging to distinguish VIDs from other VMWE categories if only one dependent of the head verb is lexicalized. The VMWE categorization depends on the category of this dependent:

    • Reflexive clitic or particle: the VMWE is either an IRV (reflexive pronoun) or a VPC (particle), never a VID.
    • Verb with no lexicalized dependent: fine-grained tests need to be applied in order to discriminate between a MVC and a VID. See the section on Structural tests.
    • Extended nominal phrase: fine-grained tests need to be applied in order to discriminate between an LVC and a VID. See the section on Structural tests.

    With a dependent of any other category, the VMWE is always a VID, including the following:

    • Adjectival phrase
      • постигам своето to achieve one's ownto have it my way
      • schwarz fahren to drive black to take a ride without a ticket
      • to come clean, to stand firm
      • jugar sucio to play dirty to play dirty
      • zrobić swoje to do one's ownto do what one is supposed to do
        tykać cudze to touch someone else'sto take something that does not belong to you
        dopiąć swego to button up one's ownto fulfill one's plans
      • to jogar sujo to play dirty
      • a juca murdar to play dirty
      • biti zelen od zavisti to be green with envy
    • Verb with lexicalized dependents
      • не мога две думи на кръст да кажа I cannot say two words crossing each other to be unable to speak or express oneself две думи на кръст да кажа is a clause
        правя сам да си говори make someone talk to himself to drive someone crazy сам да си говори is a clause
      • to make ends meet
    • Relative clause
      • ще видиш откъде изгрява слънцето you will see where the sun rises from(angrily) you will get what you deserve, you will be punished
      • wissen wo es langgeht to know where things are heading to know on which side one's bread is buttered
      • to know on which side the bread is buttered
      • saber de qué pie cojea to know of which foot (he/she) limps to know someone inside out
      • wiedzieć, skąd wieje wiatr to know where wind blows fromto know on which side your bread is buttered, to know how to take advantage of the situation
      • saber onde pisar know where to-step to know the way to succeed in something
        mostrar com quantos paus se faz uma canoa show with how many sticks one makes a canoe to punish or take revenge
      • a ști cu ce se mănâncă to know with what CL.Refl. eats to knwo what it is about
      • vedeti koliko je ura to know what time it is to realize the truth
    • Non-reflexive pronoun
      • втасахме я we proved it.FEM (as in bread: raise in volume due to yeast) to fall into a difficult situation
      • es gibt it gives there is
      • τα καταφέρνω, την πατάω
      • to make it
      • l'emporter to take it away to win
      • prender le to take it to be beaten
      • Polish does not seem to have this type of VMWEs
      • dá-lhe João! give to him/her, João! show them what you got, João!
      • a o șterge to her delete to fly the coop
        a o întinde to her extend to fly the coop synonymous expressions with the non-anaphoric feminine ACC personal clitic 'o' functioning as an expletive
      • ucvreti jo to escape her to escape something/someone by running

    Sentential expressions with no open slots, such as proverbs and conventionalized sentences, are included in the scope of VIDs.

    • краставите магарета се надушват отдалече the itchy donkeys smell each other from afaralike people are attracted to each other
    • Rom wurde nicht an einem Tag erbaut Rome was not build in a day wer A sagt muss auch B sagen who says A must also say B you must finish what you start
    • στο σπίτι του κρεμασμένου δεν μιλάνε για σχοινίin-the house the.GEN hunged-man.GEN not speak.03.PL about rope
    • Rome was not built in a day
      Fortune favors the bold
      The pleasure is mine
      I beg your pardon!
    • Roma no se construyó en un día Rome was not build in a day donde dije digo, digo Diego where said.I said, say.I Diego to do or give something and then take it back, to retract oneself
    • trafiła kosa na kamień met the scythe a stonesomeone rude/dishonest came across someone else who used similar methods against him/her
    • quem vê cara não vê coração who sees face doesn't see heart a person can lie/omit his/her feelings
    • Urciorul nu merge de multe ori la apă Pitcher-the not goes of many times at water The pitcher goes so often to the well that it is broken at last
    • Počasi se daleč pride more haste less speed
      Po toči zvoniti je prepozno there is no use ringing the bells after hail it is to late

    If more than one dependent of the head verb is lexicalized, then the candidate VMWE is always classified as a VID.

    • заравям глава в пясъка to hide head in sandto pretend not to see a problem
    • die Katze aus dem Sack lassen to let the cat out of the bag
    • βάζω λάδι στη φωτιά put oil to-the fire
    • to let the cat out of the bag, to cut a long story short, to call it a day
    • hacer de tripas corazón make of intestines heart to pluck up the courage
      dejar con la miel en los labios to_leave with the honey in the lips leave (sb) wanting more
      dar gato por liebre to_give cat for hare to rip off, to take for a ride
    • se faire des idées to make SELF ideas to imagine something false,s'en aller to go SELF from there to leave,il y a it has there there is
    • chować głowę w piasek to hide head in sandto pretend not to see a problem
    • tapar o sol com a peneira to hide the sun with a sieve to sugar-coat
    • a da bir cu fugiții to give tribute with fugitives.the to back away
    • beseda mi je ostala v grlu word got stuck in my throat I am speechless
    • att sätta sig upp mot någon to sit oneself up against someone to defy someone
      att dra sitt strå till stacken to draw one's straw to stack.the to contribute (in a small way)

    Cases when there is no single clearly identifiable head verb, because of coordinated verbs or of an irregular syntactic structure, are also covered by the VID category.

    • цъфна и вържа to blossom and give fruit (usually sarcastically) to prosper
    • leben und leben lassen live and let live
    • έδωσε πήρε
    • to drink and drive
    • coser y cantarto_sew and to_singeasy as pie, a piece of cake
    • pluć i łapać spit and catch to be lazy, to do nothing useful
      coś kogoś ani ziębi, ani grzeje something neither cools nor warms someonesomeone is indifferent to something
      badż tak dobry i zrób cośbe so good and do somenthingbe so good as to do something
    • pintar e bordar paint and knit to abuse
    • a tunat și i-a adunatit.has thundered and CL.ACC-it.has gatheredbirds of a feather flock together
      seamănă, dar nu răsaresow.3SG (homonym of resemble), but not sprout.3SGnot to resemble
    • živi in pusti živeti live and let live
    • n.a.
    • to voice act
      to pretty-print
      to short-circuit
      to tumble dry
    • n.a.
    • court-circuiter to short-circuit
    • n.a. there are no cases of compound hyphenated verbs in RO
    • n.a. there are no cases of compound hyphenated verbs in SL

    In case of several lexicalized dependents, special care must be taken to identify and also annotate embedded VMWEs.

    • страхувам се от собствената си сянка to fear SELF from one's own shadowto get easily scared → contains the IRV страхувам се to fear SELFto be afraid
    • einen Plan aufstellen to set up a plan to draw up a plan → contains the VPC aufstellen to set up
    • to let the cat out of the bag → contains the VPC to let out
    • hacerse ilusiones make.self hopes to get your hopes up → contains the IRV hacerse
    • se faire des idées to make SELF ideas to imagine something false → contains the non-VMWEs se faire and faire des idées
    • bać się własnego cienia to fear SELF one's own shadowto be very timid → contains the IRV bać się to fear SELFto be afraid
    • virar-se nos trinta turn-RCLI in-the thirty to get by contains the synonymous IRV virar-se to get by ≠ virar to turn/become
    • a da cărțile pe față to give cards.the on face to reaveal one's true intentions → contains the ID a da pe față to reveal
      a-și da arama pe față to give his/her copper.the on face to reveal his/her true (evil) nature → this is even more complicated since, besides the ID a da pe față, the IRV has to be annotated as well - a three-level embedding
    • delati se norca iz koga to make RCLI fool of someone to make fun of someone → contains the IRV delati se to make oneself to pretend

    Idioms whose head verb is the copula (to be) can pose special challenges because their complements may be (nominal, adjectival, etc.) MWEs themselves. In this task, we consider constructions with a copula to be VMWEs only if the complement does not retain the idiomatic meaning when used without the verb.

    • съм с единия крак в гроба be with one leg in the graveto be close to death idiom because #с единия крак в гроба with one leg in the grave loses the meaning
      съм на червено be on redto be in debt → non-VMWE because the copula can be omitted, as in в края на месеца винаги оставам на червеноat the end of the month I always get into debt
    • sei kein Frosch be no frog be no chicken → idiom because #kein Frosch no frog loses the meaning
    • to be dying for → idiom because #dying loses the meaning of wanting something
      to be somebody → idiom because #somebody loses the meaning of being important or successful
      it is double Dutch to me → non-VMWE because the copula can be omitted, as in he seems to speak double Dutch
    • ser un pelota to be a ball to suck/butter up → idiom because un pelota a ball loses its original meaning
    • być jedną nogą na tamtym świecie to be with one leg in the other worldto be close to death idiom because #jedna noga na tamtym świecie one leg in the other world loses the meaning
      być do rzeczy to be to the thingto be relevant → non-VMWE because the copula can be omitted, as in dał parę argumentów całkiem do rzeczyhe gave a couple of quite relevant arguments
    • ser alguém na vida to be somebody in life to be somebody → idiom because #alguém na vida loses the meaning
      não ser flor que se cheire to not be a flower that one may smell to be an untrustworthy person → idiom because #flor que se cheire loses the meaning
      isso é grego pra mim that's greek to me → non-VMWE because the copula can be omitted, as in você está falando grego
    • a fi ușă de biserică to be door of church to be honest → idiom because #ușă de biserică loses the meaning
      a fi un papă-lapte to be a eat-milk to be a piker → idiom because #un papă-lapte preserves the meaning
    • biti trn v peti komu to be a thorn in somebody's heel to be a big problem, obstacle → idiom because #trn v peti loses the meaning

    Note that special care must be taken in languages in which the copula omission is a regular or even a compulsory phenomenon (e.g. in Russian). In those cases, language-specific tests are required to distinguish a copula-based idiom from a non-verbal MWE.

    Idioms typically have both a literal and an idiomatic reading. Thus, they are closely connected to the phenomenon of a metaphor (see also the section on VMWEs versus metaphors). This often makes them semantically totally non-compositional, i.e. none of their lexicalized components retains any of their original meanings. Some authors argue though that partial semantic compositionality can be obtained via decomposability, e.g. to spill the beans is compositional provided that to spill is paraphrased as to reveal and the beans as a secret

    VID-specific decision tree:

    In this tree, a single YES to one of the tests is sufficient to decide that a candidate is a VID. Note however that this tree is to be applied only after it was referred to by the generic decision tree containing structural tests.
    • Apply test VID.1 (prev. 1) - [CRAN: Candidate contains cranberry word?]
      • It is a VID, exit.
      • Apply test VID.2 (prev. 2) - [LEX: Regular replacement of a component ⇒ unexpected meaning shift?]
        • It is a VID, exit.
        • Apply test VID.3 (prev. 3) - [MORPH: Regular morphological change ⇒ unexpected meaning shift?]
          • It is a VID, exit.
          • Apply test VID.4 (prev. 4) - [MORPHSYNT: Regular morphosyntactic change ⇒ unexpected meaning shift?]
            • It is a VID, exit.
            • Apply test VID.5 (prev. 5) - [SYNT: Regular syntactic change ⇒ unexpected meaning shift?]
              • It is a VID, exit.
              • It is not a VID, exit

    Test VID.1 (prev. 1) - [CRAN] - Cranberry word

    Does the candidate expression contain a cranberry word?

    • it is a VID
      • хващам натясно catch in a tight place to coerce, to pressureнатясно is only used in MWEs
        правя на бъзе и коприва to turn into elder and nettle to scold, to tell off бъзе is an old word, very rarely used independently
        вземам предвид, имам предвид to предвид (as adverb) is only used in MWEs
        стоя диван чапраз to stay upright as in Osman council to stay ready to serve чапраз is an old word, very rarely used independently
      • sich um etw. scharen to gather around something scharen is not a stand-alone word
      • μάλλιασε η γλώσσα μου is-full-of-hair-3SG the-SG.NOM tongue-SG.NOM my-SG.GEN.POSSto repeat the same thing again and again μάλλιασε is not a stand-alone word
      • to go astray astray is not a stand-alone word
      • sin decir ni chus ni mus chus is not a stand-alone word without to_say neither chus nor mus without saying a word
        no decir ni chus ni mus chus is not a stand-alone word not to_say neither chus nor mus not to say a word, to remain silent
        hacer algo a troche y moche troche is not a stand-alone word to do something at troche and dulled to do something in a nonsensical way, willy-nilly, haphazardly
      • txintik ere ez esan 'txint' neither no say not even say a word →the word 'txint' is not used out of this expression
      • prendre la poudre d'escampette to escape escampette is not a stand-alone word
      • mangiare a ufo to eat without paying a ufo is not a stand-alone word
        fare lo gnorri to play dumb gnorri is not a stand-alone word
        scendere in lizza to enter the lists lizza is not a stand-alone word
      • odsądzić kogoś od czci i wiary to refuse honor and faith to someone to drag sb's name through the mire/mud, to damage someone's reputation by saying insulting things about them
        wyjść na jaw to come-out to light to transpire, to become known
      • ir para as cucuias to go wrong cucuias is not a stand-alone word
      • a nu avea habar to have no idea habar is not a stand-alone word
      • biti si kvit to pay up a debt, owe nothing to somebody kvit is not a stand-alone word
      • att komma ihåg to remember ihåg is not a stand-alone word
    • further tests are required
      • правя на сос правя and сос are stand-alone words
      • sich um etw. herum stellen to stand around something → all words are stand-alone words
      • to go away go and away are stand-alone words
      • ir a la universidad to go to university ir, a, la and universidad are stand-alone words
      • unibertsitatera joan university-to go to go to university →both words are stand-alone
      • andare giù to go down andare and giù are stand-alone words
      • wyznać tajemnicę to reveal a secret wyznać and tajemnica are standalone words
      • ir para a escola to go to school ir, para, a and escola are stand-alone words
      • a nu avea idee to have no idea → all words are stand-alone words
      • biti si v sorodu to be related to each other biti si and sorod are stand-alone words
      • att komma på to figure out komma and are stand-alone words

    Test VID.2 (prev. 2) - [LEX] - Lexical inflexibility

    Does a regular replacement of one of the components by related words taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?

    • it is a VID
      • бълвам змии и гущери to spew snakes and lizards#бълвам влечуги (to spew reptiles)
        всяка жаба да си знае гьола every frog to know its own puddle#всяка жаба да си знае локвата
      • die Katze aus dem Sack lassen to let the cat out of the bag#den Hund aus dem Karton lassen #to let the dog out of the box
        eine Entscheidung treffen to meet a decision to make a decision#eine Entscheidung machen/herstellen a decision make/produce #to make/produce a decision
      • to let the cat out of the bag#to allow the feline out of the container
        to go on*to go upon
        to stand firm/fast*to stand hard/rigid/solid
      • meterse en la boca del lobo to_get_into.self in the mouth from_the wolf venture into the lion's den#meterse en el ojo del gato
        tomar una decisiónto_take a decision to make a decision#hacer/coger/producir una decisión to_make/grab/produce a decision #to make/grab/produce a decision
      • erabakia hartu decision take to make a decision →erabakia #sortu/jaso/egin create/receive/do
      • non dire gatto se non ce l'hai nel sacco don't say cat if it is not in the sack don't count on something before it happens#non dire cane se non ce l'hai nel sacco#don't say dog if it is not in the sack
        sputare il rospo spit the toad spit it out#sputare la rana#spit the frog
      • wiedzieć, co w trawie piszczy to know what in grass squeals to be well informed#wiedzieć, co w trawniku popiskuje
        nie wchodzić w rachubę not to come into count to be out of question#wchodzić w liczenie/rachunek
        wodzić kogoś za nos to lead someone by the nose to cheat on someone#wodzić za nozdrza/ucho/wargi
      • quebrar um galho break a branch to help#danificar um ramo to damage a stem
      • a da cu bâta în baltă to give with bat-the in pond to say sth embarrassing*a da cu bățul în baltă to give with stick-the in pond, *a da cu bâta în lac to give with bat-the in lake
      • imeti mačka to have a cat to have a hangover#imeti psa to have a dog
        iti rakom žvižgat to go whistling to crabs to fail, to die#iti jastogom pet to go singing to the lobsters
      • att Plocka russinen ur kakan to pick the raisins out of the cake to choose only the best things#att välja ut nötterna från kakan
    • further tests are required
      • изнасям доклад present a report → изнасям урок/лекция/презентация и т.н.
      • den Bus nehmen to take the bus → den Zug/ das Flugzeug, etc nehmen to take the train/plain/etc
      • to take a plane → to take a bus/car/boat, etc.
      • coger el autobús to_take the busto take the bus → coger el avión/tren, etc. to take the plain/train/etc.
      • autobusa hartu bus take to take the bus → trena/taxia/hegazkina hartu to take a train/taxi/plane
      • prendere il trenoto take the bus → prendere il bus/aereo/etc to take the bus/plain/etc
      • jqum u joqgħod always moving about
      • sprawić kłopot to make a troublesprawić przykrość/trudność/niedogodność/problem/zawikłanie/nieprzyjemnośćto make a(n) nuisance/difficulty/inconvenience/problem/complication
      • quebrar um braço to break an arm → quebrar uma perna/costela/falange to break a leg/rib/phalanx
      • a lua o decizieto take a decision to make a decision → a lua o hotărâre to take a decree to make a decision
      • delati težave to make a troubledelati preglavice/probleme/ to make a(n) nuisance/problem
      • att ta bussen to take the bus → att ta tåget/flyget, etc to take the train/plain/etc

    Usual modifications for [LEX] include replacing content words in the candidate by synonyms, hypernyms, hyponyms, antonyms, troponyms, meronyms, and related words in general.

    Test VID.3 (prev. 3) - [MORPH] - Morphological inflexibility

    Does a regular morphological change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

    • it is a VID
      • хвърлям око throw an eye to throw a glance#хвърлям очи.PLURAL
        хващам бика.DEF за рогата take the bull by the horns#хващам бик.INDEF за рогата
        не мога да си намеря място cannot find a place for myself to be extremely nervous → only exists in negative form
      • ins Gras beißen to bite into the grass to die#in ein Gras beißento bite into a grass #in die Gräser beißen to bite into the grasses, in Kraft treten into force step to come into effect#in Kräfte treten into forces step
      • to kick the bucket#to kick the buckets
        to pretty-print*to prettier-print
        to take turns#to take a turn
      • coger el toro por los cuernos to_take the bull by the horns to take the bull by the horns#coger el toro por el cuernoto_take the bull by the horn #to take the bulls by the horns to_take the bulls by the horns #to take the bulls by the horns
        entrar en vigor to_enter in vigor to come into effect#entrar en vigores to_enter in vigors #to come into effects
      • prendre le taureau par les cornes to_take the bull by the horns#prendre le taureau par une corne to_take the bull by a horn
      • andare a letto con le gallineto go to bed with the hens to go to bed early#andare a letto con la gallina to_go to bed with the hen
        cercare il pelo nell'uovo to look for the hair in the egg to be pedantic #cercare i peli nell'uovo
      • budować zamki na lodzie to build castles on ice to rely on unstable foundations#budować zamek na lodzie to build a castle on ice
        mucha kogoś ugryzła a fly bit someone someone is in a bad temper#mucha kogoś ugryzie a fly will bite someone
        wyciągnąć nogito stretch.PERF legsto die#wiciągać nogi to stretch.IMPERF legs (imperfective aspectual variant prohibited)
      • bater perna hit leg to walk aroundbater a/uma/essas perna/pernas/perninha/pernona to hit the/one/these leg/legs/leg.SMALL/leg.BIG
      • a da colțul to give corner.the to die*a da colţurileto give corners.the
      • klicati jelene to call cerfs to vomit#klicati jelena to call a cerf
      • träda i kraft step in force to come into effect#träda i krafter step into forces
    • further tests are required
      • хвърлям топка to throw a ball → хвърлям топка/топката/топки/топките
      • einen Kuchen backen to bake a cake → viele/keine/den Kuchen backen/machen many/no/the cake bake/make
      • to make a cake → to make a/many/those/no cake/cakes
      • mover el brazo to_move the arm to move the arm → mover/agitar/levantar/estirar el brazo/la pierna/las manos/las piernas to_move/shake/raise/stretch the arm/the leg/the hands/the legs to move/shake/raise/stretch the arm/the leg/the hands/the legs
      • fare un dolce → fare un/molti/dei/quei/nessun dolce/dolci
      • kształtować opinię to form an opinionkształtować opinie to form opinions
      • bater o braço to hit the arm→ bater o/os/um/esse braço/braços/bracinho hit the/the.PL/a/this arm/arms/arm.SMALL
      • a face o prăjiturăto make a cake → a face multe/aceste prăjiturito make many/these cakes
      • vzeti taksi to take a cab → ne vzeti nobenega taksija/en taksi/dva taksija to take no/one/two/… cab(s)
      • att baka en kaka to bake a cake → att baka flera/den där/några/ingen kaka/kakor to bake several/that/some/no cake(s)

    Usual modifications for [MORPH] include inflecting content words in the candidate for gender, number, case, tense, mood, aspect, etc. - depending on the target language's morphology.

    Test VID.4 (prev. 4) - [MORPHSYNT] - Morpho-syntactic inflexibility

    Does a regular morpho-syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

    • it is a VID
      • аз ти давам думата си I give you my word#аз ти давам неговата дума (I give you HIS word)
        аз си продавам душата I sell my soul#аз продавам неговата душа (I sell his soul)
      • Ichwerde mein Bestes tun I will my best do I will do my best*Ich werde dein Bestes tun I will do your best, Ich gebe dir mein Wort I give you my word*Ich gebe dir ihr Wort I give you her word
      • I will do my best*I will do your best
        I give you my word for that → #I give you his word for that
        he was pulling my leg#I was pulling my leg
      • te doy mi palabra to_you give_I my word I give you my word#te doy su palabra to_you give_I his/her word I give you his/her word
      • il vide son sac he empties his bag he reveals his secret thoughts#il vide mon sac he empties my bag
      • Iofarò del mio meglio*Io farò del tuo meglio
        Io ti do la mia parola#Io ti do la sua parola
      • Polish VMWEs do not seem to exhibit this kind of inflexibility
      • ele se suicidou he self.3P.SG suicided*ele me suicidou
        eu perdi meu tempo I wasted my timeeu perdi teu/seu/nosso tempo English allows this, Portuguese doesn't. We say I made you waste your time instead.
      • Îți dau cuvântul meu CL.DAT give.1SG word.the my I give you my word#Îți dau cuvântul luiCL.DAT give.1SG word.the his I give you his word
      • Vlečeš me za nos you are pulling my nose you're pulling my leg *Vlečeš se za nos you're pulling your nose
        Pojdi se solit! to go salt oneself Get lost!*Pojdi ga solit go salt him
      • Jag gör mitt bästa I do my best I do my best*Jag gör ditt bästa I do your best
    • further tests are required
      • копая си гроба to dig my graveкопая ти/му/й/им гроба (to dig your/his/her/their grave)
      • er traf seine Entscheidung he made his decision → er traf meine/ihre/unsere/eure Entscheidung he made my/her/our/your decision
      • he did his job → he did my/her/our/your job
      • Ha hecho su trabajo Has_he/she done his/her work He/She has done his/her workHa hecho mi/tu/nuestro trabajo Has_he/she done my/your/our work He/She has done my/your/our work
      • hafatto il suo lavoro → ha fatto il mio/tuo/nostro/vostro/loro lavoro
      • Polish VMWEs do not seem to exhibit this kind of inflexibility
      • Eu fiz meu trabalho I did my job → Tu/ele/nós fizeste/fez/fizemos meu trabalho You/he/we made my job
      • el își face tema he his does homework.the he does his homework → el îmi/ne/le face tema he my/our/their does homework.the he does my/our/their homework
      • opravil je svojo nalogo he did his jobopravil je mojo/njeno/našo/tvojo nalogo he did my/her/our/your job
      • han gör sitt jobb he does his job → han gör mitt/hennes/vårt jobb he does my/her/our job

    Usual modifications for [MORPHSYNT] involve agreement or loss of agreement between some components in the candidate.

    Test VID.5 (prev. 5) - [SYNT] - Syntactic inflexibility

    Does a regular syntactic change that would normally be allowed by general grammar rules lead to ungrammaticality or to an unexpected change in meaning?

    • it is a VID
      • на стар краставичар краставици продавам to an old cucumber seller cucumbers to sell to try to cheat a more experienced person#продавам краставици на стар краставичар, #краставиците са продадени
        бълвам змии и гущери#бълвам гущери и змии
      • Noun phrase (NP) or prepositional phrase (PP)
      • speak of the devil the person one is talking about shows up#he was speaking of the devil
        to go bananas to get crazy#bananas are gone
        to drink and drive#drive and drink
        to kick the bucket#the bucket was kicked
      • coser y cantar to_sew and to_sing easy as pie, a piece of cake#cantar y coser to sing and to sew
        perder la cabeza to_loose the head to go bananas#perder las cabezas to_loose the heads
      • alzare la cresta to lift the crest become cocky#la cresta è stata alzata the crest has been lifted
        andare in malora go to ruin go to ruin #nella malora è andata in ruin was gone
        vivi e lascia vivere live and let live#lascia vivere e vivi let live and live
      • kogoś krew zalewa blood foods someone someone gets furious#ktoś jest zalewany przez krew someone is flooded by blood (passive blocked)
        robić bokami to do with-sidesto have serious financial problems→#robić swoją robotę bokami to do one's job with sides (regular modification blocked)
        dobrze komuś z oczu patrzy well someone.DAT from eyes lookssomeone looks like a good person#uprzejmość dobrze komuś z oczu patrzy kindness well someone.DAT from eyes looks (subject prohibited)
        nie zagrzać miejsca w pracy not to warm a place at worknot to stay long at one work #zagrzać miejsce w pracy to warm a place at work (negation is compulsory)
        zdechł pies! died the dog!it is a lost cause#pies zdechł the dog died (a regular word order variability is blocked)
        wziąć w łebto take into headto fail #wziąć porażkę w łeb to take failure into head(direct object prohibited for the normally transitive verb wziąćto take)
      • pisar na bola step on the ball make a mistake#a bola na qual ele pisou the ball on which he stepped
      • a da colțul to give corner.the to die*colțul a fost dat corner.the has been given
      • delati se Francoza to pretend to be French to pretend to be indifferent*delan Francoz made French
      • det knallar och går it trots and walks it is OK/as usual#det går och knallar
    • further tests are required
      • продавам неговата кола I sell his car → колата му беше продадена (his car was sold), неговата кола, която тя продаде (his car which she sold), т.н.
      • jemandes Auto waschen to wash one's car → ihr Auto wurde gewaschen her car was washed, das Auto, welches sie wusch the car that she washed, Autowaschen car-washing, etc
      • to wash one's car → her car was washed, the car that she washed, car washing, etc.
      • pisar la arena to step on the sand → la arena que pisaste The sand on which you stepped
      • lavare la macchina →la sua macchina è stata lavata, la macchina che ha lavato, il lavaggio della macchina, etc.
      • kształtować opinię to form an opinion opinia jest kształtowana the opinion is formed
      • pisar na areia to step on the sand → a areia na qual você pisou the sand on which you stepped
        jogar futebol to play football → ?futebol é jogado football is played One may argue that this is a VMWE because passive sounds strange. However, we assume that this sense of jogar does not accept passive. Since this construction is very productive, we do not annotate it as VMWE.
      • a spăla maşinato wash the car→ maşina a fost spălată, maşina pe care a spălat-o, spălarea maşinii etc.the car was washed, the car that he/she washed, car washing
      • narediti film to make a movie → Film, narejen po knjigi a movie based on a book
      • att tvätta bilen to wash one's carmin bil tvättades my car was washed, bilen som hon tvättade the car that she washed, biltvätt car-wash etc.

    Section 5.4

    Inherently reflexive verbs (IRV)

    Reflexive clitics (RCLI) are clitic pronouns that refer to the subject of the verb, like oneself in English. They are very common in many languages and play several semantic roles depending on the context, as detailed below.

    Reflexive verbs (REFLV), sometimes also called pronominal verbs, are formed by a full verb combined with a RCLI, although the clitic does not always have a reflexive meaning. REFLV can be categorized into different classes, some of which should be annotated as verbal MWEs.

    Namely, we will only annotate a REFLV as an inherently reflexive verb (IRV) when (a) it never occurs without the clitic, or (b) the REFLV and non-reflexive versions have clearly different senses or subcategorization frames. Inherently reflexive verbs constitute a quasi-universal category.

    IReflVs are a difficult category to annotate due to various problematic cases. Note in particular that in some languages, e.g. Slavic, the reflexive clitics inflect and should be considered not only in their most frequent case, i.e. accusative.

    We start by listing the various categories of REFLV before providing tests to decide whether to annotate a given occurrence as IRV.

    • Inherently reflexive ⇒ ANNOTATE as IRV
      • The verb without the RCLI does not exist
        • усмихвам се to smile, страхувам се to be afraid
        • stydět se to be ashamed, divit se to wonder
        • sich schämen to be ashamed, sich wundern to wonder
        • suicidarse to suicide, abstenerse to abstain
        • n.a.
        • s'évanouir to faint, se suicider to suicide
        • suicidarsi to suicide, arrabbiarsi to get angry
        • dowiedzieć się to find out, bać się to be afraid
        • queixar-se to complain, abster-se to abstain
        • a se teme to be afraid with obligatory ACC reflexive clitic
          a își însuși to appropriate with obligatory DAT reflexive clitic
        • sramovati se to be ashamed, bati se to be afraid
        • att försova sig to sleep in
          att gifta sig to get married
      • The verb without the RCLI does exist, but has a very different meaning
        • смея ≠ смея се to dare ≠ to smile, намирам ≠ намирам се to find ≠ to be situated
        • sich enthalten ≠ enthalten to abstain ≠ to contain, sich (um etw.) handeln ≠ handeln to be ≠ to handle
        • to find oneself in a difficult situation
          to to help oneself to the cookies
        • recoger ≠ recogerse to gather ≠ to go home, empeñar ≠ empeñarse to pawn ≠ to insist
        • n.a.
        • s'apercevoir ≠ apercevoir to realize ≠ to see, s'agir ≠ agir to be ≠ to act
        • riferire ≠ riferirsi to report, tell ≠ to refer
        • znajdować ≠ znajdować się to find ≠ to be, radzić ≠ radzić sobie to advise ≠ to manage
        • encontrar-se ≠ encontrar to be ≠ to meet, referir-se ≠ referir to concern ≠ to refer
        • a se îndura ≠ a îndura to have the heart ≠ to suffer
          a se face≠ a face to become ≠ to make even if it is inchoative (Dindelegan 2013: 79) a se face (=to become) is IRV (it passes Test15)
        • dati se it is possible (to do something) ≠ dati to give, dobiti se to meet ≠ dobiti to get
        • att känna sig ledsen/arg to feel sad/angry ≠ to touch
    • Reciprocal ⇒ NOT ANNOTATED
      • The RCLI has a sense of mutually:
        • целувам се to kiss each other, срещам се to meet each other
        • líbat se to kiss each other, potkávat se to meet each other
        • sich küssen to kiss each other, sich treffen to meet each other
        • besarse to kiss each other, verse to see each other
        • n.a.
        • s'embrasser to kiss each other, se rencontrer to meet each other
        • baciarsi to kiss each other
        • całować się to kiss each other, spotykać się to meet each other
        • cumprimentar-se to greet each other, ver-se to see each other
        • a se saluta to greet each other
        • poljubljati se to kiss each other, srečati se to meet each other
    • Reflexive ⇒ NOT ANNOTATED
      • The RCLI marks the reflexive or reciprocal construction, that is, the clitic plays the role of self in English
        • мия се to wash oneself, реша се to combe oneself
        • mýt se to wash oneself, drbat se to scratch oneself
        • sich waschen to wash oneself, sich kratzen to scratch oneself
        • mirarse to look at oneself, vestirse to dress oneself
        • n.a.
        • se laver to wash oneself, se parler to talk to oneself
        • lavarsi to wash oneself, vestirsi to dress oneself
        • myć się to wash oneself, drapać się po głowie to scratch oneself on the head
        • apressar-se to hurry oneself, vestir-se to dress oneself
        • a se spăla to wash oneself
        • umivati se to wash oneself, praskati se to scratch oneself
        • att tvätta sig to wash oneself
    • Body part, also called possessive reflexive ⇒ NOT ANNOTATED
      • Specific type of reflexive use in which the direct object is a body part or, more generally, an inalienable part of the subject
        • мия си ръцете wash REFL.POSSESSIVE hands wash one's hands
        • mýt si nohy wash RCLI.DAT the feet wash one's feet
        • sich das Bein brechen RCLI the leg break break one's leg
        • rascarse el brazo scratch.RCLI the arm scratch one's arm
        • n.a.
        • se gratter la tête RCLI scratch the head scratch one's head
        • grattarsi la testa RCLI scratch the head scratch one's head
        • myć sobie nogi wash RCLI.DAT the feet wash one's feet
        • impossible, uses possessive instead
        • a-şi rupe mâna RCLI.DAT break arm break one's arm
        • umivati noge wash RCLI.DAT the feet wash one's feet, zlomiti roko RCLI.DAT break arm break one's arm
    • Middle with preverbal subject, also called synthetic passive ⇒ NOT ANNOTATED
      • The clitic marks a regular syntactic alternation for transitive verbs. Just like in regular passive alternation, the direct object of the transitive version appears as the subject of the REFLV version, and thus the verb agrees with the subject.
      • Differently from inchoative (see below), the subject of the transitive version is absent in the REFLV version but it exists necessarily, though it is underspecified
        • книги се пишат трудно books write.PL RCLI difficult it is difficult to write books
        • die Häuser verkaufen sich gut the houses sell RCLI well the houses sell well
        • las casas se venden bien the houses RCLI sell well the houses sell well
        • n.a.
        • les pots se vendent bien the pots RCLI sell well the pots sell well
        • le case si affittano the houses RCLI rent the houses are rented
        • domy dobrze się sprzedają houses sell.PL RCLI well houses sell well
        • as casas se vendem bem the houses RCLI sell well the houses sell well
        • casele se vând bine houses-the RCLI sell well houses sell well
        • hiše se dobro prodajajo the houses sell RCLI well the houses sell well
    • Middle with postverbal subject, also called synthetic passive ⇒ NOT ANNOTATED
      • In some languages, middle alternation with preverbal subject sounds unnatural and middle alternation with postverbal subject is preferred. Depending on the languages, it is viewed as a postverbal subject (ES, PL, PT, RO) or as an object which agrees with the unaccusative verb form (IT). Middle alternation with postverbal subject is impossible in FR and DE.
        • трудно се пишат книги difficult RCLI write.PL books it is difficult to write books
        • se alquilan casas RCLI rent houses people rent houses
        • n.a.
        • si affittano case RCLI rent houses people rent houses
        • dobrze sprzedają się te domy well sell RCLI these houses these houses sell well Polish is a relatively free word-order language and a postverbal subject is a regular (even if stylistically marked) alternation.
        • alugam-se casas rent-RCLI houses people rent houses
        • se vând bine apartamentele din blocurile noi RCLI sell well apartments-the from blocks-the new Apartments from new blocks sell well
          se construiesc locuințe noi RCLI built houses new new houses are built
        • nove hiše se gradijo new houses RCLI built new houses are built
    • Impersonal ⇒ NOT ANNOTATED
      • The RCLI marks an impersonal verb alternation possible for various transitivity classes, depending on the language: only transitive verbs (FR), only intransitive verbs with manner adjuncts (DE), preferably intransitive but tolerated for transitive verbs (PT), either transitive or intransitive verbs (IT, ES, RO, PL)
      • There is no noun phrase before the verb (empty subject slot), the presence of the RCLI indicates a verb interpreted with a generic and underspecified subject
      • The verb is in third person singular, even when the object is plural
        • не се вечеря късно not RCLI have dinner late it is not good to have dinner late
        • hier tanzt es sich gut here dances it RCLI well people dance well here
        • se busca a actores RCLI searches to actors people look for actors
          se trabaja mejor aquí RCLI works better here people work better here
        • n.a.
        • il se dit des bêtises it RCLI says silly things people say silly things
        • si lavora troppo RCLI works too much people work too much
          si affitta molte case RCLI rents many houses people rent many houses
        • za dużo się pracuje too much RCLI works people work too much
          bzdury się opowiada nonsense RCLI tells people tell nonsense
        • dorme-se muito sleeps-RCLI much people sleep a lot
          conta-se histórias tells-RCLI stories people tell stories Transitive impersonal is considered wrong by traditional grammar but it is found in corpora.
        • se lucrează până târziu RCLI works until late people work until late transitive verbs can be impersonal in RO only when they are null-object verbs (se lucrează până târziu - *este lucrat până târziu) or when their subject is realized by a clause headed by a complementizer Dindelegan 2013: 174
          se suferă din cauza sărăciei RCLI suffer because of poverty one suffers because of poverty RO impersonal reflexive verbs are mostly intransitive Dindelegan 2013: 173
          se aleargă dimineața RCLI run in the morning people run in the morning
        • govori se/govorijo se neumnosti it says/they say RCLI silly things people say silly things
    • Inchoative ⇒ NOT ANNOTATED
      • Similar to middle, but the RCLI marks a less productive syntactic alternation:
        • the direct object of the transitive version appears as subject of the REFLV
        • the subject of the transitive version is not only absent, it is also semantically unclear or nonexistent
          • вратата се отваря the door opens
          • dveře se otvírají the door opens
          • die Tür öffnet sich the door opens
          • la puerta se abrió the door opened
          • n.a.
          • la porte s'est subitement ouverte the door suddenly opened
          • la porta si apre the door opens
          • drzwi się otwierają the door opens
          • o vaso se quebrou the vase broke
          • mașina s-a stricat the car broke down
            ușa s-a deschis the door opened
          • vrata se odpirajo the door opens
          • dörren öppnar sig the door opens

    IRV-specific decision tree

    • Apply test IRV.1 - [INHERENT]
      • Annotate as IRV
      • Apply test IRV.2 - [DIFF-SENSE]
        • Annotate as IRV
        • Apply test IRV.3 - [DIFF-SUBCAT]
          • Annotate as IRV
            • verb has no subject ⇒ Apply test IRV.4 - [IMPERS]
              • It is not a VMWE, exit
              • Annotate as IRV
            • verb has a subject ⇒ Apply test IRV.5 - [MIDDLE-INCHO]
              • It is not a VMWE, exit
              • Apply test IRV.6 - [REFL]
                • It is not a VMWE, exit
                  • subject is SINGULAR ⇒ Apply test IRV.7 - [REFL-MUTUAL]
                    • It is not a VMWE, exit
                    • Annotate as IRV
                  • subject is PLURAL ⇒ Apply test IRV.8 - [RECIPRO]
                    • It is not a VMWE, exit
                    • Annotate as IRV

    Test IRV.1 (prev. 14) - [INHERENT] Inherent clitic

    Does the verb only exist with the RCLI and never occurs without it?

    • annotate as IRV
      • страхувам се ⇒ *страхувам to be afraid
        усмихвам се ⇒ *усмихвам to smile
      • sich schämen ⇒ *schämen to be ashamed
        sich wundern ⇒ *wundern to wonder
      • suicidarse ⇒ *suicidar to suicide
        abstenerse ⇒ *abstener to abstain
      • n.a.
      • s'évanouir ⇒ *évanouir to faint
        se suicider ⇒ *suicider to suicide
      • suicidarsi ⇒ *suicidare to suicide
      • dowiedzieć się ⇒ *dowiedzieć to find out
        bać się ⇒ *bać to be afraid
        wydarzyć się ⇒ *wydarzyć to happen
      • queixar-se ⇒ *queixar to complain
        abster-se ⇒ *abster to abstain
      • a se teme ⇒ *a teme to be afraid
        a își însuși ⇒ *a însuși to appropriate
      • sramovati se ⇒ *sramovati to be ashamed
        čuditi se ⇒ *čuditi to wonder
    • next test

    Test IRV.2 (prev. 15) - [DIFF-SENSE] - Different sense

    Given the same verb without the RCLI, are all of its meanings clearly different from the REFLV form?

    • annotate as IRV
      • намирам се ≠ намирам to be situated ≠ to find
        радвам се≠ радвам to feel happy ≠ to make happy
      • sich verstehen ≠ verstehen to get along well ≠ to understand
      • to find oneself in a difficult situation
        to to help oneself to the cookies
      • recogerse ≠ recoger to go home ≠ to pick up, to gather
      • n.a.
      • s'apercevoir ≠ apercevoir to realize ≠ to see
        s'agir ≠ agir to be ≠ to act
      • riferirsi ≠ riferire to refer ≠ to report, to tell
      • znajdować się ≠ znajdować to find oneself ≠ to be
        sprawdzić się≠ sprawdzić to prove appropriate ≠ to check
        wybrać się≠ wybrać to go ≠ to choose
      • encontrar-se ≠ encontrar to be ≠ to meet
        referir-se ≠ referir to concern ≠ to refer
      • a se îndura ≠ a îndura to have the heart to ≠ to suffer
      • razumeti se ≠ razumeti to get along well ≠ to understand
    • next test

    Test IRV.3 (prev. 16) - [DIFF-SUBCAT] - Different subcategorization frame

    Is the subcategorization frame of the simple verb without the RCLI different from the subcategorization frame of the REFLV, except for the addition of a direct or indirect object corresponding to the same syntactic argument as the RCLI in the REFLV version?

    • annotate as IRV
      • X verliert sich in Y ⇔ X verliert Y X looses RCLI in Y ⇔ X looses Y
      • X se olvidó de Y ⇔ X olvidó Y X RCLI forgot of Y ⇔ X forgot Y
      • n.a.
      • X se confesse de Y ⇔ X confesse Y (but *X confesse de Y) X RCLI confesses of Y ⇔ X confesses Y (but not *X confesses of Y)
        X se plaint de Z ⇒ *Y plaint (à) X de Z X RCLI complains of Z ⇒ *Y complains (to) X of Z → the verb without RCLI, plus direct or indirect object. does not subcategorize for the PP with preposition de
        X se refuse à Vinf ⇒ *Y refuse (à) X à Vinf X RCLI refuses to Vinf ⇒ *Y refuses (to) X to Vinf
      • X si è dimenticato di Y ⇔ X ha dimenticato Y X RCLI forgot of Y ⇔ X forgot Y
      • X tłumaczy się z Y ⇔ X tłumaczy Y X explains SELF of Y ⇔ X explains Y
        X dziwi się Y.dat ⇔ Y dziwi X ⇔ Z dziwi X Y.inst X surprises SELF Y.dat ⇔ Y surprises X ⇔ Z surprises X Z.inst
      • X se esqueceu de Y ⇔ X esqueceu Y X RCLI forgot of Y ⇔ X forgot Y
      • X se gândeşte la Y ⇔ X gândeşte că Y X RCLI thinks of Y ⇔ X thinks that Y
    • next test

    Test IRV.4 (prev. 17) - [IMPERS] - Impersonal

    When you replace the RCLI by an underspecified subject such as one or people, does the sentence keep its meaning?

    • do NOT annotate as verbal MWE
      • не се вечеря късно ⇔ хората не вечерят късно not RCLI have dinner late it is not good to have dinner late
      • hier tanzt es sich gut ⇔ hier tanzen die Leute gut people dance well here
      • se duerme mucho ⇔ las personas duermen mucho people sleep a lot
        se busca a actores ⇔ la gente busca a actores people look for actors
      • n.a.
      • il se dit des bêtises ⇔ les personnes disent des bêtises people say silly things
      • si dorme molto ⇔ le persone dormono molto people sleep a lot
        si affitta molte case ⇔ le persone affittano molte case people rent many houses
      • pracuje się za dużo ⇔ ludzie pracują za dużo people work too much
        opowiada się bzdury ⇔ ludzie opowiadają bzdury people tell nonsense
      • dorme-se muito ⇔ as pessoas dormem muito people sleep a lot
        conta-se histórias ⇔ as pessoas contam histórias people tell stories
      • se lucrează până târziu ⇔ lumea lucrează până târziu people work until late
        se aleargă dimineața ⇔ lumea aleargă dimineața people run in the morning
      • govorijo se neumnosti ⇔ ljudje govorijo neumnosti people tell nonsense
    • annotate as IRV

    Test IRV.5 (prev. 18) - [MIDDLE-INCHO] - Middle or Inchoative

    When you move the subject to the object position, remove the RCLI and add a generic subject (people, somebody), thus building a transitive version, does it imply the REFLV version? In other words, people/somebody V [to] X ⇒ X REFLV?

    • do NOT annotate as verbal MWE
      • някой отваря вратата ⇒ вратата се отваря somebody opens the door ⇒ the door opens
      • man kann die Häuser gut verkaufen ⇒ die Häuser verkaufen sich gut people can sell the houses well ⇒ the houses sell well
        jemand öffnet die Tür ⇒ die Tür öffnet sich somebody opens the door ⇒ the door opens
      • la gente cuenta historias ⇒ se cuentan historias people tell stories ⇒ stories are told
        alguien abrió la puerta ⇒ la puerta se abrió somebody opened the door ⇒ the door opened
      • n.a.
      • on vend bien ce produit ⇒ ce produit se vend bien people sell this product well ⇒ this product sells well
        quelqu'un ouvre la porte ⇒ la porte s'ouvre, somebody opens the door ⇒ the door opens
      • qualcuno vende bene questo prodotto ⇒ questo prodotto si vende bene someone people sells this product well ⇒ this product sells well
        qualcuno apre la porta ⇒ la porta si apre somebody opens the door ⇒ the door opens
      • ktoś sprzedaje te domy ⇒ te domy się sprzedają somebody sells these houses ⇒ these houses sell well
        ktoś otwiera drzwi ⇒ drzwi się otwierają somebody opens the door ⇒ the door opens
        ktoś nasila skargi ⇒ skargi nasilają się somebody increases complaints ⇒ complaints increase
        ktoś rozgrywa mecz ⇒ mecz rozgrywa się somebody plays a game ⇒ the game plays
      • alguém conta histórias ⇒ contam-se histórias somebody tells stories ⇒ tell.PL-RCLI stories somebody tells stories ⇒ stories are told
        alguém acalmou o menino ⇒ o menino se acalmou somebody calmed the boy ⇒ the boy RCLI calmedsomebody calmed the boy down ⇒ the boy calmed down
        o juiz casou João com Maria ⇒ João se casou com Maria the judge married João with Maria ⇒ João RCLI married with Maria the judge married João with Maria ⇒ João got married to Maria
        o juiz casou Maria e João ⇒ Maria e João se casaram the judge married Maria and João ⇒ Maria and João RCLI married the judge married Maria and João ⇒ Maria and João got married
        alguém lembrou João do meu aniversário ⇒ João se lembrou do meu aniversário somebody reminded João of my birthday ⇒ João RCLI reminded of my birthday somebody reminded João of my birthday ⇒ João remembered my birthday
      • cineva spune glume ⇒ se spun glume somebody tells jokes ⇒ jokes are told
        cineva a deschis ușa ⇒ ușa s-a deschis somebody opened the door ⇒ the door opened
      • nekdo pripoveduje šale ⇒ šale se pripovedujejo somebody tells jokes ⇒ jokes are told
        nekdo je odprl vrata ⇒ vrata so se odprla somebody opened the door ⇒ the door opened
    • next test

    Test IRV.6 (prev. 19) - [REFL] - Reflexive

    When you replace the RCLI by oneself only or to oneself only, does it imply the REFLV version? In other words, X V [to] himself only ⇒ X REFLV?

    • do NOT annotate as verbal MWE
      • Павел лекува себе си ⇒ Павел се лекува Pavel heals himself
      • Paul kratzt nur sich selbst ⇒ Paul kratzt sich Paul scratches himself
      • Paul washes only himself ⇒ Paul washes himself
      • Pablo se lava a sí mismo ⇒ Pablo se lava Paul washes himself
      • n.a.
      • Paul ne soigne que lui-même ⇒ Paul se soigne Paul heals himself
        Paul ne parle qu'à lui-même ⇒ Paul se parle Paul talks to himself
      • Paolo cura solo se stesso ⇒ Paolo si cura Paul heals himself
        Paolo parla solo a se stesso ⇒ Paolo si parla Paul talks to himself
      • Paweł leczy tylko siebie ⇒ Paweł leczy się Paul heals himself
        Paweł bogacie tylko siebie ⇒ Paweł bogaci się Paul enriches himself Paul gets rich
      • Paulo só lava a si mesmo ⇒ Paulo se lava Paul washes himself
      • Paul se spală doar pe sine ⇒ Paul se spală. Paul washes himself
      • Pavel praska sam sebe ⇒ Pavel se praska Paul scratches himself
    • next test

    Test IRV.7 (prev. 20) - [REFL-MUTUAL] - Reflexive-mutual

    Is a reciprocal version possible? Namely: Is it acceptable to replace the singular subject by a plural and add each other to the REFLV form without changing the REFLV's meaning?

    • do NOT annotate as verbal MWE The test applies only if test 15 has failed. For example, for "X se marie" 'X gets married' in French, it is odd though possible to say 'X and Y marry each other', but this does not mean 'X gets married', because it is only possible if X and Y are marriage officiants
      • Павел се мие ⇔ те се мият един друг they wash each other
      • Paul wäscht sich ⇔ Sie waschen sich gegenseitig / einander they wash each other
      • Pablo se lava ⇔ ellos se lavan mutuamente / los unos a los otros they wash each other
      • n.a.
      • Paul se lave ⇔ ils se lavent mutuellement / les uns les autres they wash each other
      • Paolo si lava ⇔ essi si lavano reciprocamente / l'un l'altro they wash each other
      • Paweł się myje ⇔ oni myją się nawzajem they wash each other
      • Paulo se lava ⇔ eles se lavam mutuamente / uns aos outros they wash each other
      • el se spală ⇔ ei se spală unul pe altul they wash each other
      • Pavel se umiva ⇔ umivajo drug drugega they wash each other
    • annotate as IRV

    Test IRV.8 (prev. 21) - [RECIPRO] - Reciprocal

    Is it possible to remove the RCLI and replace the coordinated subject (A and B) or plural subject (A.PL) by a singular subject (A or A.PL) and a singular object, often introduced by to/with (B or A.PL), without changing the REFLV's meaning? That is:

    • Coordinated subject: A and B PronV ⇔ A V [to/with] B and B V [to/with] A?
    • Plural subject: A.PL PronV ⇔ A.PL V [to/with] A.PL?
    • do NOT annotate as verbal MWE
      • Павел и Елена се целуват ⇔ Павел целува Елена и Елена целува Павел Pavel and Elena kiss
      • Paul und Anna umarmen sich ⇔ Paul umarmt Anna and Anna umarmt Paul Paul and Anna hug each other
        die Affen kratzen sich ⇔ die Affen kratzen die Affen the monkeys scratch each other
      • Pablo y Ana se abrazan ⇔ Pablo abraza a Ana and Ana abraza a Pablo Paul and Ann hug each other
        los niños se abrazan ⇔ los niños abrazan a los niños the children hug each other
      • n.a.
      • Paul et Anne s'embrassent ⇔ Paul embrasse Anne and Anne embrasse Paul Paul and Ann kiss
        les jours se suivent ⇔ les jours suivent les jours the days follow each other
      • Giovanni e Anna si baciano ⇔ Giovanni bacia Anna and Anna bacia Giovanni John and Ann kiss
        i giorni si seguono ⇔ i giorni seguono i giorni i giorni seguono l'un l'altro
      • Paweł i Elena się całują ⇔ Paweł całuje Elenę and Elenę całuje Paweł Paweł and Elena kiss
      • João e Ana se beijam ⇔ João beija Ana and Ana beija João John and Ann kiss
        os presos se agridem ⇔ os presos agridem os presos the prisoners aggress each other
      • Ion şi George se salută ⇔ Ion îl salută pe George and George îl salută pe Ion Ion and George greet each other
        participanții se salută ⇔ participanții îi salută pe participanți the participants greet each other
      • Pavel in Ana se objemata ⇔ Pavel objema Ano in Ana objema Pavla Paul and Anna hug each other
    • annotate as IRV

    Problematic cases and remarks

    Polysemy

    Keep in mind that both simple and reflexive verbs can have several senses. In test 15, we ask that ALL senses you can think of are different from the REFLV form in the given context. For example, French verb trouver can mean to find something, to have an opinion about something, discover something, etc. But it has a totally different and unrelated meaning of to be (located at) in the sentence L'église se trouve à Paris the church is located in Paris . It should thus be annotated as a MWE. As the REFLV is polysemous itself, it should NOT be annotated as IRV in sentences like Elle se trouve grosse she finds herself fat where it means have an opinion about (herself), equivalent to the non-reflexive version.

    Clitics position and concatenation

    In some languages the clitics are joint with the verb, sometimes using a hyphen but not always. When there is no hyphen, the REFLV will probably be tokenized as a single token in the corpus.

    • In French, orthography and pronunciation rules require the clitic to be concatenated with the verb and its last vowel to be replaced by an apostrophe (liaison):
      • s'abstenir to abstain
    • In Spanish and Italian, the clitic can appear concatenated after the verb in some verbal forms (e.g. infinitives, gerunds):
      • enamorarse to fall in love
      • alzarsi to get up
    • In Portuguese, there are always hyphens for postponed clitics (enclisis), but in conditional tense the clitic is in the middle of the verb (mesoclisis), separating the root from the suffix:
      • queixar-se-ia would complain
    • In Romanian the clitic and the verb are either separate or have a hyphen between them:
      • se aude un clopot RCLI hears a bell a bell is heard
        s-aude un clopot RCLI-hears a bell a bell is heard

    The current annotation format allows annotating a single token as a MWE if it is a multiword token. Therefore, it should be annotated as an MWE.

    Overlap VID - IRV

    Some idiomatic constructions include reflexive clitics. Two cases are possible:

    • If a syntactically comparable literal construction is impossible or the REFLV would not be annotated in syntactically comparable literal constructions, annotate only the VID:
      • пилците се броят наесен chicken REFL are counted in the autumn the true results can be seen only at the endкокошките се броят the hens REFL counted
      • sich über etwas im Klaren sein dass S RCLI about s.th. in.the clear be to be aware of s.th./that S ⇒ *sich in N sein, dass for any noun N
      • darse cuenta de to realize ⇒ *darse N de for any noun N
        meterse en líos to get in troubleREFLV not annotated in literal equivalents like meterse en una tienda to get in a store
      • n.a.
      • se rendre compte de to realize ⇒ *se rendre N de for any noun N
        s'arracher les cheveux RCLI tear the hair worryREFLV not annotated in literal equivalents like s'arracher un ongle to tear oneself's nail
      • rendersi conto di to realize ⇒ *si rende N di for any noun N
        si strappa i capelli RCLI tear the hair to worryREFLV not annotated in literal equivalents like strapparsi un unghia to tear oneself's nail
      • zdawać sobie sprawę z to realize ⇒ *zdawać sobie N z for any noun N
      • dar-se mal to faildar-se ADV intransitive is acceptable only for antonym bem well
        meter-se numa fria to get-RCLI in a cold to get in troubleREFLV not annotated in literal equivalent like meter-se numa cabine to get into a cabin
      • a-și smulge părul din cap
      • puliti si lase tear RCLI the hair to worryREFLV not annotated in literal equivalents like puliti si obrvi to pluck one's eyebrows
    • If the REFLV would be annotated as IRV in syntactically comparable literal constructions, annotate both the IRV and the VID as embedded MWEs (rare):
      • смея се през сълзи laugh REFL through tears to laugh bitterly
      • n.a.
      • rozlatywać się w proch scatter itself into dust disappear
      • virar-se nos trinta turn-RCLI in-the thirty contains virar-se to get by ≠ virar to turn/become
      • a i se face rău to CL.DAT RCLI.ACC make ill to feel sick this is a case when both a non-reflexive, dative clitic and a RCLI.ACC appear in the structure; the REFLV is annotated as IRV; both the IRV and the ID are annotated as embedded MWEs; note that the non-reflexive clitic is also considered as part of a VID (6.4_R)
        a se duce pe apa sâmbetei RCLI go on water-the Saturday-of to get lost the REFLV is annotated in literal equivalent a se duce pe apa Bistriței he goes on the river Bistriţathere is a notable difference in meaning betwee the non-REFLV a duceto take and the REFLV a se duce to go
      • režati se kot pečen maček to laugh RCLI like a baked tomcat to laugh loudly režati se is IRV
    Overlap LVC - IRV

    It is rare, although possible, to find light verb constructions in which a reflexive clitic changes the original meaning significantly, thus characterizing an IRV:

    • Fragen stellen to ask questionssich Fragen stellen to doubt/hesitate
    • hacer preguntas to ask questionshacerse preguntas to doubt/hesitate
    • n.a.
    • poser des questions to ask questionsse poser des questions to doubt/hesitate
    • no examples found for RO

    In this case, the whole construction, including the verb, the noun and the reflexive clitic, must be annotated as VID, since there are two syntactic arguments:

    • sich Fragen stellen to doubt/hesitate
    • hacerse preguntas to doubt/hesitate
    • n.a.
    • se poser des questions
    • no examples found for RO

    Notice that annotating only the verb and the RCLI as IRV would be wrong, since it will have a completely different meaning without the noun, sometimes even coinciding with another IRV:

    • sich stellen to surrender
    • hacerse get used to
    • n.a.
    • se poser to sit/lay down
    Dative clitics and double clitics

    In some languages, e.g. Polish, clitics inflect for case. Most cases of IRV seem to be restricted to the accusative case:

    • страхувам се to be afraid
    • bát se to be afraid
    • n.a.
    • bać się to be afraid
    • a se sinchisito RCLI.ACC care to care
      a se sfiito RCLI.ACC be.shy to be shy
      a se căito RCLI.ACC repent to repent
    • bati se to be afraid

    However, other cases can appear in IRV:

    • отивам си to go oneself.DAT to go away
    • poradit si to advise oneself.DAT to manage
    • n.a.
    • radzić sobie to advise oneself.DAT to manage
    • a-și însuși to-RCLI.DAT appropriateto appropriate - with a Dative clitic
      a-și apropriato-RCLI.DAT appropriateto appropriate - with a Dative clitic
    • drzniti si to dare oneself.DAT to dare

    Some expressions can have double clitics. Only the first two words belong to the IRV:

    • надсмивам се над себе си to laugh RCLI.acc at RCLI.DAT to laugh at myself
    • n.a.
    • przyglądać się sobie to observe RCLI.acc RCLI.DAT to observe each other
      radzić sobie z sobą to advise RCLI.DAT with RCLI.INST to manage with oneself
    • n.a.
    • nasmehniti se sebi to smile at oneself
    Non-reflexive clitics

    This category does not cover other types of pronouns and clitics. They are covered by regular VID tests and should be annotated as such. Examples of constructions that should be annotated as VID rather than IRV include:

    • es gibt it gives there is
    • n.a.
    • l'emporter to take it away to win
      s'en aller to self from-it go to leave
      en avoir marre to have from-it enough to be fed up
      il y avoir it at-it haveto exist
    • prender-ci to take to-it to make the right choice
      prender-le to take it to be beaten
    • dá-lhe João! give to-him/her, João! show them what you got, João!
    • a-i arde to CL.DAT burn to have a desire
      a o lua pe jos to take CL.ACC on footto walkaccording to the current guidelines, such examples pass the ID tests (see also 6.3_B5); both have literal correspondents that are not characterized by an obligatory non-reflexive clitic: a arde to burn and a lua to take
      a-i repugnato CL.DAT loathe to loathe
      a-i priito CL.DATto be favourable to sb.
    • ucvreti jo to escape her to escape something/someone by running

    Section 5.5

    Verb-particle constructions (VPC)

    Verb-­particle constructions (VPCs), sometimes called phrasal verbs or phrasal-prepositional verbs, like

    • n.a.
    • um|fahren over|drive to run over,mit|kommen with|come to join,vor|bereiten before|prepare to prepare
    • to put off, to blow up, to do in
    • n.a.
    • n.a.
    • buttare giùn throw down to swallow
    • n.a.
    • n.a.
    • n.a.

    constitute another quasi-universal category. They have the following general characteristics:

    1. They are formed by a lexicalized head verb v and a lexicalized particle p dependent on v.
    2. The meaning of the VPC is fully or partly non-compositional.
      • In fully non-compositional VPC (VPC.full) the change in the meaning of v goes significantly beyond adding the meaning of p:
        • n.a.
        • die Fische sind eingegangen the fish went in the fish died
        • to do in to kill, destroy, cheat or harm severely
        • n.a.
        • n.a.
        • n.a.
      • In semi-non-compositional VPCs (VPC.semi), p adds a partly predictable but non-spatial meaning to v
        • n.a.
        • to eat up to eat completely
        • n.a.
        • n.a.
        • n.a.

    VPCs are pervasive in English, German, Swedish, Hungarian and possibly some other languages but irrelevant to or infrequent in Romance and Slavic languages or in Farsi and Greek for instance.

    In some Germanic languages and also in Hungarian, verb-particle constructions can be spelled either as one (multiword) token or separated. Both types of occurrences are to be annotated:

    • n.a.
    • Die Kinder sollen in der Schule aufpassen The children must pay attention at school
      Herr Müller, passen Sie auf! Mr. Müller, be careful
    • n.a.
    • n.a.
    • n.a.
    • n.a.
    • n.a.

    The first challenge in identifying a VPC is to properly distinguish the particle from a possibly homographic preposition, e.g.:

    • n.a.
    • to look up the number vs to look up the chimney
    • n.a.
    • n.a.
    • n.a.
    • n.a.
    • n.a.

    or a verbal prefix:

    • n.a.
    • um- in um|fahren vs umfahren
    • n.a.
    • n.a.
    • n.a.
    • n.a.
    • n.a.

    Namely, a particle, contrary to a preposition, cannot govern a complement. This can be tested depending on the verb's subcategorization frame:

    • For intransitive verbs, the particle can occur without an NP. The fact that there is no NP that could be governed by the particle to form a PP shows that it is a particle rather than a preposition.
    • For transitive verbs, the particle can occur either before or after the direct object. The fact that it is mobile and can go before or after the NP shows that it is a particle rather than a preposition
    • n.a.
    • intransitive: The airplane took off
      transitive The fire did in the whole block or The fire did it in
    • n.a.
    • n.a.
    • n.a.
    • n.a.
    • n.a.

    Prefixes, contrary to particles, can never be spelled separately from the verb, nor can the past tense of prefixed verbs be formed with the infix -ge-

    • n.a.
    • *er fuhr den See um
      *er hat den See umgefahren, instead: er hat den See umfahren he drove around the lake but: er hat das Schild umgefahren he run over the sign
    • n.a.
    • n.a.
    • n.a.
    • n.a.
    • n.a.

    See the language-specific tests for more details on distinguishing particles from prepositions and verbal prefixes.

    Note that in this shared task we do not account for compositional verb-particle combinations, i.e. those whose meaning can be deduced from the meaning of the preposition and of the verb:

    • n.a.
    • er legt das Buch ab he puts down the book, er kommt ins Haus rein he comes into the house he enters the house
    • to lie down, You may go in now
    • n.a.
    • n.a.
    • n.a.
    • n.a.
    • n.a.

    Some combinations may have both compositional and non-compositional meanings depending on the context and only the latter should be annotated:

    • n.a.
    • ein Schild aufstellen to put up a sign vs. einen Plan aufstellen to draw up a plan
    • to put up a flag vs. to put up a friend for the night
    • n.a.
    • n.a.
    • n.a.
    • n.a.
    • n.a.

    the following decision tree should be applied to decide whether a candidate should be annotated as a VPC or not.

    VPC-specific decision tree:

    • Apply test VPC.1 - [PART-REDUC: Can the verb without the particle refer to the same event?]
      • It is a VPC.full.
      • Apply test VPC.2 - [PART-SPATIAL: Is the particle spatial?]
        • It is not a VPC, exit
        • Apply test VPC.3 - [PART-SPATIAL-LIT: Is the particle spatial in a literal reading?]
          • It is a VPC.semi
          • It is not a VPC, exit

    Test VPC.1 (prev. 22) - [PART-REDUC] - Verb without the particle refers to the same event/state

    Can a sentence without the particle refer to the same event/state as the sentence with the particle? Special care must be taken when the same construction might or might not be a valid VPC depending on its context.

    • It is a VPC.full.
      • n.a.
      • Der Lehrling fängt ein Praktikum an the apprentice catches an internship on the apprentice begins an internship does not imply #Der Lehrling fängt ein Praktikum the apprentice catches an internship
        Die Bäuerin hat sich wieder eingefangen the farmer’s wife has herself again catched the farmer’s wife has calmed down again does not imply #Die Bäuerin hat sich wieder gefangen the farmer’s wife has catched herself again
        Der Schüler legt die Prüfung ab the pupil lays the exam off the pupil takes the exam does not imply #der Schüler legt die Prüfung the pupil lays the exam
        Das Schiff legt vom Hafen ab the boat lays from the harbor off the ship leaves the harbor does not imply #das Schiff legt vom Hafen the boat lays from the harbor
      • to do somebody in to kill sb does not imply #to do somebody
        to check in upon arrival does not imply #to check upon arrival
      • n.a.
      • n.a.
      • A meccs után csak az edző nem rúgott be Only the coach did not get drunk after the match A meccs után az edző berúgottThe coach got drunk after the match does not imply #Az edző rúgott the coach kicked
        Nem jött be ez a koktél nekem I didn’t like this cocktail Bejött ez a koktél nekem I liked this cocktail does not imply #Jött ez a koktél nekem this cocktail bumped into me
      • n.a.
      • n.a.
      • n.a.
    • Go to the next test.
      • n.a.
      • Der Bauer fängt die Hühner ein the farmer catches the chickens in the farmer catches the chickens implies der Bauer fängt die Hühner the farmer catches the chickens
        Der Lehrer legt das Buch auf dem Tisch ab the teacher lays the book on the table apart the teacher puts the book away on the table implies Der Lehrer legt das Buch auf den Tisch the teacher puts the book on the table
        Der Lehrer legt den Mantel ab the teacher lays the coat off the teacher takes off his coat implies Der Lehrer legt den Mantel the teacher puts the coat
      • to look up into the sky implies to look into the sky
        to eat up the cookies implies to eat the cookies
      • n.a.
      • n.a.
      • A csatár nem rúgta be a helyzetét The forward missed its chance to score a goal A csatár berúgta a helyzetét implies A csatár rúgott The forward kicked
        Nem jött be a szobába He did not come into the room (Bejött a szobába he entered the room implies Jött a szobába he came into the room
      • n.a.
      • n.a.
      • n.a.

    Test VPC.2 - [PART-SPATIAL] - Spatial particle

    Is the particle spatial in the context of the verb, i.e. does it express direction or position?

    • It is not a VPC, exit.
      • n.a.
      • to stand up
        to give something back
        to stay up tonight
        You may go in now
        to mix ingredients together
      • n.a.
      • n.a.
      • n.a.
    • Go to the next test
      • n.a.
      • to eat the cookies up
        to mix ideas together
      • n.a.
      • n.a.
      • n.a.

    Test VPC.3 - [PART-SPATIAL-LIT] - Spatial particle in a literal reading

    Does the VPC candidate have a literal counterpart in which the particle is spatial, i.e. expresses direction or position?

    • It is not a VPC, exit.
      • n.a.
      • to mix ideas together
      • n.a.
      • n.a.
      • n.a.
    • It is a VPC.semi.
      • n.a.
      • to eat the cookies up
      • n.a.
      • n.a.
      • n.a.

    Section 5.6

    Multi-verb constructions (MVC)

    Multi-verb constructions (MVC) constitute a quasi-universal category. They are VMWEs composed by a sequence of two adjacent verbs (in a language-dependent order), a governing verb V-gov (also called a vector verb) and a dependent verb V_dep (also called a pole/polar verb), which have the following characteristics:

    1. They usually have the same subject.
    2. They usually denote actions that are closely connected and may be seen as part of the same event.
    3. They function together as a single predicate.
    4. They are unaccompanied by any explicit coordination, subordination, or dependency marker.
    5. They only have a single tense, aspect and polarity value.
    6. They may be idiomatic or indicate successions of events.
    7. The V-gov (vector) verb contains the core meaning of the whole, while the V-dep (polar) verb is a semantically delexicalized verb.

    The behavior of MVCs is very heterogeneous across languages. Therefore, most tests for the detection of MVCs are language specific. The current tests were designed for Indonesian, Hindi, Japanese and Chinese. The generalization of these tests cross-lingually is planned as future work.

    MVC-specific decision tree for Hindi

    • Apply Test MVC.1.BASE - [MVC-STRUCT-BASE: V-dep is non finite and V-gov bears inflection?]
      • It is not a VMWE, exit
      • Apply Test MVC.4 - [INS-REDIRECT-KAR: kar or ke appears just after V-dep?]
        • Apply Test MVC.6 - [MANNER: V-dep indicates the manner/means/direction of V-gov?]
          • It is a manner serial verb, not a VMWE, exit
          • Apply Test MVC.7 - [REASON: V-dep indicates the reason for V-gov?]
            • It is a reason serial verb, not a VMWE, exit
            • Apply Test MVC.8 - [SEQ: V-gov and V-dep bound by temporal sequence?]
              • It is a temporal sequence serial verb, not a VMWE, exit
              • Apply Test MVC.9 - [SIMULT: V-gov+V-dep express rapid and simultaneous actions?]
                • It is a serial verb expressing simultaneous actions, not a VMWE, exit
                • Continue to the next test
        • Apply Test MVC.10 - [LIGHT: V-gov in the closed list of light verbs?]
          • Annotate as MVC
          • Apply Test MVC.13 - [V-LEX: V-dep refers to the same event/state as V-gov+V-dep?]
            • It is not a VMWE, exit
            • Annotate as an MVC

    MVC-specific decision tree for Chinese

    In progress

    MVC-specific decision tree for Indonesian and Japanese

      TODO (in the meantime, follow the tests one by one)

    MVC-specific decision tree for any other language

    • Apply directly Test MVC.13 - [COMP: V-dep refers to the same event/state as V-gov+V-dep?]
      • It is not a VMWE, exit
      • Annotate as an MVC

    Test MVC.1 - [MVC-STRUCT] MVC-like structure

    Does the candidate respect the necessary structural (language-dependent) requirements for an MVC?

    Hindi

    Test MVC.1.BASE [MVC-STRUCT-BASE]: Is V-dep non finite and does V-gov carry the tense, aspect and agreement inflections?

    • continue to the next test
      • n.a.
      • n.a.
      • n.a.
    • it is not an MVC
      • n.a.
      • n.a.
      • n.a.

    Japanese

    Test MVC.1.IMORPH: Does the first verb (V-dep) contain the i-morph suffix?

    • [change proposed by AS:] continue to the next test
      • n.a.
      • n.a.
      • 焼きついyakitsuisear into one's mind → the first verb 焼き yakiburnis inflected in the i-morph ending
      • n.a.
      • n.a.
    • it is not a MVC
      • n.a.
      • n.a.
      • n.a.

    Any other language

    Go to the next test

    Test MVC.2 - [INS-DISCARD] Insertion which discards

    Does the candidate sequence appear, or could it appear, with an affix, particle or another external (non-lexicalized) material (depending on the language) which indicates that this candidate is a regular combination and should be discarded?

    Chinese

    Test MVC.2.POSS: Can the possessive marker come after the candidate [AS: after V-gov or V-dep?]?

    • it is NOT an MVC
      • n.a.
      • n.a.
      • n.a.
      • n.a.
      • 挺漂亮tǐngpiàoliangquite pretty挺漂亮tǐngpiàoliangquite pretty depossessive marker→ The addition of the possessive marker is grammatically sound
    • continue to next test
      • n.a.
      • n.a.
      • n.a.
      • 想想看xiǎngxiǎngkànconsider carefully → *想想看xiǎngxiǎngkànconsider carefullydepossessive marker→ The addition of the possessive marker leads to ungrammaticality in the phrase

    Indonesian

    Test MVC.2.PRON: Can a pronoun like dia he/she be inserted between the first [AS: between V-gov and V-dep or the opposite?] and second verb?

    • it is NOT an MVC
      • n.a.
      • n.a.
      • n.a.
    • continue to next test
      • n.a.
      • n.a.
      • n.a.

    Test MVC.2.CLAUSE: Can a that-clause like bahwa that, or a whether-clause like apakah whether be inserted between the first and second verb [AS: between V-gov and V-dep or the opposite?], where the first verb [AS: V-gov?] is a saying verb like mengatakan say or an asking verb like menanyakan ask?

    • it is NOT an MVC
      • n.a.
      • n.a.
      • n.a.
    • continue to next test
      • n.a.
      • n.a.
      • n.a.

    Test MVC.2.PURPOSE: Can untuk for/to be inserted between the first and second verb [AS: between V-gov and V-dep or the opposite?]?

    • [change proposed by AS:] it is a purpose serial verb, not an MVC
      • n.a.
      • Saya Ibersiap pergi get ready to go= SayaI bersiap untuk pergi get ready for the purpose of going→ The insertion of untuk for/to is grammatically sound and does not change the meaning of the sentence. Although it is possible to insert untukfor/to between first and second verb, it is usually unnecessary and omitted.
      • n.a.
      • n.a.
    • continue to next test

    Japanese

    Test MVC.2.HONOR: Is the first verb [AS: V-gov or V-dep?] preceded by the honorific particle お o and is the second verb する/できるsuru/dekiru?

    • it is NOT an MVC, but an honorific construction.
      • n.a.
      • n.a.
      • お-話し-する o-hanasi-suru I humbly talk
      • n.a.
      • n.a.
    • continue to next test

    Any other language

    Go to the next test

    Test MVC.3 - [PROH-INS-CONF] Prohibited insertion which confirms

    Is the insertion of a particular affix, particle or another external (non-lexicalized) material (depending on the language) prohibited, which helps confirming that this candidate is an MVC?

    Chinese

    Test MVC.3.ASPECT - [PROH-INS-CONF-ASP]: is it impossible to insert the aspect marker leperfective between the adjacent verbs [AS: between V-gov and V-dep or the opposite?] in the candidate ?

    • it is an MVC
      • n.a.
      • n.a.
      • n.a.
      • n.a.
      • I听说tīngshuō heard → *我听wǒtīng I heard le aspect markershuō say→ The insertion of the aspect markerle aspect marker leads to ungrammaticality in the phrase
    • continue to next test
      • n.a.
      • n.a.
      • n.a.
      • n.a.
      • I看出来 kànchūlái figure out→ 我看wǒkàn I seele aspect marker出来 chūlái exit→ The insertion of the aspect markerle aspect markeris grammatically sound

    Any other language

    Go to the next test

    Test MVC.4 - [INS-REDIRECT] Insertion which redirects

    Does the candidate sequence appear with an affix, particle or another external (non-lexicalized) material (depending on the language) which indicates that a particular test should be applied next?

    Hindi

    Test MVC.4.KAR - [INS-REDIRECT-KAR]: Does conjunctive participle kar or ke appear attached to or immediately after V-dep?

    Any other language

    Go to the next test

    Test MVC.5 - [MODAL] Modal or auxiliary verb

    Chinese

    Is V-gov a modal or an auxiliary verb?

    • it is NOT an MVC
      • n.a.
      • n.a.
      • 可以 kéyǐcan, 可能 kěnéngmight, 会 huìwill, 必须 bìxūmust, 需要 xūyàoneed to, 要 yàowant to, 能 néngable to, 应该 yīng gāishould
    • continue to next test
      • n.a.
      • n.a.

    Any other language

    Go to the next test

    Test MVC.6 - [MANNER] Manner verb

    Chinese, Hindi, Indonesian, Japanese

    Does V-dep indicate the manner or means (and possibly a direction) of the action expressed by V-gov?

    • [change proposed by AS:] it is a manner serial verb, not an MVC
      • n.a.
      • n.a.
      • us-ne ciikh-kar mujh-e bulaa-yaa He-erg yell-ConjPpl I-dative call-perf he called me by screaming
      • pulang melalui return-home pass-through go home by passing through (a place)
      • 投げ込み nage komi throw go in throw into
        なぐり殺し naguri korosi punch kill kill by punching
      • n.a.
      • 走进来 zǒu jìnláiwalk enter walk into (a place)
    • continue to next test

    Any other language

    Go to the next test

    Test MVC.7 - [REASON] Reason verb

    Hindi

    Does V-dep indicate the reason of the action expressed by V-gov?

    • [change proposed by AS:] it is a reason serial verb, not an MVC
      • n.a.
      • n.a.
      • vo melaa jaa-kar khush hu-aa he fair go-ConjPpl happy become-perf he got happy having gone to the fair
      • n.a.
    • continue to next test

    Any other language

    Go to the next test

    Test MVC.8 - [SEQ] Temporal sequence

    Hindi, Indonesian, Japanese

    Are the verbs bound by a temporal sequence?

    • [change proposed by AS:] it is a sequential serial verb, not an MVC
      • n.a.
      • n.a.
      • us-ne gilaas banaa-kar bec-aa he-erg glass make-ConjPpl sell-perf having made the glass, he sold it
      • bersiap pergi prepare go prepare in order to go (somewhere) → the first verb must happen before the second verb happens, otherwise the sentence will not make sense.
      • 夫人が最初にfujin ga saisho ni the wife first叩き起こさtataki okosa hit to awakenre verb suffix != #夫人が最初にfujin ga saisho ni the wife first起き叩さtataki okosa hit to awakenre verb suffix→ The two verbs 叩き tataki hitand 起こさ okosa awakenare bound by temporal sequence, such that if the order is switched, the sentence does not make sense.
      • n.a.
    • continue to next test

    Any other language

    Go to the next test

    Test MVC.9 - [SIMULT] Simultaneous actions

    Do the verbs indicate rapid and simultaneous actions (without resorting to a coordination conjunction)?

    • [change proposed by AS:] it is a serial verb expressing simultaneous actions, not an MVC
      • n.a.
      • n.a.
      • berlari menuju run head-towards run and go towards
      • n.a.
    • continue to next test

    Test MVC.10 - [LIGHT] Light verb

    Hindi

    Does V-gov belong to a closed list of light verbs: aa come, baiTh sit, chal go, chuk finish, choR leave, Daal throw, de give, ja go, jataa declare, khaa eat, lagaa put, le take, maar hit, paa get/obtain, paRh fall, rakh keep, uTh rise?

    • it is a (light) MVC
      • n.a.
      • n.a.
    • continue to next test

    Any other language

    Go to the next test

    Test MVC.11 - [PREP-LIKE] Preposition-like verb

    Chinese

    Is the second verb in the candidate [AS: V-gov or V-dep?] a preposition-like verb like chéng become?

    • it is a preposition-like MVC
      • n.a.
      • n.a.
      • n.a.
      • 排列成 páiliè chéng arrange become arrange into (something)
    • continue to next test

    Any other language

    Go to the next test

    Test MVC.12 - [NOUN-LIKE] Noun-like verb

    Japanese

    Are any of the components [AS: V-gov or V-dep?] in the candidate noun-like arguments?

    • it is a deverbalized V1/V2 MVC
      • n.a.
      • (JA) 響き渡る hibiki wataru echo spread-widely reverberate → The first verb is a noun-like argument of the second verb [deverbalized V2]
        聞き違え kiki chigae listen be-different mishear/misunderstand → The second verb is a noun-like argument of the first verb [deverbalized V1]
      • n.a.
      • n.a.
    • continue to next test

    Any other language

    Go to the next test

    Test MVC.13 - [V-LEX] Lexical inflexibility

    Does a regular replacement of V-dep by a related verb taken from a relatively large semantic class lead to ungrammaticality or to an unexpected change in meaning?

    • it is an MVC
      • it will make do → #it will make build/solve/construct
      • hello quiere decir "hola" en inglés "hello" want say "hola" in English "hello" means "hola" in English → #"hello" quiere comunicar/gritar "hola" en inglés "hello" want communicate/shout "hola" in English
      • j'ai laissé tomber la présentation I have let fall the presentation I gave up on the presentation → #j'ai laissé commencer/lancer/interrompre la présentation I have let start/launch/interrupt the presentation
        ce mot veut dire autre chose this word wants say other thing this word means something else → #ce mot veut chuchoter/communiquer/crier autre chose this word wants whisper/communicate/scream another thing
      • dał jej popalić he let hr smoke he made things hard for her → #dał jej popić/podymić/pofajczyć he let her drink/smoke
      • n.a.
    • it is not an MVC
      • it will make me think → it will make me build/solve/construct
      • quiero leer tu tesis want.I read your thesis I want to read your thesis → quiero adquirir/descargar/imprimir tu tesis want.I acquire/download/print your thesis I want to get/download/print your thesis
      • je l'ai laissé finir la présentation I him have let finish the presentation I let him finish the presentation → je l'ai laissé commencer/lancer/interrompre la présentation I him have let start/launch/interrupr the presentation
        ce garçon veut dire autre chose this boy wants say other thing this boy wants to say something else → ce garçon veut chuchoter/communiquer/crier autre chose this boy wants whisper/communicate/scream another thing
      • dał jej pospać he let her sleep→ dał jej odpocząć/poleżeć he let her rest/lay
      • n.a.

    Section 5.7

    Inherently adpositional verbs (IAVs)

    Inherently adpositional verb (IAV) is a special optional and experimental category (corresponding to the IPrepV category in the first pilot annotations), and to what is also sometimes called in English prepositional verbs. It consists of a verb or VMWE and an idiomatic selected preposition or postposition that is either always required or, if absent, changes the meaning of the verb of VMWE significantly. Language teams who decide to annotate IAV should do so after annotating other categories (step 4 of the annotation process), since overlapping can be quite frequent with other categories, as detailed below. Language teams are not required to use this category.

    Our definition of inherently adpositional verbs is a generalization (applying to many languages) of the annotation guidelines of the English STREUSLE corpus, which define guidelines for annotating prepositional verbs.

    IAVs are verb+adposition combinations in which:

    • the dependents of the adposition are not lexicalized
      • разчитам на някого/нещо to rely on somebody/something is annotated as IAV because the object is not lexicalised,
        but in the ID
        вземам на мушка някого/нещо take on target to critisise heavily somebody/something cannot be annotated as IAV because мушка is also lexicalized in the ID
      • to stand for something is annotated as IAV because the object is not lexicalized,
        but in the ID
        to take something for granted, to take for cannot be annotated as IAV because granted is also lexicalized in the ID
      • entender de algo understand of somethingto know about something is annotated as IAV because the object is not lexicalised, whereas entender algo would not be any type of VMWE.
      • n.a.
      • pristati na kaj to land on (something) to agree (with something)is annotated as IAV because the object is not lexicalized,
        but in the ID
        ostati na trdnih tleh to remain on solid ground to remain realistic ostati na to remain on cannot be annotated as IAV because trdnih tleh solid ground is also lexicalized in the ID
    • the adposition is integral, that is, "it cannot be omitted without markedly altering the meaning of the verb"
      • считам за to take for *считам can never occur without the preposition за
        разчитам на to rely on разчитам can occur without the preposition, but it will never have a sense of to depend/rely on
      • to rely on *to rely can never occur without the preposition on
        to count on to count can occur without the preposition, but it will never have a sense of to depend/rely on
      • entender de understand of somethingto know about something entender to understandcan occur without the preposition, but it will never have a sense of to be an expert about something
        contar con count withto rely on contar to countcan occur without the preposition, but it will never have a sense of to rely on.
      • n.a.
      • temeljiti na to be based on *temeljiti can never occur without the preposition na
        biti za to be for to agree with or support (something or someone) biti to be can occur without the preposition, but it will never have a sense of to agree with or to support

    Note that idiomatic adpositional valency, in which the adposition opens a slot for a complement, should not be mistaken for verb-­particle constructions. Tests distinguishing particles from prepositions can be used to disambiguate these categories.

    • to wake up somebody cannot be annotated as IAV because up is a particle, and not a preposition.
      Particles can occur after the object:
      to wake somebody up but prepositions cannot *to come a new restaurant across
    • n.a.
    • n.a.
    • n.a.

    Not only single verbs but also VMWEs may be inherently adpositional. This is why IAV annotation needs to be the last step, after all other VMWEs in a sentence have been identified and categorized. In case of overlap between another category and IAV, the whole VMWE annotation needs to be repeated with the addition of the lexicalized adposition, and the whole is annotated as an IAV.

    • to put up with bears 2 annotations:
      1.
      to put up is annotated as VPC
      2. the whole sequence
      to put up with is annotated as IAV
    • atenerse a abide.self to to abide by bears 2 annotations:
      1.
      atenerse is annotated as IRV
      2. the whole sequence
      atenerse a is annotated as IAV
    • n.a.
    • ubadati se z to deal RCLI with to deal with bears 2 annotations:
      1.
      ubadati se to deal RCLI is annotated as IRV, since the verb without the RCLI does not exist
      2. the whole sequence
      ubadati se z to deal RCLI withis annotated as IAV, since the verb also does not exist without the preposition

    Test IAV.1 - [CIRCUM-QUEST] Circumstantial question with no adposition

    This is an adaptation of STREUSLE's guideline on prepositional verbs by Nathan Schneider and Meredith Green.

    In response to a declarative sentence with the verb+adposition combination, is there a natural way to query the circumstances of the verbal event using the verb, but not the adposition?

    • it is not an IAV
      • - I care about the environment.
        - Why do you care?

        to care about is not annotated as IAV
      • - me preocupo por su salud. me worry.I for his/her health I'm worried about his/her health
        - ¿Por qué te preocupas?why you worry.you? Why are you worried?

        preocuparse por is not annotated as IAV
      • n.a.
      • - Lahko se zanesem na pomoč svojih prijateljev. I can rely on my friends' help
        - Se lahko zaneseš, da ti bo kdo pomagal? Can you rely that someone will help you?Can you rely on that someone will help you?

        zanesti se to rely on is not annotated as IAV
    • annotate as an IAV
      • - I came across a nice restaurant downtown.
        - #When did you come?
        to come across is annotated as IAV
      • - Ana entiende de música clásica. Ana understands of music classic Ana knows about classical music
        - #¿Desde cuándo entiende? Since when understands.she?Since when does she know?
        entender de is annotated as IAV
      • n.a.
      • Gre za enakovrednost. It goes about equality. It is about equality.
        - #Kaj gre? #What goes?
        gre za is annotated as IAV

    Section 6

    Language-specific tests

    Language-specific tests may be necessary in one of 3 cases:

    • a VMWE category may be universal or quasi-universal but it may require different tests in different languages,
    • any category specific to a language must be associated with appropriate tests in the same language,
    • universal tests can build upon more elementary language-specific tests (e.g. to distinguish a particle from a preposition).

    Section 6.1

    Language-specific categories (LS)

    Language-specific categories can be proposed for annotation in this task provided that they are carefully defined and accompanied by linguistic tests that allow to distinguish them from other categories. We recommended not redefining the universal and quasi-universal categories described here, but introducing new names and abbreviations in order to answer such needs.

    When a new language(-group)-specific category is introduced, we encourage the use of the LS category with a dotted extension, e.g. LS.SIM or LS.PROV (for "language-specific simile" or "language-specific proverb").


    Section 6.2

    Particles versus prepositions and prefixes

    The following tests allow to properly identify prepositional verb particles in cases where they might be homographic with prepositions in prepositional phrases (PPs) or with verbal prefixes. The word to be discriminated is referred to as a candidate word. The tests are language-specific and concern English and German.

    English-specific test for distinguishing particles from preposition

    The following tests concern English words which can be either a preposition or a particle depending on the context, e.g. up, on, through, etc. If a candidate word passes any of the two tests it can be categorized as a particle.

    Test PREP.EN.1 (prev. 7.2.EN) - [FIN-PART] - Sentence-final particle

    Can the sentence be reformulated so that the candidate word w occurs at the end of a clause which is: (i) affirmative or imperative, (ii) headed by the verb governing w, and (iii) not a relative clause?

    • the candidate word is a particle
      • n.a.
      • They got up a petition on Monday. They got it up.
        I took off my clothes. I too my clothes off.
        She tries to take in her clients. She tries to take her clients /in.
      • n.a.
      • n.a.
      • n.a.
      • n.a.
    • go to the next test
      • n.a.
      • I got up the hill. *I got it up.
        He has been off alcohol*He has been alcohol off.
      • n.a.
      • n.a.
      • n.a.
      • n.a.

    Test PREP.EN.2 (prev. 7.3.EN) - [AD-INS] - Adjunct insertion

    Is an insertion of a circumstantial adjunct prohibited between the governing verb and the candidate word?

    • the candidate word is a particle
      • n.a.
      • They finally got up a petition. *They got finally up a petition.
        I took off my clothes at once. *I took at once off my clothes.
        She always tries to take in her clients. *She tries to take always in her clients.
      • n.a.
      • n.a.
      • n.a.
      • n.a.
    • it is not a VPC
      • n.a.
      • I got up the hill finally. I got finally up the hill.
        He has been off alcohol recently. He has been recently off alcohol.
      • n.a.
      • n.a.
      • n.a.
      • n.a.

    This test might be redundant with respect to test PREP.EN.1. It it occurs to be so (after a large-scale annotation), it may be deleted.

    German-specific tests for distinguishing particles from prepositions and verbal prefixes

    The following tests concern German words which can be both a particle and either a preposition or a verbal prefix, depending on the context, e.g. mit, um, vor, etc. If a candidate word passes any of the three following tests it can be categorized as a particle.

    Test PREP.DE.1 (prev. 7.1.DE) - [FIN-PART] - Sentence-final particle

    Does the candidate word occur at the end of the sentence or can the sentence be reformulated so as to put the candidate word at the end?

    • it is a particle
      • Kommst Du mit? come you with? are you coming?
        Ich schlage vor allen zu verzeihen. I propose to forgive everyone Ich schlage es vor I propose it
        Der Mülleimer wurde umgefahren. The trash bin was knocked down Er fuhr den Mülleimer um. He knocked down the trash bin
      • n.a.
      • n.a.
      • n.a.
      • n.a.
    • other tests are needed
      • Kommst Du mit jemandem? Are you coming with someone? *Kommst Du jemandem mit?
        Er umfuhr den ganzen See mit dem Fahrrad. He drove around the whole lake with a bike *Er fuhr ihn um.
      • n.a.
      • n.a.
      • n.a.
      • n.a.

    Test PREP.DE.2 (prev. 7.2.DE) - [SEP-PART] - Separable particle

    Can the verb and the candidate word be spelled both separately and together?

    • it is a particle
      • Passen Sie auf die Autos auf! Be careful with the cars! Sie müssen auf die Autos aufpassen! You must be careful with the cars!
        Er fuhr das Schild um. He drove over the sign Er sollte das Schild nicht umfahren He should not drive over the sign
      • n.a.
      • n.a.
      • n.a.
      • n.a.
    • other tests are needed
      • Er umfuhr den ganzen See mit dem Fahrrad. He rode around the whole lake with a bike *Er fuhr den ganzen See mit dem Fahrrad um.
        Sprechen Sie mit ihm! Speak with him! *Sie sollen ihm mitsprechen.
      • n.a.
      • n.a.
      • n.a.
      • n.a.

    Section 6.3

    Language-specific inherently clitic verbs (LS.ICV)

    Inherently Clitic Verbs (LS.ICV) together with the Inherently Reflexive Verbs (IRV) are pronominal verbs. LS.ICV are formed by a full verb combined with one or more non-reflexive clitic that represents the pronominalization of one or more complement (CLI). LS.ICV is annotated when (a) the verb never occurs without one non-reflexive clitic, e.g. entrarci to be relevant to something colloquial form, or (b) when the LS.ICV and the non-clitic versions have clearly different senses or subcategorization frames.

    LS.ICVs represent a specific category for some Romance languages, and they are particularly frequent in the Italian language. It is often challenging to distinguish LS.ICV from IRV, particularly because some clitics may be ambiguous, like se/si which is a polyfunctional clitic pronoun and grammatical marker (and has many functions such as reflexive, reciprocal, impersonal, passivizing, aspectual, middle).

    If the CLI has a clear reflexive meaning the VMWE might be an IRV.

    We start by listing the various categories of LS.ICVs before providing tests to decide whether to annotate a given occurrence as an LS.ICV.

    • Inherently clitic verbs ⇒ ANNOTATE as LS.ICV
      1. The verb without the CLI does not exist
        • infischiarsene (not worry about) vs *infischiare
      2. The verb without the CLI does exist, but has a very different meaning
        • darla (gl.: give it) (transl. fuck around) ≠ dare (give)
          prenderle (gl.: take them) (transl. be beaten) ≠ prendere (take)
          prenderci (gl.: take it) (transl. grasp the truth) ≠ prendere (take)
          starci (gl.: stay there) (transl. agree) ≠ stare (stay)
      3. The verb has more than one CLI of which the second one is an invariable object complement.
        • fregarsene (gl.: matter self of-it) (transl.don’t care about)
          infischiarsene (transl. not worry about)
          curarsene (gl.: take care self of-it) (transl. care about)
          prendersela (gl.: take self it.FEM)(transl. be angry/upset)
          sentirsela (gl.: feel self it.FEM) (transl. be in the mood of)
          sentirselo (gl.: feel self it.MASC) (transl. feel)
          vedersela (gl.: see self it.FEM)(transl. to manage something)
      4. The verb has two non-reflexive invariable CLIs:
        • farcela (gl.: make there it.FEM) (transl. succeed)
      5. The verb has a different meaning with respect to an intensive use of the same two non-reflexive invariable CLIs:
        • andarsene (gl.: go away self from-there) (transl. die) ≠ andarsene (go away)
          bersela (gl.: drink slef it.FEM) (transl. believe) ≠ bersela (drink)

    LS.ICV-specific decision tree

    Test LS.ICV.1 - [CL-INHERENT] Inherent clitic

    Does the verb only exist with the CLI and never occurs without it?

    • annotate as LS.ICV
      • infischiarsi ⇒ *infischiare
        infischiarsene ⇒ *infischiare
    • next test

    Test LS.ICV.2 - [CL-DIFF-SENSE] - Different sense

    Given the same verb without the CLI/CLIs, are all of its meanings clearly different from the inherently clitic form?

    • annotate as LS.ICV
      • smetterla (gl.: quit it) (transl. knock it off) ≠ smettere (quit)
        prenderle (gl.: take them) (transl. get beaten up) ≠ prendere (take)
        prenderci (gl.: take it)(transl. grasp the truth) ≠ prendere (take)
        starci (gl.: stay there)(transl. up for it) ≠ stare (stay)
        curarsene (gl.: take care self of-it) (transl. care about) ≠ curare (take care)
        prendersela (gl.: take self it.FEM)(transl. be angry/upset)≠ prendere (take)
        sentirsela (gl.: feel slef it.FEM) (transl. be in the mood of) ≠ sentire (feel)
        darla (gl.: give it.FEM) (transl. fuck around) ≠ dare (give)
    • next test

    Test ICV.3 - [CL-DIFF-SUBCAT] - Different subcategorization frame

    Is the subcategorization frame of the simple verb without the CLI different from the subcategorization frame of the LS.ICV?

    • annotate as LS.ICV
      • X se la prende con Y ⇔ X prende Y
    • Exit

    Section 6.4

    Italian-specific decision tree

    For Italian, a language-specific category called inherently clitic verbs (LS.ICV) has been defined. This implies a modified version of the annotation decision tree.

    Steps 1-4 are still valid in Italian. But Step 3 should be realized with the decision tree below instead of the generic decision tree.

    • Apply test S.1 (prev. 6) - [1HEAD: Unique verb as functional syntactic head of the whole?]
      • Apply the VID-specific testsVID tests positive?
        • Annotate as a VMWE of category VID
        • It is not a VMWE, exit
      • Apply test S.2 (prev. 7) - [1DEP: Verb v has exactly one lexicalized dependent d?]
        • Apply the test IT.S.1 - [CLITICS-ONLY: Are all lexicalized dependents of the verb clitics?]
          • Apply the LS.ICV-specific testsLS.ICV tests positive?
            • Annotate as a VMWE of category LS.ICV
            • It is not a VMWE, exit
          • Apply the VID-specific testsVID tests positive?
            • Annotate as a VMWE of category VID
            • It is not a VMWE, exit
        • Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
          • Apply the VID-specific testsVID tests positive?
            • Annotate as a VMWE of category VID
            • It is not a VMWE, exit
          • Apply test S.4 (prev. 8) - [CATEG: What is the morphosyntactic category of d?]
            • Reflexive clitic ⇒ Apply IRV-specific testsIRV tests positive?
              • Annotate as a VMWE of category IRV
              • It is not a VMWE, exit
            • Non-reflexive clitic ⇒ Apply LS.ICV-specific testsLS.ICV tests positive?
              • Annotate as a VMWE of category LS.ICV
              • It is not a VMWE, exit
            • Particle ⇒ Apply VPC-specific testsVPC tests positive?
              • Annotate as a VMWE of category VPC.full or VPC.semi
              • It is not a VMWE, exit
            • Verb with no lexicalized dependent ⇒ Apply MVC-specific testsMVC tests positive?
              • Annotate as a VMWE of category MVC
              • Apply the VID-specific testsVID tests positive?
                • Annotate as a VMWE of category ID
                • It is not a VMWE, exit
            • Extended NP ⇒ Apply LVC-specific decision treeLVC tests positive?
              • Annotate as a VMWE of category LVC
              • Apply the VID-specific testsVID tests positive?
                • Annotate as a VMWE of category VID
                • It is not a VMWE, exit
            • Another category ⇒ Apply the VID-specific testsVID tests positive?
              • Annotate as a VMWE of category VID
              • It is not a VMWE, exit

    Test IT.S.1 - [CLITICS-ONLY] Clitics only

    Are all lexicalized dependents of the verb clitics??

    • apply LS.ICV tests
      • Warning! Examples not found in database for id=6.4_A_test-it1-yes
    • next test

    Section 6.5

    Hindi-specific decision tree

    For Hindi, LVCs can be formed by a verb and a noun, or by a verb and an adjective which is morphologically identical to an eventive noun. This implies a modified version of the annotation decision tree.

    Steps 1-4 are still valid in Hindi. But Step 3 should be realized with the decision tree below instead of the generic decision tree.

    • Apply test S.1 (prev. 6) - [1HEAD: Unique verb as functional syntactic head of the whole?]
      • Apply the VID-specific testsVID tests positive?
        • Annotate as a VMWE of category VID
        • It is not a VMWE, exit
      • Apply test S.2 (prev. 7) - [1DEP: Verb v has exactly one lexicalized dependent d?]
        • Apply the VID-specific testsVID tests positive?
          • Annotate as a VMWE of category VID
          • It is not a VMWE, exit
        • Apply test S.3 - [LEX-SUBJ: Lexicalized subject?]
          • Apply the VID-specific testsVID tests positive?
            • Annotate as a VMWE of category VID
            • It is not a VMWE, exit
          • Apply test S.4 (prev. 8) - [CATEG: What is the morphosyntactic category of d?]
            • Reflexive clitic ⇒ Apply IRV-specific testsIRV tests positive?
              • Annotate as a VMWE of category IRV
              • It is not a VMWE, exit
            • Particle ⇒ Apply VPC-specific testsVPC tests positive?
              • Annotate as a VMWE of category VPC.full or VPC.semi
              • It is not a VMWE, exit
            • Verb with no lexicalized dependent ⇒ Apply MVC-specific testsMVC tests positive?
              • Annotate as a VMWE of category MVC
              • Apply the VID-specific testsVID tests positive?
                • Annotate as a VMWE of category ID
                • It is not a VMWE, exit
            • Extended NP or an adjective which is morphologically identical to an eventive noun ⇒ Apply LVC-specific decision treeLVC tests positive?
              • Annotate as a VMWE of category LVC
              • Apply the VID-specific testsVID tests positive?
                • Annotate as a VMWE of category VID
                • It is not a VMWE, exit
            • Another category ⇒ Apply the VID-specific testsVID tests positive?
              • Annotate as a VMWE of category VID
              • It is not a VMWE, exit

    Section 7

    Annotation management

    This section groups the documentation on practical aspects of the annotation campaign management. Some of these aspects are specific to this shared task, such as the edition of examples by language leaders and the use of the annotation platform FLAT. Others are more generic and concern the guidelines in general, such as the FAQ section.


    Section 7.1

    Frequently Asked Questions (FAQ)

    Annotators often face questions and challenging examples. When several annotators ask the same question, we will update the list of frequently asked questions.

    However, we suggest that language teams set up another communication platform to deal with questions that are specific to a language. This can take the form of a shared online document, a wiki, a dedicated bug tracking system or mailing list. We also suggest keeping track of decisions taken considering borderline examples (with a list of expressions to which the decision applies). These should be kept in a centralized document or page that all annotators can access.

    Whenever you think that a question can also be interesting to other languages, please notify the organizers and we will try to update this page.

    1. How to define an unexpected change in meaning​?
    2. How to annotate lexicalized words which belong to contractions, compounds, and acronyms?
    3. How to annotate coordinated​ VMWEs sharing some components?
    4. How to annotate elliptical​ occurrences of VMWEs?
    5. How to annotate VMWEs that seem to belong to more than one category​?
    6. How to annotate embedded​ VMWEs?
    7. Are existential expressions with there is/are considered VMWEs?
    8. How to categorize VMWEs which seem LVCs​ but do not pass all LVC tests?
    9. Why are verb+noun constructions with pure​ operator verbs​ (to commit, to make, to have etc.) considered LVCs?
    10. Does the IRV category include verbs with non-­reflexive clitics?
    11. Should nominalizations​ of VMWEs be annotated?
    12. How to express hesitation between different VMWE categories?
    13. How can one decide what are the semantic arguments of a noun for borderline cases?
    14. How does one decide if a more or less frozen determiner is a lexicalized VMWE component?
    15. Should I annotate compound and serial verbs as VMWEs? Of which category?
    16. If an LVC contains a complex (fixed) NP as a dependent, should I include the whole NP or just the head?
    17. In an LVC candidate, if the verb adds aspect to the predicative noun, does it imply failing Test LVC.3?
    18. In the LVC decision tree, should I test that the noun keeps its original meaning?

    1. How to define an unexpected change in meaning​?

    Check the glossary entry that defines unexpected change in meaning

2. How to annotate lexicalized words which belong to contractions, compounds and acronyms?

In some languages adpositions (pre- or post-positions), clitics and determiners are subject to contractions (i.e. they yield multi­word tokens, MWTs). If they are properly split by the tokenizer, only the lexicalized parts of each contraction should be annotated. If you use FLAT for annotating, the display of split contractions is twofold: both in its folded and unfolded version. Only the latter should be subject to annotation, e.g. Jean bénéficie du de le traitement Jean benefits from the treatment, Jean donne du de le grain à moudre à son fils Jean gives grain to grind to his sonJean gives an occasion to act to his son.

Sometimes, however, tokenizers might not handle contraction splitting properly. In this case, a lexicalized component of a VMWE can be merged with an external word:

  • n.a.
  • haberse suicidado have+REFL suicided committed suicide
  • n.a.
  • aller au (à+le) secours go to+the rescueto rescue
  • n.a.
  • n.a.
  • n.a.

A similar problem occurs in languages with productive compounding, where a lexicalized component of a VMWE and a free modifier can build up a multitoken word (since compound splitting might not be a standard feature of a tokenizer):

  • unter Drogeneinfluss stehen to be under the influence of drugs
    Heisshunger haben to have hot hunger to be ravenously hungry
  • n.a.
  • n.a.

Yet another related phenomenon concerns acronyms whose spelled-out versions may contain predicative nouns which in the abbreviated versions boil down to single letters:

  • the patient has AIDS (acquired immunodeficiency syndrome)
    the book underwent OCR (optical character recognition)
    the program carries out a PCA (principal component analysis)
  • el paciente presenta un SCA (síndrome coronario agudo)
  • le patient présente un SCA (Syndrome coronarien aigu)
    le patient fait un AVC (accident vasculaire cérébral)

Since the current annotation format is token­-based, we prohibit correcting tokenization errors and compound splitting by the annotators for the sake of coherence. Therefore the annotation of such contractions, compounds and acronyms finds no fully satisfactory solution in our schema. We propose to annotate a whole MWT each time it contains a word which is part of a VMWE. Annotators should add a textual comment about the mixed status of this MWT:

  • Drogeneinfluss → MWT containing a lexicalized VMWE component Einfluss and an external word Drogen
    Heisshunger → MWT containing a lexicalized VMWE Hunger and an additional modifier heiss
  • haberse → MWT containing a lexicalized VMWE component se and an external word haber
  • n.a.
3. How to annotate coordinated​ VMWEs sharing some components?

A component shared by two or more coordinated VMWEs should be annotated as belonging to ​both of them.

  • Regeln und Richtlinien aufstellen to set up rules and guidelines to draw up rules and guidelines aufstellen must be annotated both as part of​ to Regeln aufstellen to lay down rules and of Richtlinien aufstellen to draw up guidelines
  • κάναμε βόλτες και ένα σωρό ψώνια στο εμπορικό κέντρο κάναμε we made must be annotated both as part of​ κάναμε βόλτες we made walksand of κάναμε ψώνια we were buying
  • to have a walk or a ride have must be annotated both as part of​ to have a walk and of to have a ride
  • darse un baño o una ducha give a bath or a shower to have a bath or a shower darse must be annotated both as part of​ darse un baño and of darse una ducha
  • hitz eta lan egin word and work do to speak and work egin must be annotated as part of both hitz egin and lan egin.
  • odprawić mszę i pokutę celebrate a mass and a penanceodprawić should be annotated both as part of​ odprawić mszę to celebrate a mass and of odprawić pokutę to celebrate a penance
  • a cere cuiva explicații sau socoteală to ask someone.to explanations or account cere should be annotated both as part of​ cere explicații and cere socoteală
  • imeti dober želodec in dobre živce to have a good stomach to bear something well and good nerves to be mentally strong imeti have must be annotated both as part of​ imeti dober želodec and of imeti dobre živce
4. How to annotate elliptical​ occurrences of VMWEs?
Instances of a VMWE in which all but one lexicalized component were omitted or pronominalized should not be annotated. This concerns in particular the cases where a nominal component is concerned by anaphora. For instance, in this decision was hard but he took it, we should not annotate take and decision or it as an instance of a VMWE. We annotate only the transformations in which the syntactic dependency link between the head verb and the ​lexicalized ​complement is preserved, e.g. the decision which he took.
5. How to annotate VMWEs that seem to belong to more than one category​?

Such hesitation issues should normally be solved by the structural tests. For instance, consider the German expression sich eine Frage stellen SELF a question put to doubt. It may seem to belong to both IRV, since sich is required only if stellen co-occurs with Frage, and LVC, since Frage keeps its original meaning and stellen brings no additional meaning. However, test S.2 [1DEP] indicates that an expression like this should be annotated as a VID, since the verb has more than one lexicalized syntactic dependent.

Similarly, the French expression avoir peur have fear to be afraid seems to have features of a VID. Unlike most LVCs, ­it does not allow a determiner *avoir une peur have a fear , except when the noun is modified avoir une grande peur have a great fear . However, test S.4 [CATEG] in the generic decision tree 2, and the LVC-­specific decision tree indicate that it belongs to the LVC category.

6. How to annotate embedded​ VMWEs?

Candidate VMWEs embedded in other VMWEs should be annotated only if they have a VMWE status also outside the particular context. For instance, the VMWE to let the cat out of the bag should be annotated as a VID, and its embedded VMWE to let out as a VPC.

On the other hand, the French expression se faire des idées SELF make DET.PL ideas to imagine things which are not true, se faire should not be annotated as IRV, since it is not inherently reflexive as a standalone verb+clitic combination.

7. Are existential expressions with there is/are considered VMWEs?

Hesitations about a possible LVC status can arise with respect to existential constructions with nouns introducing events or properties (see test LVC.1 [N­-PRED]) as in:

  • es gibt Beschwerden there are complaints
  • υπάρχουν κατηγορίες there-are problems there are problems
  • there are complaints
  • hay quejas there are complaints
  • arazoak daude problems there-are there are problems
  • il existe des plaintes it there has complaints there are complaints
  • n.a.
  • queixas has complaints there are complaints

Namely, the noun keeps its original sense and the existential verb to be or to have brings no additional meaning. However, a candidate LVC must also pass test LVC.4 [V­-REDUC]. This requires the modification of the noun by the verb's subject, which is impossible with impersonal and empty subjects like there. Therefore, such candidates cannot be LVCs.

Note,​ however, that existential expressions themselves can be VMWEs of the VID type. For instance, in the French example il y a des plaintes it there has complaints there are complaints, two dependents of the verb a has are lexicalized: il it and y there , therefore it is a VID (see test S.2 [1DEP]).

8. How to categorize VMWEs which seem LVCs​ but do not pass all LVC tests?

If at least one of the five LVC tests (9 to 13) is not passed, the candidate is not considered an LVC. For the sake of a deterministic VMWE categorization and higher inter-­annotator agreement, we admit a definition of an LVC which might seem more restrictive than some linguistic studies usually assume. Thus, we exclude from the LVC scope:

  • expressions in which the verb's syntactic subject is not necessarily the noun's semantic subject, like to give courage or to make an impression. These candidates do not pass test LVC.4 [V-­REDUC].
  • expressions where the lexicalized nominal dependent of the verb is its subject, as in the problem lies in something; these candidates do not pass test LVC.4 [V-­REDUC].
  • expressions with aspectual verbs, as in to start, to pursue, to stop a walk. These do not pass test LVC.3 [V-­LIGHT] since they add (aspectual) semantics to the noun. The only exception is when the noun itself is already aspectual, as in to come into bloom
9. Why are verb+noun constructions with pure​ operator verbs​ (to commit, to make, to have etc.) considered LVCs?

Pure operator verbs, i.e. such verbs which never have any semantics per se but only carry the grammatical (tense, mood etc.) information, seem to contradict the intuition behind a VMWE. Namely, they usually select a whole semantic class of nouns. For instance to commit selects any negative act (a crime, a suicide, a theft) and to perform selects any activity (a task, an experiment, a miracle). In this sense, their complements resemble open slots and the whole combinations resemble collocations. However, for the sake of a deterministic VMWE categorization and higher inter­-annotator agreement, we do include verb+noun combinations with pure operator verbs, such as to commit a crime and to ​perform a task, into the LVC category. This is because such combinations pass all tests (LVC.0 through LVC.4). We found no other reliable tests which would distinguish such productive cases from less productive ones like to make a decision. In particular, some studies (e.g. Bonial 2014) show that there exist no truly productive light verbs. Therefore, all examples cited here to be classified as LVCs.

10. Does the IRV category include verbs with non­-reflexive clitics?

No, the IRV category only includes (some) combinations of a head verb with a reflexive clitic. As indicated in the borderline cases page of IRV category, other pronouns, whenever lexicalized, trigger the VID category. Recall that whenever more than one dependent of the verb is lexicalized (including or not a reflexive clitic), the VMWE is always categorized as an ID

  • sich Fragen stellen SELF questions put to doubt
  • n.a.
  • s'en aller SELF of-there go to leave
  • n.a.
  • ucvreti jo to escape her to escape something/someone by running
11. Should nominalizations​ of VMWEs be annotated?

The only nominal VMWE variants within our annotation scope are those:

  • headed by the gerund stemming from the head verb of the VMWE - taking of the decision, and
  • in which a noun stemming from a VMWE is modified by a participle or a relative clause headed by the verb stemming from the same VMWE - the decisions taken yesterday, the decision which he took.

Other nominalizations are excluded:

  • Wortbruch word-break a promise which has not been hold
  • a break-down, a forget-me-not
  • toma de decisiones taking of decisions decision making
    puesta a punto setting to point set-up
  • izen-emate, esker-egite name-giving, thanks-doing inscription, thanks-giving
  • la prise en compte the taking into account the fact of taking something into account, peut-être may-be maybe, porte-feuilles carry-sheets wallet
  • zabawa czyimś kosztem a play at someone else's expenses derived from bawić się czyimś kosztem to enjoy oneself at someone else's expenses
  • un pierde-vară a loses-summer a lazy person
  • šala na tuj račun a joke at someone else's expenses derived from šaliti se na tuj račun to play a joke on someone

For practical reasons (e.g. compatibility with an existing annotation, or usefulness for a particular application) they can be considered language-specific VMWEs but then a new category should be defined for them, so as to keep the universal and the quasi­-universal categories intact

12. How to express hesitation between different VMWE categories?

Once identified in a text, each VMWE is to be assigned to exactly one category. Note that in this version of the guidelines we no longer admit "hesitation labels" (e.g. LVC/VID) used in the pilot annotation. Hesitation can, however, be expressed in a comment and a particular value of the annotator's confidence assigned to a particular VMWE occurrence.

13. How to decide what are the semantic arguments of a noun for borderline cases?

The goal of test LVC.1 is to identify whether a noun is predicative, that is, whether it requires at least one semantic argument. For many classes of abstract nouns, however, it can be tricky to apply the test. We advise listing in a separate document those classes of nouns that pass test LVC.1 in your language. Language teams can also provide links to the documentation of semantic annotation projects such as NomBank for English, which usually include tests and descriptions that help identifying semantic arguments.

We suggest considering that the following categories pass test LVC.1:

  • Illnesses, symptoms and health conditions:
    Ο Γιάννης έχει συνάχι = ο Γιάννης είναι άρρωστος (αρρώστεια is a hypernym of συνάχι)
    Relations:
    Ο Γιάννης έχει σχέση με κάποιον = Ο Γιάννης σχετίζεται με κάποιον
    Ο Γιάννης έχει επαφές με κάποιον = Ο Γιάννης επικοινωνεί με κάποιον (επικοινωνία is a synonym of επαφή)
    Mental content (internal to a cognizer):
    Ο Γιάννης έχει ανησυχία = Ο Γιάννης ανησυχεί
    Ο Γιάννης έχει μια ιδέα = Ο Γιάννης σκέφτεται (σκέψη is a synonym of ιδέα)
    Ο Γιάννης έχει την άποψη = Ο Γιάννης κρίνει (κρίση is a synonym of άποψη)
  • Illnesses, symptoms and health conditions:
    John has a flu = John is ill (illness is a hypernym of flu)
    Relations:
    John has contact with somebody = John contacts somebody
    John has an affair with somebody = John is involved with somebody (involvement is a synonym of affair)
    Mental content (internal to a cognizer):
    John has a worry = John worries
    John has an idea = John thinks (thought is a synonym of idea)
    John has an opinion = John believes (belief is a synonym of opinion)
  • Mental content (internal to a cognizer):
    Miha je v dvomih Miha is in doubts = Miha dvomi Miha doubts
    Miha je mnenja Miha is of opinion = Miha meni Miha believes
    Miha ima predstavo/pojma Miha has an idea = Miha meni Miha thinks (predstava, pojem are synonyms of idea in this context)

Please notice that events and states that have no semantic arguments do not pass test LVC.1, even if they have verbal/adjectival paraphrases:

  • Natural phenomena: rain, snow, tornado, flood, earthquake
    Informational content (external to a cognizer): information, news
  • Natural phenomena: dež, sneg, tornado, poplava, potres rain, snow, tornado, flood, earthquake
    Informational content (external to a cognizer): informacije, novice information, news

Finally, notice that not any verb + predicative noun combination forms an LVC. Additionally, the verb needs to be "light", not adding semantics to the noun. The remaining LVC tests guarantee this.

14. How does one decide if a more or less frozen determiner is a lexicalized VMWE component?

Most of the time, it is easy to test whether a determiner is lexicalized by searching alternatives in corpora (or on the web). For instance, the is lexicalized in to kick the bucket because searches for other determiners (this, a, some, three, many, etc.) either do not return any result or return only literal uses of this verb phrase.

However, borderline cases do exist, in which alternatives are rare but possible, specially for LVCs and decomposable VIDs. For instance, while the standard form of the idiom spill the beans forbids some determiners (#spill three/twenty beans), it is possible to find some variation (spill these/many/all/my/his/more/no beans).

We argue that the selection of some determiners (but not all) by a VMWE is comparable to selected prepositions for verbs. Thus, it can be seen as a regular grammatical phenomenon, suggesting that when the determiner varies, then it should not be included in the annotation scope. Possesive pronouns (my, her, their, etc.) and reflexive clitics (myself, herself, themselves, etc.) are exceptions to this rule (see also Section 1.4). Namely, when they are constrained to agree in number and person with the subject (I do my best, *I do your best), they are realized by different lexemes, i.e., strictly speaking, they are not lexicalized. We consider, however, that - with respect to lexicalization - they constitute single lexemes inflecting for number and gender.

Patricular language teams may of course adopt their own criteria for annotating partly frozen determiners. Then, these decisions should be documented in language-specific guidelines.

15. Should I annotate compound and serial verbs as VMWEs? Of which category?

It depends. In many Indo-European languages (including Germanic, Romance and Balto-Slavic families), verbal chains using auxiliary and modal verbs are used to express tense, mood, modality and aspect. This is a regular linguistic phenomenon, fully productive, that can be applied to any verb and should not be annotated at all.

On the other hand, some languages have idiomatic compound and serial verbs, that is, VMWEs whose lexicalized components are two verbs, and where of them does not express tense, mood, modality and/or aspect with respect to the other one. Therefore, we have created a new category in edition 1.1 to annotate these constructions, called multi-verb construction (MVC), covering examples such as:

  • will sagen want to say that is to say
  • to let go
    to make do
  • querer decir want say to mean
  • ?
  • laisser tomber let fall to give up
    vouloir dire want say to mean
  • lasciar andare let go to unhand
    voler dire want say to mean
  • dać komuś żyćto let someone livenot to bother someone
    można wytrzymaćone can standthe situatiion is reasonably good
  • querer dizer want say to mean
    ouvir falar hear speak to know/remember vaguely
  • n.a.
  • n.a.
16. If an LVC contains a complex (fixed) NP as a dependent, should I include the whole NP or just the head?

The guidelines determine that only lexicalized components should be annotated. Therefore, we suggest that, in such cases, if the NP is compositional, only the head of the NP is included in the scope of the LVC. This may lead to the annotation of odd LVCs that actually never occur by themselves without a modifier. This is not a problem and is already the case for other VMWEs, e.g. the ones that only occur with a determiner, but the determiner is not lexicalized. The only cases where the NP should be included as a whole is if the complement is a non-compositional MWE, so that it would not make any sense to annotate only the head.

  • παίζω το χαρτί του ευρωσκεπτικισμού to-play the paper the.SG.GEN euroscepticism.SG.GEN to use the asset of euroscepticism, to use euroscepticism as an asset
    κάνω στάση εργασίας to-make stop work.SG.GEN to go on strike, to strike → the expression στάση εργασίας is non-compositional (term)
  • darse una larga ducha caliente give.self a long shower hot to have a long and hot shower
  • présenter un Syndrome Coronairien Aigu to present an acute coronary syndrome
    mener une vie de débauche to have a life of pleasures
    faire un faux pas make a false step to commit a faux pas → the expression faux pas is non-compositional
  • mieć wyrzuty sumienia to have reproaches of the conscience to feel guilty
  • fazer uma sessão de fotos/autógrafos to make a photo/autograph session
    fazer roleta russa to make russian roulette to play russian roulette → the expression roleta russa is non-compositional
    ter uma situação financeira/profissional/estável to have a financial/professional/stable situation

Notice that these suggestions also apply to LVCs whose nominal complements are introduced by prepositions (i.e. verb+PP LVCs). As usual, the preposition should be included if it is lexicalized and then the NP introduced by the preposition is analyzed exactly as described above.

If the complex dependent is an acronym, you may want to add the textual comment "PART" to indiate that only part of the full version is lexicalized (generally, the head), just like for contractions and compounds.

In an LVC candidate, if the verb adds aspect to the predicative noun, does it imply failing Test LVC.3?

Depending on the language, aspect can be realised by various lexical, morphological and syntactic means.

  1. We consider aspect a morpological feature in the following cases:
    • Perfective or continuous aspect introduced by inflection and/or analytical tenses:
      • John was making a presentation
        he called her while having a walk
    • Perfective or imperfective aspect inherent to the verb (independently of its inflected form), recognisable either by a prefix or by an ending:
      • pełnić rolęfulfil.IMPERF a roleto play a role
        wypełnić rolęfulfil.PERF a roleto play a role
        wypełni rolęfulfil.PERF a roleto play a role
      • Taja je postavljala vprašanjaTaja was asking questions
        ves čas je dajal napačne napovedi he was always giving wrong forecasts
  2. We consider aspect a semantic feature in the following cases:
    • Starting, continuation or completion is expressed by precise verbs which usually modify other verbs:
      • η Μαρία άρχισε τη συζήτηση Maria started the conversation
        ο Γιάννης διέκοψε την κουβέντα John interrupted the discussion
      • Anthony started his presentation in advance
        the weather interrupted the transmission twice
        we kept our show regardless of the reactions
      • Tomaž je začel svoje predavanje Tomaž started his lecture
        Politik je nadaljeval svojo napoved reform the politician continued his forecast about reforms
        naredili bomo konec onesnaževanju we will make end to pollution we will put an end to pollution

In Test LVC.3, we verify whether the verb adds "light" semantics to the predicative noun. When aspect is expressed as a morphological feature, such as in the first item above, we consider that the verb is light and test LVC.3 passes. However, when aspect is a semantic feature rather than a morphological feature, test LVC.3 fails and we do not have an LVC.

In the LVC decision tree, should I test that the noun keeps its original meaning?

The previous version (1.0) of the annotation guidelines contained Test 10 [N-SEM], which checked if the noun in an LVC candidate preserves one of its original senses. If it did not, the candidate was not an LVC.

In the current version of the guidelines we have abandoned this test because:

  • it proved hard to establish the list original senses of a noun,
  • this test was superfluous with respect to Test LVC.4 [V-REDUC],
  • in some verbal idioms (VIDs) the noun also keeps its original sense, so the test can be misleading for the LVC vs. VID distinction.

Section 7.2

Adding new examples in your language

It is often useful to have examples of a phenomenon shown in your own language. We collect these examples for each language using an online shared spreadsheet, and we present these examples as in the template below:

  • MWEs with their lexicalized components in Arabic are indicated like this.
  • MWEs with their lexicalized components in Bulgarian are indicated like this.
  • MWEs with their lexicalized components in Czech are indicated like this.
  • MWEs with their lexicalized components in German are indicated like this.
  • MWEs with their lexicalized components in Greek are indicated like this.
  • MWEs with their lexicalized components in English are indicated like this.
  • MWEs with their lexicalized components in Spanish are indicated like this.
  • MWEs with their lexicalized components in Basque are indicated like this.
  • MWEs with their lexicalized components in Farsi are indicated like this.
  • MWEs with their lexicalized components in French are indicated like this.
  • MWEs with their lexicalized components in Irish are indicated like this.
  • MWEs with their lexicalized components in Hebrew are indicated like this.
  • MWEs with their lexicalized components in Hindi are indicated like this.
  • MWEs with their lexicalized components in Croatian are indicated like this.
  • MWEs with their lexicalized components in Hungarian are indicated like this.
  • MWEs with their lexicalized components in Indonesian are indicated like this.
  • MWEs with their lexicalized components in Italian are indicated like this.
  • MWEs with their lexicalized components in Japanese are indicated like this.
  • MWEs with their lexicalized components in Lithuanian are indicated like this.
  • MWEs with their lexicalized components in Maltese are indicated like this.
  • MWEs with their lexicalized components in Polish are indicated like this.
  • MWEs with their lexicalized components in Portuguese are indicated like this.
  • MWEs with their lexicalized components in Romanian are indicated like this.
  • MWEs with their lexicalized components in Slovene are indicated like this.
  • MWEs with their lexicalized components in Swedish are indicated like this.
  • MWEs with their lexicalized components in Turkish are indicated like this.
  • MWEs with their lexicalized components in Chinese are indicated like this.

Examples are preceded by the 2-letter language code in parentheses (e.g. EN for English). You can control what languages are shown and hidden by toggling the header buttons. See the section on notation for more information.

In order to see the ID of all examples, make sure the ID button is toggled on the header of the current page. Now look at the template above. You should see this ID: 7.2_A_template-mwe. The 7.2 represents the current section number (in bold in the TOC on the left). The letter A (or B, C, D...) indicates the position of the example inside this page. The name template-mwe is a more human-readable identifier for this example.

The spreadsheet

The spreadsheet can be accessed through this link to Google Docs. From time to time, the guidelines will be updated based on the contents of the spreadsheet.

The spreadsheet is divided into the following columns: ID-section, ID-order, ID-name, lang, HTML-example and Status. In order to edit an example, you need to look at its ID, and then find the appropriate place in the spreadsheet. For example, for the ID 7.2_A_template-mwe, you should look for the lines with ID-order 7.2 (towards the bottom of the spreadsheet). Then look for ID-order A on the second column. Check that the third column contains the ID-name template-mwe.

You will then see a sequence of examples, one for each language. The examples in the template above were collected from this spreadsheet. The rest of this page will teach you how to add you own examples to this spreadsheet.

When adding examples for your own language, we advise you to always start by copying an example that has already been filled in for another language, and then adapting it to your language. Remember that you should not translate an example, but rather find an example of the target phenomenon in your language, regardless if it is a direct translation or not. Therefore, before entering an example in the spreadsheet, you should always check its context using its ID. A quick way to do this is to search (Ctrl+F) the ID of an example in the full-text version of the guidelines (where the ID button is on).

If we notice something wrong or suspicious with your example, we may correct it (e.g. you forgot a closing <lex> tag). If we cannot correct the example, we will ask you to check it by using the last column of the spreadsheet, Status.

If you think that a phenomenon is not relevant for your language or that examples are not needed for a given phenomenon, just leave the corresponding cell empty.

Examples with tags

If you have not done it yet, open the spreadsheet and look for the entry 7.2_A_template-mwe. Let us analyse the English example (look for EN in the fourth column). The fifth column should read as follows:

MWEs with <lex>their lexicalized components</lex> in English are indicated like this.

As you can see, this is exactly the same text that was shown in the template above, except that the lexicalized components are surrounded by the tags <lex> and </lex>. When writing an example, you will often have to use XML tags. We describe below the most important ones.

Bold: you should surround lexicalized components with the tags <lex> and </lex>. For example, consider the code He will <lex>take</lex> a <lex>shower</lex>. This code is presented as follows:

  • He will take a shower

Red: By default, all examples are typeset using the language's color. Sometimes, examples contain counter-examples, that is, something that looks like a VMWE but that should not be annotated. The <nmwe> and </nmwe> tags can be used to represent these non-MWEs, which will be shown in red. For example, the code <nmwe>This is not an MWE</nmwe> yields the following:

  • This is not an MWE

Underlining: Some examples use underlining to focus on some of the words. This can be done with the tags <u> and </u>. For example, the code <nmwe>This is <u>not</u> an MWE</nmwe> yields the following:

  • This is not an MWE

Latin-script transcription: You can optionally provide latin-script transcription if your language does not use latin characters. Latin-script transcriptions must be surrounded by the tags <latin> and </latin>. For example, the code الدرس <latin>ad-dars</latin> generates the example below. The latin transcription should always appear after the example in the original script, and before glosses and translations.

  • الدرس ad-dars

Gloss icon: You should also provide English glosses and translation for your examples. Glosses and translations should always be provided in English, and never in another language. Glosses must be surrounded by the tags <gl> and </gl>. Translations must be surrounded by <trans> and </trans>. English examples can also use the tag <trans> to indicate the meaning of an idiomatic expression. For example, the code <lex>défendre</lex> son <lex>bifteck</lex> <gl>defend one's beefsteak</gl> <trans>to defend one's interests</trans> generates the example below. Notice that the code for gloss and translation is only shown when the user hovers the gloss icon. For consistency, you should always follow this order: original text <latin>transcription (optional)</latin> <gl>the gloss</gl> <trans>the translation</trans>.

  • défendre son bifteck defend one's beefsteak to defend one's interests

Normal: Some examples are presented followed by an explanation, in normal font (black color). This is done by using the tags <n> and </n>. For example, the code some words <n>→ further details</n> generates this:

  • some words → further details

Newline: Sometimes, one may want to add several examples for a single phenomenon in the same language. If they are rather long, they should be presented on separate lines using the tag <br/>. This tag is special as it does not come in pairs: you only write one tag with the slash at the end (technically, it is an empty XML element). For example, the code example 1 <br/> example 2 <br/> example 3 will be rendered as follows:

  • example 1
    example 2
    example 3

Inside normal text, you may also use tags such as <i> (italics), <strong> (bold), as well as other HTML tags. If another language is using a given tag for an example, you can use it too. Otherwise, try to stick to the established conventions.


Section 7.3

Annotation platform FLAT

The annotation will be performed using the online annotation platform FLAT. The documentation of the platform annotation is provided in a separate document. Check the useful links below:


Section 7.4

Best practices

Annotating VMWEs in text is a hard task. Many tests are semantic and require not only a strong knowledge about the language, but also knowledge of advanced notions in linguistics. As a consequence, ensuring annotation quality and, above all, intra- and inter-annotator consistency, is a challenge. We provide here a set of hints that you can use to try to optimize the annotation effort and ensure the quality of the resulting corpus.

Resources and people

This website only covers the annotation guidelines. Do not forget that many other resources are available on the PARSEME shared task 1.1 website. That website is not for system authors, but for language leaders, annotators and organizers. It contains many useful data, notably the names and contacts of people that can help you, and user manuals for FLAT, for the language leaders, etc. Also, you can use the mailing lists if you need to ask questions that could be relevant for other teams as well. In short, don't be shy to ask if you would like to do something but you're not exactly sure where to start :-)

NotVMWE label

The new FLAT configurations for edition 1.1 allow you to use an optional annotation label called NotVMWE. This is not a new VMWE category, but an auxiliary label which simply means "this is not a VMWE". NotVMWE is an optional and useful label you can use to indicate that something should not be annotated, specially if it is a borderline case. Adding this annotation allows you to add a textual comment saying why you decided not to annotate this construction (e.g. after discussing it with fellow annotators and recording the decision in the list of solved cases).

While you don't need to use this label, we recommend that you use it for challenging/hard cases which, in the end, you decide not to annotate as a VMWE. This kind of annotation will be useful when performing consistency checks. Of course, NotVMWE labels will all be removed in the final released corpora, since this kind of information is irrelevant for shared task participants.

List of solved cases

In edition 1.0, some languages have ensured consistency by keeping a separate shared document (e.g. a Google spreadsheet) where hard/challenging cases were documented. We advise language leaders to implement such a list of solved cases. This allows all annotators to contribute to the discussion of hard cases, and to reach a common decision that can be later applied systematically to all occurrences of the expression and for similar expressions. From our experience, this greatly enhances the satisfaction of annotators and saves some valuable time during the consistency checks. Even for languages that have a single annotator, she/he can keep a personal list of difficult cases and their decisions, to ensure intra-annotator consistency.

Consistency checks

Once all files have been annotated, language leaders will perform the final consistency checks using semi-automatic tools. During these consistency checks, all occurrences of a single expression annotated by all annotators will be shown together. There, language leaders may change annotations performed by individual annotators if they are incoherent with the other annotations. Therefore, do not worry too much if you are unsure about an annotation. Try to be as consistent as possible, but if you do not remember a particular annotation performed earlier, it is not necessary to search through the corpus on FLAT (this is quite time-consuming). If there is some minor inconsistency, it will probably be corrected later by the language leader. But note your decision down on the list of solved cases so that next time you come across the same expression (or a similar one) you do not spend so much time thinking about it.

Intuition and tradition vs. guidelines

You may sometimes (often) find that the guidelines do not reflect your intuition about a given construction, or that they contradict the linguistic tradition and literature in your language. We understand that this is frustrating, but please, remember that our main objective is achieving universal modelling of MWEs while preserving diversity. Therefore, please refrain from using undocumented criteria (a.k.a. intuition), or tests that are only known/documented in your language.

The guidelines were designed taking feedback from many language teams into account. They are also meant to continuously evolve, and we do count on you to play an active role in this process. Therefore, if you disagree with their current version, please, choose one of the two options:

  • Follow the guidelines anyway to ensure the corpus-to-guidelines consistency, but express your criticism (documented with glossed and translated examples in your language), best via Gitlab issues. You may also add comments to those annotations which you would like to modify once the guidelines have been enhanced.
  • Create a language-specific section for the guidelines, describing your own tests and decision trees. We will be happy to publish it online.

Inter-annotator agreement

Usually, data annotation campaigns require measuring inter-annotator agreement (e.g. kappa) to verify that the guidelines are clear and that the annotators are well trained. We encourage language teams to measure inter-annotator agreement. However, in the PARSEME shared task, the organizers do not set any hard threshold on the kappa value required to accept your annotations as part of the shared task. This is a collaborative effort, so we do not feel comfortable with making such requirements to language teams.

Furthermore, VMWE annotation is a very hard task so inter-annotator agreement is expected to be low. We recommend that language teams use complementary tools and resources to compensate for the low agreement, such as the list of solved cases and consistency checks mentioned on this page. After the annotation is completed, we may ask you to double-annotate a sample of your data so that we can calculate inter-annotator agreement, for instance, to report it on a corpus description article. But you should not worry too much about this: do your best in trying to understand the guidelines, do not hesitate to suggest improvements, and try to train annotators as much as possible, for instance, with pilot annotations and discussions. This way, you will ensure that the data released in the shared task for your language will be of high quality. And remember you will have the opportunity to improve it incrementally for the next shared task.

TODO label

We have introduced a new label on FLAT called "{change-me} TODO". This label is a temporary mark-up used to indicated that a given VMWE must be dealt with by a human annotator. It will be used when a corpus is automatically converted and some annotations must be manually checked. For instance, the OTH category from shared task 1.0 disappeared in edition 1.1. Therefore, all VMWEs annotated as OTH in the 1.0 corpora will be automatically converted using the TODO label. This means that all TODO labels must be changed into a valid new category (e.g. VID). In the final annotated corpora, any remaining TODO label will be removed, since this is not actually a VMWE category but just an auxiliary label.

Existence questions and corpus queries

Some tests ask if is possible/impossible to find some attested variant of a candidate. While for many cases this is straightforward (the variant can be easily found), some borderline cases will inevitably occur in which it is hard to tell if a given variant is impossible or just very rare.

Decisions for hard cases like this should not be made based solely on introspection and intuition. In case of doubts, we recommend that annotators:

  1. check existing lexicons for their languages
  2. perform corpus queries using any available large raw monolingual corpus
  3. run web queries, e.g. using Sketch Engine, Linguee or plain Google
  4. discuss the case with other annotators, reach a decision and mark it in the list of solved cases

In all cases, the list of lexicons, monolingual corpora and/or web platforms to consult should be agreed upon in advance by all annotators.


Section 8

Glossary

Candidate VMWE

A candidate VMWE is group of tokens that seems to have some idiosyncrasy of the type listed in the MWE definition. However, further tests are required to decide whether it is to be annotated as a true VMWE or, instead, it was a false alarm. The lexicalized elements of candidate VMWEs are highlighted in bold.

Collocation

A collocation is a word co-occurrence whose idiosyncrasy is of statistical nature only. Collocations are not considered VMWEs in this task:

  • цените се покачват prices rise
    играя футбол to play football
  • eine Anfrage beantworten to answer a request, das Diagramm zeigt the diagram shows, mit einem Bus fahren to take a bus
  • the graphic shows
    drastically drop
  • responder a una petición to answer a request
    el diagrama muestra the diagram shows
    coger el tren to take the train
  • zalać rynek to flood the market to dominate the market
    przyznać rację to admit right to admit that someone is right
    uprawiać sport to practice sports
    wzruszać ramionami to shrugging one's shoulders
  • občutno zmanjšati significantly reduce
    drastično zmanjšati drastically reduce

Cranberry word

A cranberry word is a token that does not have the status of a stand-alone word, has no proper distribution, and no stand-alone meaning, but it may have a syntactic category and an inflection paradigm. It only occurs in a particular expression (or a closed list of expressions) and can never be found in different contexts, as the underlined words below:

  • вземам на мушка някого/нещо take on target to critisise heavily somebody/something
  • jemandem Angst einjagen to-someone chase-in fear to frighten someone
    jemanden einen Besuch abstatten
  • to go astray
  • sin decir ni chus ni mus chus is not a stand-alone word without to_say neither chus nor mus without saying a word
    no decir ni chus ni mus chus is not a stand-alone word not to_say neither chus nor mus not to say a word, to remain silent
    hacer algo a troche y moche troche is not a stand-alone word to do something at troche and dulled to do something in a nonsensical way, willy-nilly, haphazardly
  • se mettre martel en tête SELF put a hammer in head to worry a lot
  • odsądzić kogoś od czci i wiary to refuse honor and faith to someone to drag sb's name through the mire/mud, to damage someone's reputation by saying insulting things about them
    sprawiedliwości stało się zadośćjustice has been done
  • pune pe roate - roate is a form found only in expressions in the literary language
  • biti si kvit owe nothing to somebody; each party got what it deserved/asked for

Extended nominal phrase

An extended nominal phrase (ENP) is a notion covering, in a universal way, various types of phrases which convey similar lexical relations in morpho-syntactically different ways (prepositions, post-positions, case markers, etc.), depending on the language. Extended NPs include:

  • noun phrases, i.e. phrases headed by a noun, with its possible syntactic modifiers/complements
    • въпрос question, зелена светлина green light
    • explanation, the dog, many old documents
    • explicación, el perro, muchos documentos antiguos
    • explication, le chien, quelques documents anciens
    • ludzie people, najbliżsi współpracownicy closest collaborators
    • razlaga, pes, številni stari dokumenti explanation, the dog, many old documents
  • prepositonal phrases, in which by a preposition directly governs a noun, or the opposite, depending on a particular linguistic theory
    • за здраве for (good) health
      преди всичко before everything
    • on the bed, after the lesson, in front of the window
    • en la cama, después de la clase, enfrente de la ventana
    • sur le lit, après le cours, devant la fenêtre
    • ze stanowiska from a position
      dla wszystkich for everyone
      z prawdziwego zdarzeniafrom a true event genuine
    • na postelji, po pouku, pred hišo, za steno on the bed, after the lesson, in front of the house, behind the wall
  • noun phrases with case markers
    • предавам богу дух give to god.GEN soul to die
    • n.a.
    • ludzi people.GEN, najbliższymi współpracownikami closest.INST collaborators.INST
    • n.a.
    • mačka cat (nominative), mačke cat (genitive), mački cat (dative), mačko cat (accusative), o mački cat (prepositional), z mačko cat (instrumental)
  • noun phrases with postpositions
    • n.a.
    • n.a.
    • n.a.
    • n.a.
    • n. a.

ENP is close to the UD understanding of the nominal phrase.

Particles

Particles are hard to distinguish from homographic prepositions:

  • ich schlage vor allen zu verzeihen I propose to forgive everyone
    ich schlage vor allen Dingen die Sahne I mix prior to anything the cream
  • to get up a petition
    to get up a hill
  • n.a.
  • jestem zaI an forI am in favor
    jestem za ustawąI an for the lawI am in favor of the law
  • n.a.
  • n. a.

The fundamental property to capture is that a preposition governs a prepositional group, while a particle functions as an adverbial. In some languages particles can also be homographic with verbal prefixes:

  • das Schild um|fahren to drive over the sign
    den See umfahren to drive around the lake
  • n.a.
  • n. a.

Most tests discriminating particles from prepositions and prefixes are language-specific and should be proposed by the individual language team. See the guidelines on particles for more details.

Reflexive clitics

Reflexive clitics are a special type of object pronoun that refers to the subject of the verb. See the guidelines of IRV category for more details. In English, the reflexive is expressed as a suffix -self appended to object pronouns. However, many languages have special reflexive pronouns, which are a relatively small closed class of words:

  • се, си
  • mich, dich, sich, uns, euch
  • me, te, se, nos, os
  • me, te, se, nous, vous
  • mi, ti, si, ci, vi
  • się, sobie
  • me, te, se, nos, vos
  • mă/m-, te, se/s-, ne, vă/v-, se/s- (for accusative); îmi/mi-/-mi, îți/ți-/-ți, își/și-/-și, ne, vă/-vă/v-, își/și-/-și (for dative)
  • se, si

Semantic argument

A semantic argument of a predicative lexical unit (verb, noun, etc.) is a participant of the situation described by the predicative lexical unit that (a) can be realized as a syntactic dependent of the predicative lexical unit, (b) is semantically mandatory, and (c) is specific to that predicative lexical unit.

  • Semantically mandatory participants: a participant is semantically mandatory when it must be mentioned to specify the meaning of the predicative lexical unit. In other words, the realization of the predicative lexical unit implies the existence of its semantically mandatory participants. For instance, a visit cannot hold if there is no visitor or no visitee, courage is a property of a being, a presentation implies the existence of a presenter, of an audience and of a presented topic. Some participants are not semantically mandatory, for instance the addressee is not semantically mandatory for a whisper because one can whisper without an addressee. We restrict semantic arguments to semantically mandatory participants because we believe that this restriction helps delimiting the semantic arguments without resorting to the difficult syntactic argument/adjunct distinction, while not being prejudicial to LVC tests. Notice that semantically mandatory participants do not necessarily occur in a sentence containing the predicative lexical unit, and can sometimes be omitted (e.g. due to coreference or ellipsis).
    • To define a заем loan one needs to mention two participants: the beneficient and the source of the benefit. In other words, the existence of a loan implies the existence of its arguments.
    • To define a presentation one needs to mention three participants: the presenter, the audience and the topic of the presentation. In other words, the existence of a presentation implies the existence of its arguments.
    • To define a opinión opinion one needs to mention two participants: the person who has the opinion and the topic. In other words, the existence of a opinión implies the existence of its arguments.
    • To define a conseil advice one needs to mention two participants: the adviser and the advised person. In other words, the existence of a conseil implies the existence of its arguments.
    • To define a dochód profit one needs to mention two participants: the patient who benefits and the source of the benefit. In other words, the existence of a benefit implies the existence of its arguments.
    • To define a opinião opinion one needs to mention two participants: the person who has the opinion and the topic. In other words, the existence of a opinião implies the existence of its arguments.
    • To define a prezentarepresentation one needs to mention three participants: the one who presents, the topic of the ptresentation and the person to whom the topic is presented. In other words, the existence of a prezentare implies the existence of its arguments.
    • priti v poštev to come into consideration to be considered
      imeti mnenje to have an opinion to believe
  • Specific participants: some semantically mandatory particiants are generic and we do not consider them to be semantic arguments. For instance, the existence of a presentation implies that it occurred in a given time and place, so these are semantically mandatory participants. However, time and place are implicit to any event, and are not specific to the predicative noun presentation. Participants that denote non-specific characteristics of the predicative lexical unit and thus can be interpreted independently of the predictive lexical unit (for a large class of predicative lexical units), such as time, place and manner for most predicates, are not considered as semantic arguments.

Semantic arguments are generally mentioned in the dictionary definition of a predicative lexical unit. One useful source for determining the semantic arguments of a given lexical unit are semantic lexicons such as Framenet and Propbank. Our definition of semantic argument is closely related to Framenet's core frame elements. Language teams are encouraged to use available resources and/or to provide language-specific documentation to help identifying semantic arguments.

Subcategorization frame

A subcategorization frame of a verb describes how syntactic arguments are realized as the verb's dependents, for a given sense of the verb. A subcategorization frame indicates morphological and syntactic features of a verb's dependents, namely the required prepositions, postpositions and case markers of the subject, direct and oblique objects. For instance, one subcategorization frame for to return meaning to give back would be:

  • return: [NP]subject + [NP]direct object + [to NP]oblique
    • Example: [my sister]subject returned [the book]direct-object [to the library]oblique

Notice that the semantic characteristics of the dependents (a.k.a. selectional restricitons or preferences) are not considered as part of the subcategorization frame. For instance, the fact that the subject is animated (somebody) or inanimated (something) is irrelevant for subcategorization frames. Verbs can have many senses and each sense can have many subcategorization frames. For instance, the verb to return in the same sense can also be used with the subcategorization frames NPsubject + NPdirect-object ([my sister]subject returned [the book]direct-object) and NPsubject + NPoblique + NPdirect-object ([my sister]subject returned [me]oblique [the book]direct-object).

Syntactic argument

Typically, verbal lexical units have dependents that can be syntactic arguments or adjuncts, depending on their status (mandatory/specific or not). For instance, in John walked in the forest yesterday all three dependents (the entity walking, the time and the place) add semantics to the predicate, but time and place can be interpreted independently of the semantics of the verb, and could be omitted. Thus, John is a syntactic argument while the other dependents are syntactic adjuncts. Typically, time and place are considered as syntactic adjuncts, and never as syntactic arguments.

Beyond verbs, nouns, adjectives and adverbs can also have arguments. For example, the noun cause cannot normally appear by itself; rather, one must always talk about the cause of X, with X as the syntactic argument of the noun cause. Similarly, the noun contact has two arguments: the contact of X with Y.

Distinguishing between semantic arguments and adjuncts can be tricky, and we will not go into the details of the polemic argument/adjunct distinction. In addition to usual tests for argument-adjunct distinction described in the linguistic literature, we advise language teams to use language-specific resources (e.g. valency dictionaries) that sometimes encode the syntactic argumental structure of lexical units.

Most of the time, syntactic and semantic arguments coincide, but not always. For instance, in I translated a book., there is no syntactic argument expressing the source and target languages, which are semantic arguments of translate. Therefore, we distinguish both notions in our guidelines. Syntactic arguments describe the linguistic structure of lexical items whereas semantic arguments are related to the conceptual structure of predicates.

Syntactic operator

A syntactic operator is a verb that only bears the grammatical features (person, number, tense and mood) but adds no semantics to the complement. This definition is more restricted that the traditional notion of a light verb. Notably, aspectual light verbs (which adds aspectual semantics to the complement), as in to start a walk, to give courage, are not considered operators. Operators are typical head verbs of light-verb constructions:

  • отдавам почит to give tributeto pay tribute
  • eine Entscheidung treffen to make a decision
    Angst haben to have fear
    ein Verbrechen begehen to commit a crime
  • to make a decision
    to have fear
    to commit a crime
  • tomar una decisión
    tener miedo
    hacer ilusión
  • oddać hołd to give-back tributeto pay tribute
  • priti v poštev to come into consideration to consider

Unexpected change in meaning

An unexpected change in meaning, signaled by the # (hash) sign, is a phenomenon referred to in generic and category-specifc tests, based on the notion of inflexibility​. Inflexibility is verified by attempting a regular modification which yields an unexpected acceptability or meaning shift, that is, beyond what would be expected by the initial modification. In order to judge whether a shift in acceptability or meaning is unexpected, one can try to apply the same modification to a similar compositional construction, using analogy. For example, book and word have synonyms including notebook/novel/volume/publication and term/expression/headword, respectively. However, while the slight shift in the meaning of book is compositionally reflected in:

  • давам ти книгаI give you a book давам ти тетрадка/роман/том/учебник I give you a notebook/novel/volume/textbook
  • Ich gebe dir mein Buch I give you my book Ich gebe Dir mein(e) Publikation/Doktorarbeit/Kapitel/Novelle/Ausgabe I give you my publication/thesis/chapter/novel/edition
  • Te doy mi libro I give you my book Te doy mi(s) publicación/tesis doctoral/capítulo/novela/edición I give you my publication/thesis/chapter/novel/edition
  • I give you my book I give you my notebook/novel/volume/publication
  • daję ci książkęI give you a book daję Ci zeszyt/powieść/tom/publikację I give you a notebook/novel/volume/publication
  • îți dau carteaI give you the book îți dau caietul/romanul/volumul/publicația I give you the notebook/novel/volume/publication
  • dam ti besedo I give you a wordI promise #dam ti izraz/zlog/glagol I give you a word/syllable/verb

the same does not hold for:

  • давам ти дума I give you a wordI give you my word #давам ти слово/израз/текст I give you a word/expression/text
  • Ich gebe Dir mein Wort I give you my word, i.e. I promise #Ich gebe Dir mein(e) Publikation/Doktorarbeit/Kapitel/Novelle/Ausgabe I give you my publication/thesis/chapter/novel/edition
  • Te doy mi palabra I give you my word, i.e. I promise #Te doy mi(s) publicación/tesis doctoral/capítulo/novela/edición I give you my publication/thesis/chapter/novel/edition
  • I give you my word #I give you my notebook/novel/volume/publication
  • daję ci słowo I give you a wordI give you my word daję Ci wyraz/sylabę/czasownik I give you a word/syllable/verb
  • Îți dau cuvântul I give you my word #Îți dau caietul/romanul/volumul/publicația I give you my notebook/novel/volume/publication
  • dati komu besedo to give (someone) a wordto promise someone

That is, the latter replacement produces an unexpected change of meaning that goes beyond the semantic difference between the original and the replaced word. Thus, Test VID.2 [LEX] applies and:

  • давам своята дума to give one's word to someone
  • jmd. sein Wort geben to give one's word to s.o.
  • to give one's word to someone
  • dar a alguien tu palabra to give one's word to s.o.
  • dać komuś słowo to give someone a wordI give one's word to someone
  • a-ți da cuvântul cuiva to give your word to someone
  • n.a.

is a VMWE.

Similarly, Test VPC.1 [V+PART-DIFF-SENSE] refers to an unexpected change in meaning of the verb stemming from the addition of the particle. We do so by checking if the situation described by the verb with the particle implies the one described without the particle:

  • n.a.
  • Ich fange das Buch an I begin to read the book does not imply Ich fange das Buch I catch the book
    Ich lege das Buch auf dem Tisch ab I put down the book on the table implies Ich lege das Buch auf den Tisch I put the book on the table
  • to check in upon arrival does not imply to check upon arrival (it is VPC)
    to look up into the sky implies to look into the sky (it is not a VPC)
  • n.a.
  • n.a.
  • n.a.
  • n.a.

Ungrammaticality

Ungrammaticality of an utterance is its non-conformity to the syntactic or semantic rules of the language. We suppose that ungrammaticlity judgement is a basic competence of a native speaker of a language. Ungrammatical examples are signaled with * (star).


Section 9

Contact

These guidelines were written by many authors. If you have questions, comments, suggestions, you can send them to the shared task mailing list or contact one of the organisers. They will then forward your message to technical team, guidelines editors, language group leaders or language leaders.

Email addresses are available on the who-is-who document.