Annotation guidelines
PARSEME shared task on automatic identification of verbal MWEs - edition 1.0 (2017)

Idioms (ID)

Idioms constitute a universal category. An idiom (ID) has at least two lexicalized components including a head verb and at least one of its arguments. The argument can be of different types. Here are some examples:

  • Subject
    • броят му се ребрата be counted someone's (possessive pronoun) ribs (someone) to be very thin and skinny
    • ein kleines Vöglein hat mir gezwitschert a little bird told me
    • a little bird told someone
    • tu hora ha llegado your time has arrived your time has come
    • licho wie devil knowsI have no idea
    • a sua hora chegou your time has arrived your time has come
    • a șoptit o păsăricăwhispered a bird little a little bird told someone
    • srce mu je padlo v hlače His heart fell into his pants someone has lost courage
  • Direct object
    • гушна букета hug the bunch of flowers to die
    • er hat den Schuss nicht gehört he did the shoot not hear it takes him a long(er) time to understand sth
    • to kick the bucket
    • estirar la pata to strech the leg kick the bucket
    • udać Greka to pretend to be a Greekto pretend not to understand
    • bater as botas to hit the boots to die, abrir mão de algo to open hand (of something) to give up (on something)
    • a arunca vina to throw guilt-the to blame
    • ustreliti kozla to shoot the goat to say or do something stupid
  • Circumstantial or adverbial complement
    • удрям в гръб hit in the back to stab in the back
      правя сам да си говори make (someone) to talk to himself to drive (someone) crazy
    • etwas wie warme Semmeln verkaufen sth. like warm bread rolls to sell sth. fast and easy
    • to take something with a pinch of salt, to sell like hotcakes, to strike while the iron is hot, to come off with flying colors
    • coger algo con pinzas to hang something with pegs take something with a pinch of salt
    • wiercić komuś dziurę w brzuchu to drill a hole in one's bellyto intrusively solicit someone, to insist too much
    • levar em conta to bring in account to take into account
      ir ao ar go to the air to go on air
    • a lua în considerare to bring in account to take into account
    • spati kot ubit to sleep like dead to sleep soundly

It is often challenging to distinguish IDs from other VMWE categories if only one argument of the head verb is lexicalized. The VMWE categorisation depends on the category of this argument:

  • Noun or preposition governing a noun: fine-grained tests need to be applied in order to discriminate between an LVC and an ID. See the section on Structural tests.
  • Particle or reflexive pronoun: the VMWE is either a VPC (particle) or an IReflV (reflexive pronoun), never an ID.

With an argument of any other category, the VMWE is always an ID, including the following:

  • Preposition governing a complex noun phrase
    • удрям някого в гръб hit someone in the back to stab someone in the back
    • jmd springt im Dreieck s.o. jumps in the triangle s.o. can soon no more control his anger
    • to take something with a pinch of salt
    • coger algo con pinzas to hang something with pegs take something with a pinch of salt
    • dopiąć coś na ostatni guzik to button something up to the last buttonto complete something
    • bater na mesma tecla to hit the same key to insist on something
    • a da cu piciorul to give with leg-the to give up the chance
    • skrivati glavo v pesek to hide head in the sand to pretend not to see a problem
  • Adjectival phrase
    • schwarz fahren to drive black to take a ride without a ticket
    • to come clean, to stand firm
    • jugar sucio to play dirty to play dirty
    • zrobić swoje to do one's ownto do what one is supposed to do
      tykać cudze to touch someone else'sto take something that does not belong to you
      dopiąć swego to button up one's ownto fulfill one's plans
    • to jogar sujo to play dirty
    • a juca murdar to play dirty
    • biti zelen od zavisti to be green with envy
  • Verbal phrase
    • will sagen want to say that is to say
    • to make do
    • n.a.
    • laisser tomber to let fall to give up
    • dać komuś żyćto let someone livenot to bother someone
      można wytrzymaćone can standthe situatiion is reasonably good
    • querer dizer to want to say to mean
    • n.a.
    • n.a.
  • Relative clause
    • wissen wo es langgeht to know where things are heading to know on which side one's bread is buttered
    • to know on which side the bread is buttered
    • wiedzieć, skąd wieje wiatr to know where wind blows fromto know on which side your bread is buttered, to know how to take advantage of the situation
    • saber onde pisar know where to-step to know the way to succeed in something
      mostrar com quantos paus se faz uma canoa show with how many sticks one makes a canoe to punish or take revenge
    • a ști cu ce se mănâncă to know with what CL.Refl. eats to knwo what it is about
    • vedeti koliko je ura to know what time it is to realize the truth
  • Non-reflexive pronoun
    • es gibt it gives there is
    • τα καταφέρνω, την πατάω
    • to make it
    • l'emporter to take it away to win
    • prender le to take it to be beaten
    • Polish does not seem to have this type of VMWEs
    • dá-lhe João! give to him/her, João! show them what you got, João!
    • a o șterge to her delete to fly the coop
      a o întinde to her extend to fly the coop synonymous expressions with the non-anaphoric feminine ACC personal clitic 'o' functioning as an expletive
    • ucvreti jo to escape her to escape something/someone by running

Sentential expressions with no open slots, such as proverbs and conventionalized sentences, are included in the scope of IDs.

  • Rom wurde nicht an einem Tag erbaut Rome was not build in a day wer A sagt muss auch B sagen who says A must also say B you must finish what you start
  • Rome was not built in a day
    Fortune favors the bold
    The pleasure is mine
    I beg your pardon!
  • trafiła kosa na kamień met the scythe a stonesomeone rude/dishonest came across someone else who used similar methods against him/her
  • quem vê cara não vê coração who sees face doesn't see heart a person can lie/omit his/her feelings
  • Urciorul nu merge de multe ori la apă Pitcher-the not goes of many times at water The pitcher goes so often to the well that it is broken at last
  • Počasi se daleč pride more haste less speed
    Po toči zvoniti je prepozno there is no use ringing the bells after hail it is to late

If more than one argument of the head verb is lexicalized, then the candidate VMWE it is always classified as an ID.

  • die Katze aus dem Sack lassen to let the cat out of the bag
  • to let the cat out of the bag, to cut a long story short, to call it a day
  • se faire des idées to make SELF ideas to imagine something false,s'en aller to go SELF from there to leave,il y a it has there there is
  • chować głowę w piasek to hide head in sandto pretend not to see a problem
  • tapar o sol com a peneira to hide the sun with a sieve to sugar-coat
  • a da bir cu fugiții to give tribute with fugitives.the to back away
  • att sätta sig upp mot någon to sit oneself up against someone to defy someone
    att dra sitt strå till stacken to draw one's straw to stack.the to contribute (in a small way)
  • beseda mi je ostala v grlu word got stuck in my throat I am speechless

In case of several lexicalized arguments, special care must be taken to identify and also annotate embedded VMWEs.

  • einen Plan aufstellen to set up a plan to draw up a plan → contains the VPC aufstellen to set up
  • to let the cat out of the bag → contains the VPC to let out
  • se faire des idées to make SELF ideas to imagine something false → contains the non-VMWEs se faire and faire des idées
  • bać się własnego cienia to fear SELF one's own shadowto be very timid → contains the IReflV bać się to fear SELFto be afraid
  • virar-se nos trinta turn-RCLI in-the thirty to get by contains the synonymous IReflV virar-se to get by ≠ virar to turn/become
  • a da cărțile pe față to give cards.the on face to reaveal one's true intentions → contains the ID a da pe față to reveal
    a-și da arama pe față to give his/her copper.the on face to reveal his/her true (evil) nature → this is even more complicated since, besides the ID a da pe față, the IReflV has to be annotated as well - a three-level embedding
  • delati se norca iz koga to make RCLI fool of someone to make fun of someone → contains the IReflV delati se to make oneself to pretend

Idioms whose head verb is the copula (to be) can pose special challenges because their complements may be (nominal, adjectival, etc.) MWEs themselves. In this task, we consider constructions with a copula to be VMWEs only if the complement does not retain the idiomatic meaning when used without the verb.

  • sei kein Frosch be no frog be no chicken → idiom because #kein Frosch no frog loses the meaning
  • to be no chicken → idiom because #no chicken loses the meaning
    to be somebody → idiom because #somebody loses the meaning
    it is double Dutch to me → non-VMWE because the copula can be omitted, as in he seems to speak double Dutch
  • Ser un pelota to be a ball to suck/butter up
  • być jedną nogą na tamtym świecie to be with one leg in the other worldto be close to death idiom because #jedna noga na tamtym świecie one leg in the other world loses the meaning
    być do rzeczy to be to the thingto be relevant → non-VMWE because the copula can be omitted, as in dał parę argumentów całkiem do rzeczyhe gave a couple of quite relevant arguments
  • ser alguém na vida to be somebody in life to be somebody → idiom because #alguém na vida loses the meaning
    não ser flor que se cheire to not be a flower that one may smell to be an untrustworthy person → idiom because #flor que se cheire loses the meaning
    isso é grego pra mim that's greek to me → non-VMWE because the copula can be omitted, as in você está falando grego
  • a fi ușă de biserică to be door of church to be honest → idiom because #ușă de biserică loses the meaning
    a fi un papă-lapte to be a eat-milk to be a piker → idiom because #un papă-lapte preserves the meaning
  • biti trn v peti komu to be a thorn in somebody's heel to be a big problem, obstacle → idiom because #trn v peti loses the meaning

Note that special care must be taken in languages in which the copula omission is a regular or even a compulsory phenomenon (e.g. in Russian). In those cases, language-specific tests are required to distinguish a copula-based idiom from a non-verbal MWE.

Idioms typically have both a literal and an idiomatic reading. Thus, they are closely connected to the phenomenon of a metaphor (see also the section on VMWEs versus metaphors). This often makes them semantically totally non-compositional, i.e. none of their lexicalized components retains any of their original meanings.Some authors argue though that partial semantic compositionality can be obtained via decomposability, e.g. to spill the beans is compositional provided that to spill is paraphrased as to reveal and the beans as a secret