Background

Introduction Bibliography News CV Publications

Introduction

The kicktionary is a multilingual (German – English – French) electronic dictionary of the language of football (soccer). It was developed between September 2005 and July 2006 during my stay as a visiting researcher at the FrameNet project at the International Computer Science Institute (ICSI) in Berkeley.

The main aim of the project was (and is) to explore how linguistic theories about lexical semantics, methods from corpus linguistics, technologies for hypertext and hypermedia and techniques from computer language processing can help to make lexical resources that are better than (or: good in a manner different from) traditional paper dictionaries.

Theoretical Background

The theoretical starting points in the development of the kicktionary were

The treatment of synonymy, translation equivalence and other lexical relations was loosely modelled after the WordNet approach (Fellbaum 1998). In constructing the kicktionary and its website, I have used as a guideline the seven theses concerning the use of hypertext in lexicography formulated by Storrer (2001).

Overview

The kicktionary currently contains close to 1,900 lexical units (nouns, verbs, adjectives and idiomatic expressions) in German, English and French. For each lexical unit, there are between one and ten annotated example sentences from a corpus of football match reports. The annotations identify the lexical unit itself as well as its arguments and, as the case may be, a support verb or support preposition.

GermanEnglishFrenchAll
Lexical Units7925995351926
Nouns4513182901059
Verbs305248201754
Other363344113
Examples3551237422398164
Spoken language examples12600126
Example / LU4.483.964.194.24
Annotated FEs57313882364713260
Annotated Supports5542933401187

Based on an analysis of their semantics and argument structure, lexical units are grouped into roughly a hundred frames such that lexical units in the same frame share important semantic and syntactic characteristics. The frames, in turn, are assigned to to one of 16 scenes, where each scene corresponds to a prototypical event (e.g. a goal or a one-on-one situation) of a football match.

In addition to the scenes-and-frames-hierarchy, lexical units are also organised into synsets, i.e. into groups of words with identical or largely similar meanings. Synsets, in turn, are the building blocks of a number of concept hierarchies, each of which organises a set of synsets into a tree via lexical relations such as hypernymy/hyponymy (X is-a-kind-of Y), holonymy/meronymy (X is-a-part-of Y) and troponymy (to X is to Y in some way).

The following figure gives a schematic overview of the data structure.
Diagram Data Structure

The following sections will explain the highlighted words of this section in more detail.

Corpus

The core corpus used in this project is a collection of German, English and French football match reports from the UEFA website (www.uefa.com). For each language, there are roughly 500 such texts, amounting to around 200,000 words. This core corpus is partly parallel, i.e. many texts (about half of them) are direct translations of one another. For German, the corpus contains additional match reports from the website of the football journal kicker (www.kicker.de - about 1,200 texts = 750,000 words) and about an hour (= ca. 10,000 words) of transcribed spoken soccer commentary from German radio.

Medium Source Language Competition(s) #texts #words
Core Corpus written UEFA.com German Champions League, UEFA cup, World Cup Qualification 486 ca. 200.000
English 535 ca. 230.000
French 482 ca. 240.000
Other Languages Satellite Corpus Portuguese 469 ca. 250.000
Italian 478 ca. 220.000
Spanish 479 ca. 250.000
German Satellite Corpus kicker.de German Champions League 1.242 ca. 700.000
Bundesliga
spoken NDR/SWR Radio 65 ca. 15.000

Lexical Units, Example Sentences and Annotations

A lexical unit (LU) is a pairing of a word form with a meaning, corresponding to a “word sense” in a traditional dictionary. In the kicktionary, each LU is illustrated by a number of example uses taken from the corpus described above. Besides the LU itself, the annotation in an example sentence identifies and assigns a label to the arguments of the LU. For instance, one example sentence of the verbal LU “to sidestep” is annotated in the following way:

(1) [Yattara]PLAYER_WITH_BALL sidestepped [his marker]OPPONENT_PLAYER and shot in from an acute angle.

Different LUs have different sets of such argument labels, and not every example need make use of all of them. For instance, the following three sentences, all belonging to the LU “to volley”, share some argument labels and differ in others:

(2a) [Kekic]SHOOTER volleyed [wide]TARGET at the other end.
(2b) [Kuijt]SHOOTER volleyed [in]TARGET [a Goor cross]MOVING_BALL [from close range]SOURCE to remove any doubt.
(2c) [He]SHOOTER volleyed [the ball]BALL [low]PATH [beyond José Moreira]TARGET

For each LU, a table of the following kind gives an overview of what arguments are used in the example and how they are realised:

LU BALL MOVING BALL PATH SHOOTER SOURCE TARGET
volleyed Kazakhstan captain Samat ... over the bar
volleyed Kekic wide
volleyed his return cross The midfield player into the net
volleyed a Goor cross Kuijt from close range in
volleyed the ball low he beyond José Moreira
was volleyed Gudjohnsen's cross by Smertin in

As with verbal LUs, arguments are also annotated with nominal and adjectival LUs. The following is an annotated example for the noun “volley”:

(3) [Huggel's]SHOOTER [right-foot]PART_OF_BODY volley [from the penalty spot]SOURCE flashed narrowly wide.

Often, a nominal LU is integrated syntactically into the sentence with the help of a support construction. Thus, in the following example, the nominal LU “foul” is treated as the semantic predicate of the sentence, and the verb “to commit” is annotated as a support verb - it serves to syntactically integrate this predicate into the sentence and to provide it with tense and aspect information, but otherwise contributes little to its meaning.

(4) [The Parma backline]OFFENDER committed countless fouls in the first half.

For each annotated example sentence, a link is provided into the corpus text from which it was taken. A few (about 100) examples come from the spoken language part of the corpus. For these examples, the corresponding part of the original audio recording can be played.

Scenes and Frames

When a number of lexical units share a basic meaning, when the perspective they take on a given event is the same, when they allow for comparable sets of arguments to be used with them, and when these arguments exhibit similar semantic relations to one another, the lexical units can be grouped into a structural entity called a frame. Thus, as the following examples illustrate, the LUs “to sidestep”, “to beat” and “to nutmeg” all share the basic meaning of “to overcome an attacking player in a one-on-one situation”, and they all have argument slots for constituents describing the player in possession of the ball, the opponent player and the area where the action takes place. They are therefore grouped into the frame “Beat” (and the argument labels on this level are called frame elements).

(5a) [Yattara]PLAYER_WITH_BALL sidestepped [his marker]OPPONENT_PLAYER and shot in from an acute angle.
(5b) [Kryzstalowicz]PLAYER_WITH_BALL had beaten [Jean-Alain Boumsong]OPPONENT_PLAYER [in the penalty area]AREA .
(5c) [Hector Font]PLAYER_WITH_BALL tried to nutmeg [Ioannis Skopelitis]OPPONENT_PLAYER.

Note that this analysis can be applied across different languages and across different parts of speech. Thus, the German verbal LU “tunneln” (lit. 'to make a tunnel' – en. 'to nutmeg'), just like the French nominal LU “petit pont” (lit. 'little bridge' – en. 'to nutmeg') also belong in the “Beat” frame:

(5d) [Diogo Rincón]PLAYER_WITH_BALL tunnelte [Paul Freier]OPPONENT_PLAYER [im Strafraum]AREA.
(5e) [Hector Font]PLAYER_WITH_BALL tentait le petit pont [sur Ioannis Skopelitis]OPPONENT_PLAYER .

The kicktionary provides a schematic overview of the following kind to represent frames and give an overview of the LUs in them and their argument labels:

  1. OPPONENT_PLAYER
  2. PLAYER_WITH_BALL
  3. AREA
  4. ACTION
  5. CHALLENGE
* 1 2 3 4 5
ausdribbeln.v
tunneln.v
vernaschen.v
beat.v
nutmeg.v
sidestep.v
mystifier.v
petit_pont.n
se_jouer.v

Other LUs may refer to the same event, but take a different perspective on it or focus on a different stage of it. Thus, just like the LUs in the “Beat” frame, other LUs may also refer to a one-on-one situation, but describe the event itself rather than its outcome. The LU “to challenge” does this from the perspective of the opponent player, whereas the LU “to take on” takes the perspective of the player in possession of the ball. These two LUs therefore go into the frames “Challenge” and “Take_On”, respectively.

(6a) [Boumsong]OPPONENT_PLAYER challenges [Kryzstalowicz]PLAYER_WITH_BALL.
(6b) [Kryzstalowicz]PLAYER_WITH_BALL takes on [Boumsong]OPPONENT_PLAYER.

Similarly, the LU “to dispossess” describes another outcome of a one-on-one situation, namely that in which the opponent player manages to take the ball from the player in possession. This LU therefore goes into the frame “Deny”

(7) [Boumsong]OPPONENT_PLAYER dispossesses [Kryzstalowicz]PLAYER_WITH_BALL.

To summarize: LUs in the frames “Beat”, “Deny”, “Challenge” and “Take_On” have in common that they refer to a one-on-one situation and consequently share some or all of their argument labels, but they differ in the specific perspective they place on this event. To capture the former fact, these frames (along with some others) are related to one another via a further structural entity – a scene.

A scene is meant to correspond to what knowledge a speaker can activate about a certain prototypical situation. In the present case, this will mean that he or she knows what actors and objects typically participate in a one-on-one situation (two players, the ball), where this event usually takes place (on some location on the pitch) and what substages and outcomes (the opponent player is beaten or he dispossesses the player with the ball) it can have.

Another example of a scene can be illustrated with the following annotated examples of the LUs “to trip”, “to award”, “to win”, “to concede” and “to caution”:

(8a) [Costinha]OFFENDER tripped [Ignashevich]OFFENDED_PLAYER [just inside the area]AREA.
(8b) [The referee]REFEREE awarded [a penalty]COMPENSATION to [CSKA Moscow]OFFENDED_TEAM.
(8c) [Ignashevich]OFFENDED_PLAYER won [a penalty]COMPENSATION for [CSKA Moscow]OFFENDED_TEAM.
(8d) [Costinha]OFFENDER conceded [a penalty]COMPENSATION [by tripping Ignashevich]OFFENSE
(8e) [The referee]REFEREE cautioned [Costinha]OFFENDER [for his foul on Ignashevich]OFFENSE.

These all describe some aspect of a situation centred around the event of one player committing a foul on an opponent and the referee reacting to this with an appropriate compensation and/or sanction. Because they take different perspectives on that event, the LUs belong to different frames (“Foul”, “Referee_Decision”, “Win_Compensation”, “Concede_Compensation” and “Sanction”, respectively), and the fact that they refer to the same prototypical situation is captured by relating them to one another via the scene “Foul”.

In contrast to a frame which is defined via the properties of linguistic entities it contains, the knowledge to which the structural entity of a scene corresponds can, but need not have an explicit verbalization. From the point of view of a dictionary, this means that a textual description, a short film or a schematic diagram may all be equally adequate representations of a scene. In the kicktionary in its present form, most scenes take the form of a short text, supplemented by a schematic diagram, a drawing or a photo wherever appropriate. For instance, the following diagram is a part of the description of the “Pass” scene:

Diagram for the Pass Scenario

Synsets

The term “synset” is taken from WordNet terminology where it is defined as “[a] synonym set; a set of words that are interchangeable in some context.” I have used the notion of a synset not only to group synonymous lexical units within a language (e.g. penalty and spot-kick), but also to pair a given lexical unit in one language with a potential translation equivalent in another language (e.g. en. spot-kick, de. Strafstoß and fr. coup de pied de réparation). I am aware that this is a non-trivial extension of the original concept, but have found it to be usable and useful in most cases. The following are two examples of such multilingual synsets:

(9a) {Abwehrspieler.n; Verteidiger.n; defender.n; arrière.n; défenseur.n} (“a defensive player”)
(9b) {bringen.v; einwechseln.v; bring_on.v; introduce.v; throw_on.v; lancer.v} (“to make a new player come into the game”)

Whereas these examples have at least one lexical unit for every language, there are also synsets that are incomplete in the sense that they contain LUs from only one or two languages:

(9c) {köpfen.v; head.v} (“to play the ball with the head” - no French equivalent)
(9d) {Abspielfehler.n; Fehlpass.n} (“a bad pass” - no English and no French equivalent)

If a user looks up a specific lexical unit in the kicktionary, he will be given links to other members of the synset to which this LU belongs.

Concept Hierarchies

Concept hierarchies organise synsets into trees using different semantic relations. Like the synsets themselves, the concept hierarchies are modelled after the WordNet approach. Currently, three different such semantic relations are taken into account.

Hypernymy/hyponymy holds between two nominal synsets X and Y if one is a more specific subclass of the other, i.e. if X is-a-kind-of Y. Thus, in the following example, since a defender is a kind of player, the corresponding synsets are placed in a hypernymy hierarchy where the second synset is a parent of the first.

(10a) {Verteidiger.n; defender.n; défenseur.n} is-a-kind-of {Spieler.n; player.n; joueur.n}

Holonymy/Meronymy holds between two nominal synsets X and Y if one is a constituent part or a member of the other, i.e. if X is-part-of Y. Thus, in the following examples, since a defender is a member of the defence, the corresponding synsets are placed in a holonymy hierarchy where the second synset is a parent of the first.

(10b) {Verteidiger.n; defender.n; défenseur.n} is-part-of {Abwehr.n; defence.n; défence.n}

Troponymy holds between two verbal synsets X and Y if verbs in one of them express a specific manner elaboration of verbs in the other, i.e. if to X is to Y in some way. Thus, in the following example, since to centre is to pass in some way, the corresponding synsets are placed in a troponymy hierarchy where the second synset is a parent of the first.

(10c) to {flanken.v; centre.v; centrer.v} is to {passen.v; pass.v; passer.v} in some way

The concept hierarchies are less complete and have been less thoroughly revised than the scene-frame hierarchy. This is especially true of the troponymy hierarchies. I have not yet tackled the question of semantic relations between adjectival synsets.

Looking up a LU in the kicktionary will offer the user a link to its superordinate synsets in different concept hierarchies if such a superordinate synset exists. I have decided not to include the subordinate concepts into the LU entries, because these may be very numerous and thus clutter up the screen to much. Instead, it is possible to get from a given LU to a tree representation of all the concept hierarchies to which the LU belongs.

Disclaimer and Call for Feedback

Although I think that the kicktionary in its present form is more than the notorious “prototype” or “lexicon fragment”, I am aware that it is incomplete and imperfect in several ways.

It is incomplete, firstly, because I have neglected some subdomains of soccer language altogether. For instance, words to do with a football team's position in a league table (e.g. “relegation”) have not been included in the data, mainly because the texts from my corpus are almost exclusively about matches taking place in cup competitions (i.e. not in a league).

Secondly, as the number of LUs for each language shows, German has been treated in somewhat more depth than English and French. This is mainly due to the fact that I am a native speaker of German and that the German part of the corpus is roughly five times the size of the English and the French part, both of which made finding German LUs easiest. Consequently, some potential translation equivalents may be missing from the kicktionary even though they exist.

Thirdly, I think that – all other means of characterising and relating the meaning of lexical units notwithstanding – a good dictionary should give a free text definition for each item. However, the kicktionary currently contains no such definitions, because I lacked the time for this labour-intensive task.

Moreover, the kicktionary will also almost certainly contain a number of genuine errors. The annotations have not been verified by anybody other than myself, so it is likely that I have in some places missed out or labelled wrongly arguments of an LU. I have also had little opportunity to check my analysis of English and French LUs with a native speaker of these languages who is also competent in football vocabulary. Consequently, some annotations in these languages and, especially, some frame/scenario assignments and synonymy relations I have postulated for LUs in these languages may be wrong.

Last but not least, I have found that the process of developing frames and scenarios itself is something where an ultimate decision of whether a given analysis is right or wrong can become very difficult. In cases of doubt, I have often judged a given analysis by categories like “useful vs. not useful” or “manageable vs. not manageable” rather than “right vs. wrong”.

For all of these reasons, I think that the kicktionary would profit much from some outside feedback. If you have thought of a lexical unit that should be included, if you find an error, if you disagree with an analysis I have made or if you have any other suggestions for improvement of the kicktionary, I would be very grateful if you let me know. My email adress is thomas dot schmidt at uni-hamburg dot de.

Acknowledgments

The initial idea for this project and much of what I know about the lexicography of football language come from Dieter Seelbach from the Univeristy of Mainz. I am grateful to the Berkeley FrameNet team (Charles Fillmore, Collin Baker, Michael Ellsworth, Josef Ruppenhofer) and its visitors (Kyoko Ohara, Jan Scheffczyk, Carlos Subirats) for their help. This project was financed with a post-doc grant from the German Academic Exchange Service (DAAD).