Limerick Corpus of Irish English (LCIE 2004)

The Limerick Corpus of Irish English (LCIE 2004) was developed by the University of Limerick in conjunction with Mary Immaculate College, Limerick. This spoken corpus of Irish English discourse includes conversation recorded in a wide variety of mostly informal settings throughout Ireland. The corpus is a collection of 351 transcripts (totalling over 900,000 words) of naturally occurring spoken data from everyday Irish contexts.

While the corpus consists mainly of casual conversations, there are also over 150,000 words of professional, transactional and pedagogic Irish English included which, along with the casual conversation data, were carefully collected with reference to a range of different speech genres. It includes conversations recorded across a wide variety of predominantly informal settings throughout Ireland (excluding Northern Ireland). All LCIE 2004 data comes from a number of counties in the Republic of Ireland; however, as the corpus is not designed to be geographically representative, it does not include data from every county. The project was funded by the HEA Targeted Initiatives Scheme, Mary Immaculate College and the University of Limerick.

Acknowledgements

Corpus Managers: Prof. Fiona Farr, University of Limerick, and Prof. Anne O’Keeffe, Mary Immaculate College, University of Limerick

Research Assistants: Dr James Binchy, Dr Brian Clancy, Niamh Flynn, Dr Brona Murphy and Dr Elaine Riordan

Part-of-speech tagset and lemmatization

The Limerick Corpus of Irish English is part-of-speech tagged with the following English Penn Treebank tagset summary (with Sketch Engine modifications) indicating the part of speech and grammatical category. The corpus texts also contain lemmatization when each word form from the corpus is assigned to its base form (lemma).

Limerick Corpus sizes

Frequency
Tokens 951,195
Words 830,210
Sentences 66,623
Documents 351

Search the Limerick Corpus

Sketch Engine offers a range of tools to work with this Irish English corpus.

Design Matrix

The design matrix for the Limerick Corpus of Irish English centres on a range of speaker relationships (from intimate to professional) across a range of interactional contexts and speech genres (see below). Like the Cambridge and Nottingham Corpus Of Discourse in English (CANCODE) (as described by McCarthy, 1998), LCIE 2004 developed a careful sociolinguistic classification scheme in order to make inter-corpus comparisons, especially with regard to linguistic choices and the relationships that hold between the speakers.

Context Types

This axis of categorisation reflects the relationship that holds between the participants in the dyadic and multi-party conversations in the corpus. These relationships fall into five broad categories, which were identified at the outset: intimate, socio-cultural, professional, transactional and pedagogic.

  • Intimate: In this category, the distance between the speakers is at a minimum and is often related to co-habitation. Only conversations between partners or close family qualify for this category, in which participants are linguistically most ‘off-guard’. All participants in a conversation have to fall into this category for the conversation to be classified as ‘intimate’.
  • Socio-cultural: This category implies the voluntary interaction between speakers that seek each other’s company for the sake of interaction itself. The relationship between the speakers is usually marked by friendship and is thus not as close as that between speakers in the ‘intimate’ category
  • Professional: This category refers to the relationship that holds between people who are interacting as part of their regular daily work. As such, this category only applies to interactions where all speakers are part of the professional context. Talk that is not work related but occurs between colleagues in the workplace has still been classified as ‘professional’ based on the observation that participants retain their professional relationship even when the topic of the conversation is not work related.
  • Transactional: This category embraces interactions in which the speakers usually do not previously know one another. The ‘reason’ for transactional conversations is usually related to a need on the part of the hearer or the speaker. As such, conversations aim to satisfy a particular transactional goal.
  • Pedagogic: This final category was set up in order to include any conversation in which the relationship between the speakers was defined by the pedagogic context. The emphasis has been on the speaker relationship rather than on the setting.

Interaction Types

Apart from the context-type categories, distinctions were also made within the corpus between texts that were predominantly collaborative versus those that were non-collaborative i.e. texts in which speakers give explanations and information or relate events and tell stories. A further distinction within the collaborative texts was made between ‘collaborative idea’ and ‘collaborative task’.

  • Collaborative Task: Social activities which cannot be performed with language alone.
  • Collaborative Idea: Conversations in which the main goal is to exchange ideas.
  • Information Provision: Conversations are marked by the uni-linear transfer of information from one speaker to the other interactants.

Tools to work with the Limerick Corpus of Irish English

A complete set of Sketch Engine tools is available to work with this corpus of Irish English to generate:

  • word sketch – English collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • keywordsterminology extraction of one-word and multi-word units
  • word lists – lists of English nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • text type analysis – statistics of metadata in the corpus

Limerick Corpus of Irish English (LCIE 2004)

version lcie (June 2025)

Farr, F., Murphy, B. and O’Keeffe, A. (2004) ‘The Limerick Corpus of Irish English: design, description and application’. Teanga (Yearbook of the Irish Association for Applied Linguistics) 21: 5-29.

English Trends corpus

Explore our largest English corpus, totalling over 85 billion words and growing automatically every week.

Use Sketch Engine in minutes

Generate collocations, frequency lists, examples in contexts, n-grams or extract terms. Use our Quick Start Guide to learn it in minutes.