NCI: New Corpus for Ireland

The New Corpus for Ireland (NCI) is a language corpus developed as part of the set-up phase of a project for a new English-to-Irish Dictionary (NEID). The project is under the direction of Foras na Gaeilge, a public body responsible for the promotion of the Irish language.

The corpus was collected by three main ways:

  • incorporating existing corpora
  • contacting publishers, authors, newspaper companies etc. to request permission to use their texts
  • collecting data from the web.

In Sketch Engine, the project is composed of two separate corpora:

  • 30-million corpus of Irish
  • 200-million corpus of English including Hiberno-English (the variety of English that is spoken in Ireland)

The project page is available at

Part-of-speech tagset

The NCI corpus, the Irish part, was processed by the morphological analyzer/generator for Irish (Uı´ Dhonn chadha) with the following POS tagset. The English part of the NCI was tagged by TreeTagger using Penn Treebank tagset.

Tools to work with the New Corpus for Ireland

A complete set of Sketch Engine tools is available to work with this NCI corpus to generate:

  • word sketch – English  and Irish collocations categorized by grammatical relations
  • thesaurus – synonyms and similar words for every word
  • word lists – lists of English  and Irish nouns, verbs, adjectives etc. organized by frequency
  • n-grams – frequency list of multi-word units
  • concordance – examples in context
  • keywords– terminology extraction of one-word
  • text type analysis – statistics of metadata in the corpus

Kilgarriff, Adam, Michael Rundell, and Elaine Uí Dhonnchadha. Efficient corpus development for lexicography: building the New Corpus for IrelandLanguage resources and evaluation 40.2 (2006): 127-152.

Search the New Corpus for Ireland

Sketch Engine offers a range of tools to work with the New Corpus for Ireland.


Other text corpora in Sketch Engine

Sketch Engine offers 700+ language corpora.

Use Sketch Engine in minutes

Generating collocations, frequency lists, examples in contexts, n-grams or extracting terms is easy with Sketch Engine. Use our Quick Start Guide to learn it in minutes.