[ezcol_1half]

The New Corpus for Ireland – user’s guide

Welcome to the New Corpus for Ireland, a corpus created as part of the New English-Irish Dictionary project in Foras na Gaeilge.

The New Corpus for Ireland is a large collection of texts in Irish with approximately 30 million words. It contains a wide range of texts including works of fiction, factual texts, news reports, official documents and much more. The corpus is designed to be used for linguistic research – for example, to find examples of words being used in context or to find out about word frequency.

This website enables you to consult the corpus in several different ways. The website is based on a corpus query system called Sketch Engine created by Lexical Computing Limited. This document will give you a basic introduction to the website and how to use it.

The home page

On the home page, you can type a word or a multi-word term in Irish to search for it in the corpus. You will get three kinds of results:

Concordance: You will see a list of examples in which the word or term you searched for is used in a sentence. These examples were selected from the corpus automatically. The home page lists approximately ten examples and you can get more by clicking the “more” link.

Collocations: The box on the right-hand side gives the ten most frequent words that co-occur with the word you searched for. For example, of you search for doras ‘door’, you will get words such as oscail ‘open’, dúnta ‘closed’, plab ‘slam’ and others. Once again, you can see more of these words by clicking the “more” link. This list of collocates has been extracted from the corpus automatically.

Statistics: At the bottom of the page, you will see some statistical data about how the word you searched for is used, such as genre and dialect. For example if you search for fata – one of the words for ‘potato’ – you will see that this word is used almost exclusively in the Connacht dialect. Once again, these statistics have been extracted from the corpus automatically. You can see more statistics by clicking the “more” link.

Advanced searches

If you want to perform more complicated searches on the corpus, you can use the options in the menu on the left-hand side. Here is a summary of the options available.

Concordance: This is where you can search for and list sentences from the corpus based on the words that occur in them. This search is more powerful than the one on the home page; for example, if you select “lemma” in the drop-down box, you can search for all forms of a word: type fuinneog‘window’ and you will get sentences where any inflected or mutated form of the word occurs: fuinneoige, bhfuinneog and so on. You can also sort and filter the results in several ways.

Word List: This is where you can extract various word lists from the corpus, such as a list of the most frequently occurring words in Irish.

Word Sketch: This section gives you an opportunity to see which words are most frequently used along with the word you are looking for. The results are presented in several lists according to the grammatical relation that exists between the two words. For example, if you search for a verb, you will get one list of its direct objects, another list of its subjects, and so on. Remember that this information was extracted from the corpus automatically and so it may not always be accurate.

Thesaurus: This section allows you to type a word and get a list of other words that are similar to it with respect to their patterns of usage. For example if you search for the adjective folláin ‘healthy, wholesome’, you will get a list that includes sláintiúil ‘healthy’, sábháilte ‘safe’ and others. This is basically a list of words that seem like synonyms because they are used in similar ways. But again, remember that this information was extracted from the corpus automatically and the words you receive may not necessarily be synonyms.

Sketch-Diff: This is a tool for investigating the difference between two words, based on other words that occur with them. If you type two words that are close to each other in meaning, for example leanbh ‘baby’ and páiste ‘child’, you will get information that may help you understand the difference between them: you will see that the words used mainly with leanbh ‘baby’ include saolaigh ‘give birth’ and baist ‘baptize’ while the words used mainly with páiste ‘child’ include múin ‘teach’ and foghlaim ‘learn’.

[/ezcol_1half] [ezcol_1half_end]

Nua-Chorpas na hÉireann – treoir don úsáideoir

Fáilte chuig Nua-Chorpas na hÉireann, corpas a cruthaíodh mar chuid de thionscadal an Fhoclóra Nua Béarla-Gaeilge i bhForas na Gaeilge.

Is bailiúchán mór téacsanna Gaeilge é Nua-Chorpas na hÉireann a bhfuil timpeall is 30 milliún focal ann. Tá réimse leathan téacsanna ar fáil sa chorpas: saothair ficsin, téacsanna fíriciúla, tuairiscí nuachta, cáipéisí oifigiúla agus eile. Tá an corpas cóirithe sa chaoi gur féidir taighde teangeolaíoch a dhéanamh air – mar shampla, samplaí a aimsiú d’fhocail faoi leith agus iad á n-úsáid i gcomhthéacs nó minicíocht focal a fhiosrú.

Tugann an suíomh seo deis duit an corpas a chuardach i mbealaí éagsúla. Tá an suíomh bunaithe ar Sketch Engine, bogearra cuardaigh corpais de chuid Lexical Computing Limited. Tabharfaidh an cháipéis seo treoir bhunúsach duit maidir leis an suíomh a úsáid.

An leathanach baile

Ar an leathanach baile, is féidir leat focal nó téarma ilfhoclach a chlóscríobh sa bhosca chun é a chuardach sa chorpas. Nuair a dhéanfaidh tú sin, gheobhaidh tú leathanach agus trí chineál sonraí ann:

Comhchordacht: Feicfidh tú liosta samplaí den fhocal nó den téarma agus é á úsáid in abairtí a roghnaíodh go huathoibríoch as an gcorpas. Beidh timpeall is deich sampla ar taispeáint ar an leathanach seo agus is féidir teacht ar a thuilleadh díobh ach cliceáil ar an nasc “tuilleadh”.

Comhlogaíochtaí: Sa bhosca ar dheis, feicfidh tú liosta de na deich bhfocal is minice a úsáidtear in éineacht leis an bhfocal a chuardaigh tú. Mar shampla, má chuardaíonn tú doras, gheobhaidh tú focail ar nós oscail, dúnta, plab agus eile. Arís, is féidir tuilleadh díobh a fháil ach cliceáil ar an nasc “tuilleadh”. Tá liosta na gcomhlogaíochtaí seo bunaithe ar eolas a baineadh as an gcorpas go huathoibríoch.

Staitisticí: Ag bun an leathanaigh, gheobhaidh tú eolas staitistiúil i dtaobh úsáid an fhocail de réir critéar áirithe, mar shampla de réir an tseánra nó réir canúna. Mar shampla, má chuardaíonn tú fata, feicfidh tú gur mó i bhfad a úsáid i gcanúint Chonnacht. Arís, tá na staitisticí seo bunaithe ar eolas a baineadh as an gcorpas go huathoibríoch. Is féidir teacht ar a thuilleadh staitisticí ach cliceáil ar an nasc “tuilleadh”.

Cuardaigh níos casta

Más maith leat cuardaigh níos casta a dhéanamh sa chorpas ná mar is féidir ar an leathanach baile, is féidir leas a bhaint as na roghanna sa roghchlár ar chlé. Seo achoimre ar na roghanna atá ar fáil ann.

Comhchordacht: Anseo, is féidir leat abairtí a aimsiú agus a liostú as an gcorpas bunaithe ar na focail a thagann chun cinn iontu. Tá an cuardach seo níos cumhachtaí ná an ceann atá ar fáil ar an leathanach baile; mar shampla, má roghnaíonn tú “leama” sa bhosca aníos, beidh tú ábalta cuardach a dhéanamh beag beann ar fhoirm an fhocail: clóscríobh fuinneog agus gheobhaidh tú abairtí ina bhfuil foirm infhillte nó chlaochlaithe ar bith den fhocal: fuinneoige, bhfuinneog agus mar sin de. Sa bhreis air sin, is féidir an liosta abairtí a shórtáil agus a scagadh de réir critéar éagsúil.

Liosta Focal: Anseo, is féidir liostaí éagsúla focal a bhaint as an gcorpas, mar shampla liosta na bhfocal is coitianta a thagann chun cinn sa Ghaeilge, agus mar sin de.

Achoimre Focal: Rannóg é seo a thugann deis duit féachaint ar na focail eile a úsáidtear go minic in éineacht leis an bhfocal atá uait. Cuirtear na torthaí i láthair i liostaí de réir an choibhnis ghramadaí atá idir an dá fhocal. Mar shampla, má chuardaíonn tú briathar, gheobhaidh tú liosta amháin de na cuspóirí is minice a bhíonn leis an mbriathar, liosta eile de na hainmnithe is minice a bhíonn leis, agus mar sin de. Cuimhnigh go mbaintear an t-eolas seo as an gcorpas go huathoibríoch agus, dá bhrí sin, ní gá go mbeidh sé go hiomlán cruinn i gcónaí.

Teasáras: An rud is féidir a dhéanamh anseo ná focal a chlóscríobh agus liosta a fháil de na focail eile atá cosúil leis ó thaobh a bpatrún úsáide. Mar shampla, má chuardaíonn tú an aidiacht folláin, gheobhaidh tú roinnt aidiachtaí eile ar nós sláintiúil agus sábháilte. Go bunúsach, is liosta é seo d’fhocail ar dócha gur comhchiallaigh iad de bharr go n-úsáidtear ar bhealach cosúil iad. Ach arís, cuimhnigh gur baineadh an t-eolas as an gcorpas go huathoibríoch agus, dá bhrí sin, ní gá go mbeidh na focail seo ina gcomhchiallaigh i gcónaí.

Difríocht: Is áis í seo chun an difríocht idir dhá fhocal a fhiosrú, bunaithe ar na focail a bhíonn in aice leo. Má scríobhann tú dhá fhocal atá gar dá chéile ó thaobh brí de, abair leanbh agus páiste, gheobhaidh tú eolas a chuideoidh leat an difríocht eatarthu a thuiscint: feicfidh tú gur focail ar nóssaolaigh agus baist a úsáidtear de ghnáth le leanbh, agus gur focail ar nós múin agus foghlaim a úsáidtear de ghnáth le páiste.

[/ezcol_1half_end]