• CAT tool

    A CAT tool,  stands for a computer assisted translation tool, is software that helps translators maintain consistency in terminology across their translation jobs and also aids the translation process by suggesting (or translating automatically) passages (segments) which the translator already translated in the past. Data exported from CAT tools (translation memories) can be used to build a parallel corpus in Sketch Engine or uploaded for bilingual term extraction. Extracted terms can be exported as term base (TBX) and uploaded back to the CAT too. Parallel user corpora in Sketch Engine can be downloaded as translation memory (TMX) and uploaded in the CAT tool.
  • cluster

    a process of creating groups of words in the thesaurus or word sketch. Words are connected to their shared collocational behavior. See more on the Clustering Neighbours documentation
  • collocate

    a part of a collocation that is not the node. A collocate is dependent on the node. The collocate strong and the node wind make up the collocation strong wind
    collocation
    collocate node
    strong wind
    icy wind
    cold wind
    The most typical collocates for every word in the language can be generated with the word sketch tool.
  • collocation

    a collocation is a sequence or combination of words that occur together more often than would be expected by chance (from Wikipedia|Collocation) A collocation, e.g. fatal error, typically consists of a node (error) and a collocate (fatal). Collocations can have different strengths, e.g. nice house is a weak collocation because both nice and house can combine with lots of other words, on the other hand, the Opera House is a strong collocation because it is very typical for opera to occur next to house and, at the same time, opera does not combine with many other words. In Sketch Engine, the tool to use for collocations is the word sketch. The strength of collocation is expressed by the logDice score.
  • comparable corpus [ corpus-types ]

    A comparable corpus is a corpus consisting of texts from the same domain in more languages. In contrast to a parallel corpus, the texts are not translations of each other and belong to the same domain with the same metadata. An example of a comparable corpus is corpus made from Wikipedia.
  • compile

    A corpus compilation refers to the processing of the corpus data (text) with the tools available for the language and converting the text into a corpus.Only a compiled corpus can be searched. see corpus compilation
  • concordance [ feature ]

    a list of all examples of the search word or phrase found in a corpus, usually in the format of a KWIC concordance with the search word highlighted in the centre of the screen and some context to the right and to the left see also KWIC
  • concordancer [ feature ]

    A concordancer is a tool (a piece of software) which searches a text corpus and displays a concordance. A concordancer is one of the features in Sketch Engine which allows for simple corpus searches as well as queries involving complex criteria that search for grammatical or lexical structures. see also concordance
  • CoNLL format

    CoNLL format is a specific format of vertical that represents a syntactic parse tree. In comparison with vertical, there are extra columns describing the syntactic structure of words within the sentence, i.e. id, head, deprel. The number and position of these extra columns may vary depending on the specific CoNLL format.
    • id representing the positions of the current word (the 1st column)
    • head is the parent node id of the current word (the 5th column)
    • deprel contains the information about the relation by which the current node and parent node are connected (the 6th column)
    <s>    
    1    Dropping    drop-v    VBG    14    advcl
    2    down    down-x    RP    1    prt
    3    abaft    abaft-i    IN    1    prep
    4    the    the-x    DT    5    det
    5    bridge    bridge-n    NN    3    pcomp
    6    ,    ,-x    ,    14    punct
    7    the    the-x    DT    9    det
    8    first    first-j    JJ    9    amod
    9    thing    thing-n    NN    14    subj
    10    to    to-x    TO    11    infmark
    11    come    come-v    VB    9    infmod
    12    into    into-i    IN    11    prep
    13    view    view-n    NN    12    pcomp
    14    was    be-v    VBD    0    ROOT
    15    the    the-x    DT    16    det
    16    funnel    funnel-n    NN    14    arg1
    17    .    .-x    .    14    punct
    </s>
    see also vertical building word sketches from parsed corpora
  • cooccurrence [ text-analysis ]

    cooccurrence or co-occurrence is a term which expresses how often two terms from a corpus occur alongside each other in a certain order. It usually indicates words which together create a new meaning. We call them as phraseme or multi-word expression, e.g. black sheep or get on. Sketch Engine help to find such words with using the word sketch tool or the collocation search. Read more about further tools for text analysis.
  • corpus

    A corpus is a large collection of authentic texts used for studying language or generating linguistic data. Modern corpora contain texts whose total length is billions or dozens of billions of words. A corpus is usually annotated (=word are labelled with information about the part of speech and grammatical category). The terms corpus and text corpus and language corpus are interchangeable. Using a corpus for any type of linguistic or language oriented work ensures that the outcomes reflect the real use of the language. more on copora»
  • corpus architect

    an intuitive tool inside Sketch Engine for creating corpora from documents or the Web which does not require any expert knowledge. See the create your own corpus    page.
  • corpus manager

    a program used to manage text corpora, i.e. to build, edit, annotate and search corpora. Sketch Engine is the user interface to the corpus manager Manatee.
  • CQL

    The Corpus Query Language is a code used to set criteria for complex searches which cannot be carried out using the standard user interface controls. The criteria may include words or lemmas but also tags and other attributes, text types or structures. Conditions can be set for optional tokens or token repetition.
  • CSV

    a type of plain text document used for saving tabular data. It is seamlessly accepted by a large variety of applications and is therefore ideal for exporting Sketch Engine results to be used in other software. CSV can be opened directly in Microsoft Excel, Open Office, Google Documents and many others.