Practical examples of advanced CQL corpus queries

These examples are designed for English corpora in Sketch Engine. They can be adapted for other langauges by changing the POS tags and including/replacing prepositions, articles and other features relevant for the language. The list of POS tags used in the corpus is on the corpus info page.

To see the results, go to Sketch Engine, select an English Web corpus and copy and paste the CQL into the Concordance – ADVANCED – CQL.

noun phrases

Unless the corpus is annotated for noun-phrases (most Sketch Engine corpora are not), it is not possible to find all noun phrases. Here are queries for some of the most frequent noun phrase structures.

Examples
  1. A sequence of up to 3 adjectives followed by a sequence of between 2 and 4 nouns.
    [tag="J.*"]{0,3} [tag="N.*"]{2,4}
  2. This example only includes words spelt in lowercase to avoid proper nouns.
    [tag="J.*" & word="[[:lower:]]+"]{0,3} [tag="N.*" & word="[[:lower:]]+"]{2,4}
  3. This CQL finds the pattern adjective+noun of adjective+noun with an optional determiner after of.
    [tag="J.*" & word="[[:lower:]]+"]{0,10} [tag="N.*" & word="[[:lower:]]+"]{1,10} [word="of"] [tag="DT"]? [tag="J.*" & word="[[:lower:]]+"]{0,10} [tag="N.*" & word="[[:lower:]]+"]{1,10}
  4. An adaptation of the above query. The of+adjective+noun part is optional. The length must be at least 3 tokens. Use Filter – ADVANCED – HIDE SUB-HITS to only display the longest match of each hit.
    [tag="J.*" & word="[[:lower:]]+"]{0,10} [tag="N.*" & word="[[:lower:]]+"]{1,10} ([word="of"] [tag="DT"]? [tag="J.*" & word="[[:lower:]]+"]{0,10} [tag="N.*" & word="[[:lower:]]+"]{1,10})? containing [][][]

Punctuation

These characters must be escaped with a backslash.

. ^ $ * + ? ( ) [ ] { } | \

Examples
  1.  A question mark or an exclamation mark. (Question marks must be escaped, exclamation marks not.)
    [word="\?|!"]
  2. Elipsis (three dots) tokenized as 3 tokens and also as 1 token.
    [word="\."]{3} | [word="\.\.\."]
  3. A token surrounded by quotes.
    [word="\""] [] [word="\""]
  4. Between 3 and 5 tokens in round brackets.
    [word="\("] []{3,5} [word="\)"]

sentences

Most corpora in Sketch Engine use < s> as the sentence structure. Check the corpus info page to see what your corpus uses.
These examples can be used with any structure, e.g. < doc> or others.

Examples
  1. Sentences containing between 10 and 15 tokens. within < s/> serves to exclude sentence boundaries
    < s> [ ]{10,15} within < s/>
  2. Lemma go in sentences of 10 – 15 tokens. The final within < s/> ensures that the result is found inside the same sentence and not in two short sentences crossing their boundaries. You can replace [lemma=”go”] with, for example,  one of the noun phrases above.
    [lemma="go"] within < s> []{10,15} < /s> within < s/>
  3. Questions (a question mark as the last token of a sentence).
    [word="\?"]< /s>
  4. Lemma go inside questions.
    [lemma="go"] within (< s/> containing [word="\?"]< /s>)
  5. [lemma=”go”] and [lemma=”have”]inside a question in any order. To be understood as: a sentence containing go and also containing have and also containing a question mark as the last token.
    < s> containing 1:[lemma="go"] containing 2:[lemma="have"] containing [lemma="\?"] < /s>
  6. Same as above, but it highlights the search words, not the sentence.
    [lemma="go"] within (< s> containing 1:[lemma="have"] containing [lemma="\?"] < /s>)