CQL – Practical examples

Practical examples of advanced CQL corpus queries

These examples are designed for English corpora in Sketch Engine. They can be adapted for other langauges by changing the POS tags and including/replacing prepositions, articles and other features relevant for the language. The list of POS tags used in the corpus is on the corpus info page.

To see the results, go to Sketch Engine, select an English Web corpus and copy and paste the CQL into the Concordance – ADVANCED – CQL.

CQL basics

noun phrases

Unless the corpus is annotated for noun-phrases (most Sketch Engine corpora are not), it is not possible to find all noun phrases. Here are queries for some of the most frequent noun phrase structures.

Examples

A sequence of up to 3 adjectives followed by a sequence of between 2 and 4 nouns.
[tag="J.*"]{0,3} [tag="N.*"]{2,4}
This example only includes words spelt in lowercase to avoid proper nouns.
[tag="J.*" & word="[[:lower:]]+"]{0,3} [tag="N.*" & word="[[:lower:]]+"]{2,4}
This CQL finds the pattern adjective+noun of adjective+noun with an optional determiner after of.
[tag="J.*" & word="[[:lower:]]+"]{0,10} [tag="N.*" & word="[[:lower:]]+"]{1,10} [word="of"] [tag="DT"]? [tag="J.*" & word="[[:lower:]]+"]{0,10} [tag="N.*" & word="[[:lower:]]+"]{1,10}
An adaptation of the above query. The of+adjective+noun part is optional. The length must be at least 3 tokens. Use Filter – ADVANCED – HIDE SUB-HITS to only display the longest match of each hit.
[tag="J.*" & word="[[:lower:]]+"]{0,10} [tag="N.*" & word="[[:lower:]]+"]{1,10} ([word="of"] [tag="DT"]? [tag="J.*" & word="[[:lower:]]+"]{0,10} [tag="N.*" & word="[[:lower:]]+"]{1,10})? containing [][][]

CQL basics

Punctuation

These characters must be escaped with a backslash.

. ^ $ * + ? ( ) [ ] { } | \

Examples

A question mark or an exclamation mark. (Question marks must be escaped, exclamation marks not.)
[word="\?|!"]
Elipsis (three dots) tokenized as 3 tokens and also as 1 token.
[word="\."]{3} | [word="\.\.\."]
A token surrounded by quotes.
[word="\""] [] [word="\""]
Between 3 and 5 tokens in round brackets.
[word="$"] []{3,5} [word="$"]

Regular expressions

sentences

Most corpora in Sketch Engine use < s> as the sentence structure. Check the corpus info page to see what your corpus uses.
These examples can be used with any structure, e.g. < doc> or others.

Examples

Sentences containing between 10 and 15 tokens. within < s/> serves to exclude sentence boundaries
< s> [ ]{10,15} within < s/>
Lemma go in sentences of 10 – 15 tokens. The final within < s/> ensures that the result is found inside the same sentence and not in two short sentences crossing their boundaries. You can replace [lemma=”go”] with, for example, one of the noun phrases above.
[lemma="go"] within < s> []{10,15} < /s> within < s/>
Questions (a question mark as the last token of a sentence).
[word="\?"]< /s>
Lemma go inside questions.
[lemma="go"] within (< s/> containing [word="\?"]< /s>)
[lemma=”go”] and [lemma=”have”]inside a question in any order. To be understood as: a sentence containing go and also containing have and also containing a question mark as the last token.
< s> containing 1:[lemma="go"] containing 2:[lemma="have"] containing [lemma="\?"] < /s>
Same as above, but it highlights the search words, not the sentence.
[lemma="go"] within (< s> containing 1:[lemma="have"] containing [lemma="\?"] < /s>)

within < s/>

The within < s/> operator should be used at the end of the query to ensure, that the result si found within (=inside) the same sentence. This is typically necessary when the query searches for two things at a certain distance from each other. The distance is usually expressed by one or more non-specific tokens, for example [] or [][][] or []{10} or []{2,5}. Not using witing < s/> may produce results which start at the end of one sentence, cross the sentence boundary (=the end of the sentence) and the reamaining part of the query is found at the beginning of the same sentence.

Queries with non-specific tokens will cross sentence boundary because [] or [][][] or []{10} or []{2,5} can match the end-of-sentence punctuation such as a full stop (dot), question mark or exclamation mark.

Examples

Subject + predicate (verb)
A rather simplistic queary for subject and predicate might search for a noun followed by a verb [tag="N.*"] [tag="V.*"] However, it will return only cases where the noun is followed immediately by the verb. Normally, subjects are at a certain destance from the verb, so this might give a more desirable result:
[tag="N.*"] []{0,2} [tag="V.*"]
You will however see many cases where the noun is at the end of one sentence and the verb at the beginning of another. To avoid this, put within < s/> to the end:
[tag="N.*"] []{,2} [tag="V.*"] within < s/>
To prevent punctuation, words: and, or, to and verbs ending -ing, fine-tune the query like this:
[tag="N.*"] [tag!=",|$|$" & word!="and|or|to"]{0,2} [tag="V.*" & word!=".*ing"] within < s/>
Verb + object
This is analogical to the above. The simple query with a verb followed by a noun [tag="V.*"][tag="N.*"] will return lots of noise. Object usually appears at a certain distance with articles or adjectives in between [tag="V.*"] []{0,2} [tag="N.*"]
To make sure the result is found insde the same sentence and not two consecutive sentences, add within < s/> to the end of the query.
[tag="V.*"] []{0,2} [tag="N.*"] within < s/>
Object should not have any prepositions in front of it. This is how to prevent prepositions between the verb and the noun:
[tag="V.*"] [tag!="IN"]{0,2} [tag="N.*"] within < s/>
Complex subjects and objects
To search for complex subjects and objects, use some of the queries above and replace the token [tag="N.*"] with one of the noun phrase queries from the start of this page. It is a good idea to enclose them in round brackets () So you replace [tag="N.*"] with () and then you paste the noun phrase query inside ().

within / containing

Practical examples of advanced CQL corpus queries

noun phrases

Examples

Punctuation

Examples

sentences

Examples

within < s/>

Examples

for learners of languages

A Course in Lexicography and Lexical Computing

term extraction

learn sketch engine