CQL for geeks | Sketch Engine

WARNING!

This CQL functionality is primarily meant for development, testing, and very advanced users.
Use it carefully. It is recommended to start with small corpora.

Back to safe CQL

! – complement operator

since manatee 2.122 An exclamation mark ( ! ) before a position (square brackets) is a logical not on a corpus range, i.e., a complement operator yielding a corpus range that is complementary to its argument.

The following two examples will return:

the whole corpus except for nouns which will be gapped
the corpus parts which are not inside the sentence structure. Since usually all corpus text should be within sentence structures, this might be useful to identify incorrect data in the corpus.

![tag="N.*"]
!< s/>

within! X and containing! X are complements with semantics “within the complement of X” and “containing the complement of X”

Word sketch seeks

since manatee 2.84 If you know a particular seek offset in the word sketch data files, the related concordance can be retrieved using:

[ws(level,seek)]

The level can be 0, 1 or 2 for the level of headwords, grammatical relations or collocations, respectively. The seek depends on how the particular corpus was compiled, hence this kind of query is mainly suitable for technical manipulation and combination of word sketch concordances.

Word Sketches: swap & ccoll

since manatee 2.84 If word sketches are available in the corpus, the following operators can be used in CQL.

swap

Use swap to swap the KWIC with the selected collocations. The syntax is:

swap (<COLLNUM>, <ONEPOSITION>) 
[swap (1, ws ("car", "modifier", "new"))]

ccoll

Use ccoll to re-label the given collocation. The syntax is:

ccoll (<OLDCOLLNUM>, <NEWCOLLNUM>, <ONEPOSITION>)
[ccoll (1, 2, ws ("car", "modifier", "new"))]

This relabels the first collocation as the second.

[ccoll (1, 2, ws ("car", "modifier", "new"))]

This relabels 1 to 3 and back, i.e. a NOOP.

[ccoll (3, 1, ccoll (1, 3, ws(2, 6543)))]

Searching for position numbers

since manatee 2.84 Use [#POSITION] to find a concrete token in the corpus.

[#100]	finds the 100th token, the concordance will consist of 1 line
[#100 \| #210]	will display the 100th and 210th token (the concordance will contain 2 lines
[#100-210]	the concordance will contain 111 lines
[!#100-210]	will display all tokens minus the 111 tokens on positions 100-210, the concordance will consist of as many lines as there are tokens in the corpus minus 111
![#100-210]	is a complement of tokens in positions 100-210, the concordance will consist of 2 lines: 1st line: tokens on positions 1–99 2nd line: tokens on positions 211 till the end of the corpus

the n-th structure

since manatee 2.38 The following examples will refer to:

the 5th document in the corpus
each document in the corpus but not the 5th document (excludes the 5th document)
a range of documents, in this case the 5th, 6th, 7th, 8th, 9th and 10th document in the corpus

<doc #5>
<doc !#5>
<doc #5-10>

+ * with tokens

WARNING!

Only use this with small corpora. The computation can be very time consuming.

These regular expression operators can be used with tokens, but the computation can be extremely time-consuming, especially with large corpora. This is not recommended.

Instead, use curly brackets { } for repetition and [ ] { } for distance between tokens.

avoid	recommended
`[tag="N."]`	`[tag="N.*"]{0,10}`
`[tag="N.*"]+`	`[tag="N.*"]{1,10}`

Limit

When [ ]+ or [ ]* is used, a limit of maximum 100 repetitions is applied and the query behaves as [ ]{1,100} and [ ]{0,100} respectively. The limit is applied irrespective of the criteria inside the square brackets.

Terms

since manatee 2.133 This operator searches terms identified by the rules in the term grammar. General syntax: [term(regular_expression)]

Examples:

[term("award title")] see example (login required) will find results of the term “award title”.
[term(".* title")] see example (login required) will terms such as find results of the term paper title, book title, page title…
[term(".* WIFI")] finds nothing.
[term(".* wifi")] finds free wifi, on-board WIFI, inernal WiFi…

In some (mostly old) corpora, individual words within the term are connected with ‘_’ and the term has a suffix ‘-x’. This means the example above would look like [term("award_title-x")].

within/containing NUMBER

since manatee 2.28 General notation: within/containing NUMBER
Both of the within/containing queries support a shortcut of within/containing NUMBER which expands to within/containing []{NUMBER}.

The following example searches for strings of 4-10 words.

[tag="J.*"]{1,2}[tag="N.*"]{1,5}[tag="V.*"][tag="R.*"]{1,2}

Adding within 5 at the end of the query finds strings having 4-5 words, see this example (login required)

[tag="J.*"]{1,2}[tag="N.*"]{1,5}[tag="V.*"][tag="R.*"]{1,2} within 5