XLIFF support in Sketch Engine

Sketch Engine users can now create their user corpora from texts in the XLIFF file format, a format used during localization processes and a standard for CAT tools.

Users can upload the file to create a multilingual corpus and use it for bilingual term extraction to generate glossaries with translations or to search the corpus using the parallel search and look up translations.

supported XLIFF versions: 2.0 and later

Screenshot of word sketch from frTenTen French corpus
Amharic corpus
news: parallel corpora

N'ko corpus

Sketch Engine CQL calendar
Audio recordings for the British National Corpus (BNC)

BNC audio

improved functionality for Bulgarian text
improved Thai support
Logo of SDL – Sketch Engine SDL Trados Studio plugin
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Prices for Academic Individual Users

[raw] map_period = {"year" : 12, "quarter"…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

example 3 python

This is a Python example for basic HTTP authentication on local…
Logo of Sketch Engine – a tool for discovering how language works

Dutch Web Corpus

This corpus was created within the Corpus Factory project as…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

CLAWS tagset - mapping file

C8 to C7 mapping file. NS 2011-5-14. APPGE -> APPGE: possessive…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Feed Corpus Project

FCP corpus aims to be a million word per day collection of POS-tagged…
Logo of Sketch Engine – a tool for discovering how language works

My jobs

My jobs (job runner) feature shows your long running tasks and…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

The New Corpus for Ireland | Nua-Chorpas na hÉireann

[ezcol_1half] The New Corpus for Ireland – user’s guide Welcome…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Icelandic sample corpus

This is a small corpus of Icelandic texts prepared for the Sketch…
Logo of Sketch Engine – a tool for discovering how language works

Renaming Sketch Grammar relations

CD to directory which contains the compiled corpus files. cd…
Logo of Sketch Engine – a tool for discovering how language works

Adding sentence boundaries to a compiled corpus

This document explains how structures, such as documents, paragraph,…
Logo of Sketch Engine – a tool for discovering how language works

Compatibility Matrix

This page provides compatibility matrix of Sketch Engine components…
Logo of Sketch Engine – a tool for discovering how language works

Sketch Engine API for IntelliWebSearch

Sketch Engine is a corpus manager tool offering many corpus linguistics…
Logo of Sketch Engine – a tool for discovering how language works

Preloaded Configuration Templates

When you create a corpus from the Sketch Engine interface (see…
Logo of Sketch Engine – a tool for discovering how language works

Building sketches from parsed corpora

Introduction Sketch Engine generates word sketches usually using…
Logo of Sketch Engine – a tool for discovering how language works

Word Sketches definition files

The following files can be used for building word sketches in…
Logo of Sketch Engine – a tool for discovering how language works

Word Sketch Index Format

This page is a brief overview of the development of the word…
Logo of Sketch Engine – a tool for discovering how language works

Highlight Only Part of a Complex Query

I want to align a concordance accoding to a part of the query.…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Search Punctuation

To search for punctuation as well as words: Insert the punctuation…
Logo of Sketch Engine – a tool for discovering how language works

Compare corpora using word lists

To compare two preloaded corpora Open the focus corpus and…
Logo of Sketch Engine – a tool for discovering how language works

Distinguish Between Lemmas

To look at different lemmas with the same spelling but different…
Logo of Sketch Engine – a tool for discovering how language works

How do I…?

This page lists possible tasks that a Sketch Engine user might…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Sketch Engine Localisation

The Sketch Engine interface can be translated into any other…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

JSON API - creating query

Sketch Engine uses HTTP REST API. All API methods (unless stated…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Full Administration

This feature is available only for local installations (see the…
Logo of Sketch Engine – a tool for discovering how language works

Text Types, Headers and Subcorpora

Overview When studying a word, phrase, or grammatical construction,…
Logo of Sketch Engine – a tool for discovering how language works

Preparing Corpus Text

The input format is "vertical" or "word-per-line (WPL)" text,…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

czes corpus

CZES is a Czech corpus consisting of newspaper articles and magazine…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

TED_en corpus

A corpus of transcripts of TED talks. Prepared by Akshay Min…
Logo of Sketch Engine – a tool for discovering how language works

Scottish Gaelic Wiki corpus

Scottish Gaelic Wikipedia corpus. Downloaded in February 2015.…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Polish Web Corpus (PolishWaC)

Polish web as corpus has 103 million words and the encoding is…
Logo of Sketch Engine – a tool for discovering how language works

Parallel Corpora Registry Info

General Attribute Set ATTRIBUTE word STRUCTURE s{ ATTRIBUTE…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Internet-ZH corpus

Internet-ZH is a Chinese web corpus collected by Serge Sharoff.…
Logo of Sketch Engine – a tool for discovering how language works

Project Gutenberg Corpus

downloaded with wget: getting Gutenberg cleaned with…
Logo of Sketch Engine – a tool for discovering how language works

Fryske Akademy Parallel Corpus

Frisian and Dutch not POS tagged aligned sentences Dutch…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

NepaliWaC corpus

Nepali web corpus downloaded by LCL on Dec 10, 2014. ~1200…
Logo of Sketch Engine – a tool for discovering how language works

SamoanWaC corpus

Web corpus of Samoan. Created by Bharat Ram Ambati using corpus…
Logo of Sketch Engine – a tool for discovering how language works

SetswanaWaC corpus

(version 2) The corpus is prepared by Corpus factory method.…
Logo of Sketch Engine – a tool for discovering how language works

SpanishWaC corpus

This corpus was gathered using a list of URLs provided by Serge…
Logo of Sketch Engine – a tool for discovering how language works

SwedishWaC corpus

The corpus is prepared by Corpus factory method. Full details…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

SDeWaC corpus

SDeWaC is a subset of DeWaC. The creation of sDeWaC is described…
Logo of Sketch Engine – a tool for discovering how language works

WelshWaC corpus

The corpus is prepared by Corpus factory method by Anil in October…
Logo of Sketch Engine – a tool for discovering how language works

ThaiWaC corpus

The corpus is prepared by Corpus factory method. Full details…
Logo of Sketch Engine – a tool for discovering how language works

TurkishWaC corpus

The TurkishWaC corpus is a 32 million word collection of samples…
Logo of Sketch Engine – a tool for discovering how language works

UKWaCsst corpus

UKWaC tagged with SuperSenseTagger (​sst-light) described in…
Logo of Sketch Engine – a tool for discovering how language works

GujarathiWaC corpus

FrWac web as corpus is a corpus of Gujarati language (Indo-Aryan…
Logo of Sketch Engine – a tool for discovering how language works

Patakis corpus

Patakis is a 100 million word collection of POS-tagged texts…
Logo of Sketch Engine – a tool for discovering how language works

GeorgianWaC corpus

Original file owner: bharat.
Logo of Sketch Engine – a tool for discovering how language works

FinnishWaC corpus

Finnish web as corpus.
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

danishWaC corpus

The corpus prepared by Corpus factory method. It has 288 million…
Logo of Sketch Engine – a tool for discovering how language works

Domain Specific Corpora

These corpora are prepared from specific domains, e.g. science,…
Logo of Sketch Engine – a tool for discovering how language works

ScienceBlog corpus

The ScienceBlogs corpus is a selection of posts and comments…
Logo of Sketch Engine – a tool for discovering how language works

e-flux corpus

The e-flux corpus is a web corpus of English art news digests.…
Logo of Sketch Engine – a tool for discovering how language works

Environment corpus

English environment related web corpus. Crawled by SpiderLing…
Logo of Sketch Engine – a tool for discovering how language works

Filipino web corpus (FilipinoWaC)

The corpus was created by Anil in October 2013. It has almost…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Nineteenthcentury corpus

Actually, the 19th century corpus is only available to Osnabrück…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Penn Historical Corpora

Penn Historical Corpora is a collection of historical English…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Clustering

Clustering can be performed in Sketch Engine on the similar…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Manual for GDEX

To quickly start using Good Dictionary EXamples, see the GDEX…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Syntax of GDEX configuration files

GDEX configuration files are written in YAML (Wikipedia.org).…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Dynamic Attributes

To make use of dynamic attributes they have to be set up in …
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Corpus Factory Method

A method for developing large general language corpora which…
Logo of Sketch Engine – a tool for discovering how language works

New Model Corpus

The New model Corpus is a ~100 million words domain corpus built…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

LEXMCI

The 1.7 billion word LEXMCI corpus of English was created by…
Logo of Sketch Engine – a tool for discovering how language works

Corpus configuration example

If your vertical text contains only words and no annotation,…
Logo of Sketch Engine – a tool for discovering how language works

Preparing a Text Corpus for Sketch Engine: Overview

This page describes how to prepare a text corpus for indexation…
Logo of Sketch Engine – a tool for discovering how language works

Sketch Engine Video Tutorials

All videos are accessible also on our YouTube channel. Please…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Compiling corpus

You need to prepare a vertical and registry file before compiling…
Logo of Sketch Engine – a tool for discovering how language works

Common corpus structures

It is generally practical to divide a corpus into smaller parts…
Logo of Sketch Engine – a tool for discovering how language works

Scripts for adding header fields

Adding attributes is based on mapping existing structure attributes…
Logo of Sketch Engine – a tool for discovering how language works

Variation in hit counts

It often seems like you have got a different hit count for the…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Adam Kilgarriff: Structured bibliography

(note: written by Adam Kilgarriff on 27th April 2015; see also…
SkE research

Research Agenda

Lexical Computing's research interests lie at the intersection…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works

Word Sketch highlights

If a noun is usually in the plural, or a verb is usually in the…
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works
Logo of Sketch Engine – a tool for discovering how language works