The Sketch Engine interface can be translated into any other language. To do this we need to translate all the interface strings into the particular language.
Sketch Engine uses the simple and popular gettext translation system. The strings are stored in a file ske.po (only example file, for the up-to-date version of this file, request it from email@example.com) that is generated automatically from the program code to keep it up-to-date. It is a simple text file and the translator needs to fill in the translations: the original English term is referred as ‘msgid’ and the translations belong between the quotes after ‘msgstr’. As encoding for the ske.po file, UTF-8 is used. When finished, the translator should send the resulting file to the developers (firstname.lastname@example.org) so that it could be added into the system.
Localization of local installation
Once translated (keep UTF-8 and the plain text format), use msgfmt to convert the .po it into .mo file and copy it into the proper directory (something like /usr/share/locale/…). You need also to add a link into the template document.tmpl, look for “switch_language”; then recompile the template with cheetah template system:
cheetah compile -R –idir=templates/ –odir=cmpltmpl document
(in the directory containing the two directories)
and it should work (maybe you will need also to restart Apache.
There is a number of freely available GUI editors for the gettext .po files, e.g. Lokalize or Gtranslator which are available for both Linux and Windows platforms.
How to find a particular string in the interface
At each translation string in the ske.po there is the name of the file where the string occurs in the system – e.g. “wsdiff_form.py” – which is a guideline for where to find the string in the interface. The filename is usually an abbreviation of the function name as it occurs in the interface (wsdiff stands for Word sketch Difference feature).
There are some strings that are not worth looking for in the interface because they are rare or do not occur at all because they are either rare program exceptions or come from some specialised applications of the Sketch Engine that regular users do not have access to. In such cases, we suggest skipping the string (the English term will then remain untranslated in the interface).
Some other strings may be difficult to find. Usually, they are in titles of some links (popup help) or in applications that the translator is not familiar with. Here we provide a list of strings that were problematic for translators with a short explanation of where the particular string can be found (the list may be updated in the future):
> msgid "Collocation form" > msgid "Collocation candidates" > msgid "In the range from" > msgid "Show functions"
on page “Collocation candidates” – after creating a concordance and clicking “Collocations”
> msgid "Showing page” > msgid "<< First" > msgid "< Previous" > msgid "Next >"
in the case of longer wordlists, there is a simple paging in the bottom of the page
> msgid "Subcorpus"
everywhere you can select a subcorpus, e.g. in the Concordance form -> text types
> msgid "View current concordance" > msgid "View concordance"
these two are in the new design (beta.sketchengine.co.uk) instead of old “View” in the various concordance functions (e.g. freqs)
> msgid "Show frequencies of Node forms"
popup help to a link in the concordance submenu
> msgid "ConcDesc"
abbreviation for “Concordance description” – gives the formal description of the concordance
> msgid "info"
subcorpus info (everywhere where you can see subcorpora)
> msgid "unspecified"
the default value of PoS in the concordance form
> msgid "Phrase" > msgid "Word Form" > msgid "Concordance - First query form"
in the concordance form (the third is the title)
> msgid "Simple" > msgid "Expert options"
in the concordance form (new version)
> msgid "Set limit"
in the frequency lists (e.g. concordance -> frequency -> node tags)
> msgid "Header Fields"
this lists the structure attributes (e.g. doc.author) of the corpus
> msgid "Numbers stand for"
switching between document/token counts in listing structure attributes
> msgid "#"
means “number of” eg. tokens, docs – in listing structure attributes
> msgid "freq"
abbreviation of “frequency” – in wsketch, thesaurus results
> msgid "Sample form"
concordance -> sample (page title)
> msgid "Save Collocations" > msgid "Save Collocation Candidates"
collocations list -> save
> msgid "Concordance lines" > msgid ">> More" > msgid "other"
new version – result of using the “simple search” in the right upper box
> msgid "of" (in view.py)
this shows up as “(page) X of XYZ”
> msgid "Search" > msgid "in" > msgid "Help"
new version — topbar
> msgid "View options" > msgid "Shuffle"
> msgid "Attributes" > msgid "Structures" > msgid "References" > msgid "Display attributes" > msgid "Number of lines to be sorted"
after clicking “View options”
> msgid "Concordance view settings" > msgid "Swap concordance view mode" > msgid "Sort according to the 'References' field" > msgid "Randomize ordering of lines"
popup help to the concordance submenu in the new version
> msgid "Compute a list of all words"
wordlist submenu popup help
> msgid "Reference corpus" > msgid "Reference subcorpus"
> "Subcorpora operations need more preprocessing. It will take a while, please " > "wait." > "Once the list has been computed, it will be stored, so will be immediately " > "available next time." > "same subcorpus will show you progress of the computation."
This message appears when you create the subcorpus and then request a wordlist/keywords for this subcorpus for the first time
> msgid "ARF" > msgid "ARF/mill"
Appears in keywords result — “mill” is a million (freq per million words)
> msgid "Switch between clustered and unclustered word sketches"
mouseover in wsketch submenu
> msgid "word frequency" > msgid "cluster frequency"
mouseovers in wsketch tables