https://www.sketchengine.eu/wp-content/uploads/SE_logo_330x150-bleed-transp-bg.png 0 0 2019-04-02 16:23:212019-04-16 12:13:53document
A document (called a file in old corpora) is a generic name used in Sketch Engine to refer to any file, document or webpage the corpus is made up of.
If a user uploads a file (such as .doc, .pdf, .txt), each of the files becomes a corpus document. If the user downloads content from the web, each web page becomes a corpus document.
The beginning and end of each document is automatically marked with a structure, most typically with <doc></doc> but certain corpora may use a different convention such as British National Corpus which uses <bncdoc></bncdoc>. This can be checked on the corpus info page.
A corpus can also be divided into documents by manually inserting document structures into the source text.