• X
  • LinkedIn
  • Facebook
  • Youtube
  • Rss
  • LOG IN
  • Sign up
  • FREE Trial
  • subscribe to news
Sketch Engine
  • Home
  • News & Events
    • News
    • Blog
    • Events
    • Subscribe to email updates
  • Pricing
  • User Guide
    • Quick Start Guide
      • Quick Start Guide en français
    • User guide
    • Languages & corpora
    • FAQs
    • Documentation for experts
    • Training
  • About us
    • Jobs
    • Lexical Computing
      • The company
      • The people
    • Research
      • Referencing Sketch Engine and bibliography
      • Research Agenda
      • Dissertation topics
      • Statistics in Sketch Engine
  • Contact
    • Contact us
    • Subscribe to email updates
  • Search
  • Menu Menu
Blog - Latest News
You are here: Home1 / User Guide2 / Glossary3 / tokenization

tokenization

For the corpus to work, the corpus text should be first divided into individual tokens. Tokenization is the automatic process of dividing text into tokens. This process is performed by tools called tokenizers.

« Back to Glossary Index
Share this entry
  • Share on Facebook
  • Share on X
  • Share on WhatsApp
  • Share on LinkedIn
  • Share by Mail
https://www.sketchengine.eu/wp-content/uploads/SE_logo_330x150-bleed-transp-bg.png 0 0 2024-11-13 16:24:182024-11-13 16:24:18tokenization

for learners of languages

SkELL logo transparent

A Course in Lexicography and Lexical Computing

Logo of Lexicom – a workshop in corpus linguistics and lexicography

term extraction

Logo of OneClick Terms – term extraction tool

learn sketch engine

© Copyright - Lexical Computing CZ s.r.o.
  • X
  • LinkedIn
  • Facebook
  • Youtube
  • Rss
  • Contact
  • Jobs
  • Pricing
  • Privacy
  • Terms
  • Cookie settings
tokentokenizer
Scroll to top