mtWaC: Maltese corpus from the web
The Maltese Web Corpus (mtWaC) is a Maltese corpus made up of texts collected from the Internet. The corpus was prepared according to standards described in the document A Corpus Factory for Many Languages (Kilgarriff et al. at LREC 2010).
Data were downloaded in November 2012 with the total size 110 million words. Texts were cleaned and deduplicated.