idTenTen – Indonesian corpus from the web
idTenTen: Corpus of the Indonesian Web The Indonesian Web Corpus (idTenTen) is an Indonesian corpus made up of texts collected…
If you are not happy with the results below please do another search
idTenTen: Corpus of the Indonesian Web The Indonesian Web Corpus (idTenTen) is an Indonesian corpus made up of texts collected…
…Tagset Indonesian tagset is available in Indonesian corpora annotated by the tool TreeTagger (with the Indonesian parameter file) developed by…
idWaC: Indonesian web corpus The Indonesian web corpus (idWaC) is an Indonesian corpus made up of texts collected from the…
…the CQL concordance search box: [tag=”” & morph=””] searches for cardinal numerals Indonesian and Malaysian_Previous morphology – Apertium Source http://wiki.apertium.org/wiki/Indonesian_and_Malaysian/Previous_morphology…
…words Indonesian Web (IndonesianWaC) trial 90,120,046 Indonesian Web 2020 (idTenTen20) main 3,687,192,045 Indonesian Web 2024 (idTenTen24) trial 7,108,841,939 OpenSubtitles 2018…
…vietnamese, turkish, chinese-traditional, hindi, telugu, czech, finnish, croatian, italian, swedish, danish, indonesian, chinese-simplified, malayalam, bengali, spanish, estonian, german, arabic, hebrew,…
…from the Internet. This Malay corpus includes varieties of the Malay language used in Brunei, Malaysia, and Singapore. The Indonesian…
…Galician, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Norwegian, Persian…
…deWaC (sdeWaC)), Greek (gkWaC), Gujarati (guWaC) H Hausa (haWaC ), Hebrew (hebWaC), Hindi (hindiWaC) I Igbo (igWaC), Indonesian (idWaC), Italian…
…Chinese, Indonesian, Japanese, Korean, etc. The current texts of the OpenSubtitles corpora date back to 2018. UNPC The United Nations…
…English tagsets Estonian tagsets Finnish tagsets French tagsets German tagsets Greek tagsets Hebrew tagsets Hindi tagset Hungarian tagsets Indonesian tagset…