Original : Em út theo anh cả vào miền Nam.Ĭoccoc-tokenizer : Em_út theo anh_cả vào miền_Nam. The tokenizer tool has a special output format which is similar to other existing tools for tokenization of Vietnamese texts - it preserves all the original text and just marks multi-syllable tokens with underscores instead of spaces.
0 Comments
Leave a Reply. |