This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Description
If we make the assumption that the input text is going to be UTF-8 file format, I think that The following needs to be modified from
|
morfessor -l $lmDir/zeroth_morfessor.seg -T - -o - \ |
morfessor -l $lmDir/zeroth_morfessor.seg -T - -o - \
to
morfessor -e 'utf-8' -l $lmDir/zeroth_morfessor.seg -T - -o - \
This is because morfessor defaults to the assumption that the text is ASCII.