To apply annotations to MT output, we need to find translations of substrings. Currently we use an uppercasing trick used in ll.Translator#translate (inside getPlexGroup).
That only works if:
- both source and target languages distinguish uppercase / lowercase; and
- the MT engine reliably maps the uppercasing to the right place; and
- case changes fairly reliably have no other affect on the translation.
This is mostly fine for Apertium language pairs. But for other engines we need to use different techniques. Some engines can directly provide subsequence correspondence mappings, or have an HTML mode that maps tags pretty infallibly. Others will need the use of probabilistic tricks such as translating the subsequence and fuzzily matching it in the target sentence, in some cases in a language-specific way.