Build / tailor subsequence detection for each machine translation engine

To apply annotations to MT output, we need to find translations of substrings. Currently we use an uppercasing trick used in ll.Translator#translate (inside getPlexGroup).

That only works if:

- both source and target languages distinguish uppercase / lowercase; and
- the MT engine reliably maps the uppercasing to the right place; and
- case changes fairly reliably have no other affect on the translation.

This is mostly fine for Apertium language pairs. But for other engines we need to use different techniques. Some engines can directly provide subsequence correspondence mappings, or have an HTML mode that maps tags pretty infallibly. Others will need the use of probabilistic tricks such as translating the subsequence and fuzzily matching it in the target sentence, in some cases in a language-specific way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build / tailor subsequence detection for each machine translation engine #19

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Build / tailor subsequence detection for each machine translation engine #19

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions