Skip to content

Build / tailor subsequence detection for each machine translation engine #19

@divec

Description

@divec

To apply annotations to MT output, we need to find translations of substrings. Currently we use an uppercasing trick used in ll.Translator#translate (inside getPlexGroup).

That only works if:

  • both source and target languages distinguish uppercase / lowercase; and
  • the MT engine reliably maps the uppercasing to the right place; and
  • case changes fairly reliably have no other affect on the translation.

This is mostly fine for Apertium language pairs. But for other engines we need to use different techniques. Some engines can directly provide subsequence correspondence mappings, or have an HTML mode that maps tags pretty infallibly. Others will need the use of probabilistic tricks such as translating the subsequence and fuzzily matching it in the target sentence, in some cases in a language-specific way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions