Skip to content

document, validate encoding names #28

@simonmichael

Description

@simonmichael

It's hard to find out which encoding names will be accepted. And their parsing seems a bit too loose.

The list is at
https://hackage.haskell.org/package/encoding-0.9/docs/src/Data.Encoding.html#encodingFromStringExplicit .

I think each commented group is alternate names for a single encoding, with the first entry being a preferred/canonical form.

Case is ignored.
Wherever an underscore appears, any sequence of non-alphanumeric characters is accepted and ignored. (Perhaps only a single space or hyphen would be enough ?)

There are some inconsistencies. Eg,

  • utf8 and utf16 are accepted but not utf32.
  • Mac OS Roman seems to be the common name for that encoding but only macintosh is accepted.

Here's a list of lowercase "canonical" spellings I came up with; I believe all of these are accepted:

ascii
utf-8
utf-16
utf-32
iso-8859-1
iso-8859-2
iso-8859-3
iso-8859-4
iso-8859-5
iso-8859-6
iso-8859-7
iso-8859-8
iso-8859-9
iso-8859-10
iso-8859-11
iso-8859-13
iso-8859-14
iso-8859-15
iso-8859-16
cp1250
cp1251
cp1252
cp1253
cp1254
cp1255
cp1256
cp1257
cp1258
koi8-r
koi8-u
gb18030
macintosh
jis-x-0201
jis-x-0208
iso-2022-jp
shift-jis
cp437
cp737
cp775
cp850
cp852
cp855
cp857
cp860
cp861
cp862
cp863
cp864
cp865
cp866
cp869
cp874
cp932

I'm not sure it's worthwhile supporting other punctuations, unless it's systematic/comprehensive/documented.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions