-
Notifications
You must be signed in to change notification settings - Fork 14
Description
It's hard to find out which encoding names will be accepted. And their parsing seems a bit too loose.
The list is at
https://hackage.haskell.org/package/encoding-0.9/docs/src/Data.Encoding.html#encodingFromStringExplicit .
I think each commented group is alternate names for a single encoding, with the first entry being a preferred/canonical form.
Case is ignored.
Wherever an underscore appears, any sequence of non-alphanumeric characters is accepted and ignored. (Perhaps only a single space or hyphen would be enough ?)
There are some inconsistencies. Eg,
utf8andutf16are accepted but notutf32.- Mac OS Roman seems to be the common name for that encoding but only
macintoshis accepted.
Here's a list of lowercase "canonical" spellings I came up with; I believe all of these are accepted:
ascii
utf-8
utf-16
utf-32
iso-8859-1
iso-8859-2
iso-8859-3
iso-8859-4
iso-8859-5
iso-8859-6
iso-8859-7
iso-8859-8
iso-8859-9
iso-8859-10
iso-8859-11
iso-8859-13
iso-8859-14
iso-8859-15
iso-8859-16
cp1250
cp1251
cp1252
cp1253
cp1254
cp1255
cp1256
cp1257
cp1258
koi8-r
koi8-u
gb18030
macintosh
jis-x-0201
jis-x-0208
iso-2022-jp
shift-jis
cp437
cp737
cp775
cp850
cp852
cp855
cp857
cp860
cp861
cp862
cp863
cp864
cp865
cp866
cp869
cp874
cp932
I'm not sure it's worthwhile supporting other punctuations, unless it's systematic/comprehensive/documented.