This document describes the character encodings supported by the Minimum Profit text editor and the performed autodetection tests. <->
The following steps are performed on input:
utf-8bom
, utf-16le
, utf-16be
, utf-32le
or utf-32be
;
utf-8
;
8bit
;
On output, the document is saved using the locale conversion functions.
The following steps are performed on input:
utf-8bom
;
?
character.
On output, it saves the document using the utf-8 encoding without a BOM prefix.
On input, if no utf-8 BOM is found, the encoding is still assumed to be
utf-8
, but not changed to it.
On output, it saves the document using the utf-8 encoding with a BOM prefix.
No character conversion is done on input nor output.
Characters are treated as being encoded using the iso8859-1 character set,
that is, no real conversion is done. This mode is really identical to
8bit
.
Aliases: latin1
.
On input, it tries to determine the endianness of the document by reading
the BOM; if a valid one is found, encoding is set to utf-16le
or
utf-16be
; if none is found, it assumes utf-16le
.
On output, it behaves like utf-16le
.
Aliases: ucs-2
.
On input, it assumes utf-16 little endian characters.
On output, it saves the document using the utf-16 little endian encoding with a BOM prefix.
Aliases: ucs-2le
.
On input, it assumes utf-16 big endian characters.
On output, it saves the document using the utf-16 big endian encoding with a BOM prefix.
Aliases: ucs-2be
.
On input, it tries to determine the endianness of the document by reading
the BOM; it a valid one is found, encoding is set to utf-32le
or
utf-32be
; if none is found, it assumes utf-32le
.
On output, it behaves like utf-32le
.
Aliases: ucs-4
.
On input, it assumes utf-32 little endian characters.
On output, it saves the document using the utf-32 little endian encoding with a BOM prefix.
Aliases: ucs-4le
.
On input, it assumes utf-32 big endian characters.
On output, it saves the document using the utf-32 big endian encoding with a BOM prefix.
Aliases: ucs-4be
.
If Minimum Profit is compiled with support for the iconv
library, many
more encodings will be available. There is no easy way of knowing their
names; the underlying system may provide the iconv --list
command to have
a list.
Though not directly related to character encodings, the Minimum Profit text
editor remembers the end of line marker found inside each document, and use
it when saving it afterwards. This helps in maintaining document
compatibility and portability. This behaviour can be disabled by setting
the mp.config.keep_eol
configuration directive to 0.