Showing posts with label XML Encoding. Show all posts
Showing posts with label XML Encoding. Show all posts

Friday, August 24, 2012

XML Encoding


XML documents can contain non ASCII characters, like Norwegian æ ø å , or French ê è é.
To avoid errors, specify the XML encoding, or save XML files as Unicode.

XML Encoding Errors

If you load an XML document, you can get two different errors indicating encoding problems:

An invalid character was found in text content.

You get this error if your XML contains non ASCII characters, and the file was saved as single-byte ANSI (or ASCII) with no encoding specified.


Switch from current encoding to specified encoding not supported.

You get this error if your XML file was saved as double-byte Unicode (or UTF-16) with a single-byte encoding (Windows-1252, ISO-8859-1, UTF-8) specified.

You also get this error if your XML file was saved with single-byte ANSI (or ASCII), with double-byte encoding (UTF-16) specified.


Windows Notepad

Windows Notepad save files as single-byte ANSI (ASCII) by default.
If you select "Save as...", you can specify double-byte Unicode (UTF-16).

Save the XML file below as Unicode (note that the document does not contain any encoding attribute):

<?xml version="1.0"?>
<note>
<from>Jani</from>
<to>Tove</to>
<message>Norwegian: æøå. French: êèé</message>
</note>

The file above, note_encode_none_u.xml will NOT generate an error. But if you specify a single-byte encoding it will.

The following encoding (open it), will give an error message:

<?xml version="1.0" encoding="windows-1252"?>

The following encoding (open it), will give an error message:

<?xml version="1.0" encoding="ISO-8859-1"?>

The following encoding (open it), will give an error message:

<?xml version="1.0" encoding="UTF-8"?>

The following encoding (open it), will NOT give an error:

<?xml version="1.0" encoding="UTF-16"?>


Conclusion

  • Always use the encoding attribute
  • Use an editor that supports encoding
  • Make sure you know what encoding the editor uses
  • Use the same encoding in your encoding attribute