Package pyxb :: Package utils :: Module activestate
[hide private]
[frames] | no frames]

Module activestate

source code

Functions [hide private]
 
detectXMLEncoding(fp)
Attempts to detect the character encoding of the xml file given by a file object fp.
source code
Variables [hide private]
  __package__ = None
hash(x)
Function Details [hide private]

detectXMLEncoding(fp)

source code 

Attempts to detect the character encoding of the xml file given by a file object fp. fp must not be a codec wrapped file object!

The return value can be:

  • if detection of the BOM succeeds, the codec name of the corresponding unicode charset is returned
  • if BOM detection fails, the xml declaration is searched for the encoding attribute and its value returned. the "<" character has to be the very first in the file then (it's xml standard after all).
  • if BOM and xml declaration fail, None is returned. According to xml 1.0 it should be utf_8 then, but it wasn't detected by the means offered here. at least one can be pretty sure that a character coding including most of ASCII is used :-/