===================================================
encutils - encoding detection collection for Python
===================================================
:Version: 0.9
:Author: Christof Hoeke, see http://cthedot.de/encutils/
:Contributor: Robert Siemer
:Copyright: 2005-2009: Christof Hoeke
:License: encutils has a dual-license, please choose whatever you prefer:

    * encutils is published under the 
      `LGPL 3 or later <http://cthedot.de/encutils/license/>`__
    * encutils is published under the  
      `Creative Commons License <http://creativecommons.org/licenses/by/3.0/>`__.
      
    encutils is free software: you can redistribute it and/or modify
    it under the terms of the GNU Lesser General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    encutils is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU Lesser General Public License for more details.

    You should have received a copy of the GNU Lesser General Public License
    along with encutils.  If not, see <http://www.gnu.org/licenses/>.
 

A collection of helper functions to detect encodings of text files (like HTML, XHTML, XML, CSS, etc.) retrieved via HTTP, file or string.

:func:`getEncodingInfo` is probably the main function of interest which uses
other supplied functions itself and gathers all information together and
supplies an :class:`EncodingInfo` object.

example::

    >>> import encutils
    >>> info = encutils.getEncodingInfo(url='http://cthedot.de/encutils/')
    
    >>> print info # = str(info)
    utf-8
    
    >>> print repr(info) # doctest:+ELLIPSIS
    <encutils.EncodingInfo object encoding='utf-8' mismatch=False at...>
    
    >>> print info.logtext
    HTTP media_type: text/html
    HTTP encoding: utf-8
    Encoding (probably): utf-8 (Mismatch: False)
    <BLANKLINE>

references
    XML
        RFC 3023 (http://www.ietf.org/rfc/rfc3023.txt)
        
        easier explained in 
            - http://feedparser.org/docs/advanced.html
            - http://www.xml.com/pub/a/2004/07/21/dive.html
            
    HTML
        http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2

TODO
    - parse @charset of HTML elements?
    - check for more texttypes if only text given  