Metadata-Version: 1.1
Name: htmldammit
Version: 0.1.1
Summary: Make every effort to properly decode HTML, because HTML is unicode, dammit!
Home-page: https://github.com/taleinat/htmldammit
Author: Tal Einat
Author-email: taleinat@gmail.com
License: MIT
Description-Content-Type: UNKNOWN
Description: ======================
        htmldammit
        ======================
        
        .. image:: https://img.shields.io/pypi/v/htmldammit.svg?style=flat
            :target: https://pypi.python.org/pypi/htmldammit
            :alt: Latest Version
        
        .. image:: https://img.shields.io/travis/taleinat/htmldammit/master.svg?style=flat
            :target: https://travis-ci.org/taleinat/htmldammit
            :alt: Build & Tests Status
        
        .. image:: https://img.shields.io/coveralls/taleinat/htmldammit/master.svg?style=flat
            :target: https://coveralls.io/r/taleinat/htmldammit
            :alt: Test Coverage
        
        .. image:: https://img.shields.io/pypi/l/htmldammit.svg?style=flat
            :target: https://github.com/taleinat/htmldammit/blob/master/LICENSE
            :alt: License: MIT
        
        Make every effort to properly decode HTML, because HTML is unicode, dammit!
        
        Features
        --------
        
        * Very easy to use with integrations for ``requests`` and ``urlopen()``.
        * Utilizes information from HTTP headers, inline encoding declarations,
          and UTF BOM-s (Byte Order Marks), as well as falling back to making a
          best guess based on the raw data.
        * Improves upon BeautifulSoup's great ``UnicodeDammit`` utility.
        
        Installation
        ------------
        
        .. code::
        
            pip install htmldammit
        
        Additionally, it is *highly* recommended to install the ``cchardet`` and/or
        the ``chardet`` libraries. This will enable the fallback to guessing the
        encoding based on the raw data.
        
        .. code::
        
            pip install cchardet chardet
        
        Basic usage
        -----------
        
        To decode any binary HTML content into unicode (passing HTTP headers is optional):
        
        .. code:: python
        
            from htmldammit import decode_html
            html = decode_html(raw_html, http_headers)
        
        To get unicode HTML from a ``requests`` response:
        
        .. code:: python
        
            from htmldammit.integrations.requests import get_response_html
            response = requests.get('http://www.example.org/')
            html = get_response_html(response)
        
        To get unicode HTML from a ``urlopen()`` response:
        
        .. code:: python
        
            from htmldammit.integrations.urllib import get_response_html
            response = urlopen('http://www.example.org/')
            html = get_response_html(response)
        
Keywords: htmldammit HTML unicode
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Software Development :: Libraries :: Python Modules
