HTML5 Print

Module Contents

This tool pretty print your HTML, CSS and JavaScript file. The package comes with two parts:

  • a command line tool, html5-print
  • a python module, html5print
https://travis-ci.org/berniey/html5print.png?branch=master Latest Version Documentation Source Code License

Introduction

This module reformat web page code and make it more readable. It is targeted for developers, hence is not optimized for speed. I start out looking for a tool, ended up created this module. Hope it helps you!

Key features:

  • Pretty print HTML as well as embedded CSS and JavaScript within it
  • Pretty print pure CSS and JavaScript
  • Try to fix fragmented HTML5
  • Try to fix HTML with broken unicode encoding
  • Try to guess encoding of the document, and in some cases manage to convert 8-bit byte code back into correct UTF-8 format
  • Support both Python 2 and 3

Installation

$ [sudo] pip install html5print

Uninstallation

$ [sudo] pip uninstall html5print
$ [sudo] pip uninstall bs4 html5lib slimit tinycss2 requests chardet

Command Line Tool

Synopsis

$ html5-print --help
usage: html5-print [-h] [-o OUTFILE] [-s INDENT_WIDTH] [-e ENCODING]
                    [-t {html,js,css}] [-v]
                    infile

Beautify HTML5, CSS, JavaScript - Version 0.1.2 (By Bernard Yue)
This tool reformat the input and return a beautified version,
in unicode.

positional arguments:
  infile                filename | url | -, a dash, which represents stdin

optional arguments:
  -h, --help            show this help message and exit
  -o OUTFILE, --output OUTFILE
                        filename for formatted HTML, stdout if omitted
  -s INDENT_WIDTH, --indent-width INDENT_WIDTH
                        number of space for indentation, default 2
  -e ENCODING, --encoding ENCODING
                        encoding of input, default UTF-8
  -t {html,js,css}, --filetype {html,js,css}
                        type of file to parse, default "html"
  -v, --version         show program's version number and exit

Example

Pretty print HTML:

$ html5-print -s4 -
Press Ctrl-D when finished
<html><head><title>Small HTML page</title>
<style>p { margin: 10px 20px; color: black; }</style>
<script>function myFunction() {
document.getElementById("demo").innerHTML = "Paragraph changed.";
}</script>
</head><body>
<p>Some text for testing</body></html>
^D
<html>
    <head>
        <title>
            Small HTML page
        </title>
        <style>
            p {
                margin              : 10px 20px;
                color               : black;
            }
        </style>
        <script>
            function myFunction() {
                document.getElementById("demo").innerHTML = "Paragraph changed.";
            }
        </script>
    </head>
    <body>
        <p>
            Some text for testing
        </p>
    </body>
</html>
$

Create valid HTML5 document from HTML fragment:

$ html5-print -s4 -
Press Ctrl-D when finished
<title>Hello in different language</title>
<p>Here is "hello" in different languages</p>
<ul>
<li>Hello
<li>您好
<li>こんにちは
<li>Dobrý den,
<li>สวัสดี
^D
<html>
    <head>
        <title>
            Hello in different language
        </title>
    </head>
    <body>
        <p>
            Here is "hello" in different languages
        </p>
        <ul>
            <li>
                Hello
            </li>
            <li>
                您好
            </li>
            <li>
                こんにちは
            </li>
            <li>
                Dobrý den,
            </li>
            <li>
                สวัสดี
            </li>
        </ul>
    </body>
</html>
$

Testing

The module uses pytest. Use pip to install pytest.

$ [sudo] pip install pytest

Then run test as normal.

$ tar zxf html5print-0.1.2.tar.gz
$ cd html5print-0.1.2
$ python setup.py test

License

This module is distributed under Apache License Version 2.0.

Python API

class html5print.CSSBeautifier

Bases: html5print.utils.BeautifierBase

A CSS Beautifier that pretty print CSS. It loosely supports CSS3.

classmethod beautify(css, indent=2, encoding=None)

Prettifing css by reindending to width of indent per level. css is expected to be a valid Cascading Style Sheet

Parameters:
  • css – a valid css as multiline string
  • indent – width od indentation per level
  • encoding – expected encoding of css. If None, it will be guesssed
Returns:

reindented css

>>> # a single css rule
>>> from html5print import CSSBeautifier
>>> css = ".para { margin: 10px 20px; }"
>>> print(CSSBeautifier.beautify(css))
.para {
  margin              : 10px 20px;
}
>>> # multiple css rules
>>> from html5print import CSSBeautifier
>>> css = ".para { margin: 10px 20px; }"
>>> css += os.linesep + "p { border: 5px solid red; }"
>>> print(CSSBeautifier.beautify(css))
.para {
  margin              : 10px 20px;
}
p {
  border              : 5px solid red;
}
>>> # pseudo-class css rule
>>> from html5print import CSSBeautifier
>>> css = ' /* beginning of css*/\n ::after { margin: 10px 20px; }'
>>> print(CSSBeautifier.beautify(css))
/* beginning of css*/
::after {
  margin              : 10px 20px;
}
>>> # pseudo-class css rule with different indent
>>> from html5print import CSSBeautifier
>>> css = ' /* beginning of css*/\n ::after { margin: 10px 20px; }'
>>> print(CSSBeautifier.beautify(css, 4))
/* beginning of css*/
::after {
    margin              : 10px 20px;
}
>>> # pseudo-class css rules with comments in between
>>> from html5print import CSSBeautifier
>>> css = ' /* beginning of css*/\n ::after { margin: 10px 20px; }'
>>> css += os.linesep + ' /* another comment */p {'
>>> css += 'h1 : color: #36CFFF; font-weight: normal;}'
>>> print(CSSBeautifier.beautify(css, 4))
/* beginning of css*/
::after {
    margin              : 10px 20px;
}
/* another comment */
p {
    h1                  : color: #36CFFF;
    font-weight         : normal;
}
>>> # media query
>>> from html5print import CSSBeautifier
>>> css = '''@media (-webkit-min-device-pixel-ratio:0) {
... h2.collapse { margin: -22px 0 22px 18px;
... }
... ::i-block-chrome, h2.collapse { margin: 0 0 22px 0; } }
... '''
>>> print(CSSBeautifier.beautify(css, 4))
@media (-webkit-min-device-pixel-ratio:0) {
    h2.collapse {
        margin              : -22px 0 22px 18px;
    }
    ::i-block-chrome, h2.collapse {
        margin              : 0 0 22px 0;
    }
}
classmethod beautifyTextInHTML(html, indent=2, encoding=None)

Beautifying CSS within the <style></style> tag. HTML comments(s) (i.e. <!-- ... -->) within the style tag, if any, will be moved to the end of the tag block.

Note: The function assumes tag <style> the first element in a new line containing the tag (except for whitespace). Indention of the style block will be the indent of <style> tag plus one indent of current indentation

Parameters:
  • html – html as string
  • indent – width of indentation for embedded CSS in HTML
Returns:

html with CSS beautified (i.e. text within <style>...</style>)

>>> # pretty print css
>>> from html5print import CSSBeautifier
>>> html = '''<html><body>
...   <style>
...     .para { margin: 10px 20px; }
... <!-- This is what the function is dealing with-->
... p { color: red; font-style: normal; }
...   </style>
... </body></html>'''
>>> print(CSSBeautifier.beautifyTextInHTML(html))
<html><body>
  <style>
    .para {
      margin              : 10px 20px;
    }
    p {
      color               : red;
      font-style          : normal;
    }
    <!-- This is what the function is dealing with-->
  </style>
</body></html>
>>> # <style> not the first element, no pretty print
>>> from html5print import CSSBeautifier
>>> html = '''<html><body><style>
...     .para { margin: 10px 20px; }
... <!-- This is what the function is dealing with-->
... p { color: red; font-style: normal; }
...   </style>
... </body></html>'''
>>> print(CSSBeautifier.beautifyTextInHTML(html))
<html><body><style>
    .para { margin: 10px 20px; }
<!-- This is what the function is dealing with-->
p { color: red; font-style: normal; }
  </style>
</body></html>
class html5print.JSBeautifier

Bases: html5print.utils.BeautifierBase

A Javascript Beautifier that pretty print Javascript

classmethod beautify(js, indent=2, encoding=None)

Prettifing js by reindending to width of indent per level. js is expected to be a valid Javascipt

Parameters:
  • js – a valid javascript as multiline string
  • indent – width od indentation per level
  • encoding – expected encoding of js. If None, it will be guesssed
Returns:

reindented javascript

>>> from html5print import JSBeautifier
>>> js = '''function myFunction() {
... document.getElementById("demo").innerHTML = "Paragraph changed.";
... }'''
>>> # test default indent of 2 spaces
>>> print(JSBeautifier.beautify(js))
function myFunction() {
  document.getElementById("demo").innerHTML = "Paragraph changed.";
}
>>> # test indent of 4 spaces
>>> print(JSBeautifier.beautify(js, 4))
function myFunction() {
    document.getElementById("demo").innerHTML = "Paragraph changed.";
}
classmethod beautifyTextInHTML(html, indent=2, encoding=None)

Beautifying Javascript within the <script></script> tag. HTML comments(s) (i.e. <!-- ...  -->) within the script tag, if any, will be moved to the end of the tag block

Parameters:
  • html – html as string
  • indent – width of indentation for embedded javascript in HTML
Returns:

html with javascript beautified (i.e. text within <script>...</script>)

>>> from html5print import JSBeautifier
>>> js = '''<html><body>
...   <script>function myFunction() {
... document.getElementById("demo").innerHTML = "Paragraph changed.";
... }
...   </script>
... </body></html>
... '''
>>> print(JSBeautifier.beautifyTextInHTML(js))
<html><body>
  <script>
    function myFunction() {
      document.getElementById("demo").innerHTML = "Paragraph changed.";
    }
  </script>
</body></html>
class html5print.HTMLBeautifier

Bases: html5print.utils.BeautifierBase

HTML Beautifier. Powered by BeautifulSoup 4

classmethod beautify(html, indent=2, encoding=None, formatter=u'html5')

Pretty print html with indentation of indent per level

Parameters:
  • html – html as string
  • indent – width of indentation
  • encoding – encoding of html
  • formatter – formatter to use by bs4. use lxml if you want HTML4 output
Returns:

beautified html

>>> # pretty print HTML
>>> from html5print import HTMLBeautifier
>>> html = '<title>Testing</title><body><p>Some Text</p>'
>>> print(HTMLBeautifier.beautify(html))
<html>
  <head>
    <title>
      Testing
    </title>
  </head>
  <body>
    <p>
      Some Text
    </p>
  </body>
</html>
>>> # pretty print HTML with embedded CSS and Javascript
>>> from html5print import HTMLBeautifier
>>> html = '''<html><head><title>Testing</title>
... <style>p { color:red; font-weight:nornal}
... h1{color:green;}
... </style>
... </head>
... <body><p>Some Text</p>
... <script>function myFunction()
... {document.getElementById("demo").innerHTML="changed.";
... }</script>
... </body></html>
... '''
>>> print(HTMLBeautifier.beautify(html))
<html>
  <head>
    <title>
      Testing
    </title>
    <style>
      p {
        color               : red;
        font-weight         : nornal
      }
      h1 {
        color               : green;
      }
    </style>
  </head>
  <body>
    <p>
      Some Text
    </p>
    <script>
      function myFunction() {
        document.getElementById("demo").innerHTML = "changed.";
      }
    </script>
  </body>
</html>
html5print.decodeText(text, encoding=None)

Decoding text to encoding. If encoding is None, encoding will be guessed.

Note: encoding provided will be disregarded if it causes decoding error

Parameters:
  • text – string to be decoded
  • encoding – encoding scheme of text. guess by system if None
Returns:

new decoded text as unicode

>>> import sys
>>> from html5print import decodeText
>>> s = 'Hello! 您好! こんにちは! halló!'
>>> output = decodeText(s)
>>> print(output)
Hello! 您好! こんにちは! halló!
>>> if sys.version_info[0] >= 3:
...    unicode = str
>>> isinstance(output, unicode)
True
html5print.isUnicode(text)

Return True if text is unicode. False otherwise. Note that because the function has to work on both Python 2 and Python 3, u’’ cannot be used in doctest.

Parameters:text – string to check if it is unicode
Returns:
True if text is unicode,
False otherwise
>>> import sys
>>> if sys.version_info[0] >= 3:
...     isUnicode(bytes('hello', 'ascii'))
... else:
...     isUnicode(bytes('hello'))
False
>>> import sys
>>> if sys.version_info[0] >= 3:
...     unicode = str
>>> isUnicode(unicode('hello'))
True

Table Of Contents

This Page