Python: module AdvancedHTMLParser.Formatter

AdvancedHTMLParser.Formatter

Modules

codecs
sys

Classes



html.parser.HTMLParser(_markupbase.ParserBase)

AdvancedHTMLFormatter

class AdvancedHTMLFormatter(html.parser.HTMLParser)

    A formatter for HTML. Note this does not understand CSS, so if you are enabling preformatted text based on css rules, it will not work. It does, however, understand "pre", "code" and "script" tags and will not try to format their contents.

Method resolution order:

AdvancedHTMLFormatter

html.parser.HTMLParser

_markupbase.ParserBase

builtins.object

Methods defined here:

__init__(self, indent=' ', encoding='utf-8')
Create a formatter. @param indent - Either a space/tab/newline that represents one level of indent, or an integer to use that number of spaces @param encoding - Use this encoding for the document.

feed(self, contents)
feed - Load contents @param contents - HTML contents

getHTML(self)
getHTML - Get the full HTML as contained within this tree, converted to  valid XHTML     @returns - String

getRoot(self)
getRoot - returns the root Tag     @return - AdvancedTag at root. If you provided multiple root nodes, this will be a "holder" with tagName value as constants.INVISIBLE_ROOT_TAG

getRootNodes(self)
getRootNodes - Gets all objects at the "root" (first level; no parent). Use this if you may have multiple roots (not children of <html>)     Use this method to get objects, for example, in an AJAX request where <html> may not be your root.     Note: If there are multiple root nodes (i.e. no <html> at the top), getRoot will return a special tag. This function automatically       handles that, and returns all root nodes.     @return list<AdvancedTag> - A list of AdvancedTags which are at the root level of the tree.

handle_charref(self, charRef)
Internal for parsing

handle_comment(self, comment)
Internal for parsing

handle_data(self, data)
handle_data - Internal for parsing

handle_decl(self, decl)
Internal for parsing

handle_endtag(self, tagName)
handle_endtag - Internal for parsing

handle_entityref(self, entity)
Internal for parsing

handle_startendtag(self, tagName, attributeList)
handle_startendtag - Internal for parsing

handle_starttag(self, tagName, attributeList, isSelfClosing=False)
handle_starttag - Internal for parsing

parseFile(self, filename)
parseFile - Parses a file and creates the DOM tree and indexes     @param filename <str/file> - A string to a filename or a file object. If file object, it will not be closed, you must close.

parseStr(self, html)
parseStr - Parses a string and creates the DOM tree and indexes.     @param html <str> - valid HTML

setRoot(self, root)
setRoot - Sets the root node, and reprocesses the indexes @param root - AdvancedTag to be new root

unknown_decl(self, decl)
Internal for parsing

Methods inherited from html.parser.HTMLParser:

check_for_whole_start_tag(self, i)
# Internal -- check to see if we have a complete starttag; return end # or -1 if incomplete.

clear_cdata_mode(self)

close(self)
Handle any buffered data.

get_starttag_text(self)
Return full source of start tag: '<...>'.

goahead(self, end)
# Internal -- handle data as far as reasonable.  May leave state # and data to be processed by a subsequent call.  If 'end' is # true, force handling all data as if followed by EOF marker.

handle_pi(self, data)
# Overridable -- handle processing instruction

parse_bogus_comment(self, i, report=1)
# Internal -- parse bogus comment, return length or -1 if not terminated # see http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state

parse_endtag(self, i)
# Internal -- parse endtag, return end or -1 if incomplete

parse_html_declaration(self, i)
# Internal -- parse html declarations, return length or -1 if not terminated # See w3.org/TR/html5/tokenization.html#markup-declaration-open-state # See also parse_declaration in _markupbase

parse_pi(self, i)
# Internal -- parse processing instr, return end or -1 if not terminated

parse_starttag(self, i)
# Internal -- handle starttag, return end or -1 if not terminated

reset(self)
Reset this instance.  Loses all unprocessed data.

set_cdata_mode(self, elem)

unescape(self, s)
# Internal -- helper to remove special character quoting

Data and other attributes inherited from html.parser.HTMLParser:

CDATA_CONTENT_ELEMENTS = ('script', 'style')

Methods inherited from _markupbase.ParserBase:

error(self, message)

getpos(self)
Return current line number and offset.

parse_comment(self, i, report=1)
# Internal -- parse comment, return length or -1 if not terminated

parse_declaration(self, i)
# Internal -- parse declaration (for use by subclasses).

parse_marked_section(self, i, report=1)
# Internal -- parse a marked section # Override this to handle MS-word extension syntax <![if word]>content<![endif]>

updatepos(self, i, j)
# Internal -- update line number and offset.  This should be # called for each piece of data exactly once, in order -- in other # words the concatenation of all the input strings to this # function should be exactly the entire input.

Data descriptors inherited from _markupbase.ParserBase:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

Data

__all__ = ('AdvancedHTMLFormatter',)