| |
- html.parser.HTMLParser(_markupbase.ParserBase)
-
- AdvancedHTMLFormatter
class AdvancedHTMLFormatter(html.parser.HTMLParser) |
|
A formatter for HTML. Note this does not understand CSS, so if you are enabling preformatted text based on css rules, it will not work.
It does, however, understand "pre", "code" and "script" tags and will not try to format their contents. |
|
- Method resolution order:
- AdvancedHTMLFormatter
- html.parser.HTMLParser
- _markupbase.ParserBase
- builtins.object
Methods defined here:
- __init__(self, indent=' ', encoding='utf-8')
- Create a formatter.
@param indent - Either a space/tab/newline that represents one level of indent, or an integer to use that number of spaces
@param encoding - Use this encoding for the document.
- feed(self, contents)
- feed - Load contents
@param contents - HTML contents
- getHTML(self)
- getHTML - Get the full HTML as contained within this tree, converted to valid XHTML
@returns - String
- getRoot(self)
- getRoot - returns the root Tag
@return - AdvancedTag at root. If you provided multiple root nodes, this will be a "holder" with tagName value as constants.INVISIBLE_ROOT_TAG
- getRootNodes(self)
- getRootNodes - Gets all objects at the "root" (first level; no parent). Use this if you may have multiple roots (not children of <html>)
Use this method to get objects, for example, in an AJAX request where <html> may not be your root.
Note: If there are multiple root nodes (i.e. no <html> at the top), getRoot will return a special tag. This function automatically
handles that, and returns all root nodes.
@return list<AdvancedTag> - A list of AdvancedTags which are at the root level of the tree.
- handle_charref(self, charRef)
- Internal for parsing
- handle_comment(self, comment)
- Internal for parsing
- handle_data(self, data)
- handle_data - Internal for parsing
- handle_decl(self, decl)
- Internal for parsing
- handle_endtag(self, tagName)
- handle_endtag - Internal for parsing
- handle_entityref(self, entity)
- Internal for parsing
- handle_startendtag(self, tagName, attributeList)
- handle_startendtag - Internal for parsing
- handle_starttag(self, tagName, attributeList, isSelfClosing=False)
- handle_starttag - Internal for parsing
- parseFile(self, filename)
- parseFile - Parses a file and creates the DOM tree and indexes
@param filename <str/file> - A string to a filename or a file object. If file object, it will not be closed, you must close.
- parseStr(self, html)
- parseStr - Parses a string and creates the DOM tree and indexes.
@param html <str> - valid HTML
- setRoot(self, root)
- setRoot - Sets the root node, and reprocesses the indexes
@param root - AdvancedTag to be new root
- unknown_decl(self, decl)
- Internal for parsing
Methods inherited from html.parser.HTMLParser:
- check_for_whole_start_tag(self, i)
- # Internal -- check to see if we have a complete starttag; return end
# or -1 if incomplete.
- clear_cdata_mode(self)
- close(self)
- Handle any buffered data.
- get_starttag_text(self)
- Return full source of start tag: '<...>'.
- goahead(self, end)
- # Internal -- handle data as far as reasonable. May leave state
# and data to be processed by a subsequent call. If 'end' is
# true, force handling all data as if followed by EOF marker.
- handle_pi(self, data)
- # Overridable -- handle processing instruction
- parse_bogus_comment(self, i, report=1)
- # Internal -- parse bogus comment, return length or -1 if not terminated
# see http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
- parse_endtag(self, i)
- # Internal -- parse endtag, return end or -1 if incomplete
- parse_html_declaration(self, i)
- # Internal -- parse html declarations, return length or -1 if not terminated
# See w3.org/TR/html5/tokenization.html#markup-declaration-open-state
# See also parse_declaration in _markupbase
- parse_pi(self, i)
- # Internal -- parse processing instr, return end or -1 if not terminated
- parse_starttag(self, i)
- # Internal -- handle starttag, return end or -1 if not terminated
- reset(self)
- Reset this instance. Loses all unprocessed data.
- set_cdata_mode(self, elem)
- unescape(self, s)
- # Internal -- helper to remove special character quoting
Data and other attributes inherited from html.parser.HTMLParser:
- CDATA_CONTENT_ELEMENTS = ('script', 'style')
Methods inherited from _markupbase.ParserBase:
- error(self, message)
- getpos(self)
- Return current line number and offset.
- parse_comment(self, i, report=1)
- # Internal -- parse comment, return length or -1 if not terminated
- parse_declaration(self, i)
- # Internal -- parse declaration (for use by subclasses).
- parse_marked_section(self, i, report=1)
- # Internal -- parse a marked section
# Override this to handle MS-word extension syntax <![if word]>content<![endif]>
- updatepos(self, i, j)
- # Internal -- update line number and offset. This should be
# called for each piece of data exactly once, in order -- in other
# words the concatenation of all the input strings to this
# function should be exactly the entire input.
Data descriptors inherited from _markupbase.ParserBase:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |