Package genshi :: Module core :: Class Stream

Class Stream

object --+
         |
        Stream
Known Subclasses:

Represents a stream of markup events.

This class is basically an iterator over the events.

Stream events are tuples of the form:

(kind, data, position)

where kind is the event kind (such as START, END, TEXT, etc), data depends on the kind of event, and position is a (filename, line, offset) tuple that contains the location of the original element or text in the input. If the original location is unknown, position is (None, -1, -1).

Also provided are ways to serialize the stream to text. The serialize() method will return an iterator over generated strings, while render() returns the complete generated text at once. Both accept various parameters that impact the way the stream is serialized.

Instance Methods
 
__init__(self, events, serializer=None)
Initialize the stream with a sequence of markup events.
 
__iter__(self)
Stream
__or__(self, function)
Override the "bitwise or" operator to apply filters or serializers to the stream, providing a syntax similar to pipes on Unix shells.
Stream
filter(self, *filters)
Apply filters to the stream.
basestring
render(self, method=None, encoding='utf-8', out=None, **kwargs)
Return a string representation of the stream.
Stream
select(self, path, namespaces=None, variables=None)
Return a new stream that contains the events matching the given XPath expression.
iterator
serialize(self, method='xml', **kwargs)
Generate strings corresponding to a specific serialization of the stream.
 
__str__(self)
str(x)
 
__unicode__(self)
 
__html__(self)

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __subclasshook__

Class Variables
  START = 'START'
a start tag
  END = 'END'
an end tag
  TEXT = 'TEXT'
literal text
  XML_DECL = 'XML_DECL'
XML declaration
  DOCTYPE = 'DOCTYPE'
doctype declaration
  START_NS = 'START_NS'
start namespace mapping
  END_NS = 'END_NS'
end namespace mapping
  START_CDATA = 'START_CDATA'
start CDATA section
  END_CDATA = 'END_CDATA'
end CDATA section
  PI = 'PI'
processing instruction
  COMMENT = 'COMMENT'
comment
Instance Variables
  events
The underlying iterable producing the events
  serializer
The default serializion method
Properties

Inherited from object: __class__

Method Details

__init__(self, events, serializer=None)
(Constructor)

 
Initialize the stream with a sequence of markup events.
Parameters:
  • events - a sequence or iterable providing the events
  • serializer - the default serialization method to use for this stream
Overrides: object.__init__

Note: Changed in 0.5: added the serializer argument

__or__(self, function)
(Or operator)

 

Override the "bitwise or" operator to apply filters or serializers to the stream, providing a syntax similar to pipes on Unix shells.

Assume the following stream produced by the HTML function:

>>> from genshi.input import HTML
>>> html = HTML('''<p onclick="alert('Whoa')">Hello, world!</p>''')
>>> print(html)
<p onclick="alert('Whoa')">Hello, world!</p>

A filter such as the HTML sanitizer can be applied to that stream using the pipe notation as follows:

>>> from genshi.filters import HTMLSanitizer
>>> sanitizer = HTMLSanitizer()
>>> print(html | sanitizer)
<p>Hello, world!</p>

Filters can be any function that accepts and produces a stream (where a stream is anything that iterates over events):

>>> def uppercase(stream):
...     for kind, data, pos in stream:
...         if kind is TEXT:
...             data = data.upper()
...         yield kind, data, pos
>>> print(html | sanitizer | uppercase)
<p>HELLO, WORLD!</p>

Serializers can also be used with this notation:

>>> from genshi.output import TextSerializer
>>> output = TextSerializer()
>>> print(html | sanitizer | uppercase | output)
HELLO, WORLD!

Commonly, serializers should be used at the end of the "pipeline"; using them somewhere in the middle may produce unexpected results.

Parameters:
  • function - the callable object that should be applied as a filter
Returns: Stream
the filtered stream

filter(self, *filters)

 

Apply filters to the stream.

This method returns a new stream with the given filters applied. The filters must be callables that accept the stream object as parameter, and return the filtered stream.

The call:

stream.filter(filter1, filter2)

is equivalent to:

stream | filter1 | filter2
Parameters:
  • filters - one or more callable objects that should be applied as filters
Returns: Stream
the filtered stream

render(self, method=None, encoding='utf-8', out=None, **kwargs)

 

Return a string representation of the stream.

Any additional keyword arguments are passed to the serializer, and thus depend on the method parameter value.

Parameters:
  • method - determines how the stream is serialized; can be either "xml", "xhtml", "html", "text", or a custom serializer class; if None, the default serialization method of the stream is used
  • encoding - how the output string should be encoded; if set to None, this method returns a unicode object
  • out - a file-like object that the output should be written to instead of being returned as one big string; note that if this is a file or socket (or similar), the encoding must not be None (that is, the output must be encoded)
Returns: basestring
a str or unicode object (depending on the encoding parameter), or None if the out parameter is provided

See Also: XMLSerializer, XHTMLSerializer, HTMLSerializer, TextSerializer

Note: Changed in 0.5: added the out parameter

select(self, path, namespaces=None, variables=None)

 

Return a new stream that contains the events matching the given XPath expression.

>>> from genshi import HTML
>>> stream = HTML('<doc><elem>foo</elem><elem>bar</elem></doc>')
>>> print(stream.select('elem'))
<elem>foo</elem><elem>bar</elem>
>>> print(stream.select('elem/text()'))
foobar

Note that the outermost element of the stream becomes the context node for the XPath test. That means that the expression "doc" would not match anything in the example above, because it only tests against child elements of the outermost element:

>>> print(stream.select('doc'))
<BLANKLINE>

You can use the "." expression to match the context node itself (although that usually makes little sense):

>>> print(stream.select('.'))
<doc><elem>foo</elem><elem>bar</elem></doc>
Parameters:
  • path - a string containing the XPath expression
  • namespaces - mapping of namespace prefixes used in the path
  • variables - mapping of variable names to values
Returns: Stream
the selected substream
Raises:
  • PathSyntaxError - if the given path expression is invalid or not supported

serialize(self, method='xml', **kwargs)

 

Generate strings corresponding to a specific serialization of the stream.

Unlike the render() method, this method is a generator that returns the serialized output incrementally, as opposed to returning a single string.

Any additional keyword arguments are passed to the serializer, and thus depend on the method parameter value.

Parameters:
  • method - determines how the stream is serialized; can be either "xml", "xhtml", "html", "text", or a custom serializer class; if None, the default serialization method of the stream is used
Returns: iterator
an iterator over the serialization results (Markup or unicode objects, depending on the serialization method)

See Also: XMLSerializer, XHTMLSerializer, HTMLSerializer, TextSerializer

__str__(self)
(Informal representation operator)

 
str(x)
Overrides: object.__str__
(inherited documentation)