Package genshi :: Package filters :: Module transform :: Class Transformer

Class Transformer

object --+
         |
        Transformer

Stream filter that can apply a variety of different transformations to a stream.

This is achieved by selecting the events to be transformed using XPath, then applying the transformations to the events matched by the path expression. Each marked event is in the form (mark, (kind, data, pos)), where mark can be any of ENTER, INSIDE, EXIT, OUTSIDE, or None.

The first three marks match START and END events, and any events contained INSIDE any selected XML/HTML element. A non-element match outside a START/END container (e.g. text()) will yield an OUTSIDE mark.

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>')

Transformations act on selected stream events matching an XPath expression. Here's an example of removing some markup (the title, in this case) selected by an expression:

>>> print(html | Transformer('head/title').remove())
<html><head/><body>Some <em>body</em> text.</body></html>

Inserted content can be passed in the form of a string, or a markup event stream, which includes streams generated programmatically via the builder module:

>>> from genshi.builder import tag
>>> print(html | Transformer('body').prepend(tag.h1('Document Title')))
<html><head><title>Some Title</title></head><body><h1>Document
Title</h1>Some <em>body</em> text.</body></html>

Each XPath expression determines the set of tags that will be acted upon by subsequent transformations. In this example we select the <title> text, copy it into a buffer, then select the <body> element and paste the copied text into the body as <h1> enclosed text:

>>> buffer = StreamBuffer()
>>> print(html | Transformer('head/title/text()').copy(buffer)
...     .end().select('body').prepend(tag.h1(buffer)))
<html><head><title>Some Title</title></head><body><h1>Some Title</h1>Some
<em>body</em> text.</body></html>

Transformations can also be assigned and reused, although care must be taken when using buffers, to ensure that buffers are cleared between transforms:

>>> emphasis = Transformer('body//em').attr('class', 'emphasis')
>>> print(html | emphasis)
<html><head><title>Some Title</title></head><body>Some <em
class="emphasis">body</em> text.</body></html>

Instance Methods

__init__(self, path='.')
Construct a new transformation filter.

Stream

__call__(self, stream, keep_marks=False)
Apply the transform filter to the marked stream.

apply(self, function)
Apply a transformation to the stream.

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Selection operations

Transformer

select(self, path)
Mark events matching the given XPath expression, within the current selection.

Transformer

invert(self)
Invert selection so that marked events become unmarked, and vice versa.

Transformer

end(self)
End current selection, allowing all events to be selected.

Deletion operations

Transformer

empty(self)
Empty selected elements of all content.

Transformer

remove(self)
Remove selection from the stream.

Direct element operations

Transformer

unwrap(self)
Remove outermost enclosing elements from selection.

Transformer

wrap(self, element)
Wrap selection in an element.

Content insertion operations

Transformer

replace(self, content)
Replace selection with content.

Transformer

before(self, content)
Insert content before selection.

Transformer

after(self, content)
Insert content after selection.

Transformer

prepend(self, content)
Insert content after the ENTER event of the selection.

Transformer

append(self, content)
Insert content before the END event of the selection.

Attribute manipulation

Transformer

attr(self, name, value)
Add, replace or delete an attribute on selected elements.

Buffer operations

Transformer

copy(self, buffer, accumulate=False)
Copy selection into buffer.

Transformer

cut(self, buffer, accumulate=False)
Copy selection into buffer and remove the selection from the stream.

buffer(self)
Buffer the entire stream (can consume a considerable amount of memory).

Miscellaneous operations

Transformer

filter(self, filter)
Apply a normal stream filter to the selection.

Transformer

map(self, function, kind)
Applies a function to the data element of events of kind in the selection.

Transformer

substitute(self, pattern, replace, count=1)
Replace text matching a regular expression.

rename(self, name)
Rename matching elements.

Transformer

trace(self, prefix='', fileobj=None)
Print events as they pass through the transform.

Properties
	transforms
Inherited from `object`: `__class__`

Method Details

init(self, path=`'.'`)
(Constructor)

Construct a new transformation filter.

Parameters:

path - an XPath expression (as string) or a Path instance

Overrides: object.__init__

call(self, stream, keep_marks=False)
(Call operator)

Apply the transform filter to the marked stream.

Parameters:

stream - the marked event stream to filter
keep_marks - Do not strip transformer selection marks from the stream. Useful for testing.

Returns: Stream

the transformed stream

apply(self, function)

Apply a transformation to the stream.

Transformations can be chained, similar to stream filters. Any callable accepting a marked stream can be used as a transform.

As an example, here is a simple TEXT event upper-casing transform:

>>> def upper(stream):
...     for mark, (kind, data, pos) in stream:
...         if mark and kind is TEXT:
...             yield mark, (kind, data.upper(), pos)
...         else:
...             yield mark, (kind, data, pos)
>>> short_stream = HTML('<body>Some <em>test</em> text</body>')
>>> print(short_stream | Transformer('.//em/text()').apply(upper))
<body>Some <em>TEST</em> text</body>

select(self, path)

Mark events matching the given XPath expression, within the current selection.

>>> html = HTML('<body>Some <em>test</em> text</body>')
>>> print(html | Transformer().select('.//em').trace())
(None, ('START', (QName('body'), Attrs()), (None, 1, 0)))
(None, ('TEXT', u'Some ', (None, 1, 6)))
('ENTER', ('START', (QName('em'), Attrs()), (None, 1, 11)))
('INSIDE', ('TEXT', u'test', (None, 1, 15)))
('EXIT', ('END', QName('em'), (None, 1, 19)))
(None, ('TEXT', u' text', (None, 1, 24)))
(None, ('END', QName('body'), (None, 1, 29)))
<body>Some <em>test</em> text</body>

Parameters:

path - an XPath expression (as string) or a Path instance

Returns: Transformer

the stream augmented by transformation marks

invert(self)

Invert selection so that marked events become unmarked, and vice versa.

Specificaly, all marks are converted to null marks, and all null marks are converted to OUTSIDE marks.

>>> html = HTML('<body>Some <em>test</em> text</body>')
>>> print(html | Transformer('//em').invert().trace())
('OUTSIDE', ('START', (QName('body'), Attrs()), (None, 1, 0)))
('OUTSIDE', ('TEXT', u'Some ', (None, 1, 6)))
(None, ('START', (QName('em'), Attrs()), (None, 1, 11)))
(None, ('TEXT', u'test', (None, 1, 15)))
(None, ('END', QName('em'), (None, 1, 19)))
('OUTSIDE', ('TEXT', u' text', (None, 1, 24)))
('OUTSIDE', ('END', QName('body'), (None, 1, 29)))
<body>Some <em>test</em> text</body>

Returns: Transformer

end(self)

End current selection, allowing all events to be selected.

Example:

>>> html = HTML('<body>Some <em>test</em> text</body>')
>>> print(html | Transformer('//em').end().trace())
('OUTSIDE', ('START', (QName('body'), Attrs()), (None, 1, 0)))
('OUTSIDE', ('TEXT', u'Some ', (None, 1, 6)))
('OUTSIDE', ('START', (QName('em'), Attrs()), (None, 1, 11)))
('OUTSIDE', ('TEXT', u'test', (None, 1, 15)))
('OUTSIDE', ('END', QName('em'), (None, 1, 19)))
('OUTSIDE', ('TEXT', u' text', (None, 1, 24)))
('OUTSIDE', ('END', QName('body'), (None, 1, 29)))
<body>Some <em>test</em> text</body>

Returns: Transformer: the stream augmented by transformation marks

empty(self)

Empty selected elements of all content.

Example:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>')
>>> print(html | Transformer('.//em').empty())
<html><head><title>Some Title</title></head><body>Some <em/>
text.</body></html>

Returns: Transformer

remove(self)

Remove selection from the stream.

Example:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>')
>>> print(html | Transformer('.//em').remove())
<html><head><title>Some Title</title></head><body>Some
text.</body></html>

Returns: Transformer

unwrap(self)

Remove outermost enclosing elements from selection.

Example:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>')
>>> print(html | Transformer('.//em').unwrap())
<html><head><title>Some Title</title></head><body>Some body
text.</body></html>

Returns: Transformer

wrap(self, element)

Wrap selection in an element.

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>')
>>> print(html | Transformer('.//em').wrap('strong'))
<html><head><title>Some Title</title></head><body>Some
<strong><em>body</em></strong> text.</body></html>

Parameters:

element - either a tag name (as string) or an Element object

Returns: Transformer

replace(self, content)

Replace selection with content.

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>')
>>> print(html | Transformer('.//title/text()').replace('New Title'))
<html><head><title>New Title</title></head><body>Some <em>body</em>
text.</body></html>

Parameters:

content - Either a callable, an iterable of events, or a string to insert.

Returns: Transformer

before(self, content)

Insert content before selection.

In this example we insert the word 'emphasised' before the <em> opening tag:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>')
>>> print(html | Transformer('.//em').before('emphasised '))
<html><head><title>Some Title</title></head><body>Some emphasised
<em>body</em> text.</body></html>

Parameters:

content - Either a callable, an iterable of events, or a string to insert.

Returns: Transformer

after(self, content)

Insert content after selection.

Here, we insert some text after the </em> closing tag:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>')
>>> print(html | Transformer('.//em').after(' rock'))
<html><head><title>Some Title</title></head><body>Some <em>body</em>
rock text.</body></html>

Parameters:

content - Either a callable, an iterable of events, or a string to insert.

Returns: Transformer

prepend(self, content)

Insert content after the ENTER event of the selection.

Inserting some new text at the start of the <body>:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>')
>>> print(html | Transformer('.//body').prepend('Some new body text. '))
<html><head><title>Some Title</title></head><body>Some new body text.
Some <em>body</em> text.</body></html>

Parameters:

content - Either a callable, an iterable of events, or a string to insert.

Returns: Transformer

append(self, content)

Insert content before the END event of the selection.

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>')
>>> print(html | Transformer('.//body').append(' Some new body text.'))
<html><head><title>Some Title</title></head><body>Some <em>body</em>
text. Some new body text.</body></html>

Parameters:

content - Either a callable, an iterable of events, or a string to insert.

Returns: Transformer

attr(self, name, value)

Add, replace or delete an attribute on selected elements.

If value evaulates to None the attribute will be deleted from the element:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em class="before">body</em> <em>text</em>.</body>'
...             '</html>')
>>> print(html | Transformer('body/em').attr('class', None))
<html><head><title>Some Title</title></head><body>Some <em>body</em>
<em>text</em>.</body></html>

Otherwise the attribute will be set to value:

>>> print(html | Transformer('body/em').attr('class', 'emphasis'))
<html><head><title>Some Title</title></head><body>Some <em
class="emphasis">body</em> <em class="emphasis">text</em>.</body></html>

If value is a callable it will be called with the attribute name and the START event for the matching element. Its return value will then be used to set the attribute:

>>> def print_attr(name, event):
...     attrs = event[1][1]
...     print(attrs)
...     return attrs.get(name)
>>> print(html | Transformer('body/em').attr('class', print_attr))
Attrs([(QName('class'), u'before')])
Attrs()
<html><head><title>Some Title</title></head><body>Some <em
class="before">body</em> <em>text</em>.</body></html>

Parameters:

name - the name of the attribute
value - the value that should be set for the attribute.

Returns: Transformer

copy(self, buffer, accumulate=False)

Copy selection into buffer.

The buffer is replaced by each contiguous selection before being passed to the next transformation. If accumulate=True, further selections will be appended to the buffer rather than replacing it.

>>> from genshi.builder import tag
>>> buffer = StreamBuffer()
>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>')
>>> print(html | Transformer('head/title/text()').copy(buffer)
...     .end().select('body').prepend(tag.h1(buffer)))
<html><head><title>Some Title</title></head><body><h1>Some
Title</h1>Some <em>body</em> text.</body></html>

This example illustrates that only a single contiguous selection will be buffered:

>>> print(html | Transformer('head/title/text()').copy(buffer)
...     .end().select('body/em').copy(buffer).end().select('body')
...     .prepend(tag.h1(buffer)))
<html><head><title>Some Title</title></head><body><h1>Some
Title</h1>Some <em>body</em> text.</body></html>
>>> print(buffer)
<em>body</em>

Element attributes can also be copied for later use:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body><em>Some</em> <em class="before">body</em>'
...             '<em>text</em>.</body></html>')
>>> buffer = StreamBuffer()
>>> def apply_attr(name, entry):
...     return list(buffer)[0][1][1].get('class')
>>> print(html | Transformer('body/em[@class]/@class').copy(buffer)
...     .end().buffer().select('body/em[not(@class)]')
...     .attr('class', apply_attr))
<html><head><title>Some Title</title></head><body><em
class="before">Some</em> <em class="before">body</em><em
class="before">text</em>.</body></html>

Parameters:

buffer - the StreamBuffer in which the selection should be stored

Returns: Transformer

Note: Copy (and cut) copy each individual selected object into the buffer before passing to the next transform. For example, the XPath *|text() will select all elements and text, each instance of which will be copied to the buffer individually before passing to the next transform. This has implications for how StreamBuffer objects can be used, so some experimentation may be required.

cut(self, buffer, accumulate=False)

Copy selection into buffer and remove the selection from the stream.

>>> from genshi.builder import tag
>>> buffer = StreamBuffer()
>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>')
>>> print(html | Transformer('.//em/text()').cut(buffer)
...     .end().select('.//em').after(tag.h1(buffer)))
<html><head><title>Some Title</title></head><body>Some
<em/><h1>body</h1> text.</body></html>

Specifying accumulate=True, appends all selected intervals onto the buffer. Combining this with the .buffer() operation allows us operate on all copied events rather than per-segment. See the documentation on buffer() for more information.

Parameters:

buffer - the StreamBuffer in which the selection should be stored

Returns: Transformer

Note: this transformation will buffer the entire input stream

buffer(self)

Buffer the entire stream (can consume a considerable amount of memory).

Useful in conjunction with copy(accumulate=True) and cut(accumulate=True) to ensure that all marked events in the entire stream are copied to the buffer before further transformations are applied.

For example, to move all <note> elements inside a <notes> tag at the top of the document:

>>> doc = HTML('<doc><notes></notes><body>Some <note>one</note> '
...            'text <note>two</note>.</body></doc>')
>>> buffer = StreamBuffer()
>>> print(doc | Transformer('body/note').cut(buffer, accumulate=True)
...     .end().buffer().select('notes').prepend(buffer))
<doc><notes><note>one</note><note>two</note></notes><body>Some  text
.</body></doc>

filter(self, filter)

Apply a normal stream filter to the selection. The filter is called once for each contiguous block of marked events.

>>> from genshi.filters.html import HTMLSanitizer
>>> html = HTML('<html><body>Some text<script>alert(document.cookie)'
...             '</script> and some more text</body></html>')
>>> print(html | Transformer('body/*').filter(HTMLSanitizer()))
<html><body>Some text and some more text</body></html>

Parameters:

filter - The stream filter to apply.

Returns: Transformer

map(self, function, kind)

Applies a function to the data element of events of kind in the selection.

>>> html = HTML('<html><head><title>Some Title</title></head>'
...               '<body>Some <em>body</em> text.</body></html>')
>>> print(html | Transformer('head/title').map(unicode.upper, TEXT))
<html><head><title>SOME TITLE</title></head><body>Some <em>body</em>
text.</body></html>

Parameters:

function - the function to apply
kind - the kind of event the function should be applied to

Returns: Transformer

substitute(self, pattern, replace, count=1)

Replace text matching a regular expression.

Refer to the documentation for re.sub() for details.

>>> html = HTML('<html><body>Some text, some more text and '
...             '<b>some bold text</b>\n'
...             '<i>some italicised text</i></body></html>')
>>> print(html | Transformer('body/b').substitute('(?i)some', 'SOME'))
<html><body>Some text, some more text and <b>SOME bold text</b>
<i>some italicised text</i></body></html>
>>> tags = tag.html(tag.body('Some text, some more text and\n',
...      Markup('<b>some bold text</b>')))
>>> print(tags.generate() | Transformer('body').substitute(
...     '(?i)some', 'SOME'))
<html><body>SOME text, some more text and
<b>SOME bold text</b></body></html>

Parameters:

pattern - A regular expression object or string.
replace - Replacement pattern.
count - Number of replacements to make in each text fragment.

Returns: Transformer

trace(self, prefix=`''`, fileobj=None)

Print events as they pass through the transform.

>>> html = HTML('<body>Some <em>test</em> text</body>')
>>> print(html | Transformer('em').trace())
(None, ('START', (QName('body'), Attrs()), (None, 1, 0)))
(None, ('TEXT', u'Some ', (None, 1, 6)))
('ENTER', ('START', (QName('em'), Attrs()), (None, 1, 11)))
('INSIDE', ('TEXT', u'test', (None, 1, 15)))
('EXIT', ('END', QName('em'), (None, 1, 19)))
(None, ('TEXT', u' text', (None, 1, 24)))
(None, ('END', QName('body'), (None, 1, 29)))
<body>Some <em>test</em> text</body>

Parameters:

prefix - a string to prefix each event with in the output
fileobj - the writable file-like object to write to; defaults to the standard output stream

Returns: Transformer

Class Transformer

__init__(self, path='.') (Constructor)

__call__(self, stream, keep_marks=False) (Call operator)

apply(self, function)

select(self, path)

invert(self)

end(self)

empty(self)

remove(self)

unwrap(self)

wrap(self, element)

replace(self, content)

before(self, content)

after(self, content)

prepend(self, content)

append(self, content)

attr(self, name, value)

copy(self, buffer, accumulate=False)

cut(self, buffer, accumulate=False)

buffer(self)

filter(self, filter)

map(self, function, kind)

substitute(self, pattern, replace, count=1)

trace(self, prefix='', fileobj=None)

init(self, path=`'.'`)
(Constructor)

call(self, stream, keep_marks=False)
(Call operator)

trace(self, prefix=`''`, fileobj=None)