Package spynner :: Module browser :: Class Browser
[frames] | no frames]

Class Browser

Stateful programmatic web browser class based upon QtWebKit.

>>> browser = Browser()
>>> browser.load("http://www.wordreference.com")
>>> browser.runjs("console.log('I can run Javascript!')")
>>> browser.runjs("_jQuery('div').css('border', 'solid red')") # and jQuery!
>>> browser.select("#esen")
>>> browser.fill("input[name=enit]", "hola")
>>> browser.click("input[name=b]", wait_load=True)
>>> print browser.url, len(browser.html)
>>> browser.close()
Instance Methods
 
__init__(self, qappargs=None, debug_level=None)
Init a Browser instance.
    Basic interaction with browser
 
load(self, url)
Load a web page and return status (a boolean).
 
click(self, selector, wait_load=False, wait_requests=None, timeout=None)
Click any clickable element in page.
 
click_link(self, selector, timeout=None)
Click a link and wait for the page to load.
 
click_ajax(self, selector, wait_requests=1, timeout=None)
Click a AJAX link and wait for the request to finish.
 
wait_load(self, timeout=None)
Wait until the page is loaded.
 
wait(self, waittime)
Wait some time.
 
close(self)
Close Browser instance and release resources.
    Webview
 
create_webview(self, show=False)
Create a QWebView object and insert current QWebPage.
 
destroy_webview(self)
Destroy current QWebView.
 
show(self)
Show webview browser.
 
hide(self)
Hide webview browser.
 
browse(self)
Let the user browse the current page (infinite loop).
    Form manipulation
 
fill(self, selector, value)
Fill an input text with a string value using a jQuery selector.
 
check(self, selector)
Check an input checkbox using a jQuery selector.
 
uncheck(self, selector)
Uncheck input checkbox using a jQuery selector
 
choose(self, selector)
Choose a radio input using a jQuery selector.
 
select(self, selector)
Choose a option in a select using a jQuery selector.
 
submit(self, selector, timeout=None)
Click a link and wait for the page to load.
    Javascript
 
runjs(self, jscode, debug=True)
Inject Javascript code into the current context of page.
 
set_javascript_confirm_callback(self, callback)
Set function callback for Javascript confirm pop-ups.
 
set_javascript_prompt_callback(self, callback)
Set function callback for Javascript prompt.
    Cookies
 
get_cookies(self)
Return string containing the current cookies in Mozilla format.
 
set_cookies(self, string_cookies)
Set cookies from a string with Mozilla-format cookies.
    Download files
 
download(self, url, outfd=None)
Download a given URL using current cookies.
    HTML and tag soup parsing
 
set_html_parser(self, parser)
Set HTML parser used to generate the HTML soup.
 
html_contains(self, regexp)
Return True if current HTML contains a given regular expression.
    HTTP Authentication
 
set_http_authentication_callback(self, callback)
Set HTTP authentication request callback.
    Miscellaneous
 
snapshot(self, box=None, format=5)
Take an image snapshot of the current frame.
 
get_url_from_path(self, path)
Return the URL for a given path using the current URL as base.
 
set_url_filter(self, url_filter)
Set function callback to filter URL.
Instance Variables
  ignore_ssl_errors = True
If True, ignore SSL certificate errors.
  user_agent = None
User agent for requests (see QWebPage::userAgentForUrl for details)
  jslib = '_jQuery'
Library name for jQuery library injected by default to pages.
  download_directory = '.'
Directory where downloaded files will be stored.
  debug_stream = sys.stderr
File-like stream where debug output will be written.
  debug_level = 0
Debug verbose level (ERROR, WARNING, INFO or DEBUG).
  event_looptime = 0.01
Event loop dispatcher loop delay (seconds).
  application
PyQt4.QtGui.Qapplication object.
  webpage
PyQt4.QtWebKit.QWebPage object.
  webframe
PyQt4.QtWebKit.QWebFrame main webframe object.
  webview
PyQt4.QtWebKit.QWebView object.
  manager
PyQt4.QtNetwork.QTNetworkAccessManager object.
  cookiesjar
PyQt4.QtNetwork.QNetworkCookieJar object.
Properties
  url
Current URL.
  html
Rendered HTML in current page.
  soup
HTML soup (see set_html_parser).
Method Details

__init__(self, qappargs=None, debug_level=None)
(Constructor)

 

Init a Browser instance.

Parameters:
  • qappargs - Arguments for QApplication constructor.
  • debug_level - Debug level logging (ERROR by default)

click(self, selector, wait_load=False, wait_requests=None, timeout=None)

 

Click any clickable element in page.

@param selector: jQuery selector.
@param wait_load: If True, it will wait until a new page is loaded.
@param timeout: Seconds to wait for the page to load before 
                               raising an exception.
@param wait_requests: How many requests to wait before returning. Useful
                      for AJAX requests.

By default this method will not wait for a page to load. 
If you are clicking a link or submit button, you must call this
method with C{wait_load=True} or, alternatively, call 
L{wait_load} afterwards. However, the recommended way it to use 
L{click_link}.
                
When a non-HTML file is clicked this method will download it. The 
file is automatically saved keeping the original structure (as 
wget --recursive does). For example, a file with URL 
I{http://server.org/dir1/dir2/file.ext} will be saved to  
L{download_directory}/I{server.org/dir1/dir2/file.ext}.                 

wait_load(self, timeout=None)

 

Wait until the page is loaded.

Parameters:
  • timeout - Time to wait (seconds) for the page load to complete.
Returns:
Boolean state
Raises:

wait(self, waittime)

 

Wait some time.

Parameters:
  • waittime - Time to wait (seconds).

    This is an active wait, the events loop will be run, so it may be useful to wait for synchronous Javascript events that change the DOM.

runjs(self, jscode, debug=True)

 

Inject Javascript code into the current context of page.

Parameters:
  • jscode - Javascript code to injected.
  • debug - Set to False to disable debug output for this injection.

    You can call Jquery even if the original page does not include it as Spynner injects the library for every loaded page. You must use _jQuery(...) instead of of jQuery or the common {$(...)} shortcut.

Note: You can change the _jQuery alias (see jslib).

set_javascript_confirm_callback(self, callback)

 

Set function callback for Javascript confirm pop-ups.

By default Javascript confirmations are not answered. If the webpage you are working pops Javascript confirmations, be sure to set a callback for them.

Calback signature: javascript_confirm_callback(url, message)

  • url: Url where the popup was launched.
  • param message: String message.

The callback should return a boolean (True meaning 'yes', False meaning 'no')

set_javascript_prompt_callback(self, callback)

 

Set function callback for Javascript prompt.

By default Javascript prompts are not answered. If the webpage you are working pops Javascript prompts, be sure to set a callback for them.

Callback signature: javascript_prompt_callback(url, message, defaultvalue)

  • url: Url where the popup prompt was launched.
  • message: String message.
  • defaultvalue: Default value for prompt answer

The callback should return a string with the answer or None to cancel the prompt.

download(self, url, outfd=None)

 

Download a given URL using current cookies.

Parameters:
  • url - URL or path to download
  • outfd - Output file-like stream. If None, return data string.
Returns:
Bytes downloaded (None if something went wrong)

Note: If url is a path, the current base URL will be pre-appended.

set_html_parser(self, parser)

 

Set HTML parser used to generate the HTML soup.

Parameters:
  • parser - Callback called to generate the soup.

    When a HTML parser is set for a Browser, the property soup returns the parsed HTML.

set_http_authentication_callback(self, callback)

 

Set HTTP authentication request callback.

The callback must have this signature:

http_authentication_callback(url, realm):

  • url: URL where the requested was made.
  • realm: Realm requiring authentication.

The callback should return a pair of string containing (user, password) or None if you don't want to answer.

snapshot(self, box=None, format=5)

 

Take an image snapshot of the current frame.

Parameters:
  • box - 4-element tuple containing box to capture (x1, y1, x2, y2). If None, capture the whole page.
  • format - QImage format (see QImage::Format_*).
Returns:
A QImage image.

Typical usage:

>>> browser.load(url)
>>> browser.snapshot().save("webpage.png")

set_url_filter(self, url_filter)

 

Set function callback to filter URL.

By default all requested elements of a page are loaded. That includes stylesheets, images and many other elements that you may not need at all. Use this method to define the callback that will be called every time a new request is made. The callback must have this signature:

my_url_filter(operation, url):

  • operation: string with HTTP operation: get, head, post or put.
  • url: requested item URL.

It should return True (proceed) or False (reject).


Property Details

url

Current URL.

Get Method:
_get_url(self)

html

Rendered HTML in current page.

Get Method:
_get_html(self)

soup

HTML soup (see set_html_parser).

Get Method:
_get_soup(self)