haralyzer package

Submodules

haralyzer.assets module

Provides all of the main functional classes for analyzing HAR files

class haralyzer.assets.HarPage(page_id, har_parser=None, har_data=None)[source]

Bases: object

An object representing one page of a HAR resource

actual_page[source]

Returns the first entry object that does not have a redirect status, indicating that it is the actual page we care about (after redirects).

audio_files[source]

Returns a list of all HTML elements, each of which is an entry object.

audio_load_time[source]

Returns the browser load time for all audio files.

content_load_time[source]

Returns the full load time (in milliseconds) of the page itself

css_files[source]

Returns a list of css files, each of which is an ‘entry’ data object.

css_load_time[source]

Returns the browser load time for all CSS files.

entries[source]
filter_entries(request_type=None, content_type=None, status_code=None, regex=True)[source]

Returns a list of entry objects based on the filter criteria.

Parameters:
  • request_typestr of request type (i.e. - GET or POST)
  • content_typestr of regex to use for finding content type
  • status_codeint of the desired status code
  • regexbool indicating whether to use regex or exact match.
get_load_time(request_type=None, content_type=None, status_code=None, async=True)[source]

This method can return the TOTAL load time for the assets or the ACTUAL load time, the difference being that the actual load time takes asyncronys transactions into account. So, if you want the total load time, set async=False.

EXAMPLE:

I want to know the load time for images on a page that has two images, each of which took 2 seconds to download, but the browser downloaded them at the same time.

self.get_load_time(content_types=[‘image’]) (returns 2) self.get_load_time(content_types=[‘image’], async=False) (returns 4)

get_requests[source]

Returns a list of GET requests, each of which is an ‘entry’ data object

get_total_size(entries)[source]

Returns the total size of a collection of entries.

Parameters:entrieslist of entries to calculate the total size of.
html_load_time[source]

Returns the browser load time for all html files.

image_files[source]

Returns a list of images, each of which is an ‘entry’ data object.

image_load_time[source]

Returns the browser load time for all images.

initial_load_time[source]

Load time for the first non-redirect page

js_files[source]

Returns a list of javascript files, each of which is an ‘entry’ data object.

js_load_time[source]

Returns the browser load time for all javascript files.

misc_files[source]

We need to put misc files somewhere....

page_size[source]

Returns the size (in bytes) of the first non-redirect page

post_requests[source]

Returns a list of POST requests, each of which is an ‘entry’ data object

text_files[source]

Returns a list of all text elements, each of which is an entry object.

total_audio_load_time[source]

Returns the total load time for all audio files.

total_css_load_time[source]

Returns the total load time for all CSS files.

total_css_size[source]

Total size of all css files as transferred via HTTP

total_html_load_time[source]

Returns the total load time for all html files.

total_image_load_time[source]

Returns the total load time for all images.

total_image_size[source]

total size of all image files as transferred via HTTP

total_js_load_time[source]

Returns the total load time for all javascript files.

total_js_size[source]

Total size of all javascript files as transferred via HTTP

total_load_time[source]

Returns the full load time (in ms) of all assets on the page

total_page_size[source]

Returns the total page size (in bytes) including all assets

total_text_size[source]

Total size of all images as transferred via HTTP

total_video_load_time[source]

Returns the total load time for all video files.

video_files[source]

Returns a list of all HTML elements, each of which is an entry object.

video_load_time[source]

Returns the browser load time for all video files.

class haralyzer.assets.HarParser(har_data=None)[source]

Bases: object

A Basic HAR parser that also adds helpful stuff for analyzing the performance of a web page.

browser[source]
create_asset_timeline(asset_list)[source]

Returns a dict of the timeline for the requested assets. The key is a datetime object (down to the millisecond) of ANY time where at least one of the requested assets was loaded. The value is a list of ALL assets that were loading at that time.

Parameters:asset_listlist of the assets to create a timeline for.
creator[source]
match_headers(entry, header_type, header, value, regex=True)[source]

Function to match headers.

Since the output of headers might use different case, like:

‘content-type’ vs ‘Content-Type’

This function is case-insensitive

Parameters:
  • entry – entry object
  • header_type

    str of header type. Valid values:

    • ‘request’
    • ‘response’
  • headerstr of the header to search for
  • valuestr of value to search for
  • regexbool indicating whether to use regex or exact match
Returns:

a bool indicating whether a match was found

match_request_type(entry, request_type, regex=True)[source]

Helper function that returns entries with a request type matching the given request_type argument.

Parameters:
  • entry – entry object to analyze
  • request_typestr of request type to match
  • regexbool indicating whether to use a regex or string match
match_status_code(entry, status_code, regex=True)[source]

Helper function that returns entries with a status code matching then given status_code argument.

NOTE: This is doing a STRING comparison NOT NUMERICAL

Parameters:
  • entry – entry object to analyze
  • status_codestr of status code to search for
  • request_typeregex of request type to match
pages[source]

This is a list of HarPage objects, each of which represents a page from the HAR file.

version[source]

Module contents

Module for analyzing web pages using HAR files

Table Of Contents

This Page