Aggregation Browsing Backends

Backends for browsing aggregates of various data sources

SQL

SQL backend uses SQLAlchemy for generating queries. It supports all databases that the SQLAlchemy supports such as:

  • Drizzle
  • Firebird
  • Informix
  • Microsoft SQL Server
  • MySQL
  • Oracle
  • PostgreSQL
  • SQLite
  • Sybase

Browser

class cubes.backends.sql.browser.SnowflakeBrowser(cube, store, locale=None, debug=False, **options)

SnowflakeBrowser is a SQL-based AggregationBrowser implementation that can aggregate star and snowflake schemas without need of having explicit view or physical denormalized table.

Attributes:

  • cube - browsed cube
  • locale - locale used for browsing
  • metadata - SQLAlchemy MetaData object
  • debug - output SQL to the logger at INFO level
  • options - passed to the mapper and context (see their respective documentation)

Tuning:

  • include_summary - it True then summary is included in aggregation result. Turned on by default.
  • include_cell_count – if True then total cell count is included in aggregation result. Turned on by default. performance reasons

Limitations:

  • only one locale can be used for browsing at a time
  • locale is implemented as denormalized: one column for each language
builtin_function(name, aggregate)

Returns a built-in function for aggregate

execute_statement(statement, label=None)

Execute the statement, optionally log it. Returns the result cursor.

fact(key_value, fields=None)

Get a single fact with key key_value from cube.

Number of SQL queries: 1.

facts(cell=None, fields=None, order=None, page=None, page_size=None)

Return all facts from cell, might be ordered and paginated.

Number of SQL queries: 1.

features()

Return SQL features. Currently they are all the same for every cube, however in the future they might depend on the SQL engine or other factors.

path_details(dimension, path, hierarchy=None)

Returns details for path in dimension. Can be used for multi-dimensional “breadcrumbs” in a used interface.

Number of SQL queries: 1.

provide_aggregate(cell, aggregates, drilldown, split, order, page, page_size, **options)

Return aggregated result.

Arguments:

  • cell: cell to be aggregated
  • measures: aggregates of these measures will be considered
  • aggregates: aggregates to be considered
  • drilldown: list of dimensions or list of tuples: (dimension, hierarchy, level)
  • split: an optional cell that becomes an extra drilldown segmenting the data into those within split cell and those not within
  • attributes: list of attributes from drilled-down dimensions to be returned in the result

Query tuning:

  • include_cell_count: if True (True is default) then result.total_cell_count is computed as well, otherwise it will be None.
  • include_summary: if True (default) then summary is computed, otherwise it will be None

Result is paginated by page_size and ordered by order.

Number of database queries:

  • without drill-down: 1 – summary
  • with drill-down (default): 3 – summary, drilldown, total drill-down record count

Notes:

  • measures can be only in the fact table
provide_members(cell, dimension, depth=None, hierarchy=None, levels=None, attributes=None, page=None, page_size=None, order=None)

Return values for dimension with level depth depth. If depth is None, all levels are returned.

Number of database queries: 1.

set_locale(locale)

Change the browser’s locale

test(aggregate=False, **options)

Tests whether the statement can be constructed.

validate()

Validate physical representation of model. Returns a list of dictionaries with keys: type, issue, object.

Types might be: join or attribute.

The join issues are:

  • no_table - there is no table for join
  • duplicity - either table or alias is specified more than once

The attribute issues are:

  • no_table - there is no table for attribute
  • no_column - there is no column for attribute
  • duplicity - attribute is found more than once
class cubes.backends.sql.query.QueryBuilder(browser)

Creates a SQL query statement builder object – a controller-like object that incrementally constructs the statement.

Result attributes:

  • statement – SQL query statement
  • labels – logical labels for the statement selection
aggregate_expression(aggregate, coalesce_measure=False)

Returns an expression that performs the aggregation of measure aggregate. The result’s label is the aggregate’s name. aggregate has to be MeasureAggregate instance.

If aggregate function is post-aggregation calculation, then None is returned.

Aggregation function names are case in-sensitive.

If coalesce_measure is True then selected measure column is wrapped in COALESCE(column, 0).

aggregation_statement(cell, drilldown=None, aggregates=None, split=None, attributes=None, summary_only=False)

Builds a statement to aggregate the cell.

  • cellCell to aggregate
  • drilldown – a Drilldown object
  • aggregates – list of aggregates to consider
  • split – split cell for split condition
  • summary_only – do not perform GROUP BY for the drilldown. The
  • drilldown is used only for choosing tables to join and affects outer detail joins in the result

Algorithm description:

All the tables have one of the two relationship to the fact: master/match or detail. Every table connected to a table that has “detail” relationship is considered also in the “detail” relationship towards the fact. Therefore we have two join zones: all master or detail tables from the core, directly connected to the fact table and rest of the table connected to the core through outer detail relationship.

Depending on the query it is decided whether we are fine with just joining everything together into single join or we need to separate the fact master core from the outer details:

            +------+           +-----+
            | fact |--(match)--| dim +
            +------+           +-----+
Master Fact    |
===============|========================
Outer Details  |               +-----+
               +------(detail)-| dim |
                               +-----+

The outer details part is RIGHT OUTER JOINed to the fact. Since there are no tables any more, the original table keys for joins to the outer details were exposed and specially labeled as __masterkeyXX where XX is a sequence number of the key. The join_expression JOIN constructing method receives the map of the keys and replaces the original tables with connections to the columns already selected in the master fact.

Note

Limitation: we can not have a Cut (condition) where keys (path elements) are from both join zones. Whole cut should be within one zone: either the master fact or outer details.

append_condition(condition)

Appends condition to the generated statement.

attributes_for_cell(cell)

Returns a set of attributes included in the cell.

attributes_for_cell_cuts(cell)

Returns a list of tuples (cut, attributes) where attributes is list of attributes involved in the cut.

builtin_aggregate_expressions(aggregates, coalesce_measures=False)

Returns list of expressions for aggregates from aggregates that are computed using the SQL statement.

column(attribute, locale=None)

Returns either a physical column for the attribute or a reference to a column from the master fact if it exists.

condition_for_cell(cell)

Returns a SQL condition for the cell.

condition_for_point(dim, path, hierarchy=None, invert=False)

Returns a Condition tuple (attributes, conditions, group_by) dimension dim point at path. It is a compound condition - one equality condition for each path element in form: level[i].key = path[i]

conditions_for_cuts(cuts)

Constructs conditions for all cuts in the cell. Returns a list of SQL conditional expressions.

denormalized_statement(cell=None, attributes=None, expand_locales=False, include_fact_key=True)

Builds a statement for denormalized view. whereclause is same as SQLAlchemy whereclause for sqlalchemy.sql.expression.select(). attributes is list of logical references to attributes to be selected. If it is None then all attributes are used. condition_attributes contains list of attributes that are not going to be selected, but are required for WHERE condition.

Set expand_locales to True to expand all localized attributes.

fact(id_)

Selects only fact with given id

fact_key_column()

Returns a column that represents the fact key.

measures_for_aggregates(aggregates)

Returns a list of measures for aggregates. This method is used in constructing the master fact.

members_statement(cell, attributes=None)

Prepares dimension members statement.

order(order)

Returns a SQL statement which is ordered according to the order. If the statement contains attributes that have natural order specified, then the natural order is used, if not overriden in the order.

order sohuld be prepared using AggregationBrowser.prepare_order().

dimension_levels is list of considered dimension levels in form of tuples (dimension, hierarchy, levels). For each level it’s sort key is used.

paginate(page, page_size)

Returns paginated statement if page is provided, otherwise returns the same statement.

range_condition(dim, hierarchy, from_path, to_path, invert=False)

Return a condition for a hierarchical range (from_path, to_path). Return value is a Condition tuple.

semiadditive_attributes(aggregates, drilldown)

Returns an attribute from a semi-additive dimension, if defined for the cube. Cubes allows one semi-additive dimension.

class cubes.backends.sql.mapper.SnowflakeMapper(cube, mappings=None, locale=None, schema=None, fact_name=None, dimension_prefix=None, dimension_suffix=None, joins=None, dimension_schema=None, **options)

A snowflake schema mapper for a cube. The mapper creates required joins, resolves table names and maps logical references to tables and respective columns.

Attributes:

  • cube - mapped cube
  • mappings – dictionary containing mappings
  • simplify_dimension_references – references for flat dimensions (with one level and no details) will be just dimension names, no attribute name. Might be useful when using single-table schema, for example, with couple of one-column dimensions.
  • dimension_prefix – default prefix of dimension tables, if default table name is used in physical reference construction
  • dimension_suffix – default suffix of dimension tables, if default table name is used in physical reference construction
  • fact_name – fact name, if not specified then cube.name is used
  • schema – default database schema
  • dimension_schema – schema whre dimension tables are stored (if different than fact table schema)

mappings is a dictionary where keys are logical attribute references and values are table column references. The keys are mostly in the form:

  • attribute for measures and fact details
  • attribute.locale for localized fact details
  • dimension.attribute for dimension attributes
  • dimension.attribute.locale for localized dimension attributes

The values might be specified as strings in the form table.column (covering most of the cases) or as a dictionary with keys schema, table and column for more customized references.

physical(attribute, locale=None)

Returns physical reference as tuple for attribute, which should be an instance of cubes.model.Attribute. If there is no dimension specified in attribute, then fact table is assumed. The returned tuple has structure: (schema, table, column).

The algorithm to find physicl reference is as follows:

IF localization is requested:
    IF is attribute is localizable:
        IF requested locale is one of attribute locales
            USE requested locale
        ELSE
            USE default attribute locale
    ELSE
        do not localize

IF mappings exist:
    GET string for logical reference
    IF locale:
        append '.' and locale to the logical reference

    IF mapping value exists for localized logical reference
        USE value as reference

IF no mappings OR no mapping was found:
    column name is attribute name

    IF locale:
        append '_' and locale to the column name

    IF dimension specified:
        # Example: 'date.year' -> 'date.year'
        table name is dimension name

        IF there is dimension table prefix
            use the prefix for table name

    ELSE (if no dimension is specified):
        # Example: 'date' -> 'fact.date'
        table name is fact table name
physical_references(attributes, expand_locales=False)

Convert attributes to physical attributes. If expand_locales is True then physical reference for every attribute locale is returned.

relevant_joins(attributes, expand_locales=False)

Get relevant joins to the attributes - list of joins that are required to be able to acces specified attributes. attributes is a list of three element tuples: (schema, table, attribute).

table_map()

Return list of references to all tables. Keys are aliased tables: (schema, aliased_table_name) and values are real tables: (schema, table_name). Included is the fact table and all tables mentioned in joins.

To get list of all physical tables where aliased tablesare included only once:

finder = JoinFinder(cube, joins, fact_name)
tables = set(finder.table_map().keys())
tables_for_attributes(attributes, expand_locales=False)

Returns a list of tables – tuples (schema, table) that contain attributes.

Slicer

class cubes.backends.slicer.SlicerBrowser(cube, store, locale=None, **options)

Browser for another Slicer server.

Mixpanel

class cubes.backends.mixpanel.browser.MixpanelBrowser(cube, store, locale=None, **options)

Creates a Mixpanel aggregation browser.

Requirements and limitations:

  • time dimension should always be present in the drilldown
  • only one other dimension is allowd for drilldown
  • range cuts assume numeric dimensions
  • unable to drill-down on year level, will default to month
features()

Return SQL features. Currently they are all the same for every cube, however in the future they might depend on the SQL engine or other factors.

Mongo DB

Table Of Contents

Previous topic

Formatters Reference

Next topic

HTTP WSGI OLAP Server Reference

This Page