database

This module defines the data schema.

There are 8 major table used in this app. Let’s go through them one by one.

database schema settings:

(
    engine,
    t_station, 
    t_weather_raw, 
    t_weather,
    t_object, 
    t_object_data_raw,
    t_object_data,
    t_feature_raw, 
    t_feature,
)

Work flow:

  1. station metadata
  2. object metadata
  3. weather raw => weather data
  4. object raw => object data
  5. weather data + object data => feature raw
  6. feature raw => feature
  7. feature => machine learning model ...
wechinelearn.database.t_station = Table('station', MetaData(bind=None), Column('id', String(), table=<station>, primary_key=True, nullable=False), Column('lat', Float(), table=<station>), Column('lng', Float(), table=<station>), schema=None)

A weather station must have id, lat, lng; Any observed weather data is associated with a station.

wechinelearn.database.t_weather_raw = Table('weather_raw', MetaData(bind=None), Column('station_id', String(), ForeignKey('station.id'), table=<weather_raw>, primary_key=True, nullable=False), Column('time', DateTime(), table=<weather_raw>, primary_key=True, nullable=False), Column('label', Integer(), table=<weather_raw>, primary_key=True, nullable=False), Column('value', Float(), table=<weather_raw>), schema=None)

Weather raw data table, data could be non-interpolated, sparse and arbitrary many data points. For example, no matter how many data points we have, outdoorTemp, solarPower, windSpeed, humidity, ..., etc, we put them here.

wechinelearn.database.t_weather = Table('weather', MetaData(bind=None), Column('station_id', String(), ForeignKey('station.id'), table=<weather>, primary_key=True, nullable=False), Column('time', DateTime(), table=<weather>, primary_key=True, nullable=False), Column('outdoorTemp', Float(), table=<weather>), Column('outdoorTempReliab', Boolean(), table=<weather>), schema=None)

This is processed weather data. All kinds of observation at same time will be put here. We put interpolated, processed here. Time axis has to be continues. For those moment doesn’t have valid data, we fill in with Null value.

wechinelearn.database.t_object = Table('object', MetaData(bind=None), Column('id', String(), table=<object>, primary_key=True, nullable=False), Column('lat', Float(), table=<object>), Column('lng', Float(), table=<object>), schema=None)

Your analysis target. Must have lat, lng info. wechinelearn use this to local couple of nearby stations and automatically find the best weather data.

wechinelearn.database.and_(*clauses)

Produce a conjunction of expressions joined by AND.

E.g.:

from sqlalchemy import and_

stmt = select([users_table]).where(
                and_(
                    users_table.c.name == 'wendy',
                    users_table.c.enrolled == True
                )
            )

The and_() conjunction is also available using the Python & operator (though note that compound expressions need to be parenthesized in order to function with Python operator precedence behavior):

stmt = select([users_table]).where(
                (users_table.c.name == 'wendy') &
                (users_table.c.enrolled == True)
            )

The and_() operation is also implicit in some cases; the Select.where() method for example can be invoked multiple times against a statement, which will have the effect of each clause being combined using and_():

stmt = select([users_table]).\
            where(users_table.c.name == 'wendy').\
            where(users_table.c.enrolled == True)

See also

or_()

wechinelearn.database.distinct(expr)

Produce an column-expression-level unary DISTINCT clause.

This applies the DISTINCT keyword to an individual column expression, and is typically contained within an aggregate function, as in:

from sqlalchemy import distinct, func
stmt = select([func.count(distinct(users_table.c.name))])

The above would produce an expression resembling:

SELECT COUNT(DISTINCT name) FROM user

The distinct() function is also available as a column-level method, e.g. ColumnElement.distinct(), as in:

stmt = select([func.count(users_table.c.name.distinct())])

The distinct() operator is different from the Select.distinct() method of Select, which produces a SELECT statement with DISTINCT applied to the result set as a whole, e.g. a SELECT DISTINCT expression. See that method for further information.

See also

ColumnElement.distinct()

Select.distinct()

func

wechinelearn.database.select(columns=None, whereclause=None, from_obj=None, distinct=False, having=None, correlate=True, prefixes=None, **kwargs)

Construct a new Select.

Similar functionality is also available via the FromClause.select() method on any FromClause.

All arguments which accept ClauseElement arguments also accept string arguments, which will be converted as appropriate into either text() or literal_column() constructs.

See also

coretutorial_selecting - Core Tutorial description of select().

Parameters:
  • columns

    A list of ClauseElement objects, typically ColumnElement objects or subclasses, which will form the columns clause of the resulting statement. For all members which are instances of Selectable, the individual ColumnElement members of the Selectable will be added individually to the columns clause. For example, specifying a Table instance will result in all the contained Column objects within to be added to the columns clause.

    This argument is not present on the form of select() available on Table.

  • whereclause – A ClauseElement expression which will be used to form the WHERE clause.
  • from_obj – A list of ClauseElement objects which will be added to the FROM clause of the resulting statement. Note that “from” objects are automatically located within the columns and whereclause ClauseElements. Use this parameter to explicitly specify “from” objects which are not automatically locatable. This could include Table objects that aren’t otherwise present, or Join objects whose presence will supercede that of the Table objects already located in the other clauses.
  • autocommit – Deprecated. Use .execution_options(autocommit=<True|False>) to set the autocommit option.
  • bind=None – an Engine or Connection instance to which the resulting Select object will be bound. The Select object will otherwise automatically bind to whatever Connectable instances can be located within its contained ClauseElement members.
  • correlate=True – indicates that this Select object should have its contained FromClause elements “correlated” to an enclosing Select object. This means that any ClauseElement instance within the “froms” collection of this Select which is also present in the “froms” collection of an enclosing select will not be rendered in the FROM clause of this select statement.
  • distinct=False

    when True, applies a DISTINCT qualifier to the columns clause of the resulting statement.

    The boolean argument may also be a column expression or list of column expressions - this is a special calling form which is understood by the Postgresql dialect to render the DISTINCT ON (<columns>) syntax.

    distinct is also available via the distinct() generative method.

  • for_update=False
    when True, applies FOR UPDATE to the end of the resulting statement.

    Deprecated since version 0.9.0: - use GenerativeSelect.with_for_update() to specify the structure of the FOR UPDATE clause.

    for_update accepts various string values interpreted by specific backends, including:

    • "read" - on MySQL, translates to LOCK IN SHARE MODE; on Postgresql, translates to FOR SHARE.
    • "nowait" - on Postgresql and Oracle, translates to FOR UPDATE NOWAIT.
    • "read_nowait" - on Postgresql, translates to FOR SHARE NOWAIT.

    See also

    GenerativeSelect.with_for_update() - improved API for specifying the FOR UPDATE clause.

  • group_by – a list of ClauseElement objects which will comprise the GROUP BY clause of the resulting select.
  • having – a ClauseElement that will comprise the HAVING clause of the resulting select when GROUP BY is used.
  • limit=None – a numerical value which usually compiles to a LIMIT expression in the resulting select. Databases that don’t support LIMIT will attempt to provide similar functionality.
  • offset=None – a numeric value which usually compiles to an OFFSET expression in the resulting select. Databases that don’t support OFFSET will attempt to provide similar functionality.
  • order_by – a scalar or list of ClauseElement objects which will comprise the ORDER BY clause of the resulting select.
  • use_labels=False

    when True, the statement will be generated using labels for each column in the columns clause, which qualify each column with its parent table’s (or aliases) name so that name conflicts between columns in different tables don’t occur. The format of the label is <tablename>_<column>. The “c” collection of the resulting Select object will use these names as well for targeting column members.

    use_labels is also available via the apply_labels() generative method.

wechinelearn.database.t_object_data_raw = Table('object_data_raw', MetaData(bind=None), Column('object_id', String(), ForeignKey('object.id'), table=<object_data_raw>, primary_key=True, nullable=False), Column('time', DateTime(), table=<object_data_raw>, primary_key=True, nullable=False), Column('label', Integer(), table=<object_data_raw>, primary_key=True, nullable=False), Column('value', Float(), table=<object_data_raw>), schema=None)

Similar to weather raw data, but it’s about your target.

wechinelearn.database.t_object_data = Table('object_data', MetaData(bind=None), Column('object_id', String(), ForeignKey('object.id'), table=<object_data>, primary_key=True, nullable=False), Column('time', DateTime(), table=<object_data>, primary_key=True, nullable=False), Column('load', Float(), table=<object_data>), Column('loadReliab', Boolean(), table=<object_data>), schema=None)

Similar to weather data, it’s interpolated, processed data.

wechinelearn.database.t_feature_raw = Table('feature_raw', MetaData(bind=None), Column('object_id', String(), ForeignKey('object.id'), table=<feature_raw>, primary_key=True, nullable=False), Column('time', DateTime(), table=<feature_raw>, primary_key=True, nullable=False), Column('dayseconds', Float(), table=<feature_raw>), Column('is_weekend', Boolean(), table=<feature_raw>), Column('outdoorTemp', Float(), table=<feature_raw>), Column('load', Float(), table=<feature_raw>), schema=None)

This table is a result of merging weather and object data on the time axis. This table only have data points you observed.

wechinelearn.database.t_feature = Table('feature', MetaData(bind=None), Column('object_id', String(), ForeignKey('object.id'), table=<feature>, primary_key=True, nullable=False), Column('time', DateTime(), table=<feature>, primary_key=True, nullable=False), Column('dayseconds', Float(), table=<feature>), Column('is_weekend', Boolean(), table=<feature>), Column('outdoorTemp', Float(), table=<feature>), Column('outdoorTemp_1hourDelta', Float(), table=<feature>), Column('outdoorTemp_2hourDelta', Float(), table=<feature>), Column('outdoorTemp_3hourDelta', Float(), table=<feature>), Column('outdoorTemp_4hourDelta', Float(), table=<feature>), Column('outdoorTemp_1DayDelta', Float(), table=<feature>), Column('load', Float(), table=<feature>), Column('load_1hourDelta', Float(), table=<feature>), Column('load_2hourDelta', Float(), table=<feature>), Column('load_3hourDelta', Float(), table=<feature>), Column('load_4hourDelta', Float(), table=<feature>), Column('load_1DayDelta', Float(), table=<feature>), schema=None)

Sometimes you need to derive more features for your model. Then you need to take data out from feature_raw, and even more from others, then put everything here, so finally you have a nicely organized tabulate dataset. You can easily choose any subset and plug in any machine learning model.