database¶
This module defines the data schema.
There are 8 major table used in this app. Let’s go through them one by one.
database schema settings:
(
engine,
t_station,
t_weather_raw,
t_weather,
t_object,
t_object_data_raw,
t_object_data,
t_feature_raw,
t_feature,
)
Work flow:
- station metadata
- object metadata
- weather raw => weather data
- object raw => object data
- weather data + object data => feature raw
- feature raw => feature
- feature => machine learning model ...
-
wechinelearn.database.
t_station
= Table('station', MetaData(bind=None), Column('id', String(), table=<station>, primary_key=True, nullable=False), Column('lat', Float(), table=<station>), Column('lng', Float(), table=<station>), schema=None)¶ A weather station must have id, lat, lng; Any observed weather data is associated with a station.
-
wechinelearn.database.
t_weather_raw
= Table('weather_raw', MetaData(bind=None), Column('station_id', String(), ForeignKey('station.id'), table=<weather_raw>, primary_key=True, nullable=False), Column('time', DateTime(), table=<weather_raw>, primary_key=True, nullable=False), Column('label', Integer(), table=<weather_raw>, primary_key=True, nullable=False), Column('value', Float(), table=<weather_raw>), schema=None)¶ Weather raw data table, data could be non-interpolated, sparse and arbitrary many data points. For example, no matter how many data points we have, outdoorTemp, solarPower, windSpeed, humidity, ..., etc, we put them here.
-
wechinelearn.database.
t_weather
= Table('weather', MetaData(bind=None), Column('station_id', String(), ForeignKey('station.id'), table=<weather>, primary_key=True, nullable=False), Column('time', DateTime(), table=<weather>, primary_key=True, nullable=False), Column('outdoorTemp', Float(), table=<weather>), Column('outdoorTempReliab', Boolean(), table=<weather>), schema=None)¶ This is processed weather data. All kinds of observation at same time will be put here. We put interpolated, processed here. Time axis has to be continues. For those moment doesn’t have valid data, we fill in with Null value.
-
wechinelearn.database.
t_object
= Table('object', MetaData(bind=None), Column('id', String(), table=<object>, primary_key=True, nullable=False), Column('lat', Float(), table=<object>), Column('lng', Float(), table=<object>), schema=None)¶ Your analysis target. Must have lat, lng info. wechinelearn use this to local couple of nearby stations and automatically find the best weather data.
-
wechinelearn.database.
and_
(*clauses)¶ Produce a conjunction of expressions joined by
AND
.E.g.:
from sqlalchemy import and_ stmt = select([users_table]).where( and_( users_table.c.name == 'wendy', users_table.c.enrolled == True ) )
The
and_()
conjunction is also available using the Python&
operator (though note that compound expressions need to be parenthesized in order to function with Python operator precedence behavior):stmt = select([users_table]).where( (users_table.c.name == 'wendy') & (users_table.c.enrolled == True) )
The
and_()
operation is also implicit in some cases; theSelect.where()
method for example can be invoked multiple times against a statement, which will have the effect of each clause being combined usingand_()
:stmt = select([users_table]).\ where(users_table.c.name == 'wendy').\ where(users_table.c.enrolled == True)
See also
or_()
-
wechinelearn.database.
distinct
(expr)¶ Produce an column-expression-level unary
DISTINCT
clause.This applies the
DISTINCT
keyword to an individual column expression, and is typically contained within an aggregate function, as in:from sqlalchemy import distinct, func stmt = select([func.count(distinct(users_table.c.name))])
The above would produce an expression resembling:
SELECT COUNT(DISTINCT name) FROM user
The
distinct()
function is also available as a column-level method, e.g.ColumnElement.distinct()
, as in:stmt = select([func.count(users_table.c.name.distinct())])
The
distinct()
operator is different from theSelect.distinct()
method ofSelect
, which produces aSELECT
statement withDISTINCT
applied to the result set as a whole, e.g. aSELECT DISTINCT
expression. See that method for further information.See also
ColumnElement.distinct()
Select.distinct()
func
-
wechinelearn.database.
select
(columns=None, whereclause=None, from_obj=None, distinct=False, having=None, correlate=True, prefixes=None, **kwargs)¶ Construct a new
Select
.Similar functionality is also available via the
FromClause.select()
method on anyFromClause
.All arguments which accept
ClauseElement
arguments also accept string arguments, which will be converted as appropriate into eithertext()
orliteral_column()
constructs.See also
coretutorial_selecting - Core Tutorial description of
select()
.Parameters: - columns –
A list of
ClauseElement
objects, typicallyColumnElement
objects or subclasses, which will form the columns clause of the resulting statement. For all members which are instances ofSelectable
, the individualColumnElement
members of theSelectable
will be added individually to the columns clause. For example, specifying aTable
instance will result in all the containedColumn
objects within to be added to the columns clause.This argument is not present on the form of
select()
available onTable
. - whereclause – A
ClauseElement
expression which will be used to form theWHERE
clause. - from_obj – A list of
ClauseElement
objects which will be added to theFROM
clause of the resulting statement. Note that “from” objects are automatically located within the columns and whereclause ClauseElements. Use this parameter to explicitly specify “from” objects which are not automatically locatable. This could includeTable
objects that aren’t otherwise present, orJoin
objects whose presence will supercede that of theTable
objects already located in the other clauses. - autocommit – Deprecated. Use .execution_options(autocommit=<True|False>) to set the autocommit option.
- bind=None – an
Engine
orConnection
instance to which the resultingSelect
object will be bound. TheSelect
object will otherwise automatically bind to whateverConnectable
instances can be located within its containedClauseElement
members. - correlate=True – indicates that this
Select
object should have its containedFromClause
elements “correlated” to an enclosingSelect
object. This means that anyClauseElement
instance within the “froms” collection of thisSelect
which is also present in the “froms” collection of an enclosing select will not be rendered in theFROM
clause of this select statement. - distinct=False –
when
True
, applies aDISTINCT
qualifier to the columns clause of the resulting statement.The boolean argument may also be a column expression or list of column expressions - this is a special calling form which is understood by the Postgresql dialect to render the
DISTINCT ON (<columns>)
syntax.distinct
is also available via thedistinct()
generative method. - for_update=False – when
True
, appliesFOR UPDATE
to the end of the resulting statement.Deprecated since version 0.9.0: - use
GenerativeSelect.with_for_update()
to specify the structure of theFOR UPDATE
clause.for_update
accepts various string values interpreted by specific backends, including:"read"
- on MySQL, translates toLOCK IN SHARE MODE
; on Postgresql, translates toFOR SHARE
."nowait"
- on Postgresql and Oracle, translates toFOR UPDATE NOWAIT
."read_nowait"
- on Postgresql, translates toFOR SHARE NOWAIT
.
See also
GenerativeSelect.with_for_update()
- improved API for specifying theFOR UPDATE
clause. - group_by – a list of
ClauseElement
objects which will comprise theGROUP BY
clause of the resulting select. - having – a
ClauseElement
that will comprise theHAVING
clause of the resulting select whenGROUP BY
is used. - limit=None – a numerical value which usually compiles to a
LIMIT
expression in the resulting select. Databases that don’t supportLIMIT
will attempt to provide similar functionality. - offset=None – a numeric value which usually compiles to an
OFFSET
expression in the resulting select. Databases that don’t supportOFFSET
will attempt to provide similar functionality. - order_by – a scalar or list of
ClauseElement
objects which will comprise theORDER BY
clause of the resulting select. - use_labels=False –
when
True
, the statement will be generated using labels for each column in the columns clause, which qualify each column with its parent table’s (or aliases) name so that name conflicts between columns in different tables don’t occur. The format of the label is <tablename>_<column>. The “c” collection of the resultingSelect
object will use these names as well for targeting column members.use_labels is also available via the
apply_labels()
generative method.
- columns –
-
wechinelearn.database.
t_object_data_raw
= Table('object_data_raw', MetaData(bind=None), Column('object_id', String(), ForeignKey('object.id'), table=<object_data_raw>, primary_key=True, nullable=False), Column('time', DateTime(), table=<object_data_raw>, primary_key=True, nullable=False), Column('label', Integer(), table=<object_data_raw>, primary_key=True, nullable=False), Column('value', Float(), table=<object_data_raw>), schema=None)¶ Similar to weather raw data, but it’s about your target.
-
wechinelearn.database.
t_object_data
= Table('object_data', MetaData(bind=None), Column('object_id', String(), ForeignKey('object.id'), table=<object_data>, primary_key=True, nullable=False), Column('time', DateTime(), table=<object_data>, primary_key=True, nullable=False), Column('load', Float(), table=<object_data>), Column('loadReliab', Boolean(), table=<object_data>), schema=None)¶ Similar to weather data, it’s interpolated, processed data.
-
wechinelearn.database.
t_feature_raw
= Table('feature_raw', MetaData(bind=None), Column('object_id', String(), ForeignKey('object.id'), table=<feature_raw>, primary_key=True, nullable=False), Column('time', DateTime(), table=<feature_raw>, primary_key=True, nullable=False), Column('dayseconds', Float(), table=<feature_raw>), Column('is_weekend', Boolean(), table=<feature_raw>), Column('outdoorTemp', Float(), table=<feature_raw>), Column('load', Float(), table=<feature_raw>), schema=None)¶ This table is a result of merging weather and object data on the time axis. This table only have data points you observed.
-
wechinelearn.database.
t_feature
= Table('feature', MetaData(bind=None), Column('object_id', String(), ForeignKey('object.id'), table=<feature>, primary_key=True, nullable=False), Column('time', DateTime(), table=<feature>, primary_key=True, nullable=False), Column('dayseconds', Float(), table=<feature>), Column('is_weekend', Boolean(), table=<feature>), Column('outdoorTemp', Float(), table=<feature>), Column('outdoorTemp_1hourDelta', Float(), table=<feature>), Column('outdoorTemp_2hourDelta', Float(), table=<feature>), Column('outdoorTemp_3hourDelta', Float(), table=<feature>), Column('outdoorTemp_4hourDelta', Float(), table=<feature>), Column('outdoorTemp_1DayDelta', Float(), table=<feature>), Column('load', Float(), table=<feature>), Column('load_1hourDelta', Float(), table=<feature>), Column('load_2hourDelta', Float(), table=<feature>), Column('load_3hourDelta', Float(), table=<feature>), Column('load_4hourDelta', Float(), table=<feature>), Column('load_1DayDelta', Float(), table=<feature>), schema=None)¶ Sometimes you need to derive more features for your model. Then you need to take data out from
feature_raw
, and even more from others, then put everything here, so finally you have a nicely organized tabulate dataset. You can easily choose any subset and plug in any machine learning model.