WebStore Client: tables on the web

WebStore Client is a simple Python wrapper to easily access WebStore, a web-based table store used for on-line data storage, processing and visualization. WebStore supports various ways to access the data stored in it, but this Python client library makes using it as simple as a generic csv.DictWriter.

Example

To use WebStore, you need to have an instance of the WebStore server running either locally or on the web. If you also want to have write access, you’ll need valid access credentials. As WebStore doesn’t handle authentication internally, this usually either means signing up with an associated instance of CKAN or adding a user to an Apache .htaccess file.

Once you have both a server and credentials, you can start using WebClient by creating a Database object:

>>> from webstore.client import Database
>>> database = Database('webstore.myserver.org', 'owner', 'mydatabase')

Note that each database has a user that owns it and that needs to be specified when connecting to a database. If you were to sign into your own database, the could would look like this:

>>> database = Database('webstore.myserver.org', 'me', 'testdb',
                        http_user='me', http_password='secret')

There is no special command to create a database, so just connecting to an arbitrary name within your own namespace will create one. Once you have connected to a database, you can list tables or check for a specific name:

>>> database.tables()
[u'testdb', u'postal_codes', u'movies']
>>> 'triples' in database
False

To actually begin using a table, you can select a table and see whether it has already been created:

>>> table = database['weather']
>>> table.exists
False

... but what good is an empty table? So let’s fill this thing with some rows:

>>> table.writerow({'place': 'Berlin', 'temp': 23})
>>> table.writerows([{'place': 'London', 'temp': 5},
                     {'place': 'Moscow', 'temp': -2}])

As you run this, both the table and the required columns are created automatically. This means you don’t need to worry about schema creation at all. You cannot, however, store complex objects like dict, list, tuple or custom classes to WebStore.

While its simple to add new data, for updating existing rows, we use a little trick: unique_columns. This set of column names will be used to try and perform an update:

>>> table.writerow({'place': 'Berlin', 'temp': 18},
                   unique_columns=['place'])

This will update the temp values of all rows mentioning Berlin, but leave any other columns intact.

Now that we have added some data to the table, we can try and traverse it:

>>> for row in table:
>>>     print row['place']
Berlin
London
Moscow

Using traverse() instead will give you the option to apply limits, offsets and very simple column filters:

>>> for row in table.traverse(place='Berlin', _limit=4, _offset=0):
>>>     print row['temp']
18

For more informations on how you can use the WebStore client, have a look at the API documentation for Table.

API

Access to the WebStore client happens via two simple classes: Database and Table.

webstore.client.DSN(name, config_file=None)

Create a database from a data source name.

Allows to connect to pre-configured databases via a config file, either in the current working directory (webstore.cfg) or the user’s home directory (.webstore.cfg). The configuration is expected to have the following format:

[DEFAULT]
# global options
server = webstore.server.org
http_user = username
http_password = password

[source1]
user = username
database = db1

If the given name does not exist as a section in the configuration file, the DEFAULT section will be used and the name will be assumed to be the target database name.

webstore.client.URL(url, default_table=None)

Create a webstore database handle from a URL. The URL is assumed to have the following form:

http://user:password@server/db_user/db_database[/table]

If no user and password are given, anonymous access is used. The additional table argument is optional: if it is present, a tuple of (Database, Table) objects will be returned. If no table is specfied, the second element of the tuple will be None or the table named after the optional argument default_table.

class webstore.client.Database(server, database_user, database_name, http_user=None, http_password=None)

A web-based database with many Tables. Databases are owned by one particular user and can usually only be written by this user.

tables()

Get a list of the tables defined in this database.

__getitem__(table_name)

Get a table from the database by name.

Parameters :
  • table_name: name of the table to return.
__contains__(table_name)

Check if table_name is an existing table on the database.

Parameters :
  • table_name: the table name to check for.
class webstore.client.Table(server, base_path, table_name, http_user, http_password)

A table in the database on which you can perform read and (if authorized) write operations.

delete()

Delete the table entirely, dropping its structure as well as any contained data.

distinct(column_name)

Get all distinct values for a column.

exists()

Check if the given table actually exists or if it needs to be created by writing content.

Note the value is cached and the method thus failes to recognize write activity from other places.

find_one(**kwargs)

Get a single item matching the given criteria. The criteria can be the value of any column. If no item is found, None is returned.

traverse(_step=50, _sort=[], _limit=None, _offset=0, **kwargs)

Iterate over the table, fetching _step items at a time.

This will return a generator to traverse the table and yield each row as a dictionary of column values.

Parameters :
  • _step: determines how many records will be retrieved with each request. This is mostly a tuning aspect.
  • _limit: the maximum number of elements to retrieve.
  • _offset: offset to start traversal at.
  • _sort: a list of sorting parameters given as tuples of (column, direction). The direction can either be ‘asc’ or ‘desc’.
  • other keyword arguments: will be passed to the server and treated as column filters.
writerow(row, unique_columns=None)

Write a single row. The row is expected to be a flat dictionary (i.e. no lists, tuples or dicts as values).

For more documentation, see writerows.

writerows(rows, unique_columns=None)

Write a set of rows to the table. Each row is expected to be a flat dictionary (i.e. no lists, tuples or dicts as values).

When unique_columns is set, webstore will first attempt to update existing rows that share the values of each column in the set. If no update can be performed, a new row will instead be inserted.

Parameters :
  • rows: a list of rows to be written to the table.
  • unique_columns: a set of columns that can be used to uniquely identify this row when attempting to update.
Fork me on GitHub