hstlc Package Modules

database.database_interface module

This module serves as the interface and connection module to the hstlc database. The load_connection() function within allows the user to conenct to the database via the session, base, and engine objects (described below). The classes within serve as the object-relational mappings (ORMs) that define the individual tables of the database, and are used to build the tables via the base object.

The engine object serves as the low-level database API and perhaps most importantly contains dialects which allows the sqlalchemy module to communicate with the database.

The base object serves as a base class for class definitions. It produces Table objects and constructs ORMs.

The session object manages operations on ORM-mapped objects, as construced by the base. These operations include querying, for example.

Authors:

Matthew Bourque

Use:

This module is intended to be imported from various hstlc modules and scripts. The objects that are importable from this module are as follows:
from lightcurve_pipeline.database.database_interface import engine
from lightcurve_pipeline.database.database_interface import base
from lightcurve_pipeline.database.database_interface import session
from lightcurve_pipeline.database.database_interface import Metadata
from lightcurve_pipeline.database.database_interface import Outputs
from lightcurve_pipeline.database.database_interface import BadData
from lightcurve_pipeline.database.database_interface import Stats

Dependencies:

  1. Users must have access to the hstlc database
  2. Users must also have a config.yaml file located in the lightcurve_pipeline/utils/ directory with the following keys:
    • db_connection_string - The hstlc database connection string
Other external library dependencies include:
  • pymysql
  • sqlalchemy
  • lightcurve_pipeline
lightcurve_pipeline.database.database_interface.load_connection(connection_string, echo=False)

Create and return a connection to the database given in the connection string.

Parameters:

connection_string : str

A string that points to the database conenction. The connection string is in the following form: dialect+driver://username:password@host:port/database

echo : bool

Show all SQL produced

Returns:

session : sesson object

Provides a holding zone for all objects loaded or associated with the database.

base : base object

Provides a base class for declarative class definitions.

engine : engine object

Provides a source of database connectivity and behavior.

lightcurve_pipeline.database.database_interface.get_session()

Return the session object of the database connection

In many cases, all that is needed is the session object to interact with the database. This function can be used just to establish a connection and retreive the session object.

Returns:

session : sqlalchemy.orm.session.Session

Provides a holding zone for all objects loaded or associated with the database.

database.update_database module

This module serves as an interface for updating the various tables of the hstlc database, either by inserting new records, or updating existing ones

Authors:

Matthew Bourque

Use:

This module is intended to be imported from the various hstlc scripts, as such:
from lightcurve_pipeline.database.update_database import update_bad_data_table
from lightcurve_pipeline.database.update_database import update_metadata_table
from lightcurve_pipeline.database.update_database import update_stats_table
from lightcurve_pipeline.database.update_database import update_outputs_table

Dependencies:

  1. Users must have access to the hstlc database
  2. Users must also have a config.yaml file located in the lightcurve_pipeline/utils/ directory with the following keys:
    • db_connection_string - The hstlc database connection string
Other external library dependencies include:
  • pymysql
  • sqlalchemy
  • lightcurve_pipeline
lightcurve_pipeline.database.update_database.update_bad_data_table(filename, reason)

Insert or update a record pertaining to the filename in the bad_data table

Parameters:

filename : string

The filename of the file

reason : string

The reason that the data is bad. Can either be No events, Bad EXPFLAG, Non-linear time, Singular event, Bad Proposal, or Short Exposure.

lightcurve_pipeline.database.update_database.update_metadata_table(metadata_dict)

Insert or update a record in the metadata table containing the metadata_dict information

Parameters:

metadata_dict : dict

A dictionary containing metadata of the file. Each key of the metadata_dict corresponds to a column in the matadata table of the database.

lightcurve_pipeline.database.update_database.update_outputs_table(metadata_dict, outputs_dict)

Insert or update a record in the outputs table containing output product information

Parameters:

metadata_dict : dict

A dictionary containing metadata of the file. Each key of the metadata_dict corresponds to a column in the matadata table of the database.

outputs_dict : dict

A dictionary containing output product information. Each key of the outputs_dict corresponds to a column in the outputs table of the database.

lightcurve_pipeline.database.update_database.update_stats_table(stats_dict, dataset)

Insert or update a record in the stats table for the given dataset containing the lightcurve product statistics given in the stats_dict

Parameters:

stats_dict : dict

A dictionary containing the lightcurve statistics. Each key of stats_dict corresponds to a column in the stats table of the database.

dataset : string

The path to the lightcurve product

ingest.make_lightcurves module

ingest.resolve_target module

This module contains functions that attempt to resolve target names (i.e. TARGNAME) to a more common option, if possible. The method for doing this is as follows:

  1. Look up the targname from the hard-coded targname_dict dictionary. If it exists, then use that targname
  2. If no dictionary entry exists, look up the targname in the CDS web service[1]
  3. If the CDS web service returns resolved target names, and one of those target names already exists in the metadata table, then use that targname
  4. If the targname cannot be resolved through any of these steps, then use the original targname

The hard-coded targname_dict dictionary resides in the utils.targname_dict module.

Authors:

Justin Ely, Matthew Bourque

Use:

This module is intended to be imported from and used by the ingest_hstlc script as such:
from lightcurve_pipeline.ingest.resolve_target import get_targname
get_targname(targname)

Dependencies:

  1. Users must have access to the CDS web service
  2. Users must have access to the hstlc database
  3. Users must have a config.yaml file located in the lightcurve_pipeline/utils/ directory with the following keys:
    • db_connection_string - The hstlc database connection string
Other external library dependencies include:
  • pymysql
  • sqlalchemy
  • lightcurve
  • lightcurve_pipeline

References:

[1] Centre de Donnees astronomiques de Strasbourg (http://cdsweb.u-strasbg.fr/)
lightcurve_pipeline.ingest.resolve_target.get_targname(targname)

Resolve the targname to a better option, if available. If the targname cannot be resolved, the original targname is returned.

Parameters:

targname : str

The name of the target

Returns:

new_targname: str

The resolved target name

lightcurve_pipeline.ingest.resolve_target.resolve(targname)

Resolve target name via the CDS web service

Parameters:

targname : str

The name of the target

Returns:

other_names : set

set of resolved other names

quality.data_checks module

Perform data quality checks for the given dataset. The dataset is checked for a number of issues, which include:

  1. A non-normal EXPFLAG - indicating that something went wrong during the observation
  2. A non-linear time column in which time does not progress linearly through the TIME column in the dataset
  3. A dataset not having any events
  4. A dataset in which all events occur at a single time
  5. A dataset that is part of a problematic proposal
  6. A dataset with an exposure time that is too short

Datasets that do not pass these checks are moved to the bad_data_dir, as determined by the config file (see below)

Authors:

Justin Ely, Matthew Bourque

Use:

This module is intended to be imported and used by the ingest_hstlc script as such:
from lightcurve_pipeline.quality.data_checks import dataset_ok
dataset_ok(dataset)

Dependencies:

  1. Users must have access to the hstlc database
  2. Users must also have a config.yaml file located in the lightcurve_pipeline/utils/ directory with the following keys:
    • db_connection_string - The hstlc database connection string
    • bad_data_dir - The directory in which bad data files are stored
Other external library dependencies include:
  • astropy
  • lightcurve_pipeline
  • pymysql
  • sqlalchemy
lightcurve_pipeline.quality.data_checks.check_bad_proposal(hdu)

Check that the proposal ID is not in a list of known ‘bad’ programs. Programs can be bad for a number of reasons, typically because of specialized calibration purposes like focus sweeps or high-voltage tests.

Parameters:

hdu : astropy.io.fits.hdu.hdulist.HDUList

The hdulist of the dataset

Returns:

success : boolean

True if events are not from a known bad proposal, False otherwise

reason : string

An empty string if success is True, Bad Proposal otherwise

lightcurve_pipeline.quality.data_checks.check_expflag(hdu)

Check that the EXPFLAG keyword is NORMAL

Parameters:

hdu : astropy.io.fits.hdu.hdulist.HDUList

The hdulist of the dataset

Returns:

success : boolean

True if the EXPFLAG is NORMAL, False otherwise

reason : string

An empty string if success is True, Bad EXPFLAG otherwise

lightcurve_pipeline.quality.data_checks.check_exptime(hdu)

Check that the dataset exptime is not too short. The threshold is initially set to 1 second to filter out a small subset of very short exposures.

Parameters:

hdu : astropy.io.fits.hdu.hdulist.HDUList

The hdulist of the dataset

Returns:

success : boolean

True if exptime greater than threshold, False otherwise

reason : string

An empty string if success is True, Short Exposure otherwise

lightcurve_pipeline.quality.data_checks.check_linear(hdu)

Check that the time column linearly progresses

Parameters:

hdu : astropy.io.fits.hdu.hdulist.HDUList

The hdulist of the dataset

Returns:

success : boolean

True if time progressing linearly, False otherwise

reason : string

An empty string if success is True, Non-linear time otherwise

lightcurve_pipeline.quality.data_checks.check_no_events(hdu)

Check that the dataset has events

Parameters:

hdu : astropy.io.fits.hdu.hdulist.HDUList

The hdulist of the dataset

Returns:

success : boolean

True if the dataset has events, False otherwise

reason : string

An empty string if success is True, No events otherwise

lightcurve_pipeline.quality.data_checks.check_not_singular(hdu)

Check that the events in the dataset are not from a single time

Parameters:

hdu : astropy.io.fits.hdu.hdulist.HDUList

The hdulist of the dataset

Returns:

success : boolean

True if events are not from a single time, False otherwise

reason : string

An empty string if success is True, Singular event otherwise

lightcurve_pipeline.quality.data_checks.dataset_ok(filename, move=True)

Perform quality check on the given dataset, and update the bad_data table and move the dataset to the bad_data directory if it doesn’t pass

Parameters:

filename : string

The full path to the dataset

move : bool, optional

Whether or not to update the bad_data table and move the file

Returns:

True if the dataset passes all of the quality checks, False

if it doesn’t.

lightcurve_pipeline.quality.data_checks.move_file(filename)

Move the given dataset to the bad_data directory

Parameters:

filename : string

The full path to the dataset

utils.periodogram_stats module

Generate lomb-scargle periodogram statistics. The periodogram statistics are used in the stats table in the hstlc database as well as the periodogram plots generated by the make_hstlc_plots script.

Authors:

Matthew Bourque

Use:

This module is intended to be imported and used by the build_stats_table script as such:
from lightcurve_pipeline.utils.periodogram_stats import get_periodogram_stats
periods, power, mean, three_sigma, significant_periods, significant_powers = get_periodogram_stats(dataset, freq_space)

Dependencies:

External library dependencies include:
  • astropy
  • numpy
  • scipy
lightcurve_pipeline.utils.periodogram_stats.get_periodogram_stats(dataset, freq_space)

Find significant periods from the given dataset and frequency space using a lomb-scargle periodogram.

Parameters:

dataset : string

The path to the lightcurve product.

freq_space : string

Can either be short, med, or long. This defines the frequency space to look for significant periods. short is defined as the range (STEPSIZE, 10 minutes), med is (10 minutes, 1 hour), and long is (1 hour, 10 hours).

Returns:

periods : numpy array

An array of the periods to check

power : numpy array

An array of the lomb-scargle powers corresponding to each period

mean : float

The mean of the lomb-scargle powers

std : float

The standard deviation of the lomb-scargle powers

three_sigma : float

Three standard deviations above the mean of the lomb-scargle powers

significant_periods : list

A list of the periods that have powers greater than 3-sigma about the mean

significant_powers : list

A list of the lomb-scargle powers that are greater than 3-sigma about the mean

utils.targname_dict module

Define a targname_dict that is used as a look up table for resolving target names. The targname_dict is comrpised of two major sets of key/value pairs:

  1. Target names that need resolving
  2. Target names that do not need resolving

The target names that need resolving are typically those that use a common name, but have variations dealing with hyphens (e.g. AZV-148 instead of AZV148) or indexing (e.g. SATURN1 instead of SATURN). The target names that don’t need resolving are ones that already have established common names (e.g. CALLISTO), or incredibly unique names that will never be resolved to a common name (e.g. 1507476-162738).

Authors:

Matthew Bourque

Use:

This module is intended to be imported and used by the resolve_target module as such:
from lightcurve_pipeline.utils.targname_dict import targname_dict

utils.utils module

This module houses several functions that are key to several modules and scripts within the hstlc package. Please see the individual function documentation for more information.

Authors:

Matthew Bourque

Use:

This functions within this module are intended to be imported by the various hstlc scripts and modules, as such:
from lightcurve_pipeline.utils.utils import SETTINGS
from lightcurve_pipeline.utils.utils import insert_or_update
from lightcurve_pipeline.utils.utils import set_permissions
from lightcurve_pipeline.utils.utils import setup_logging
from lightcurve_pipeline.utils.utils import make_directory

Dependencies:

External library dependencies include:
  • astropy
  • lightcurve_pipeline
  • numpy
  • pymyslq
  • sqlalchemy
lightcurve_pipeline.utils.utils.get_settings()

Return the setting information located in the configuration file located in the lightcurve_pipeline/utils/ directory

Returns:

data : dict

A dictionary containing the settings present in the config.yaml configuration file. Thus, the keys of this dictionary presumably are:

  1. db_connection_string
  2. ingest_dir
  3. filesystem_dir
  4. outputs_dir
  5. composite_dir
  6. log_dir
  7. download_dir
  8. plot_dir
  9. bad_data_dir
  10. home_dir

The values of the keys are the user-supplied configurations

lightcurve_pipeline.utils.utils.insert_or_update(table, data, id_num)

Insert or update the given database table with the given data. This function performs the logic of inserting or updating an entry into the hstlc database; if an entry with the given id_num already exists, then the entry is updated, otherwise a new entry is inserted.

Parameters:

table : sqlalchemy.ext.declarative.api.DeclarativeMeta

The table of the database to update

data : dict

A dictionary of the information to update. Each key of the dictionary must be a column in the given table

id_num : string

The row ID to update. If id_num is blank, then a new row is inserted instead.

lightcurve_pipeline.utils.utils.make_directory(directory)

Create a directory if it doesn’t already exist and set the hstlc permissions

Parameters:

directory : string

The path to the directory

lightcurve_pipeline.utils.utils.set_permissions(path)

Set the permissions of the file path to hstlc permissions settings. The hstlc permissions settings are groupID = hstlc and permissions of rwxrwx---.

Parameters:

path : string

The path to the file

lightcurve_pipeline.utils.utils.setup_logging(module)

This function will configure the logging for the execution of the given module. Logs are written out to the log_dir directory (as determined by the config.yaml file) with the filename <module>_<timestamp>.log.

Parameters:

module : string

The name of the module to log