hstlc Package Modules¶
database.database_interface module¶
This module serves as the interface and connection module to the hstlc
database. The load_connection()
function within allows the user
to conenct to the database via the session
, base
, and
engine
objects (described below). The classes within serve as the
object-relational mappings (ORMs) that define the individual tables of
the database, and are used to build the tables via the base
object.
The engine
object serves as the low-level database API and perhaps
most importantly contains dialects which allows the sqlalchemy module
to communicate with the database.
The base
object serves as a base class for class definitions. It
produces Table
objects and constructs ORMs.
The session
object manages operations on ORM-mapped objects, as
construced by the base
. These operations include querying, for
example.
Authors:
Matthew Bourque
Use:
This module is intended to be imported from various hstlc modules and scripts. The objects that are importable from this module are as follows:
from lightcurve_pipeline.database.database_interface import engine
from lightcurve_pipeline.database.database_interface import base
from lightcurve_pipeline.database.database_interface import session
from lightcurve_pipeline.database.database_interface import Metadata
from lightcurve_pipeline.database.database_interface import Outputs
from lightcurve_pipeline.database.database_interface import BadData
from lightcurve_pipeline.database.database_interface import Stats
Dependencies:
- Users must have access to the hstlc database
- Users must also have a
config.yaml
file located in thelightcurve_pipeline/utils/
directory with the following keys:
db_connection_string
- The hstlc database connection string
- Other external library dependencies include:
pymysql
sqlalchemy
lightcurve_pipeline
-
lightcurve_pipeline.database.database_interface.
load_connection
(connection_string, echo=False)¶ Create and return a connection to the database given in the connection string.
Parameters: connection_string : str
A string that points to the database conenction. The connection string is in the following form:
dialect+driver://username:password@host:port/database
echo : bool
Show all SQL produced
Returns: session : sesson object
Provides a holding zone for all objects loaded or associated with the database.
base : base object
Provides a base class for declarative class definitions.
engine : engine object
Provides a source of database connectivity and behavior.
-
lightcurve_pipeline.database.database_interface.
get_session
()¶ Return the
session
object of the database connectionIn many cases, all that is needed is the
session
object to interact with the database. This function can be used just to establish a connection and retreive thesession
object.Returns: session : sqlalchemy.orm.session.Session
Provides a holding zone for all objects loaded or associated with the database.
database.update_database module¶
This module serves as an interface for updating the various tables of the hstlc database, either by inserting new records, or updating existing ones
Authors:
Matthew Bourque
Use:
This module is intended to be imported from the various hstlc scripts, as such:
from lightcurve_pipeline.database.update_database import update_bad_data_table
from lightcurve_pipeline.database.update_database import update_metadata_table
from lightcurve_pipeline.database.update_database import update_stats_table
from lightcurve_pipeline.database.update_database import update_outputs_table
Dependencies:
- Users must have access to the hstlc database
- Users must also have a
config.yaml
file located in thelightcurve_pipeline/utils/
directory with the following keys:
db_connection_string
- The hstlc database connection string
- Other external library dependencies include:
pymysql
sqlalchemy
lightcurve_pipeline
-
lightcurve_pipeline.database.update_database.
update_bad_data_table
(filename, reason)¶ Insert or update a record pertaining to the filename in the
bad_data
tableParameters: filename : string
The filename of the file
reason : string
The reason that the data is bad. Can either be
No events
,Bad EXPFLAG
,Non-linear time
,Singular event
,Bad Proposal
, orShort Exposure
.
-
lightcurve_pipeline.database.update_database.
update_metadata_table
(metadata_dict)¶ Insert or update a record in the metadata table containing the
metadata_dict
informationParameters: metadata_dict : dict
A dictionary containing metadata of the file. Each key of the
metadata_dict
corresponds to a column in the matadata table of the database.
-
lightcurve_pipeline.database.update_database.
update_outputs_table
(metadata_dict, outputs_dict)¶ Insert or update a record in the outputs table containing output product information
Parameters: metadata_dict : dict
A dictionary containing metadata of the file. Each key of the
metadata_dict
corresponds to a column in the matadata table of the database.outputs_dict : dict
A dictionary containing output product information. Each key of the
outputs_dict
corresponds to a column in the outputs table of the database.
-
lightcurve_pipeline.database.update_database.
update_stats_table
(stats_dict, dataset)¶ Insert or update a record in the stats table for the given dataset containing the lightcurve product statistics given in the
stats_dict
Parameters: stats_dict : dict
A dictionary containing the lightcurve statistics. Each key of
stats_dict
corresponds to a column in the stats table of the database.dataset : string
The path to the lightcurve product
ingest.make_lightcurves module¶
ingest.resolve_target module¶
This module contains functions that attempt to resolve target names
(i.e. TARGNAME
) to a more common option, if possible. The method
for doing this is as follows:
- Look up the
targname
from the hard-codedtargname_dict
dictionary. If it exists, then use thattargname
- If no dictionary entry exists, look up the
targname
in the CDS web service[1]- If the CDS web service returns resolved target names, and one of those target names already exists in the
metadata
table, then use thattargname
- If the
targname
cannot be resolved through any of these steps, then use the originaltargname
The hard-coded targname_dict
dictionary resides in the
utils.targname_dict
module.
Authors:
Justin Ely, Matthew Bourque
Use:
This module is intended to be imported from and used by theingest_hstlc
script as such:
from lightcurve_pipeline.ingest.resolve_target import get_targname
get_targname(targname)
Dependencies:
- Users must have access to the CDS web service
- Users must have access to the hstlc database
- Users must have a
config.yaml
file located in thelightcurve_pipeline/utils/
directory with the following keys:
db_connection_string
- The hstlc database connection string
- Other external library dependencies include:
pymysql
sqlalchemy
lightcurve
lightcurve_pipeline
References:
[1] Centre de Donnees astronomiques de Strasbourg (http://cdsweb.u-strasbg.fr/)
-
lightcurve_pipeline.ingest.resolve_target.
get_targname
(targname)¶ Resolve the
targname
to a better option, if available. If thetargname
cannot be resolved, the originaltargname
is returned.Parameters: targname : str
The name of the target
Returns: new_targname: str
The resolved target name
-
lightcurve_pipeline.ingest.resolve_target.
resolve
(targname)¶ Resolve target name via the CDS web service
Parameters: targname : str
The name of the target
Returns: other_names : set
set of resolved other names
quality.data_checks module¶
Perform data quality checks for the given dataset. The dataset is checked for a number of issues, which include:
- A non-normal
EXPFLAG
- indicating that something went wrong during the observation - A non-linear time column in which time does not progress linearly
through the
TIME
column in the dataset - A dataset not having any events
- A dataset in which all events occur at a single time
- A dataset that is part of a problematic proposal
- A dataset with an exposure time that is too short
Datasets that do not pass these checks are moved to the
bad_data_dir
, as determined by the config file (see below)
Authors:
Justin Ely, Matthew Bourque
Use:
This module is intended to be imported and used by theingest_hstlc
script as such:
from lightcurve_pipeline.quality.data_checks import dataset_ok
dataset_ok(dataset)
Dependencies:
- Users must have access to the hstlc database
- Users must also have a
config.yaml
file located in thelightcurve_pipeline/utils/
directory with the following keys:
db_connection_string
- The hstlc database connection stringbad_data_dir
- The directory in which bad data files are stored
- Other external library dependencies include:
astropy
lightcurve_pipeline
pymysql
sqlalchemy
-
lightcurve_pipeline.quality.data_checks.
check_bad_proposal
(hdu)¶ Check that the proposal ID is not in a list of known ‘bad’ programs. Programs can be bad for a number of reasons, typically because of specialized calibration purposes like focus sweeps or high-voltage tests.
Parameters: hdu : astropy.io.fits.hdu.hdulist.HDUList
The hdulist of the dataset
Returns: success : boolean
True
if events are not from a known bad proposal,False
otherwisereason : string
An empty string if success is
True
,Bad Proposal
otherwise
-
lightcurve_pipeline.quality.data_checks.
check_expflag
(hdu)¶ Check that the
EXPFLAG
keyword isNORMAL
Parameters: hdu : astropy.io.fits.hdu.hdulist.HDUList
The hdulist of the dataset
Returns: success : boolean
True
if theEXPFLAG
isNORMAL
,False
otherwisereason : string
An empty string if success is
True
,Bad EXPFLAG
otherwise
-
lightcurve_pipeline.quality.data_checks.
check_exptime
(hdu)¶ Check that the dataset exptime is not too short. The threshold is initially set to 1 second to filter out a small subset of very short exposures.
Parameters: hdu : astropy.io.fits.hdu.hdulist.HDUList
The hdulist of the dataset
Returns: success : boolean
True
if exptime greater than threshold,False
otherwisereason : string
An empty string if success is
True
,Short Exposure
otherwise
-
lightcurve_pipeline.quality.data_checks.
check_linear
(hdu)¶ Check that the time column linearly progresses
Parameters: hdu : astropy.io.fits.hdu.hdulist.HDUList
The hdulist of the dataset
Returns: success : boolean
True
if time progressing linearly,False
otherwisereason : string
An empty string if success is
True
,Non-linear time
otherwise
-
lightcurve_pipeline.quality.data_checks.
check_no_events
(hdu)¶ Check that the dataset has events
Parameters: hdu : astropy.io.fits.hdu.hdulist.HDUList
The hdulist of the dataset
Returns: success : boolean
True
if the dataset has events,False
otherwisereason : string
An empty string if success is
True
,No events
otherwise
-
lightcurve_pipeline.quality.data_checks.
check_not_singular
(hdu)¶ Check that the events in the dataset are not from a single time
Parameters: hdu : astropy.io.fits.hdu.hdulist.HDUList
The hdulist of the dataset
Returns: success : boolean
True
if events are not from a single time,False
otherwisereason : string
An empty string if success is
True
,Singular event
otherwise
-
lightcurve_pipeline.quality.data_checks.
dataset_ok
(filename, move=True)¶ Perform quality check on the given dataset, and update the
bad_data
table and move the dataset to thebad_data
directory if it doesn’t passParameters: filename : string
The full path to the dataset
move : bool, optional
Whether or not to update the
bad_data
table and move the fileReturns: True
if the dataset passes all of the quality checks,False
if it doesn’t.
-
lightcurve_pipeline.quality.data_checks.
move_file
(filename)¶ Move the given dataset to the
bad_data
directoryParameters: filename : string
The full path to the dataset
utils.periodogram_stats module¶
Generate lomb-scargle periodogram statistics. The periodogram
statistics are used in the stats
table in the hstlc database as
well as the periodogram plots generated by the make_hstlc_plots
script.
Authors:
Matthew Bourque
Use:
This module is intended to be imported and used by thebuild_stats_table
script as such:
from lightcurve_pipeline.utils.periodogram_stats import get_periodogram_stats
periods, power, mean, three_sigma, significant_periods, significant_powers = get_periodogram_stats(dataset, freq_space)
Dependencies:
- External library dependencies include:
astropy
numpy
scipy
-
lightcurve_pipeline.utils.periodogram_stats.
get_periodogram_stats
(dataset, freq_space)¶ Find significant periods from the given dataset and frequency space using a lomb-scargle periodogram.
Parameters: dataset : string
The path to the lightcurve product.
freq_space : string
Can either be
short
,med
, orlong
. This defines the frequency space to look for significant periods.short
is defined as the range (STEPSIZE
, 10 minutes),med
is (10 minutes, 1 hour), andlong
is (1 hour, 10 hours).Returns: periods : numpy array
An array of the periods to check
power : numpy array
An array of the lomb-scargle powers corresponding to each period
mean : float
The mean of the lomb-scargle powers
std : float
The standard deviation of the lomb-scargle powers
three_sigma : float
Three standard deviations above the mean of the lomb-scargle powers
significant_periods : list
A list of the periods that have powers greater than 3-sigma about the mean
significant_powers : list
A list of the lomb-scargle powers that are greater than 3-sigma about the mean
utils.targname_dict module¶
Define a targname_dict
that is used as a look up table for
resolving target names. The targname_dict
is comrpised of two
major sets of key/value pairs:
- Target names that need resolving
- Target names that do not need resolving
The target names that need resolving are typically those that use a common name, but have variations dealing with hyphens (e.g. AZV-148 instead of AZV148) or indexing (e.g. SATURN1 instead of SATURN). The target names that don’t need resolving are ones that already have established common names (e.g. CALLISTO), or incredibly unique names that will never be resolved to a common name (e.g. 1507476-162738).
Authors:
Matthew Bourque
Use:
This module is intended to be imported and used by theresolve_target
module as such:
from lightcurve_pipeline.utils.targname_dict import targname_dict
utils.utils module¶
This module houses several functions that are key to several modules and scripts within the hstlc package. Please see the individual function documentation for more information.
Authors:
Matthew Bourque
Use:
This functions within this module are intended to be imported by the various hstlc scripts and modules, as such:
from lightcurve_pipeline.utils.utils import SETTINGS
from lightcurve_pipeline.utils.utils import insert_or_update
from lightcurve_pipeline.utils.utils import set_permissions
from lightcurve_pipeline.utils.utils import setup_logging
from lightcurve_pipeline.utils.utils import make_directory
Dependencies:
- External library dependencies include:
astropy
lightcurve_pipeline
numpy
pymyslq
sqlalchemy
-
lightcurve_pipeline.utils.utils.
get_settings
()¶ Return the setting information located in the configuration file located in the
lightcurve_pipeline/utils/
directoryReturns: data : dict
A dictionary containing the settings present in the config.yaml configuration file. Thus, the keys of this dictionary presumably are:
db_connection_string
ingest_dir
filesystem_dir
outputs_dir
composite_dir
log_dir
download_dir
plot_dir
bad_data_dir
home_dir
The values of the keys are the user-supplied configurations
-
lightcurve_pipeline.utils.utils.
insert_or_update
(table, data, id_num)¶ Insert or update the given database table with the given data. This function performs the logic of inserting or updating an entry into the hstlc database; if an entry with the given
id_num
already exists, then the entry is updated, otherwise a new entry is inserted.Parameters: table : sqlalchemy.ext.declarative.api.DeclarativeMeta
The table of the database to update
data : dict
A dictionary of the information to update. Each key of the dictionary must be a column in the given table
id_num : string
The row ID to update. If
id_num
is blank, then a new row is inserted instead.
-
lightcurve_pipeline.utils.utils.
make_directory
(directory)¶ Create a directory if it doesn’t already exist and set the hstlc permissions
Parameters: directory : string
The path to the directory
-
lightcurve_pipeline.utils.utils.
set_permissions
(path)¶ Set the permissions of the file path to hstlc permissions settings. The hstlc permissions settings are groupID =
hstlc
and permissions ofrwxrwx---
.Parameters: path : string
The path to the file
-
lightcurve_pipeline.utils.utils.
setup_logging
(module)¶ This function will configure the logging for the execution of the given module. Logs are written out to the
log_dir
directory (as determined by theconfig.yaml
file) with the filename<module>_<timestamp>.log
.Parameters: module : string
The name of the module to log