hstlc Scripts

ingest_hstlc script

build_stats_table script

make_hstlc_plots script

Create various plots that deal with the hstlc filesystem, database, and output products. This script uses multiprocessing. Users can set the number of cores used via the num_cores setting in the config file (see below).

Authors:

Justin Ely, Matthew Bourque

Use:

This script is intended to be executed as part of the hstlc_pipeline shell script. However, users can also execute this script via the command line as such:

>>> make_hstlc_plots

Outputs:

  1. hlsp_hstlc_*.png static lightcurve plots for each composite lightcurve, placed in the composite_dir directory, as determined by the config file (see below)
  2. hlsp_hstlc_*.html bokeh plots showing a ‘dashboard’ of various plots for each composite lightcurve, placed in the composite_dir directory, as determined by the config file (see below)
  3. interesting_hstlc.html, boring_hstlc.html, and null_hstlc.html ‘exploratory’ tables, which are sortable tables that display statistics and plots for each dataset, placed in the plot_dir directory, as determined by the config file (see below)
  4. exptime_histogram.html - A histrogram showing the cumulative exposure time by target in the form of a bokeh plot, placed in the plot_dir directory, as determined by the config file (see below)
  5. pie_config_cos_fuv.html, pie_config_cos_nuv.html, pie_config_stis_fuv.html, and pie_config_stis_nuv.html ‘configuration’ pie charts that show the breakdown of datasets by grating/cenwave for each instrument/detector combination, placed in the plot_dir directory, as determined by the config file (see below)
  6. opt_elem.html - a historgram showing the number of datasets for each filter, placed in the plot_dir directory, as determined by the config file (see below)
  7. <dataset name>_periodgram.png - Lomb-Scargle periodograms for each dataset (both individual and composite), placed in the plot_dir directory, as determined by the config file (see below). Additionally, periodograms that are deemed interesting are saved in a separate periodogram_subset directory under the plot_dir directory.
  8. a log file in the log_dir directory as determined by the config file (see below)

Dependencies:

  1. Users must have access to the hstlc database
  2. Users must also have a config.yaml file located in the lightcurve_pipeline/utils/ directory with the following keys:
    • db_connection_string - The hstlc database connection string
    • plot_dir - The path to where hstlc output plots are stored
    • composite_dir - The path to where hstlc composite output products are stored
    • log_dir - The path to where the log file will be stored
    • num_cores - The number of cores to use during multiprocessing
Other external library dependencies include:
  • astropy
  • bokeh
  • lightcurve_pipeline
  • matplotlib
  • numpy
  • pymysql
  • sqlalchemy
lightcurve_pipeline.scripts.make_hstlc_plots.bar_opt_elem()

Create a bar chart showing the number of composite lightcurves for each COS & STIS optical element

lightcurve_pipeline.scripts.make_hstlc_plots.configuration_piechart()

Create a piechart showing distribution of configurations for each imstrument/detector combination

lightcurve_pipeline.scripts.make_hstlc_plots.dataset_dashboard(filename, plot_file='')

Creates interactive bokeh ‘dashboard’ plot for the given filename

Parameters:

filename : str

The path to the lightcurve

plot_file : str

The path to the PNG plot. The user can supply this argument if they wish to update the plot or save to a specific location.

lightcurve_pipeline.scripts.make_hstlc_plots.exploratory_tables()

Create html tables containing data from the stats table as well as plots, broken down into interesting, boring, and null results

lightcurve_pipeline.scripts.make_hstlc_plots.histogram_exptime()

Create a histogram showing the distribution of exposure times for the composite lightcurves

lightcurve_pipeline.scripts.make_hstlc_plots.main()

The main function of the make_hstlc_plots script

lightcurve_pipeline.scripts.make_hstlc_plots.make_exploratory_table(dataset_list, table_name)

Create html tables containing data from the stats table as well as plots

Parameters:

dataset_list : list

A list of the paths to the datasets to process

table_name : str

The path to the output file

lightcurve_pipeline.scripts.make_hstlc_plots.periodogram(dataset)

Create a Lomb-Scargle periodgram for the given dataset

Parameters:

dataset : string

The path to the dataset

lightcurve_pipeline.scripts.make_hstlc_plots.plot_dataset(filename, plot_file='')

Create an interactive bokeh lightcurve plot for the given filename

Parameters:

filename : str

The path to the lightcurve

plot_file : str

The path to the PNG plot. The user can supply this argument if they wish to update the plot or save to a specific location

lightcurve_pipeline.scripts.make_hstlc_plots.plot_dataset_static(filename, plot_file='')

Creates static PNG lightcurve plot for the given filename

Parameters:

filename : str

The path to the lightcurve

plot_file : str

The path to the PNG plot. The user can supply this argument if they wish to update the plot or save to a specific location

reset_hstlc_filesystem script

Reset the hstlc filesystem by moving files back into the ingestion directory. Files are moved from the filesystem_dir directory to the ingest_dir directory, as determined by the config file (see below). Additionally, output products located in the outputs_dir directory, as determined by the config file (see below) are removed.

Authors:

Matthew Bourque

Use:

This script is intended to be executed via the command line as such:

>>> reset_hstlc_filesystem

Dependencies:

Users must have a config.yaml file located in the lightcurve_pipeline/utils/ directory with the following keys:

  • ingest_dir - The path to where files to be ingested are stored
  • filesystem_dir - The path to the hstlc filesystem
  • outputs_dir - The path to where hstlc output products are stored
Other external library dependencies include:
  • lightcurve_pipeline
lightcurve_pipeline.scripts.reset_hstlc_filesystem.main()

The main function of the reset_hstlc_filesystem script

lightcurve_pipeline.scripts.reset_hstlc_filesystem.move_files_to_ingest()

Move files from filesystem back to the ingestion directory. If the file already exists in the ingest directory, the file is removed rather than moved.

lightcurve_pipeline.scripts.reset_hstlc_filesystem.remove_filesystem_directories()

Remove parent directories from the filesystem if they are empty

lightcurve_pipeline.scripts.reset_hstlc_filesystem.remove_output_directories()

Remove all output products and output directories

reset_hstlc_database script

Reset all or specific tables in the hstlc database

Authors:

Matthew Bourque

Use:

This script is intended to be executed via the command line as such:

>>> reset_hstlc_database [table]

table (optional) - Reset the specific table given. Can be any valid table that exists in the hstlc database, all in which all tables will be reset, or production in which only the metadata, outputs, and stats tables will be reset. If an argument is not provided, the default value of production is used.

Dependencies:

  1. Users must have access to the hstlc database
  2. Users must also have a config.yaml file located in the lightcurve_pipeline/utils/ directory with the following keys:
    • db_connection_string - The hstlc database connection string
    • home_dir - The home hstlc directory, where the bad_data table will be stored in a text file
Other external library dependencies include:
  • lightcurve_pipeline
  • pymysql
  • sqlalchemy
lightcurve_pipeline.scripts.reset_hstlc_database.get_valid_tables()

Return a list of table names in the hstlc database

Returns:

tables : list

A list of hstlc table names

lightcurve_pipeline.scripts.reset_hstlc_database.main()

The main function of the reset_hstlc_database script

lightcurve_pipeline.scripts.reset_hstlc_database.parse_args()

Parse command line arguments

Returns:

args : argparse object

An argparse object containing the arguments

lightcurve_pipeline.scripts.reset_hstlc_database.rebuild_production_tables()

Rebuild the prodction tables of the hstlc database, which consist of the metadata, outputs, and stats tables. The bad_data table is treated separately; Since the bad_data table cannot easily be reconstructed (since bad data is not necessarily re-ingested), the data within the table is written out to a text file and re-ingested after the database is reset. This essentially results in a reset database for the production tables, but the bad data table effectively remains untouched.

download_hstlc script

This script retreives COS & STIS TIMETAG data from the MAST archive by submitting XML requests. The datasets to download is determined by comparing the contents of the hstlc database to the contents of the MAST database; any COS/STIS TIMETAG data that exists in MAST but does not exist in the hstlc database is retreived. Data is downloaded to the ingest_dir directory determine by the config file (see below).

Authors:

Matthew Bourque

Use:

This script is intended to be executed via the command line as such:

>>> download_hstlc

Outputs:

The following filetypes are retreived (if available) and placed in the ingest_dir directory:

  • *_x1d.fits - 1 dimensional extracted spectra
  • *_tag.fits - STIS TIMETAG data
  • *_corrtag.fits - COS NUV TIMETAG data
  • *_corrtag_<a or b>.fits - COS FUV TIMETAG data

Submission results are also saved to an XML file and stored in the download_dir directory determined by the config file (see below). The submission results indicate if the XML request was sucessful or if there were errors.

Executing this script creates a log file in the log_dir directory as determined by the config file (see below)

Dependencies:

  1. As of early 2016, submission of XML requests to the MAST archive requires a special Python 2.6 environemnt with specific XML libraries installed. More information can be found here:

    https://confluence.stsci.edu/display/STScISSOPublic/ArchiveXMLsubmitPKImaterial

    Additionally, tsql must be installed and the tsql executable must be placed in the directory ~/freetds/bin/tsql. tsql can be downloaded using freetds (http://www.freetds.org/).

  2. Users must have access to the hstlc database

  3. Users must also have a config.yaml file located in the lightcurve_pipeline/utils/ directory with the following keys:

    • db_connection_string - The hstlc database connection string
    • ingest_dir - The path to where the files will be stored after retreival
    • log_dir - The path to where the log file will be stored
    • download_dir - the path to where XML submission results will be stored
    • mast_server - The MAST server hostname
    • mast_database - The name of the MAST database
    • mast_account - The MAST account username
    • mast_password - The MAST account password
    • archive_user - The requester username
    • email - the requester email address
    • host - The hostname of the machine used for ftp
    • ftp_user - The username of the account of the machine used for ftp
    • dads_host - The hostname of the machine on which the MAST database resides
    • archive - The HTTPs connection hostname
Other external library dependencies include:
  • lightcurve_pipeline
lightcurve_pipeline.scripts.download_hstlc.build_xml_request(datasets)

Build the XML request for the given datasets

Parameters:

datasets : list

A list of rootnames to download from MAST.

Returns:

xml_request : string

The XML request string.

lightcurve_pipeline.scripts.download_hstlc.everything_retrieved(tracking_id)

Check every 15 minutes to see if all submitted datasets have been retrieved. Based on code from J. Ely. Parameters:

tracking_id
: string
A submission ID string..
Returns:
done
: bool
Boolean specifying is data is retrieved or not.
killed
: bool
Boolean specifying is request was killed.
lightcurve_pipeline.scripts.download_hstlc.get_filesystem_rootnames()

Return a list of the rootnames in the hstlc database.

This list is compared to the MAST database to determine which datasets to download.

Returns:

filesystem_rootnames : list

A list of rootnames that are in the hstlc filesystem.

lightcurve_pipeline.scripts.download_hstlc.get_mast_rootnames()

Return a list of rootnames of all COS & STIS TIMETAGE data in MAST.

The following target names are ignored:
DARK BIAS DEUTERIUM WAVE ANY NONE
Returns:

mast_rootnames : list

A list of rootnames of COS/STIS TIMETAG data in the MAST archive.

lightcurve_pipeline.scripts.download_hstlc.main()

The main function of the download_hstlc script

lightcurve_pipeline.scripts.download_hstlc.save_submission_results(submission_results)

Save the submission results to an XML file.

Submission results are saved in a separate XML file and stored in the ‘download_dir’ directory as determine by the config file.

Parameters:

submission_results : httplib object

The submission results returned by MAST after the XML request is submitted.

lightcurve_pipeline.scripts.download_hstlc.submit_xml_request(xml_request)

Submit the XML request to the MAST archive.

Parameters:

xml_request : string

The request XML string.

Returns:

submission_results : httplib object

The XML request submission results.