hstlc Scripts¶
ingest_hstlc script¶
build_stats_table script¶
make_hstlc_plots script¶
Create various plots that deal with the hstlc filesystem, database,
and output products. This script uses multiprocessing. Users can set
the number of cores used via the num_cores
setting in the config
file (see below).
Authors:
Justin Ely, Matthew Bourque
Use:
This script is intended to be executed as part of the
hstlc_pipeline
shell script. However, users can also execute this script via the command line as such:>>> make_hstlc_plots
Outputs:
hlsp_hstlc_*.png
static lightcurve plots for each composite lightcurve, placed in thecomposite_dir
directory, as determined by the config file (see below)hlsp_hstlc_*.html
bokeh plots showing a ‘dashboard’ of various plots for each composite lightcurve, placed in thecomposite_dir
directory, as determined by the config file (see below)interesting_hstlc.html
,boring_hstlc.html
, andnull_hstlc.html
‘exploratory’ tables, which are sortable tables that display statistics and plots for each dataset, placed in theplot_dir
directory, as determined by the config file (see below)exptime_histogram.html
- A histrogram showing the cumulative exposure time by target in the form of a bokeh plot, placed in theplot_dir
directory, as determined by the config file (see below)pie_config_cos_fuv.html
,pie_config_cos_nuv.html
,pie_config_stis_fuv.html
, andpie_config_stis_nuv.html
‘configuration’ pie charts that show the breakdown of datasets by grating/cenwave for each instrument/detector combination, placed in theplot_dir
directory, as determined by the config file (see below)opt_elem.html
- a historgram showing the number of datasets for each filter, placed in theplot_dir
directory, as determined by the config file (see below)<dataset name>_periodgram.png
- Lomb-Scargle periodograms for each dataset (both individual and composite), placed in theplot_dir
directory, as determined by the config file (see below). Additionally, periodograms that are deemed interesting are saved in a separateperiodogram_subset
directory under theplot_dir
directory.- a log file in the
log_dir
directory as determined by the config file (see below)
Dependencies:
- Users must have access to the hstlc database
- Users must also have a
config.yaml
file located in thelightcurve_pipeline/utils/
directory with the following keys:
db_connection_string
- The hstlc database connection stringplot_dir
- The path to where hstlc output plots are storedcomposite_dir
- The path to where hstlc composite output products are storedlog_dir
- The path to where the log file will be storednum_cores
- The number of cores to use during multiprocessing
- Other external library dependencies include:
astropy
bokeh
lightcurve_pipeline
matplotlib
numpy
pymysql
sqlalchemy
-
lightcurve_pipeline.scripts.make_hstlc_plots.
bar_opt_elem
()¶ Create a bar chart showing the number of composite lightcurves for each COS & STIS optical element
-
lightcurve_pipeline.scripts.make_hstlc_plots.
configuration_piechart
()¶ Create a piechart showing distribution of configurations for each imstrument/detector combination
-
lightcurve_pipeline.scripts.make_hstlc_plots.
dataset_dashboard
(filename, plot_file='')¶ Creates interactive bokeh ‘dashboard’ plot for the given filename
Parameters: filename : str
The path to the lightcurve
plot_file : str
The path to the PNG plot. The user can supply this argument if they wish to update the plot or save to a specific location.
-
lightcurve_pipeline.scripts.make_hstlc_plots.
exploratory_tables
()¶ Create html tables containing data from the stats table as well as plots, broken down into interesting, boring, and null results
-
lightcurve_pipeline.scripts.make_hstlc_plots.
histogram_exptime
()¶ Create a histogram showing the distribution of exposure times for the composite lightcurves
-
lightcurve_pipeline.scripts.make_hstlc_plots.
main
()¶ The main function of the
make_hstlc_plots
script
-
lightcurve_pipeline.scripts.make_hstlc_plots.
make_exploratory_table
(dataset_list, table_name)¶ Create html tables containing data from the stats table as well as plots
Parameters: dataset_list : list
A list of the paths to the datasets to process
table_name : str
The path to the output file
-
lightcurve_pipeline.scripts.make_hstlc_plots.
periodogram
(dataset)¶ Create a Lomb-Scargle periodgram for the given dataset
Parameters: dataset : string
The path to the dataset
-
lightcurve_pipeline.scripts.make_hstlc_plots.
plot_dataset
(filename, plot_file='')¶ Create an interactive bokeh lightcurve plot for the given filename
Parameters: filename : str
The path to the lightcurve
plot_file : str
The path to the PNG plot. The user can supply this argument if they wish to update the plot or save to a specific location
-
lightcurve_pipeline.scripts.make_hstlc_plots.
plot_dataset_static
(filename, plot_file='')¶ Creates static PNG lightcurve plot for the given filename
Parameters: filename : str
The path to the lightcurve
plot_file : str
The path to the PNG plot. The user can supply this argument if they wish to update the plot or save to a specific location
reset_hstlc_filesystem script¶
Reset the hstlc filesystem by moving files back into the ingestion
directory. Files are moved from the filesystem_dir
directory to
the ingest_dir
directory, as determined by the config file (see
below). Additionally, output products located in the outputs_dir
directory, as determined by the config file (see below) are removed.
Authors:
Matthew Bourque
Use:
This script is intended to be executed via the command line as such:
>>> reset_hstlc_filesystem
Dependencies:
Users must have a
config.yaml
file located in thelightcurve_pipeline/utils/
directory with the following keys:
ingest_dir
- The path to where files to be ingested are storedfilesystem_dir
- The path to the hstlc filesystemoutputs_dir
- The path to where hstlc output products are stored
- Other external library dependencies include:
lightcurve_pipeline
-
lightcurve_pipeline.scripts.reset_hstlc_filesystem.
main
()¶ The main function of the
reset_hstlc_filesystem
script
-
lightcurve_pipeline.scripts.reset_hstlc_filesystem.
move_files_to_ingest
()¶ Move files from filesystem back to the ingestion directory. If the file already exists in the ingest directory, the file is removed rather than moved.
-
lightcurve_pipeline.scripts.reset_hstlc_filesystem.
remove_filesystem_directories
()¶ Remove parent directories from the filesystem if they are empty
-
lightcurve_pipeline.scripts.reset_hstlc_filesystem.
remove_output_directories
()¶ Remove all output products and output directories
reset_hstlc_database script¶
Reset all or specific tables in the hstlc database
Authors:
Matthew Bourque
Use:
This script is intended to be executed via the command line as such:
>>> reset_hstlc_database [table]
table
(optional) - Reset the specific table given. Can be any valid table that exists in the hstlc database,all
in which all tables will be reset, orproduction
in which only themetadata
,outputs
, andstats
tables will be reset. If an argument is not provided, the default value ofproduction
is used.
Dependencies:
- Users must have access to the hstlc database
- Users must also have a
config.yaml
file located in thelightcurve_pipeline/utils/
directory with the following keys:
db_connection_string
- The hstlc database connection stringhome_dir
- The home hstlc directory, where thebad_data
table will be stored in a text file
- Other external library dependencies include:
lightcurve_pipeline
pymysql
sqlalchemy
-
lightcurve_pipeline.scripts.reset_hstlc_database.
get_valid_tables
()¶ Return a list of table names in the hstlc database
Returns: tables : list
A list of hstlc table names
-
lightcurve_pipeline.scripts.reset_hstlc_database.
main
()¶ The main function of the
reset_hstlc_database
script
-
lightcurve_pipeline.scripts.reset_hstlc_database.
parse_args
()¶ Parse command line arguments
Returns: args : argparse object
An argparse object containing the arguments
-
lightcurve_pipeline.scripts.reset_hstlc_database.
rebuild_production_tables
()¶ Rebuild the
prodction
tables of the hstlc database, which consist of themetadata
,outputs
, andstats
tables. Thebad_data
table is treated separately; Since thebad_data
table cannot easily be reconstructed (since bad data is not necessarily re-ingested), the data within the table is written out to a text file and re-ingested after the database is reset. This essentially results in a reset database for the production tables, but the bad data table effectively remains untouched.
download_hstlc script¶
This script retreives COS & STIS TIMETAG data from the MAST archive by
submitting XML requests. The datasets to download is determined by
comparing the contents of the hstlc database to the contents of the
MAST database; any COS/STIS TIMETAG data that exists in MAST but does
not exist in the hstlc database is retreived. Data is downloaded to
the ingest_dir
directory determine by the config file (see below).
Authors:
Matthew Bourque
Use:
This script is intended to be executed via the command line as such:
>>> download_hstlc
Outputs:
The following filetypes are retreived (if available) and placed in the
ingest_dir
directory:
*_x1d.fits
- 1 dimensional extracted spectra*_tag.fits
- STIS TIMETAG data*_corrtag.fits
- COS NUV TIMETAG data*_corrtag_<a or b>.fits
- COS FUV TIMETAG dataSubmission results are also saved to an XML file and stored in the
download_dir
directory determined by the config file (see below). The submission results indicate if the XML request was sucessful or if there were errors.Executing this script creates a log file in the
log_dir
directory as determined by the config file (see below)
Dependencies:
As of early 2016, submission of XML requests to the MAST archive requires a special Python 2.6 environemnt with specific XML libraries installed. More information can be found here:
https://confluence.stsci.edu/display/STScISSOPublic/ArchiveXMLsubmitPKImaterial
Additionally,
tsql
must be installed and the tsql executable must be placed in the directory~/freetds/bin/tsql
.tsql
can be downloaded using freetds (http://www.freetds.org/).Users must have access to the hstlc database
Users must also have a
config.yaml
file located in thelightcurve_pipeline/utils/
directory with the following keys:
db_connection_string
- The hstlc database connection stringingest_dir
- The path to where the files will be stored after retreivallog_dir
- The path to where the log file will be storeddownload_dir
- the path to where XML submission results will be storedmast_server
- The MAST server hostnamemast_database
- The name of the MAST databasemast_account
- The MAST account usernamemast_password
- The MAST account passwordarchive_user
- The requester usernamehost
- The hostname of the machine used for ftpftp_user
- The username of the account of the machine used for ftpdads_host
- The hostname of the machine on which the MAST database residesarchive
- The HTTPs connection hostname
- Other external library dependencies include:
lightcurve_pipeline
-
lightcurve_pipeline.scripts.download_hstlc.
build_xml_request
(datasets)¶ Build the XML request for the given datasets
Parameters: datasets : list
A list of rootnames to download from MAST.
Returns: xml_request : string
The XML request string.
-
lightcurve_pipeline.scripts.download_hstlc.
everything_retrieved
(tracking_id)¶ Check every 15 minutes to see if all submitted datasets have been retrieved. Based on code from J. Ely. Parameters:
- tracking_id : string
- A submission ID string..
- Returns:
- done : bool
- Boolean specifying is data is retrieved or not.
- killed : bool
- Boolean specifying is request was killed.
-
lightcurve_pipeline.scripts.download_hstlc.
get_filesystem_rootnames
()¶ Return a list of the rootnames in the hstlc database.
This list is compared to the MAST database to determine which datasets to download.
Returns: filesystem_rootnames : list
A list of rootnames that are in the hstlc filesystem.
-
lightcurve_pipeline.scripts.download_hstlc.
get_mast_rootnames
()¶ Return a list of rootnames of all COS & STIS TIMETAGE data in MAST.
- The following target names are ignored:
- DARK BIAS DEUTERIUM WAVE ANY NONE
Returns: mast_rootnames : list
A list of rootnames of COS/STIS TIMETAG data in the MAST archive.
-
lightcurve_pipeline.scripts.download_hstlc.
main
()¶ The main function of the
download_hstlc
script
-
lightcurve_pipeline.scripts.download_hstlc.
save_submission_results
(submission_results)¶ Save the submission results to an XML file.
Submission results are saved in a separate XML file and stored in the ‘download_dir’ directory as determine by the config file.
Parameters: submission_results : httplib object
The submission results returned by MAST after the XML request is submitted.
-
lightcurve_pipeline.scripts.download_hstlc.
submit_xml_request
(xml_request)¶ Submit the XML request to the MAST archive.
Parameters: xml_request : string
The request XML string.
Returns: submission_results : httplib object
The XML request submission results.