Overview¶
The Cosmic Origins Spectrograph (COS) and the Space Telescope Imaging Spectrograph (STIS) on board the Hubble Space Telescope (HST) continue to capture spectroscopic observations and deliver them to a steadily-growing archive. Observation products are primarily in the form of time-average spectra, however there exist many COS and STIS observations taken in the TIME-TAG observing mode wherein the position and time of each incoming photon is recorded. This results in an observation product in the form of a list of detected events, which can in turn be transformed into a lightcurve that can be used to discover and characterize unique phenomena in scientific observations.
The hstlc project aims to gather TIME-TAG observations and transform them into High Level Science Products (HLSPs) in the form of lightcurves in an automated way for all publicly available COS and STIS TIME-TAG observations. The project software is written in Python, and uses many supporting materials, including a pipeline, database, filesystem, downloading platform, and a lightcurve code library.
This project is supported by the Hubble Archival Research program 13902. (P.I. Justin Ely)
Filetypes¶
TIME-TAG observations are stored in FITS binary tables. Depending on the instrument and detector of the observation, the FITS files can have several different naming conventions, as describe below:
<rootname>_tag.fits
- STIS TIME-TAG observation<rootname>_corrtag.fits
- COS TIME-TAG observation taken with the NUV detector<rootname>_corrtag_a.fits
- COS TIME-TAG observation taken with the FUV-A detector<rootname>_corrtag_b.fits
- COS TIME-TAG observation taken with the FUV-B detector
Additionally, <rootname>_x1d.fits
files, which store the 1-dimensional extracted spectra, are used by various routines in the lightcurve
library to perform extraction. Thus, these filetypes are also downloaded and ingested by the hstlc pipeline.
Further details about these filetypes and TIME-TAG observations in general can be found in the COS Data Handbook and the STIS Data Handbook.
Pipeline¶
- The hstlc pipeline is a series of scripts, executed sequentially, that ingests raw TIME-TAG observations and produces lightcurves as well as various plots that analyze them. The pipeline consists of three scripts:
The pipeline is further described below:
ingest_hstlc
The conversion of raw TIME-TAG observations to high-level science products (lightcurves) is performed by the ingest_hstlc script. The following algorithm is employed by ingest_hstlc
:
- Gather
x1d
,tag
, andcorrtag
files from aningest
directory to build a list of files to ingest.- For each
tag
orcorrtag
file:
- Open the file and retrieve the header and data.
- Perform various quality checks on the data. If the data is deemed to be bad, then update the
bad_data
table in the hstlc database and remove the file from theingest
directory.- Gather metadata and output location information from the file.
- If the file is a
STIS
dataset, then convert thetag
file into acorrtag
file by calling thestis_corrtag
function of the lightcurve.stis module.- Update the
metadata
table in the hstlc Database with the metadata of the file.- Create the lightcurve using the lightcurve library code and place the output product in the appropriate
outputs
directory based on the file’sTARGNAME
.- Set the correct permissions of the output directory and/or files.
- Update the
outputs
table of the hstlc Database with output location information.- Create a quicklook image for the observation and save it in the appropriate
outputs
directory.- Move the file (and its accompanying
x1d
file) from theingest
directory to the appropriate directory in the filesystem.
- Use the
metadata
table to query for datasets that require (re)processing of composite lightcurves based on if new files have been ingested.- (re)Create a composite lightcurve for each dataset that requires (re)processing and save the composite lightcurve in the appropriate
outputs
directory.- Update the
outputs
table of thehstlc Database
with composite lightcurve output location information.
build_stats_table
After the TIME-TAG observations are ingested and output lightcurves are produced, the build_stats_table
script calculates various statistics for each individual and composite lightcurve and stores the statistics in the stats
table in the hstlc database. The following statistics are calculated:
total
- The total number of counts in the lightcurvemean
The mean number of counts in the lightcurvemu
- The square root of the mean number of countsstdev
- The standard deviation of the counts in the lightcurvepoisson_factor
- Thestdev
/mu
of the lightcurve. The greater thepoisson_factor
, the less likely that noise in the lightcurve is due to Poisson noise.pearson_r
- The Pearson R value for the correlation between time and counts. A positive value (close to 1.0) indicates a positive correlation, a negative value (close to -1.0) indicates a negative correlation, and a near-zero value indicates no correlation.pearson_p
- The Pearson P value for the correlations between time and counts. A low value (close to 0.0) indicates that the null-hypothesis that “counts and time are not correlated” can be rejected (i.e. the idea that the correlation is due to random sampling can be rejected – there is reason to beleive that the correlation is real). A high value (close to 1.0) indicates the opposite – that the data do not give reason to believe that the correlation is real.periodogram
- A true/false value indicating if the lightcurve has an ‘interesting’ Lomb-Scargle periodogram. A lightcurve is deemed to have an ‘interesting’ periodogram if there exists a period in which the Lomb-Scargle power exceeds 0.30 and the peak power exceeds three sigma about the mean.
make_hstlc_plots
Lastly, various plots that analyze and describe the individual and composite lightcurves are created in the make_hstlc_plots
script. The following plots are created:
- Static lightcurve plots for each individual and composite lightcurve in the form of a PNG.
- Interactive lightcurve plots for each individual and composite lightcurve in the form of a Bokeh/HTML plot.
- Interactive, sortable ‘exploratory’ tables that display the various statistics and plots for each individual and composite lightcurve.
- A histogram showing the cumulative exposure time for each target.
- ‘Configuration’ pie charts showing the breakdown of lightcurves by grating/cenwave for each instrument/detector combination.
- A histrogram showing the number of lightcurves for each filter.
- Lomb-Scargle periodograms for each lightcurve.
Database¶
The hstlc project uses a MySQL database to store useful data. The database schema is defined by the Object-Relational Mappings (ORMs) contained in the database_interface module. The database is populated by the ingest_hstlc
and build_stats_table
scripts. The database can also easily be reset by the reset_hstlc_database script. Below is a description of each table.
Metadata Table
The metadata
table stores information about each observations location in the hstlc filesystem as well as useful header keyword values. The table contains the following columns:
Field Type Null Key Default Extra id int(11) NO PRI NULL auto_increment filename varchar(30) NO UNI NULL path varchar(100) NO NULL ingest_date date NO NULL telescop varchar(10) NO NULL instrume varchar(10) NO NULL targname varchar(30) NO NULL cal_ver varchar(30) NO NULL obstype varchar(30) NO NULL cenwave int(11) NO NULL aperture varchar(30) NO NULL detector varchar(30) NO NULL opt_elem varchar(30) NO NULL fppos int(11) NO NULL
id
- A unique integer ID number that serves as primary key.filename
- The filename of the observation.path
- The location of the file in the HSTLC filesystem.ingest_date
- The date of which the file was last ingested.telescop
- The value of the observation’sTELESCOP
header keyword. Currently, this is alwaysHST
.instrume
- The value of the observation’sINSTRUME
header keyword. This is eitherCOS
orSTIS
.targname
- The value of the observation’sTARGNAME
header keyword (i.e. the target name of the observation).cal_ver
- The value of the observation’sCAL_VER
header keyword (i.e. the version of the calibration pipeline that was used to calibrate the observation).obstype
- The value of the observation’sOBSTYPE
header keyword. This is eitherSPECTROSCOPIC
orIMAGING
.cenwave
- The value of the observation’sCENWAVE
header keyword (i.e. the central wavelength of the observation).aperture
- The value of the observation’sAPERTURE
header keyword (i.e. the aperture name).detector
- The value of the observation’sDETECTOR
header keyword. This is eitherFUV-MAMA
orNUV-MAMA
for STIS, orFUV
orNUV
for COS.opt_elem
- The value of the observation’sOPT_ELEM
header keyword (i.e. the optical element used).fppos
- The value of the observation’sFPPOS
header keyword (i.e. the grating offset index).
Outputs Table
The outputs
table stores information about the output products associated with each filename from the metadata
table. The table contains the following columns:
Field Type Null Key Default Extra id int(11) NO PRI NULL auto_increment metadata_id int(11) NO UNI NULL individual_path varchar(100) YES NULL individual_filename varchar(30) YES NULL composite_path varchar(100) YES NULL composite_filename varchar(30) YES NULL
id
- A unique integer ID number that serves as primary key.metadata_id
- A foreign key that points to the primary ID of themetadata
table. This will allow for theoutputs
table and themetadata
table to join.individual_path
- The path to the individual lightcurve output file.individual_filename
- The filename of the individual lightcurve output file.composite_path
- The path to the composite lightcurve output file.composite_filename
- The filename of the composite lightcurve output file.
Bad Data Table
The bad_data
table stores information about files that could not be ingested. The table contains the following columns:
Field Type Null Key Default Extra id int(11) NO PRI NULL auto_increment filename varchar(30) NO UNI NULL ingest_date date NO NULL reason enum(‘Bad EXPFLAG’,’Non-linear time’,’No events’,’Singular event’,’Bad Proposal’,’Short Exposure’) NO NULL
id
- A unique integer ID number that serves as the primary key.filename
- The filename of the observation that couldn’t be ingested.ingest_date
- The date in which the file was attempted to be ingested.reason
- The reason why the file was not ingested. Can either be:Bad EXPFLAG
, which corresponds to observations that have anEXPFLAG
header keyword that is notNORMAL
Non-linear time
, which indicates that time does not progress linearly through theTIME
column of the datasetNo events
, which corresponds to an observation with no observed signalSingular event
, which indicates that all events a dataset occur at a single timeBad Proposal
, which indicates that the dataset is part of a problematic proposalShort Exposure
, which indicates that the exposure time of the dataset is too short
Stats Table
The stats
table stores useful statistics for each individual and composite lightcurve. The table contains the following columns:
Field Type Null Key Default Extra id int(11) NO PRI NULL auto_increment lightcurve_path varchar(100) NO NULL lightcurve_filename varchar(100) NO NULL total int(11) NO NULL mean float YES NULL mu float YES NULL stdev float YES NULL poisson_factor float YES NULL pearson_r float YES NULL pearson_p float YES NULL periodogram tinyint(1) NO NULL deliver tinyint(1) NO NULL
Filesystem¶
The hstlc filesystem has several top-level directories:
ingest/
- Stores files that are to be ingestedbad_data/
- Stores the files that do not pass the quality checks during ingestionfilesystem/
- Stores the ingested data based onTARGNAME
(see notes below)outputs/
- Stores the individual and composite lightcurves, as well as the quicklook PNG plotsplots/
- Stores the various plots created from themake_hstlc_plots
scriptdownload/
- Stores the returned XML request files from MAST indicating success or failurelogs/
- Stores the log files that log the execution of the hstlc scripts
The various TIME-TAG files are stored in a directory structure located in the filesystem/
directory. The files are stored in a subdirectory associated with their TARGNAME
header keyword. For example:
filesystem/
TARGNAME1/
file1_corrtag.fits
file1_x1d.fits
file2_corrtag.fits
file2_x1d.fits
TARGNAME2/
...
TARGNAME3/
...
...
Files are moved from the ingest
directory to their appropriate subdirectory in filesystem
as determined by the logic in the ingest_hstlc
script. A similar structure is used for the outputs
directory, with the exception of the composite
subdirectory, which stores composite lightcurves:
outputs/
TARGNAME1/
file1_curve.fits
file2_curve.fits
file3_curve.fits
TARGNAME2/
...
TARGNAME3/
...
composite/
composite_file1_curve.fits
composite_file2_curve.fits
composite_file3_curve.fits
...
The filesystem
and outputs
directories can be ‘reset’ by the reset_hstlc_filesystem script. This will move files from the filesystem
directory back to the ingest
directory and remove the subdirectories under filesystem
, as well as remove all of the files are subdirectories from the outputs
directory.
Permissions¶
The permissions of hstlc data files, directories, subdirectories, logs, and output products are all uniformly set. The permissions are governed by the set_permissions
function of the utils module.
All permissions are set to rwxrwx---
with STSCI/hstlc
group permissions.
Downloads¶
New COS and STIS TIME-TAG observations are retrieved from the MAST archive on a periodic basis. This is done by the download_hstlc script. The script queries the dadsops_rep
table in the MAST archive for new datasets based on the following query:
SELECT asm_member_name
FROM assoc_member
WHERE asm_member_type IN ('EXP-FP', 'SCIENCE')
AND asm_asn_id IN (SELECT sci_data_set_name
FROM science
WHERE sci_instrume IN ('COS', 'STIS')
AND sci_operating_mode = 'TIME-TAG'
AND sci_targname NOT IN ('DARK', 'BIAS', 'DEUTERIUM', 'WAVE', 'ANY', 'NONE')
AND sci_release_date < <today>)
As you can see, the query avoids certain targets that do not contain any useful data (e.g. DARK
, BIAS
, etc.). The query also uses the assoc_member
table to determine individual association members.
High Level Science Products¶
The composite lightcurves that are created by the hstlc pipeline are delivered to MAST as High Level Science Products (HLSPs). A composite lightcurve is comprised of one or more individual lightcurves, all having the same configuration of TARGNAME
, DETECTOR
, OPT_ELEM
, CENWAVE
, and APERTURE
. In other words, all datasets taken under the same observing conditions are aggregated together to form a composite lightcurve.
The composite lightcurves are FITS binary tables consisting of the following columns:
bins
- the stepsize in which events are binned, in seconds (i.e. a bin of 1 means that all events are binned into 1-second intervals)times
- the times of each event in the dataset, relative the the start of the observationmjd
- the Modified Julian Date of each event in the datasetgross
- the total number of counts in the datasetcounts
- calculated as thegross - background
net
- calculated ascounts / time
flux
- the flux of each event in ergs/sflux_error
- the error of each flux measurement, in ergs/sbackground
- the background measurement for each event, in countserror
- calculated as thesqrt(gross + background)
dataset
- the dataset that the event corresponds to (i.e. dataset=2 corresponds to the second individual lightcurve that comprises the composite lightcurve)
As to adhere to the MAST HSLP contribution guidelines for times-series/lightcurves, the following naming convention is used for the composite lightcurves:
hlsp_hstlc_hst_<instrument>-<detector>_<targname>_<opt_elem>_<cenwave>_<aperture>_v1_sci.fits
- where:
instrument
is the instrument (cos
orstis
)detector
is the detector (nuv-mama
orfuv-mama
for STIS,nuv
orfuv
for COS)targname
is the target name (e.g.ngc6905
)opt_elem
is the filter (e.g.e230m
)cenwave
is the central wavelength (e.g.2561
)aperture
is the aperture (e.g.PSA
)
Installation¶
Users must first install the lightcurve package. Users can obtain the latest release using pip:
>>> pip install lightcurve
or by downloading/cloning the code from GitHub and running setup.py
:
>>> git clone https://github.com/justincely/lightcurve.git
>>> python setup.py install
Similarly, users can install the lightcurve_pipeline
package via pip
:
>>> pip install lightcurve_pipeline
or by downloading/cloning from GitHub and running setup.py
:
>>> git clone https://github.com/justincely/lightcurve_pipeline
>>> python setup.py install
Package Structure¶
The lightcurve_pipeline
package has the following structure:
lightcurve_pipeline/
database/
database_interface.py
download/
SignStsciRequest.py
ingest/
make_lightcurve.py
resolve_target.py
quality/
data_checks.py
scripts/
build_stats_table.py
download_hstlc.py
ingest_hstlc.py
make_hstlc_plots.py
reset_hstlc_database.py
reset_hstlc_filesystem.py
utils/
config.yaml
periodogram_stats.py
targname_dict.py
utils.py
scripts/
hsltc_pipeline
setup.py
Note that the hstlc_pipeline
exists outside of the package itself. Additionally, the setup.py
module defines the scripts under the lightcurve_pipeline.scripts
directory as entry_points
, so that these scripts can be executed from the command line.
System Requirements¶
The hstlc software requires Python 2.7 and the following external libraries:
astropy
bokeh
lightcurve
matplotlib
numpy
pyyaml
scipy
sqlalchmy
Also required is a configuration files named config.yaml
placed in the lightcurve_pipeline.utils
directory. This config file holds the hard-coded paths that determine the various hstlc directories (e.g. filesystem/
, outputs/
, etc.) as well as the connection credentials to the hstlc database. Thus, a config.yaml
file presumably looks like:
'db_connection_string' : 'mysql+pymysql://username:password@hostname:port/hstlc'
'home_dir' : '/mydir/'
'ingest_dir' : '/mydir/ingest/'
'filesystem_dir' : '/mydir/filesystem/'
'outputs_dir' : '/mydir/outputs/'
'composite_dir' : '/mydir/outputs/composite/'
'log_dir' : '/mydir/logs/'
'download_dir' : '/mydir/download/'
'plot_dir' : '/mydir/plots/'
'bad_data_dir' : '/mydir/bad_data/'
Users wishing to run the pipeline must ensure that these directories exist, and have proper hstlc permissions.
Useage¶
Users can run the pipeline by simply executing the hstlc_pipeline
script:
>>> hstlc_pipeline [-corrtag_extract]
Supplying the -corrtag_extract
parameter is optional, and will cause the extraction of corrtag data to be performed.
Users can also execute individual parts of the pipeline, as such:
>>> ingest_hstlc [-corrtag_extract]
>>> build_stats_table
>>> make_hstlc_plots
Users wishing to download new TIME-TAG data can execute the download_hstlc
script:
>>> download_hstlc
Users wishing to reset the hstlc filesystem or database can execute the reset_hstlc_filesytem
and reset_hstlc_database
scripts, respectively:
>>> reset_hstlc_filesystem
>>> reset_hstlc_database [table]
See the rest_hstlc_database documentation for further details on the use of the table
parameter.