zip2wd_mp: Get Weather Data For a List of Zip Codes For a Range of Dates (Multi-processing version)¶
Given a zip code and a date or a range of dates, it gets weather data (you get to specify which data) from the closest weather station from which the data are available. If given a range of dates, it fetches all the specified columns for each of the days in the intervening p period.
How it does it:
This script is based of the script that calculates nearest weather station based on variety of metrics.
You can use a variety of options to choose the kinds of weather stations from which you want data. For instance, you can get data only from USAF stations.
The script features on demand data downloads. So it pings the local directory and sees if weather data for a particular day and time are present and if they are not, then it tries to download it from the NOAA website. On occassion the script may run into bandwidth bottlenecks and you may want to run the script again to download all the data that is needed.
Prerequisites:¶
zip2ws.sqlite is based off finding the nearest weather station project.
Input File Types:
- Basic: The input file format should be CSV and should contain 6
columns with following columns names: | uniqid, zip, year, month, day | See sample-input-basic.csv for sample input file.
- Extended: The input file format contain 9 columns with the
following columns names: | `uniqid, zip, from.year, from.month, from.day, to.year, to.month, to.day | See sample-input-extend.csv for sample input file.
- Column Name File: This file contain list of weather data columns
chosen for output file. | The column names begining with character ‘#’ will not be appear in the output file. | (see column-names.txt for sample file)
For what these column names stand for, see column-names-info.txt
GHCN Weather Data in SQLite3 database: These files create by a script import-db.sh for each year.
e.g. for year 2000
cd data ./import-db.sh 2000
The script will download daily weather data (GHCN-Daily) from NOAA server for year 2000 and import to SQLite3 database file (e.g. ghcn_2000.sqlite3)
Configuration file¶
There are script settings in the configuration. zip2wd.cfg
[manager]
ip = 127.0.0.1
port = 9999
authkey = 1234
batch_size = 10
[worker]
uses_sqlite = yes
processes = 4
nth = 0
distance = 30
[output]
columns = column-names.txt
[db]
zip2ws = zip2ws.sqlite
path = ./data/
- ip and port - IP address and port of manager process that the worker will be connect to.
- authkey - A shared password which is used to authenticate between manager and worker processes.
- batch_size - A number of zipcodes that manager process dispatch to worker process each time.
- uses_sqlite - Uses weather data from imported SQLite3 database if yes (recommend for speed) or download weather data for individual weather station on demand if no
- processes - A number of process will be forked on the worker process.
- nth - Search within n-th closest station [set to 0 for unlimited]
- distance - Search within distance (KM) [set to 0 for unlimited]
- column - A column file that contains list of weather data column to be output
- zip2ws - SQLite3 database of zip codes and weather stations
- path - Path relative to database files
Usage¶
Manager process¶
usage: manager.py [-h] [--config CONFIG] [-o OUTFILE] [-v] inputs [inputs ...]
Weather search by ZIP (Manager)
positional arguments:
inputs CSV input file(s) name
optional arguments:
-h, --help show this help message and exit
--config CONFIG Default configuration file (default: zip2wd.cfg)
-o OUTFILE, --out OUTFILE
Search results in CSV (default: output.csv)
-v, --verbose Verbose message
Worker process¶
usage: worker.py [-h] [--config CONFIG] [-v]
Weather search by ZIP (Worker)
optional arguments:
-h, --help show this help message and exit
--config CONFIG Default configuration file (default: zip2wd.cfg)
-v, --verbose Verbose message
Example:¶
Run manager process search weather data for the input file sample-input-extend.csv
python manager.py sample-input-extend.csv
The default output file is output.csv
Run worker process
python worker.py
The manager will dispatch job (list of zip codes and date range) to the connected workers. The worker process also forks a number of process (specify by processes in the configuration file) to search the weather data for each zip code and put back the results to the manager process.
We can have multiple workers run on same or difference machine.
Output¶
in meters