PyStretch Basic Functionality

Input & Output

PyStretch leverages GDAL to handle all of the raster input and output using the SWIG binding to gdal.RasterIO. It is therefore possible to handle any of the GDAL supported formats.

Note

GDAL does not, by default, support JP2 robustly. If you wish to read and write JP2000 you should installed the freely available ECW or MrSID plugins for GDAL. Alternatively, if you have a Kakadu license working with your GDAL installation you can use that. Modification to the source code may be necessary to force the use of your preferred driver by unloading all other JP2000 drivers explicitly.

It is not necessary to prepend the input dataset with a flag, but it should be the final arg in your list of arguments. For example:

$ pystretcher.py -flag -flag -arg -flag -arg input_file.jp2

By default the output format is GeoTiff, the output datatype is identical to the input datatype, and the output name is ‘output.tif’. All of these parameters can be altered at the users discretion.

  • To alter the output format:

    $ pystretcher.py -f ANY_GDAL_SUPPORTED_FORMAT_BY_NAME

Note

If you do not know the gdal formats by name, either explore the formats link above, or call:

$ gdalinfo --formats
  • To alter the datatype:

    $ pystretcher.py -ot SOME_GDAL_OUTPUT_DTYPE

Note

Supported DTYPES include:

  • Byte
  • UInt16
  • Int16
  • UInt32
  • Int32
  • Float32
  • Float64

Warning

It is necessary to scale the data when down sampling from a dtype with more precision to a dtype with less precision.

  • To specify the output name:

    $ pystretcher.py -o MY_OUTPUT.FORMAT input_file.tif

or:

$ pystretcher.py --output MY_OUTPUT.FORMAT input_file.img

Scaling the Data

It is often advantageous to be able to process and scale 32 or 16bit data to smaller, more easily disseminated 8-bit format. PyStretch does not automatically scale your data and must be explicitly told to scale. By default, when the scale flag is found, data is scaled [1,255]. Alternatively, it is possible to specify the minimum and maximum values to be used for scaling.

To scale to the defaults:

$ pystretcher.py --scale <some_input>

To scale to a user defined range:

$ pystretcher.py -s integer integer <some_input>

Note

The pystretcher.py script is expecting to (2) integer arguments when the user is specifying the scaling factor. This precludes scaling from 8 bit to 32 bit and is intentional as the additional precision will not increase the actual precision of the data.

No Data Value (NDV)

PyStretch supports the propagation of a NDV from the input dataset to the output dataset. It is also possible for the user to specify a NDV for the output dataset. This is especially important when using the scale flag with a 32-bit dataset which has an NDV you wish to remap to some value. By default, if the NDV flag is found, the NDV is set to 0. For example,:

$ pystretcher.py --NDV <some_input>

or:

$ pystretcher.py --NDV FLOAT <some_input>

Image Segmentation

At its core, PyStretch is all about the ability to segment images. Raster data can be segmented horizontally, vertically, or into user defined boxes. Here are a few considerations to keep in mind when determining your segmentation scheme:

  • As of version 0.1.2 memmove is used to get the numpy array to a ctpyes array. Unfortunately, this requires that an in memory copy be made. This means that a 1GB segment will briefly require 2GB of space. In our experience POSIX based systems handle a spike that is larger than the total available by pushing into virtual memory. This is a slight speed high but does not cause a seg fault. Windows does not handle the spike as gracefully and crashes python. Keep this is mind on a windows machine.
  • Unless the number of pixels is evenly divisible by the number of segments you will always have n+1 segments to be processed. Later release may provide a flag by which the final segment can be bundled with the penultimate segment.
  • And finally another:

Warning

Reading against the intrinsic block size of the input format will cause thrashing and long IO times. This was a design tradeoff between writing our own format readers and using GDAL. Likely our own format readers would have suffered from an identical limitation. Feel free to read against the block size or a multiple of the block size, just do not expect great performance.

Visualization

PyStretch currently supports basic visualization of the output datasets histogram. Future releases should add support for input visualization and visualization of the cumulative distribution function (CDF). We understand that these visualization capabilities are important, as they help determine logical threshold values, etc... They are coming.

To visualize the output histogram:

$ pystretcher --visualize SOME_STRETCH <some_input>

or:

$ pystretcher.py -z SOME_STRETCH <some_input>

Note

Matplotlib must be installed to allow for visualization.

Table Of Contents

Previous topic

The PyStretch User Manual

Next topic

PyStretch Usage By Example

This Page