radialx Documentation

Introduction

RadialX is a python package for working with x-ray powder diffraction data and for simulating x-ray powder diffraction patterns from models.

The most complete documentation is available at http://pythonhosted.org/radialx/.

Fully Documented Functionalities

At present, the only 100% documented functionalities of RadialX are

  1. The simulation of powder diffraction patterns from PDB files using the utility called powderx.
  2. Displaying of diffraction image header information using the utility called headerx.

Detailed usage for powderx and headerx are below.

Less Documented Functionalities

Other less documented functionalities correspond to “modes” of the utility called profilex:

  • centering mode: Finding the centers of powder diffraction images in adxv binary format.
  • averaging mode: Radial integration of one or more experimental powder diffraction patterns in adxv binary format.
  • averaging mode: Scaling of several powder experimental or simulated diffraction patterns to a single experimental or simulated pattern.
  • difference mode: Calculating the difference of two scaled radially integrated experimental powder diffraction patterns.

Although profilex is fully functional and heavily tested, its user interface (i.e. config file format, etc.) is very likely to change. For example, the modes will probably be split into different different utilities and several config file parameters will probably be renamed. The profilex utility is fairly well documented by comments in the config files that populate the test/test-profilex directory of the source distribution, as explained below.

It is hoped that the entire RadialX package will be fully and heavily documented soon.

Installation

Dependencies

pip & setuptools

The installation of RadialX and many other python packages will be made easier by pip. So, before going any further towards installation, it is advisable to follow the pip installation instructions, including the installation of setuptools described therein. Setuptools is essential to pip.

CCTBX

At this point the CCTBX package is only needed to simulate powder diffraction patterns with the powderx utility.

For powderx, it will be necessary to have the full CCTBX package installed and the cctbx.python executable in your path. Downloads are available for numerous operating systems, including Mac OS X, Windows 7 & XP, and several flavors of Linux. Additionally, it is possible to build CCTBX from a source bundle or, for the more ambitious, the SVN repository at sourceforge.

Because of the unusual python interpreter behavior forced by the cctbx.python executable, it is necessary to have all dependencies (except CCTBX itself) installed both to the cctbx.python interpreter and to a system python interpreter (e.g. at /usr/local/bin/python).

The difficulty here might be in using pip with CCTBX if you are using one of the pre-built CCTBX distributions called “cctbx+Python” or “cctbx plus”, wherein the CCTBX distribution python will be different from your system python.

One way to overcome this difficulty is simply to install all of the packages twice, once to the CCTBX python and once to the system python.

Installing to the system python interpreter is easy with pip. For example, to install the pyfscache package:

% sudo pip install pyfscache

Note that “%” is the prompt and is not actually typed.

For the CCTBX python, things are slightly more complicated. First, when following the pip installation instructions, use cctbx.python with ez_setup.py and get-pip.py. For example:

% sudo cctbx.python ez_setup.py
% sudo cctbx.python get-pip.py

Once this latter command completes, you’ll see among the final lines of output something similar to:

Installing pip script to /opt/cctbx/Python.framework/Versions/2.7/bin

The directory path in this output points to the location of CCTBX’s pip, which can be used directly. Using pyfscache as an example:

% sudo /opt/cctbx/Python.framework/Versions/2.7/bin/pip install pyfscache

Other Dependencies

Other python dependencies are (in alphabetical order):

If not already present on your system, most (if not all) of these dependencies can be installed by the python package manager called pip. The availability of each of these packages will be checked during the build of RadialX by the setup.py script.

Download

Because of the CCTBX dependency, it is not yet recommended to install RadialX by using pip.

For now, the best way to obtain RadialX is to download the source code from the GitHub repository:

% git clone https://github.com/jcstroud/radialx.git

This command automatically downloads and unpacks the complete RadialX repository, which includes all of the code and some test data.

Build

It is advisable to look at the Makefile.inc file inside the radialx directory to ensure that the settings reflect your build environment. Most notably, ensure that the PYTHON setting points to the desired python version and that the bin directory under the directory specified by the PREFIX setting is in your path.

For example, if PREFIX is set to /usr/local, then ensure that /usr/local/bin is in your path.

These settings only affect how RadialX is built and where it is installed, not how it will execute once installed.

Once downloaded, build and installation are easy:

% cd radialx
% make
% sudo make install

The make command will automatically call setup.py and install the utilities (powderx, headerx, and profilex) into the appropriate location, specified by the PREFIX setting.

Usage

Complete examples of how to use all of the RadialX utilites are currently in the test directory of the source distribution. These examples are documented by comments in the config files called powder.yml and profile.cfg, the latter serving presently as the only source of documentation for the profilex utility.

Detailed instructions for the headerx and powderx utilities follow.

headerx

The headerx utility is the most straightforward to use. First, convert an image file from the synchrotron or a home-source detector to an adxv binary file. This function is found under the FileSave.. menu of adxv. Ensure that the checkboxes for “Image” and “Binary” are checked in the Adxv Save window. I prefer to name these adxv binary files with the “.bin” extension.

Using a filed called stsaa_119a_0_003.bin as an example:

% headerx stsaa_119a_0_003.bin

This file is in the test/testdata directory and yields the following output:

     ===============  ===============
        HEADER_BYTES: 1024
                 DIM: 2
          BYTE_ORDER: little_endian
                TYPE: unsigned_short
               SIZE1: 3072
               SIZE2: 3072
          PIXEL_SIZE: 0.10259
                 BIN: 2x2
                 ADC: slow
         DETECTOR_SN: 911
            BEAMLINE: 24_ID_C
                DATE: Mon Jun  8 02:28:42 2009
                TIME: 10.0
            DISTANCE: 400.0
           OSC_RANGE: 1.0
                 PHI: 47.0
           OSC_START: 47.0
            TWOTHETA: 0.0
                AXIS: phi
          WAVELENGTH: 0.9793
       BEAM_CENTER_X: 157.11
       BEAM_CENTER_Y: 156.05
        TRANSMISSION: 10.0871
                PUCK: C
              SAMPLE: 2
            RING_CUR: 102.2
           RING_MODE: 0+24x1, ~1.3% Coupling
        MD2_APERTURE: 30
            UNIF_PED: 1500
CCD_IMAGE_SATURATION: 65535
     ===============  ===============
         Sanity Test
            4.7 Angs: 2357,1550 px
            4.7 Angs: 1531,2376 px
     ===============  ===============

The “Sanity Test” is based on the header beam center. Hovering the mouse over the given pixels in adxv should produce approximately the given resolutions (depending on what adxv thinks is the beam center).

powderx

The powderx utility simulates powder diffraction patterns from PDB files. These patterns are presented graphically and also written to a file name designated by the user, as described below.

The powderx Config File

A yaml formatted config file controls the behavior of powderx. This config file is specified as an argument on the command line:

powderx powder.yml

An example config file named powder.yml is in the test/test-powder directory of the source distribution. The powder config file will be referred to as “powder.yml” herein. The provided example file has comments that briefly describe each parameter. It is suggested just to copy and modify the example powder.yml file from the test/test-powder directory of the source distribution because its format may change slightly between versions of RadialX.

An introduction to the yaml config format is given below and provides everything users need to know about yaml to write a config file for powderx. For the curious, the full yaml specification (version 1.2) can be found at http://www.yaml.org/spec/1.2/spec.html.

The powder.yml file has three sections:

  • general: parameters that effect the user experience
  • simulation: parameters for the powder diffraction simulation
  • plot: parameters that modify the appearance of the plot
  • experiment: parameters of the simulated diffraction experiment

A detailed discussion of each section follows.

general

Parameters in the general section effect the user experience to a limited extent.

  • powderx_version: version number of the powderx program; it is critical for the config file version to match the version of the powderx program
  • verbosity: controls how verbose the output is; values may be DEBUG (most verbose), INFO, WARNING, ERROR, or CRITICAL (least verbose)
simulation

Of the three sections in powder.yml, the simulation section has the most parameters. Most of these parameters are self-explanatory.

  • pdb_name: pdb file from which to make a simulated pattern

  • pattern_name: the simulated pattern is written to a file of this name; the simulated pattern format is described below

  • d_max: maximum d-spacing (lowest resolution) for the simulation, given in Ångstroms.

  • d_min: minimum d-spacing (highest resolution) for the simulation, given in Ångstroms.

  • extinction_correction_x: an optional parameter refined during extinction correction, which is applied to the simulated pattern; this correction is discussed in the SHELXL 97 manual on page 7-7 (http://shelx.uni-ac.gwdg.de/SHELX/shelx97.pdf); use null or 0 if extinction correction is not desired

  • v & w: for the summation of reflections, the square of the full-width at half-max (FWHM) of a Lorentzian diffraction peak is equal to to v + w tan(θ)

  • B: isotropic temperature factor; if the simulated pattern

    is to be scaled with experimental data, then B should be set to 0 because it will be refined during scaling

  • apply_Lp: apply Lorentz polarization correction (True or False)

  • pattern_shells: number of points in the simulated pattern; each point represents the integrated intensity of the shell

  • peak_widths: the intensity of a reflection is taken to be 0 beyond this number of FWHM from the center of the Lorentzian reflection peak

  • combine _reflections: reflections may be combined by resolution such that all the reflections within a shell (specified by pattern_shells) are taken to have the same center and peak shape (i.e. the same FWHM), making the calculations significantly faster at the expense of a small decrease in accuracy; values for combine_reflections may be True or False

plot

The plot section controls the appearance of the plot.

  • window_name: name of the plot window
  • left, right, top, bottom: margins between plot and page border; note that axes labels are in the margins
  • plot_points: the data is rebinned simply for the purposes of the plot; the plot will have plot_points points
  • x_ticks: number of ticks on the x-axis; labeled by 2θ.
experiment

A simulated diffraction pattern is the result of a simulated experiment. The parameters of the experiment section specify simulated experimental details.

  • WAVELENGTH: radiation wavelength
  • DISTANCE: the distance from the sample to the detector

Please note that yaml is case-sensitive, so these latter two parameter names must be in all caps.

Summation

The simulated powder diffraction pattern is the spherical summation of diffraction intensity over each resolution shell, where the number of resolution shells is specified by the patern_shells setting of the powderx config file. Each shell is summed independently. All shells have approximately the same width in \(\varrho\), or \(\sin(\theta)/\lambda\), where \(\theta\) is the Bragg angle and \(\lambda\) is the wavelength.

Given that the structure factor for a Miller index, \(hk\ell\), is \(F_{hk\ell}\), the corresponding intensity \(I_{hk\ell}\) is

(1)\[I_{hk\ell} = \dfrac{M_{hk\ell}}{V^{2}} \cdot L_{p} \cdot F_{hk\ell} \cdot F_{hk\ell}^{*} \cdot \exp \left ( \dfrac{-2B\sin^{2}\theta}{\lambda^{2}} \right )\]

\(V\) is the unit cell volume, \(M_{hk\ell}\) is the multiplicity of the reflection, \(B\) the isotropic temperature factor, and \(F_{hk\ell}^{*}\) is the complex conjugate of \(F_{hk\ell}\). The term \(L_{p}\) is the Lorentz polarization correction:

(2)\[L_{p} = \left [ \dfrac{1 + \cos^{2}(2\theta)} {\sin^{2}\theta \cos\theta} \right ]\]

The Lorentz polarization correction may be aplied to the simulated pattern using the apply_Lp setting within the simulation section of the powderx config file.

If the simulated pattern is to be scaled to experimental data using the profilex utility, \(B\) should be set to 0 because \(B\) is refined during scaling. Note that when \(B\) is set to 0, the expression for diffraction intensity simplifies to

(3)\[I_{hk\ell} = \dfrac{M_{hk\ell}}{V^{2}} \cdot L_{p} \cdot F_{hk\ell} \cdot F_{hk\ell}^{*}\]

Diffraction intensities are distributed throughout a profile, approximated by powderx as a Lorentzian distribution (also called a “Cauchy distribution”). This distribution takes the form

(4)\[A_{J}^{L} = \dfrac{2}{\pi H_{B}} \left [1 + \dfrac{4}{H_{B}^{2}} \left ( 2\theta_{i} - 2\theta_{hk\ell} \right )^2 \right ]^{-1}\]

\(A_{J}^{L}\) is the intensity contribution at angle \(\theta_{i}\), \(H_{B}\) is the FWHM of the peak, and \(\theta_{hk\ell}\) is the angle of diffraction for the Miller index \(hk\ell\). For the purposes of the simulation, values of \(\theta_{i}\) are taken from the middles (in \(\varrho\)) of the shells.

The FWHM, or \(H_{B}\), can be expressed as a function of the Bragg angle of the reflection:

(5)\[H_{B}^{2} = v \tan \theta_{hk\ell} + w\]

The two parameters of this expression, \(v\) and \(w\), may be adjusted using the powderx config file settings v and w, respectively.

Because it is computationally expensive to calculate the contribution of every reflection to every shell, powderx allows the user to limit the calculations to the shells in the neighborhood of \(\theta_{i}\) using the peak_widths setting. This setting specifies the width of the neighborhood in multiples of the FWHM.

Combining Reflections

With a small sacrifice in accuracy, a significant improvement in the speed of the summation can be realized by combining reflections by bin before applying Equation (4). For each bin, intensities are summed over each Miller index of the bin, \(hk\ell(\text{bin})\):

(6)\[I_{\text{bin}} = \sum I_{hk\ell(\text{bin})}\]

Each bin intensity \(I_{\text{bin}}\) has a peak center, \(2\theta_{\text{bin}}\), at the middle of the bin as determined by the taking the mean of the \(2\theta\) angles of the reflections therein. If \(N_{\text{bin}}\) is the number of reflections in the bin, the middle of the bin is given by Equation (7):

(7)\[2\theta_{\text{bin}} = \dfrac {\sum 2\theta_{hk\ell(\text{bin})}} {N_{\text{bin}}}\]

When reflections are combined in this manner, Equation (4) becomes

(8)\[ A_{J}^{L} = \dfrac{2}{\pi H_{B}} \left [1 + \dfrac{4}{H_{B}^{2}} \left ( 2\theta_{i} - 2\theta_{\text{bin}} \right )^2 \right ]^{-1}\]

This approximation can be applied with the combine_reflections setting of the powderx config file.

File Formats

Simulated Pattern Format

The simulated integrated diffraction pattern is written to a yaml formatted file specified by the pattern_name setting in the simulation section of powder.yml. In yaml terms, the pattern is stored as a list of [2θ, intensity] pairs keyed by the word “pattern”. A python program can make a 2-D numpy array from the pattern easily if numpy (http://numpy.scipy.org/) and pyYAML (http://pyyaml.org/) are installed. For example, if the pattern is stored in the file “pattern.yml”:

import numpy
import yaml
ary = numpy.array(yaml.load(open('pattern.yml')))['pattern']

The array called “ary” is a Nx2 array, with each of the N rows being a [2θ, intensity] pair.

More generally, the pattern starts on the fourth line of the yaml file and each data line conforms to the following FORTRAN formatted read:

REAL X, Y
READ '(5X, F10.0, 1X, F10.0)', X, Y

The following are the first six lines of a yaml simulated powder diffraction pattern file:

model : "../testdata/stg06-phi06.4-wc-03.8-rc1.0-m4-12.pdb"
pattern :
  # [   2-theta, intensity ]
  - [  5.205029, 0.5671240 ]
  - [  5.285076, 0.5882654 ]
  - [  5.365124, 0.6002413 ]

YAML Config Format

The powder.yml file has a simple structure, which can be understood from the following listing:

%YAML 1.2
---
section_1 :
   parameter_a : value_a
   parameter_b : value_b
section_2 :
   parameter_c : value_c

Here, the first line is optional and indicates to a yaml parser that the file conforms to the yaml specification version 1.2. The second line of three dashes indicates the beginning of a yaml document. Each section name is on a line by itself and followed by a colon. Each parameter key-value pair is indented relative to the section names. All parameter key-value pairs are indented the same number of spaces. A colon separates the parameter key from its associated value.

Note that yaml is case sensitive.

Scaling Patterns

Scaling is achieved using the averaging mode of the profilex utility, which is still not officially documented. However, this functionality is available, so a brief discussion of the method follows.

Method of Scaling

When scaling, the integrated pattern intensity in the resolution range of interest (specfied by the roi setting) is partitioned into the number of bins specified by the setting roi_bins. Each bin has a center at \(\varrho\), or \(\sin(\theta)/\lambda\), where \(\theta\) is the Bragg angle. The scaled intensity \(I_{\varrho}^{\circ}\) for a bin with center \(\varrho\) is related to the expermental intensity \(I_{\varrho}\) for that bin by Equation (9):

(9)\[I_{\varrho}^{\circ} = \alpha \left \{ I_{\varrho} - \left (m\varrho + b \right ) \right \} \exp \left \{ -2B\varrho^{2} \right \}\]

The terms \(\alpha\), \(m\), \(b\), and \(B\) are fit during scaling to the pattern specified by the scaled_to setting. The parameter \(B\) is the isotropic temperature factor and \(\alpha\) is the overall scale.

Background Correction

The parameters \(m\) and \(b\) in Equation (9) estimate the contribution of background scatter, such that the Equation (10) describes the background correction of experimental intensities \(I_{\varrho}\):

(10)\[I_{\varrho}^{\circ} \exp \left \{ 2B\varrho^{2} \right \} \alpha^{-1} = I_{\varrho} - \left (m\varrho + b \right )\]

Note that the expression \(m\varrho + b\) in Equation (10) describes a straight line, meaning that the background scatter is coarsely estimated to require a simple linear correction. This coarseness reduces the risk of overfitting the background.

Background correction is applied with the background_correction setting using the the reference pattern specified by scaled_to. In other words, the background correction is estimated directly from scaling, which assumes that the reference pattern has no background (i.e. the reference pattern comes from a model).

Background correction will fail if an attempt is made to scale more than one pattern to the reference pattern. The reason can be seen on the left hand side of Equation (10), which implies that the reference intensities (analogous to the scaled intensities \(I_{\varrho}^{\circ}\)) are scaled with \(\alpha\) and corrected for the isotropic temperature factor \(B\) to make them comprable to the background-corrected experimental intensities. Thus, the scaled reference intensities \(I_{\varrho}^{\prime\circ}\) are described by Equation (11):

(11)\[I_{\varrho}^{\prime\circ} = I_{\varrho}^{\prime} \exp \left \{ 2B\varrho^{2} \right \} \alpha^{-1}\]

Since scaling produces unique values of these fitting parameters for each pair of patterns, profilex will terminate with an error if more than one pair is specified for scaling while background correction is turned on. Scaling more than two patterns simultaneously to find a common set of scaling parameters is not yet supported.

Experimental Considerations in Background Correction

It should be emphasized that the proper experimental way to correct for background is to take a background exposure (i.e. an exposure with no sample but with a holder, etc.) and subtract the blank from the otherwise equivalent exposure of a sample. This subtraction can be achieved with the difference mode of the profilex utility.

Goodness of Fit

The goodness of fit between patterns is measured if a reference pattern is specified by the scaled_to setting. The metric for goodness of fit is called \(R_{o}^{2}\), which is calculated using Equation (12).

(12)\[\begin{split}R_{o}^{2} = \dfrac{\sum_{\varrho} (I_{\varrho}^{\circ} - I_{\varrho}')^{2}} {\sum_{\varrho} (\left < I' \right > - I_{\varrho}')^{2}}\end{split}\]

\(I_{\varrho}'\) is the intensity of the bin with a middle at \(\varrho\) for the reference pattern. The term \(\left < I' \right >\) is the average bin intensity for the reference pattern.

The \(R_{o}^{2}\) metric has been described in Proc Natl Acad Sci U S A. 2012 May 15;109(20):7717-22.

Indices and tables