brainiak.fcma package

Full correlation matrix analysis

The implementation is based on the following publications:

[Wang2015-1]Full correlation matrix analysis (FCMA): An unbiased method for task-related functional connectivity”, Yida Wang, Jonathan D Cohen, Kai Li, Nicholas B Turk-Browne. Journal of Neuroscience Methods, 2015.
[Wang2015-2]“Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors”, Yida Wang, Michael J. Anderson, Jonathan D. Cohen, Alexander Heinecke, Kai Li, Nadathur Satish, Narayanan Sundaram, Nicholas B. Turk-Browne, Theodore L. Willke. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2015.

Submodules

brainiak.fcma.classifier module

Full Correlation Matrix Analysis (FCMA)

Correlation-based training and prediction

class brainiak.fcma.classifier.Classifier(clf, epochs_per_subj=0)

Bases: sklearn.base.BaseEstimator

Correlation-based classification component of FCMA

Parameters:
  • clf (class) – The classifier used, normally a classifier class of sklearn
  • epochs_per_subj (int, default 0) – The number of epochs of each subject within-subject normalization will be performed during classifier training if epochs_per_subj is specified default 0 means no within-subject normalization
training_data_

2D numpy array in shape [num_samples, num_features] – default None training_data_ is None except clf is SVM.SVC with precomputed kernel, in which case training data is needed to compute the similarity vector for each sample to be classified

test_raw_data_

a list of 2D array in shape [num_TRs, num_voxels] – default None test_raw_data_ is set after a prediction is called, if the new input data equals test_raw_data_, test_data_ can be reused

test_data_

2D numpy array in shape [num_samples, num_features] – default None test_data_ is set after a prediction is called, so that the test data does not need to be regenerated in the subsequent operations, e.g. getting decision values of the prediction

num_voxels_

int – The number of voxels per brain used in this classifier this is defined by the applied mask, normally the top voxels selected by FCMA voxel selection num_voxels_ must be consistent in both training and classification

num_samples_

int – The number of samples of the training set

decision_function(X)

output the decision value of the prediction

if X is not equal to self.test_raw_data_, i.e. predict is not called, first generate the test_data after getting the test_data, get the decision value via self.clf.

Parameters:X (a list of numpy array in shape [num_TRs, self.num_voxels_]) – X contains the activity data filtered by top voxels and prepared for correlation computation. len(X) is the number of test samples if len(X) > 1: normalization is done on all test samples
Returns:confidence
Return type:the predictions confidence values of X, in shape [len(X),]
fit(X, y)

use correlation data to train a model

first compute the correlation of the input data, and then normalize within subject if more than one sample in one subject, and then fit to a model defined by self.clf.

Parameters:
  • X (a list of numpy array in shape [num_TRs, num_voxels]) – X contains the activity data filtered by top voxels and prepared for correlation computation. assuming all elements of X has the same num_voxels value
  • y (labels, len(X) equals len(Y)) –
Returns:

self

Return type:

return the object itself

predict(X)

use a trained model to predict correlation data

first compute the correlation of the input data, and then normalize across all samples in the list if len(X) > 1, and then predict via self.clf.

Parameters:X (a list of numpy array in shape [num_TRs, self.num_voxels_]) – X contains the activity data filtered by top voxels and prepared for correlation computation. len(X) is the number of test samples if len(X) > 0: normalization is done on all test samples
Returns:y_pred
Return type:the predicted label of X, in shape [len(X),]

brainiak.fcma.cython_blas module

brainiak.fcma.cython_blas.compute_correlation()

use blas API sgemm wrapped by scipy to compute correlation

The blas APIs process matrices in column-major, but our matrices are in row-major, so we play the transpose trick here, i.e. A*B=(B^T*A^T)^T. The resulting matrix in shape [num_assigned_voxels, num_voxels] is stored in an alternate way to make sure that the correlation vectors of the same voxel stored continuously

Parameters:
  • py_trans_a (str) –
  • transpose or not for the first matrix A (do) –
  • py_trans_b (str) –
  • transpose or not for the first matrix B (do) –
  • py_m (int) –
  • row of the resulting matrix C (the) –
  • our case, is num_voxels (in) –
  • py_n (int) –
  • column of the resulting matrix C (the) –
  • our case, is num_assigned_voxels (in) –
  • py_k (int) –
  • collapsed dimension of the multiplying matrices (the) –
  • the column of the first matrix after transpose if necessary (i.e.) –
  • row of the second matrix after transpose if necessary (the) –
  • py_alpha (float) –
  • weight applied to the first matrix A (the) –
  • py_a (2D array in shape [epoch_length, num_voxels] in our case) –
  • activity data of an epoch (the) –
  • py_lda (int) –
  • stride of the first matrix A (the) –
  • py_start_voxel (int) –
  • starting voxel of assigned voxels (the) –
  • to locate the second matrix B (used) –
  • py_ldb (int) –
  • stride of the second matrix B (the) –
  • py_beta (float) –
  • weight applied to the resulting matrix C (the) –
  • py_c (3D array in shape [num_selected_voxels, num_epochs, num_voxels]) –
  • to store the resulting correlation values (place) –
  • py_ldc (int) –
  • stride of the resulting matrix (the) –
  • our case, num_voxels*num_epochs (in) –
  • py_start_epoch (int) –
  • epoch over which the correlation is computed (the) –
Returns:

  • py_c (3D array in shape [num_selected_voxels, num_epochs, num_voxels])
  • write the resulting correlation values in an alternate way
  • for the processing epoch

brainiak.fcma.cython_blas.compute_kernel_matrix()

use blas API syrk wrapped by scipy to compute kernel matrix of SVM

The blas APIs process matrices in column-major, but our matrices are in row-major, so we play the transpose trick here, i.e. A*B=(B^T*A^T)^T

In SVM with linear kernel, the distance of two samples is essentially the dot product of them. Therefore, the kernel matrix can be obtained by matrix multiplication. Since the kernel matrix is symmetric, ssyrk is used, the other half of the matrix is assigned later. In our case, the dimension of samples is much larger than the number samples, so we proportionally shrink the values of the kernel matrix for getting more robust alpha values in SVM iteration.

Parameters:
  • py_uplo (str) –
  • the upper or lower triangle of the matrix (getting) –
  • py_trans (str) –
  • transpose or not for the input matrix A (do) –
  • py_n (int) –
  • row and column of the resulting matrix C (the) –
  • our case, is num_epochs (in) –
  • py_k (int) –
  • collapsed dimension of the multiplying matrices (the) –
  • the column of the first matrix after transpose if necessary (i.e.) –
  • row of the second matrix after transpose if necessary (the) –
  • our case, is num_voxels (in) –
  • py_alpha (float) –
  • weight applied to the input matrix A (the) –
  • py_a (3D array in shape [num_assigned_voxels, num_epochs, num_voxels]) –
  • our case the normalized correlation values of a voxel (in) –
  • py_start_voxel (int) –
  • processed voxel (the) –
  • to locate the input matrix A (used) –
  • py_lda (int) –
  • stride of the input matrix A (the) –
  • py_beta (float) –
  • weight applied to the resulting matrix C (the) –
  • py_c (2D array in shape [num_epochs, num_epochs]) –
  • to store the resulting kernel matrix (place) –
  • py_ldc (int) –
  • stride of the resulting matrix (the) –
Returns:

  • py_c (2D array in shape [num_epochs, num_epochs])
  • write the resulting kernel_matrix
  • for the processing voxel

brainiak.fcma.cython_blas.compute_single_matrix_multiplication()

use blas API gemm wrapped by scipy to compute similarity matrix of SVM

This is to compute the similarity matrix between test sample(s) and training samples. The blas APIs process matrices in column-major, but our matrices are in row-major, so we play the transpose trick here, i.e. A*B=(B^T*A^T)^T

Parameters:
  • py_trans_a (str) –
  • transpose or not for the first matrix A (do) –
  • py_trans_b (str) –
  • transpose or not for the first matrix B (do) –
  • py_m (int) –
  • row of the resulting matrix C (the) –
  • our case, is num_training_samples (in) –
  • py_n (int) –
  • column of the resulting matrix C (the) –
  • our case, is num_test_samples (in) –
  • py_k (int) –
  • collapsed dimension of the multiplying matrices (the) –
  • the column of the first matrix after transpose if necessary (i.e.) –
  • row of the second matrix after transpose if necessary (the) –
  • our case, is num_selected_voxels*num_selected_voxels (in) –
  • py_alpha (float) –
  • weight applied to the input matrix A (the) –
  • py_a (2D array) –
  • shape [num_training_samples, num_selected_voxels*num_selected_voxels] (in) –
  • py_lda (int) –
  • stride of the input matrix A (the) –
  • py_b (2D array) –
  • shape [num_test_samples, num_selected_voxels*num_selected_voxels] (in) –
  • py_ldb (int) –
  • stride of the input matrix B (the) –
  • py_beta (float) –
  • weight applied to the resulting matrix C (the) –
  • py_c (2D array) –
  • shape [num_training_samples, num_test_samples] of column-major (in) –
  • fact it is (in) –
  • shape [num_test_samples, num_training_samples] of row-major (in) –
  • to store the resulting similarity matrix (place) –
  • py_ldc (int) –
  • stride of the resulting matrix (the) –
Returns:

  • py_c (3D array)
  • in shape [num_training_samples, num_test_samples] of column-major
  • write the resulting similarity matrix

brainiak.fcma.cython_blas.compute_single_self_correlation_gemm()

use blas API gemm wrapped by scipy to compute correlation matrix

This is to compute the correlation between selected voxels for final training and classification. Although the resulting correlation matrix is symmetric, in most cases, gemm performs better than syrk. Here we assume that the resulting matrix is stored in a compact way, i.e. py_ldc == py_n.

Parameters:
  • py_trans_a (str) –
  • transpose or not for the first matrix A (do) –
  • py_trans_b (str) –
  • transpose or not for the first matrix B (do) –
  • py_m (int) –
  • row of the resulting matrix C (the) –
  • our case, is num_selected_voxels (in) –
  • py_n (int) –
  • column of the resulting matrix C (the) –
  • our case, is num_selected_voxels
  • py_k (int) –
  • collapsed dimension of the multiplying matrices (the) –
  • the column of the first matrix after transpose if necessary (i.e.) –
  • row of the second matrix after transpose if necessary (the) –
  • our case, is num_TRs (in) –
  • py_alpha (float) –
  • weight applied to the input matrix A (the) –
  • py_a (2D array in shape [num_TRs, num_selected_voxels]) –
  • our case the normalized activity values (in) –
  • multipliers are specified here as the same one (both) –
  • py_lda (int) –
  • stride of the input matrix A (the) –
  • py_ldb (int) –
  • stride of the input matrix B (the) –
  • our case, the same as py_lda (in) –
  • py_beta (float) –
  • weight applied to the resulting matrix C (the) –
  • py_c (3D array) –
  • shape [num_samples, num_selected_voxels, num_selected_voxels] (in) –
  • to store the resulting kernel matrix (place) –
  • py_ldc (int) –
  • stride of the resulting matrix (the) –
  • py_start_sample (int) –
  • processed sample (the) –
  • to locate the resulting matrix C (used) –
Returns:

  • py_c (3D array)
  • in shape [num_samples, num_selected_voxels, num_selected_voxels]
  • write the resulting correlation matrices
  • for the processed sample

brainiak.fcma.cython_blas.compute_single_self_correlation_syrk()

use blas API syrk wrapped by scipy to compute correlation matrix

This is to compute the correlation between selected voxels for final training and classification. Since the resulting correlation matrix is symmetric, syrk is used. However, it looks like that in most cases, syrk performs much worse than gemm (the next function). Here we assume that the resulting matrix is stored in a compact way, i.e. py_ldc == py_n.

Parameters:
  • py_uplo (str) –
  • the upper or lower triangle of the matrix (getting) –
  • py_trans (str) –
  • transpose or not for the input matrix A (do) –
  • py_n (int) –
  • row and column of the resulting matrix C (the) –
  • our case, is num_selected_voxels (in) –
  • py_k (int) –
  • collapsed dimension of the multiplying matrices (the) –
  • the column of the first matrix after transpose if necessary (i.e.) –
  • row of the second matrix after transpose if necessary (the) –
  • our case, is num_TRs (in) –
  • py_alpha (float) –
  • weight applied to the input matrix A (the) –
  • py_a (2D array in shape [num_TRs, num_selected_voxels]) –
  • our case the normalized activity values (in) –
  • py_lda (int) –
  • stride of the input matrix A (the) –
  • py_beta (float) –
  • weight applied to the resulting matrix C (the) –
  • py_c (3D array) –
  • shape [num_samples, num_selected_voxels, num_selected_voxels] (in) –
  • to store the resulting kernel matrix (place) –
  • py_ldc (int) –
  • stride of the resulting matrix (the) –
  • py_start_sample (int) –
  • processed sample (the) –
  • to locate the resulting matrix C (used) –
Returns:

  • py_c (3D array)
  • in shape [num_samples, num_selected_voxels, num_selected_voxels]
  • write the resulting correlation matrices
  • for the processed sample

brainiak.fcma.io module

Full Correlation Matrix Analysis (FCMA)

FCMA related IO routines

brainiak.fcma.io.generate_epochs_info(epoch_list)

use epoch_list to generate epoch_info defined below

Parameters:epoch_list (list of 3D (binary) array in shape [condition, nEpochs, nTRs]) – Contains specification of epochs and conditions, assuming 1. all subjects have the same number of epochs; 2. len(epoch_list) equals the number of subjects; 3. an epoch is always a continuous time course.
Returns:epoch_info – label is the condition labels of the epochs; sid is the subject id, corresponding to the index of raw_data; start is the start TR of an epoch (inclusive); end is the end TR of an epoch(exclusive). Assuming len(labels) labels equals the number of epochs and the epochs of the same sid are adjacent in epoch_info
Return type:list of tuple (label, sid, start, end)
brainiak.fcma.io.prepare_fcma_data(data_dir, extension, mask_file, epoch_file)

obtain the data for correlation-based computation and analysis

read the data in and generate epochs of interests, then broadcast to all workers

Parameters:
  • data_dir (str) – the path to all subject files
  • extension (str) – the file extension, usually nii.gz or nii
  • mask_file (str) – the absolute path of the mask file, we apply the mask right after reading a file for saving memory
  • epoch_file (str) – the absolute path of the epoch file
Returns:

  • raw_data (list of 2D array in shape [epoch length, nVoxels]) – the data organized in epochs len(raw_data) equals the number of epochs
  • labels (list of 1D array) – the condition labels of the epochs len(labels) labels equals the number of epochs

brainiak.fcma.io.prepare_mvpa_data(data_dir, extension, mask_file, epoch_file)

obtain the data for activity-based model training and prediction

Average the activity within epochs and z-scoring within subject.

Parameters:
  • data_dir (str) – the path to all subject files
  • extension (str) – the file extension, usually nii.gz or nii
  • mask_file (str) – the absolute path of the mask file, we apply the mask right after reading a file for saving memory
  • epoch_file (str) – the absolute path of the epoch file
Returns:

  • processed_data (2D array in shape [num_voxels, num_epochs]) – averaged epoch by epoch processed data
  • labels (1D array) – contains labels of the data

brainiak.fcma.io.read_activity_data(dir, file_extension, mask_file=None)

read data in NIfTI from a dir and apply the spatial mask to them

Parameters:
  • dir (str) – the path to all subject files
  • file_extension (str) – the file extension, usually nii.gz or nii
  • mask_file (Optional[str]) – the absolute path of the mask file, we apply the mask right after reading a file for saving memory if it is not specified, the data will not be masked and remain in 4D
Returns:

activity_data – if masked, in shape [nVoxels, nTRs], organized in voxel*TR formats if not masked, in shape [x, y, z, t] or [brain 3D, nTRs] len(activity_data) equals the number of subjects the data type is float32

Return type:

list of array of brain activity data

brainiak.fcma.io.write_nifti_file(data, affine, filename)

write a nifti file given data and affine

Parameters:
  • data (3D/4D numpy array) – the brain data with/without time dimension
  • affine (2D numpy array) – affine of the image, usually inherited from an existing image
  • filename (string) – the output filename

brainiak.fcma.mvpa_voxelselector module

Full Correlation Matrix Analysis (FCMA)

Activity-based voxel selection

class brainiak.fcma.mvpa_voxelselector.MVPAVoxelSelector(raw_data, mask, epoch_info, num_folds, sl)

Bases: object

Activity-based voxel selection component of FCMA

Parameters:
  • raw_data (list of 4D array) – each element of the list is the raw data of a subject, in the shape of [x, y, z, t]
  • mask (3D array) –
  • epoch_info (list of tuple (label, sid, start, end)) – label is the condition labels of the epochs; sid is the subject id, corresponding to the index of raw_data; start is the start TR of an epoch (inclusive); end is the end TR of an epoch(exclusive). Assuming len(labels) labels equals the number of epochs and the epochs of the same sid are adjacent in epoch_info
  • num_folds (int) – the number of folds to be conducted in the cross validation
  • sl (Searchlight) – the distributed Searchlight object
processed_data_

4D array in shape [brain 3D + epoch] – contains the averaged and normalized brain data epoch by epoch it is generated from raw_data and epoch_info

labels_

1D array – contains the labels of the epochs, extracted from epoch_info

run(clf)

run activity-based voxel selection

Sort the voxels based on the cross-validation accuracy of their activity vectors within the searchlight

Parameters:clf (classification function) – the classifier to be used in cross validation
Returns:
  • result_volume (3D array of accuracy numbers) – contains the voxelwise accuracy numbers obtained via Searchlight
  • results (list of tuple (voxel_id, accuracy)) – the accuracy numbers of all voxels, in accuracy descending order the length of array equals the number of voxels

brainiak.fcma.util module

Full Correlation Matrix Analysis (FCMA)

Correlation related high performance routines

brainiak.fcma.util.compute_correlation(matrix1, matrix2)

compute correlation between two sets of variables

Correlate the rows of matrix1 with the rows of matrix2. If matrix1 == matrix2, it is auto-correlation computation resulting in a symmetric correlation matrix. The number of columns MUST agree between set1 and set2. The correlation being computed here is the Pearson’s correlation coefficient, which can be expressed as

\[corr(X, Y) = \frac{cov(X, Y)}{\sigma_X\sigma_Y}\]

where cov(X, Y) is the covariance of variable X and Y, and

\[\sigma_X\]

is the standard deviation of variable X

Reducing the correlation computation to matrix multiplication and using BLAS GEMM API wrapped by Scipy can speedup the numpy built-in correlation computation (numpy.corrcoef) by one order of magnitude

\[\begin{split}corr(X, Y) &= \frac{\sum\limits_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{(n-1) \sqrt{\frac{\sum\limits_{j=1}^n x_j^2-n\bar{x}}{n-1}} \sqrt{\frac{\sum\limits_{j=1}^{n} y_j^2-n\bar{y}}{n-1}}}\\ &= \sum\limits_{i=1}^n(\frac{(x_i-\bar{x})} {\sqrt{\sum\limits_{j=1}^n x_j^2-n\bar{x}}} \frac{(y_i-\bar{y})}{\sqrt{\sum\limits_{j=1}^n y_j^2-n\bar{y}}})\end{split}\]
Parameters:
  • matrix1 (2D array in shape [r1, c]) – MUST be continuous and row-major
  • matrix2 (2D array in shape [r2, c]) – MUST be continuous and row-major
Returns:

corr_data – continuous and row-major in np.float32

Return type:

2D array in shape [r1, r2]

brainiak.fcma.voxelselector module

Full Correlation Matrix Analysis (FCMA)

Correlation-based voxel selection

class brainiak.fcma.voxelselector.VoxelSelector(raw_data, epochs_per_subj, labels, num_folds, voxel_unit=100, master_rank=0)

Bases: object

Correlation-based voxel selection component of FCMA

Parameters:
  • raw_data (list of 2D array in shape [epoch length, nVoxels]) –
    Assumption: 1. all activity data contains the same number of voxels
    1. the activity data has been z-scored, ready to compute correlation as matrix multiplication
    2. all subjects have the same number of epochs
    3. epochs belonging to the same subject are adjacent in the list
    4. voxel selection is always done in the auto-correlation, i.e. raw_data correlate with themselves
    5. if MPI jobs are running on multiple nodes, the user home directory is shared by all nodes
  • epochs_per_subj (int) – The number of epochs of each subject
  • labels (list of 1D array) – the condition labels of the epochs len(labels) labels equals the number of epochs
  • num_folds (int) – The number of folds to be conducted in the cross validation
  • voxel_unit (int, default 100) – The number of voxels assigned to a worker each time
  • master_rank (int, default 0) – The process which serves as the master
run(clf)

run correlation-based voxel selection in master-worker model

Sort the voxels based on the cross-validation accuracy of their correlation vectors

Parameters:clf (classification function) – the classifier to be used in cross validation
Returns:results – the accuracy numbers of all voxels, in accuracy descending order the length of array equals the number of voxels
Return type:list of tuple (voxel_id, accuracy)