brainiak.fcma package¶
Full correlation matrix analysis
The implementation is based on the following publications:
[Wang20151]  Full correlation matrix analysis (FCMA): An unbiased method for taskrelated functional connectivity”, Yida Wang, Jonathan D Cohen, Kai Li, Nicholas B TurkBrowne. Journal of Neuroscience Methods, 2015. 
[Wang20152]  “Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors”, Yida Wang, Michael J. Anderson, Jonathan D. Cohen, Alexander Heinecke, Kai Li, Nadathur Satish, Narayanan Sundaram, Nicholas B. TurkBrowne, Theodore L. Willke. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2015. 
Submodules¶
brainiak.fcma.classifier module¶
Full Correlation Matrix Analysis (FCMA)
Correlationbased training and prediction

class
brainiak.fcma.classifier.
Classifier
(clf, epochs_per_subj=0)¶ Bases:
sklearn.base.BaseEstimator
Correlationbased classification component of FCMA
Parameters:  clf (class) – The classifier used, normally a classifier class of sklearn
 epochs_per_subj (int, default 0) – The number of epochs of each subject withinsubject normalization will be performed during classifier training if epochs_per_subj is specified default 0 means no withinsubject normalization

training_data_
¶ 2D numpy array in shape [num_samples, num_features] – default None training_data_ is None except clf is SVM.SVC with precomputed kernel, in which case training data is needed to compute the similarity vector for each sample to be classified

test_raw_data_
¶ a list of 2D array in shape [num_TRs, num_voxels] – default None test_raw_data_ is set after a prediction is called, if the new input data equals test_raw_data_, test_data_ can be reused

test_data_
¶ 2D numpy array in shape [num_samples, num_features] – default None test_data_ is set after a prediction is called, so that the test data does not need to be regenerated in the subsequent operations, e.g. getting decision values of the prediction

num_voxels_
¶ int – The number of voxels per brain used in this classifier this is defined by the applied mask, normally the top voxels selected by FCMA voxel selection num_voxels_ must be consistent in both training and classification

num_samples_
¶ int – The number of samples of the training set

decision_function
(X)¶ output the decision value of the prediction
if X is not equal to self.test_raw_data_, i.e. predict is not called, first generate the test_data after getting the test_data, get the decision value via self.clf.
Parameters: X (a list of numpy array in shape [num_TRs, self.num_voxels_]) – X contains the activity data filtered by top voxels and prepared for correlation computation. len(X) is the number of test samples if len(X) > 1: normalization is done on all test samples Returns: confidence Return type: the predictions confidence values of X, in shape [len(X),]

fit
(X, y)¶ use correlation data to train a model
first compute the correlation of the input data, and then normalize within subject if more than one sample in one subject, and then fit to a model defined by self.clf.
Parameters:  X (a list of numpy array in shape [num_TRs, num_voxels]) – X contains the activity data filtered by top voxels and prepared for correlation computation. assuming all elements of X has the same num_voxels value
 y (labels, len(X) equals len(Y)) –
Returns: self
Return type: return the object itself

predict
(X)¶ use a trained model to predict correlation data
first compute the correlation of the input data, and then normalize across all samples in the list if len(X) > 1, and then predict via self.clf.
Parameters: X (a list of numpy array in shape [num_TRs, self.num_voxels_]) – X contains the activity data filtered by top voxels and prepared for correlation computation. len(X) is the number of test samples if len(X) > 0: normalization is done on all test samples Returns: y_pred Return type: the predicted label of X, in shape [len(X),]
brainiak.fcma.cython_blas module¶

brainiak.fcma.cython_blas.
compute_correlation
()¶ use blas API sgemm wrapped by scipy to compute correlation
The blas APIs process matrices in columnmajor, but our matrices are in rowmajor, so we play the transpose trick here, i.e. A*B=(B^T*A^T)^T. The resulting matrix in shape [num_assigned_voxels, num_voxels] is stored in an alternate way to make sure that the correlation vectors of the same voxel stored continuously
Parameters:  py_trans_a (str) –
 transpose or not for the first matrix A (do) –
 py_trans_b (str) –
 transpose or not for the first matrix B (do) –
 py_m (int) –
 row of the resulting matrix C (the) –
 our case, is num_voxels (in) –
 py_n (int) –
 column of the resulting matrix C (the) –
 our case, is num_assigned_voxels (in) –
 py_k (int) –
 collapsed dimension of the multiplying matrices (the) –
 the column of the first matrix after transpose if necessary (i.e.) –
 row of the second matrix after transpose if necessary (the) –
 py_alpha (float) –
 weight applied to the first matrix A (the) –
 py_a (2D array in shape [epoch_length, num_voxels] in our case) –
 activity data of an epoch (the) –
 py_lda (int) –
 stride of the first matrix A (the) –
 py_start_voxel (int) –
 starting voxel of assigned voxels (the) –
 to locate the second matrix B (used) –
 py_ldb (int) –
 stride of the second matrix B (the) –
 py_beta (float) –
 weight applied to the resulting matrix C (the) –
 py_c (3D array in shape [num_selected_voxels, num_epochs, num_voxels]) –
 to store the resulting correlation values (place) –
 py_ldc (int) –
 stride of the resulting matrix (the) –
 our case, num_voxels*num_epochs (in) –
 py_start_epoch (int) –
 epoch over which the correlation is computed (the) –
Returns:  py_c (3D array in shape [num_selected_voxels, num_epochs, num_voxels])
 write the resulting correlation values in an alternate way
 for the processing epoch

brainiak.fcma.cython_blas.
compute_kernel_matrix
()¶ use blas API syrk wrapped by scipy to compute kernel matrix of SVM
The blas APIs process matrices in columnmajor, but our matrices are in rowmajor, so we play the transpose trick here, i.e. A*B=(B^T*A^T)^T
In SVM with linear kernel, the distance of two samples is essentially the dot product of them. Therefore, the kernel matrix can be obtained by matrix multiplication. Since the kernel matrix is symmetric, ssyrk is used, the other half of the matrix is assigned later. In our case, the dimension of samples is much larger than the number samples, so we proportionally shrink the values of the kernel matrix for getting more robust alpha values in SVM iteration.
Parameters:  py_uplo (str) –
 the upper or lower triangle of the matrix (getting) –
 py_trans (str) –
 transpose or not for the input matrix A (do) –
 py_n (int) –
 row and column of the resulting matrix C (the) –
 our case, is num_epochs (in) –
 py_k (int) –
 collapsed dimension of the multiplying matrices (the) –
 the column of the first matrix after transpose if necessary (i.e.) –
 row of the second matrix after transpose if necessary (the) –
 our case, is num_voxels (in) –
 py_alpha (float) –
 weight applied to the input matrix A (the) –
 py_a (3D array in shape [num_assigned_voxels, num_epochs, num_voxels]) –
 our case the normalized correlation values of a voxel (in) –
 py_start_voxel (int) –
 processed voxel (the) –
 to locate the input matrix A (used) –
 py_lda (int) –
 stride of the input matrix A (the) –
 py_beta (float) –
 weight applied to the resulting matrix C (the) –
 py_c (2D array in shape [num_epochs, num_epochs]) –
 to store the resulting kernel matrix (place) –
 py_ldc (int) –
 stride of the resulting matrix (the) –
Returns:  py_c (2D array in shape [num_epochs, num_epochs])
 write the resulting kernel_matrix
 for the processing voxel

brainiak.fcma.cython_blas.
compute_single_matrix_multiplication
()¶ use blas API gemm wrapped by scipy to compute similarity matrix of SVM
This is to compute the similarity matrix between test sample(s) and training samples. The blas APIs process matrices in columnmajor, but our matrices are in rowmajor, so we play the transpose trick here, i.e. A*B=(B^T*A^T)^T
Parameters:  py_trans_a (str) –
 transpose or not for the first matrix A (do) –
 py_trans_b (str) –
 transpose or not for the first matrix B (do) –
 py_m (int) –
 row of the resulting matrix C (the) –
 our case, is num_training_samples (in) –
 py_n (int) –
 column of the resulting matrix C (the) –
 our case, is num_test_samples (in) –
 py_k (int) –
 collapsed dimension of the multiplying matrices (the) –
 the column of the first matrix after transpose if necessary (i.e.) –
 row of the second matrix after transpose if necessary (the) –
 our case, is num_selected_voxels*num_selected_voxels (in) –
 py_alpha (float) –
 weight applied to the input matrix A (the) –
 py_a (2D array) –
 shape [num_training_samples, num_selected_voxels*num_selected_voxels] (in) –
 py_lda (int) –
 stride of the input matrix A (the) –
 py_b (2D array) –
 shape [num_test_samples, num_selected_voxels*num_selected_voxels] (in) –
 py_ldb (int) –
 stride of the input matrix B (the) –
 py_beta (float) –
 weight applied to the resulting matrix C (the) –
 py_c (2D array) –
 shape [num_training_samples, num_test_samples] of columnmajor (in) –
 fact it is (in) –
 shape [num_test_samples, num_training_samples] of rowmajor (in) –
 to store the resulting similarity matrix (place) –
 py_ldc (int) –
 stride of the resulting matrix (the) –
Returns:  py_c (3D array)
 in shape [num_training_samples, num_test_samples] of columnmajor
 write the resulting similarity matrix

brainiak.fcma.cython_blas.
compute_single_self_correlation_gemm
()¶ use blas API gemm wrapped by scipy to compute correlation matrix
This is to compute the correlation between selected voxels for final training and classification. Although the resulting correlation matrix is symmetric, in most cases, gemm performs better than syrk. Here we assume that the resulting matrix is stored in a compact way, i.e. py_ldc == py_n.
Parameters:  py_trans_a (str) –
 transpose or not for the first matrix A (do) –
 py_trans_b (str) –
 transpose or not for the first matrix B (do) –
 py_m (int) –
 row of the resulting matrix C (the) –
 our case, is num_selected_voxels (in) –
 py_n (int) –
 column of the resulting matrix C (the) –
 our case, is num_selected_voxels –
 py_k (int) –
 collapsed dimension of the multiplying matrices (the) –
 the column of the first matrix after transpose if necessary (i.e.) –
 row of the second matrix after transpose if necessary (the) –
 our case, is num_TRs (in) –
 py_alpha (float) –
 weight applied to the input matrix A (the) –
 py_a (2D array in shape [num_TRs, num_selected_voxels]) –
 our case the normalized activity values (in) –
 multipliers are specified here as the same one (both) –
 py_lda (int) –
 stride of the input matrix A (the) –
 py_ldb (int) –
 stride of the input matrix B (the) –
 our case, the same as py_lda (in) –
 py_beta (float) –
 weight applied to the resulting matrix C (the) –
 py_c (3D array) –
 shape [num_samples, num_selected_voxels, num_selected_voxels] (in) –
 to store the resulting kernel matrix (place) –
 py_ldc (int) –
 stride of the resulting matrix (the) –
 py_start_sample (int) –
 processed sample (the) –
 to locate the resulting matrix C (used) –
Returns:  py_c (3D array)
 in shape [num_samples, num_selected_voxels, num_selected_voxels]
 write the resulting correlation matrices
 for the processed sample

brainiak.fcma.cython_blas.
compute_single_self_correlation_syrk
()¶ use blas API syrk wrapped by scipy to compute correlation matrix
This is to compute the correlation between selected voxels for final training and classification. Since the resulting correlation matrix is symmetric, syrk is used. However, it looks like that in most cases, syrk performs much worse than gemm (the next function). Here we assume that the resulting matrix is stored in a compact way, i.e. py_ldc == py_n.
Parameters:  py_uplo (str) –
 the upper or lower triangle of the matrix (getting) –
 py_trans (str) –
 transpose or not for the input matrix A (do) –
 py_n (int) –
 row and column of the resulting matrix C (the) –
 our case, is num_selected_voxels (in) –
 py_k (int) –
 collapsed dimension of the multiplying matrices (the) –
 the column of the first matrix after transpose if necessary (i.e.) –
 row of the second matrix after transpose if necessary (the) –
 our case, is num_TRs (in) –
 py_alpha (float) –
 weight applied to the input matrix A (the) –
 py_a (2D array in shape [num_TRs, num_selected_voxels]) –
 our case the normalized activity values (in) –
 py_lda (int) –
 stride of the input matrix A (the) –
 py_beta (float) –
 weight applied to the resulting matrix C (the) –
 py_c (3D array) –
 shape [num_samples, num_selected_voxels, num_selected_voxels] (in) –
 to store the resulting kernel matrix (place) –
 py_ldc (int) –
 stride of the resulting matrix (the) –
 py_start_sample (int) –
 processed sample (the) –
 to locate the resulting matrix C (used) –
Returns:  py_c (3D array)
 in shape [num_samples, num_selected_voxels, num_selected_voxels]
 write the resulting correlation matrices
 for the processed sample
brainiak.fcma.io module¶
Full Correlation Matrix Analysis (FCMA)
FCMA related IO routines

brainiak.fcma.io.
generate_epochs_info
(epoch_list)¶ use epoch_list to generate epoch_info defined below
Parameters: epoch_list (list of 3D (binary) array in shape [condition, nEpochs, nTRs]) – Contains specification of epochs and conditions, assuming 1. all subjects have the same number of epochs; 2. len(epoch_list) equals the number of subjects; 3. an epoch is always a continuous time course. Returns: epoch_info – label is the condition labels of the epochs; sid is the subject id, corresponding to the index of raw_data; start is the start TR of an epoch (inclusive); end is the end TR of an epoch(exclusive). Assuming len(labels) labels equals the number of epochs and the epochs of the same sid are adjacent in epoch_info Return type: list of tuple (label, sid, start, end)

brainiak.fcma.io.
prepare_fcma_data
(data_dir, extension, mask_file, epoch_file)¶ obtain the data for correlationbased computation and analysis
read the data in and generate epochs of interests, then broadcast to all workers
Parameters:  data_dir (str) – the path to all subject files
 extension (str) – the file extension, usually nii.gz or nii
 mask_file (str) – the absolute path of the mask file, we apply the mask right after reading a file for saving memory
 epoch_file (str) – the absolute path of the epoch file
Returns:  raw_data (list of 2D array in shape [epoch length, nVoxels]) – the data organized in epochs len(raw_data) equals the number of epochs
 labels (list of 1D array) – the condition labels of the epochs len(labels) labels equals the number of epochs

brainiak.fcma.io.
prepare_mvpa_data
(data_dir, extension, mask_file, epoch_file)¶ obtain the data for activitybased model training and prediction
Average the activity within epochs and zscoring within subject.
Parameters:  data_dir (str) – the path to all subject files
 extension (str) – the file extension, usually nii.gz or nii
 mask_file (str) – the absolute path of the mask file, we apply the mask right after reading a file for saving memory
 epoch_file (str) – the absolute path of the epoch file
Returns:  processed_data (2D array in shape [num_voxels, num_epochs]) – averaged epoch by epoch processed data
 labels (1D array) – contains labels of the data

brainiak.fcma.io.
read_activity_data
(dir, file_extension, mask_file=None)¶ read data in NIfTI from a dir and apply the spatial mask to them
Parameters:  dir (str) – the path to all subject files
 file_extension (str) – the file extension, usually nii.gz or nii
 mask_file (Optional[str]) – the absolute path of the mask file, we apply the mask right after reading a file for saving memory if it is not specified, the data will not be masked and remain in 4D
Returns: activity_data – if masked, in shape [nVoxels, nTRs], organized in voxel*TR formats if not masked, in shape [x, y, z, t] or [brain 3D, nTRs] len(activity_data) equals the number of subjects the data type is float32
Return type: list of array of brain activity data

brainiak.fcma.io.
write_nifti_file
(data, affine, filename)¶ write a nifti file given data and affine
Parameters:  data (3D/4D numpy array) – the brain data with/without time dimension
 affine (2D numpy array) – affine of the image, usually inherited from an existing image
 filename (string) – the output filename
brainiak.fcma.mvpa_voxelselector module¶
Full Correlation Matrix Analysis (FCMA)
Activitybased voxel selection

class
brainiak.fcma.mvpa_voxelselector.
MVPAVoxelSelector
(raw_data, mask, epoch_info, num_folds, sl)¶ Bases:
object
Activitybased voxel selection component of FCMA
Parameters:  raw_data (list of 4D array) – each element of the list is the raw data of a subject, in the shape of [x, y, z, t]
 mask (3D array) –
 epoch_info (list of tuple (label, sid, start, end)) – label is the condition labels of the epochs; sid is the subject id, corresponding to the index of raw_data; start is the start TR of an epoch (inclusive); end is the end TR of an epoch(exclusive). Assuming len(labels) labels equals the number of epochs and the epochs of the same sid are adjacent in epoch_info
 num_folds (int) – the number of folds to be conducted in the cross validation
 sl (Searchlight) – the distributed Searchlight object

processed_data_
¶ 4D array in shape [brain 3D + epoch] – contains the averaged and normalized brain data epoch by epoch it is generated from raw_data and epoch_info

labels_
¶ 1D array – contains the labels of the epochs, extracted from epoch_info

run
(clf)¶ run activitybased voxel selection
Sort the voxels based on the crossvalidation accuracy of their activity vectors within the searchlight
Parameters: clf (classification function) – the classifier to be used in cross validation Returns:  result_volume (3D array of accuracy numbers) – contains the voxelwise accuracy numbers obtained via Searchlight
 results (list of tuple (voxel_id, accuracy)) – the accuracy numbers of all voxels, in accuracy descending order the length of array equals the number of voxels
brainiak.fcma.util module¶
Full Correlation Matrix Analysis (FCMA)
Correlation related high performance routines

brainiak.fcma.util.
compute_correlation
(matrix1, matrix2)¶ compute correlation between two sets of variables
Correlate the rows of matrix1 with the rows of matrix2. If matrix1 == matrix2, it is autocorrelation computation resulting in a symmetric correlation matrix. The number of columns MUST agree between set1 and set2. The correlation being computed here is the Pearson’s correlation coefficient, which can be expressed as
\[corr(X, Y) = \frac{cov(X, Y)}{\sigma_X\sigma_Y}\]where cov(X, Y) is the covariance of variable X and Y, and
\[\sigma_X\]is the standard deviation of variable X
Reducing the correlation computation to matrix multiplication and using BLAS GEMM API wrapped by Scipy can speedup the numpy builtin correlation computation (numpy.corrcoef) by one order of magnitude
\[\begin{split}corr(X, Y) &= \frac{\sum\limits_{i=1}^n (x_i\bar{x})(y_i\bar{y})}{(n1) \sqrt{\frac{\sum\limits_{j=1}^n x_j^2n\bar{x}}{n1}} \sqrt{\frac{\sum\limits_{j=1}^{n} y_j^2n\bar{y}}{n1}}}\\ &= \sum\limits_{i=1}^n(\frac{(x_i\bar{x})} {\sqrt{\sum\limits_{j=1}^n x_j^2n\bar{x}}} \frac{(y_i\bar{y})}{\sqrt{\sum\limits_{j=1}^n y_j^2n\bar{y}}})\end{split}\]Parameters:  matrix1 (2D array in shape [r1, c]) – MUST be continuous and rowmajor
 matrix2 (2D array in shape [r2, c]) – MUST be continuous and rowmajor
Returns: corr_data – continuous and rowmajor in np.float32
Return type: 2D array in shape [r1, r2]
brainiak.fcma.voxelselector module¶
Full Correlation Matrix Analysis (FCMA)
Correlationbased voxel selection

class
brainiak.fcma.voxelselector.
VoxelSelector
(raw_data, epochs_per_subj, labels, num_folds, voxel_unit=100, master_rank=0)¶ Bases:
object
Correlationbased voxel selection component of FCMA
Parameters:  raw_data (list of 2D array in shape [epoch length, nVoxels]) –
 Assumption: 1. all activity data contains the same number of voxels
 the activity data has been zscored, ready to compute correlation as matrix multiplication
 all subjects have the same number of epochs
 epochs belonging to the same subject are adjacent in the list
 voxel selection is always done in the autocorrelation, i.e. raw_data correlate with themselves
 if MPI jobs are running on multiple nodes, the user home directory is shared by all nodes
 epochs_per_subj (int) – The number of epochs of each subject
 labels (list of 1D array) – the condition labels of the epochs len(labels) labels equals the number of epochs
 num_folds (int) – The number of folds to be conducted in the cross validation
 voxel_unit (int, default 100) – The number of voxels assigned to a worker each time
 master_rank (int, default 0) – The process which serves as the master

run
(clf)¶ run correlationbased voxel selection in masterworker model
Sort the voxels based on the crossvalidation accuracy of their correlation vectors
Parameters: clf (classification function) – the classifier to be used in cross validation Returns: results – the accuracy numbers of all voxels, in accuracy descending order the length of array equals the number of voxels Return type: list of tuple (voxel_id, accuracy)
 raw_data (list of 2D array in shape [epoch length, nVoxels]) –