Utilities

pytadbit.utils.tadmaths.calinski_harabasz(scores, clusters)[source]

Implementation of the CH score [CalinskiHarabasz1974], that has shown to be one the most accurate way to compare clustering methods [MilliganCooper1985] [Tibshirani2001].

The CH score is:

\[CH(k) = \frac{B(k) / (k-1)}{W(k)/(n - k)}\]

Where \(B(k)\) and \(W(k)\) are between and within cluster sums of squares, with \(k\) clusters, and \(n\) the total number of points (models in this case).

Parameters:
  • scores – a dict with, as keys, a tuple with a pair of models; and, as value, the distance between these models.
  • clusters – a dict with, as key, the cluster number, and as value a list of models
  • nmodels – total number of models
Returns:

the CH score

pytadbit.utils.tadmaths.calc_eqv_rmsd(models, nloci, dcutoff=200, var='score', one=False)[source]
Parameters:
  • nloci – number of particles per model
  • dcutoff (200) – distance in nanometer from which it is considered that two particles are separated.
  • fact (0.75) – Factor for equivalent positions
  • var (‘score’) –

    value to return, can be either (i) ‘drmsd’ (symmetry independent: mirrors will show no differences) (ii) ‘score’ that is:

                          dRMSD[i] / max(dRMSD)
    score[i] = eqvs[i] * -----------------------
                           RMSD[i] / max(RMSD)

    where eqvs[i] is the number of equivalent position for the ith pairwise model comparison.

Returns:

a score (depends on ‘var’ argument)

pytadbit.utils.extraviews.plot_3d_optimization_result(result, axes=('scale', 'maxdist', 'upfreq', 'lowfreq'))[source]

Displays a three dimensional scatter plot representing the result of the optimization.

Parameters:
  • result – 3D numpy array contating correlation values
  • axes (‘scale’,’maxdist’,’upfreq’,’lowfreq’) – tuple of axes to represent. The order will define which parameter will be placed on the w, z, y or x axe.
pytadbit.utils.extraviews.plot_2d_optimization_result(result, axes=('scale', 'maxdist', 'upfreq', 'lowfreq'), show_best=0, skip=None)[source]

A grid of heatmaps representing the result of the optimization.

Parameters:
  • result – 3D numpy array contating correlation values
  • axes (‘scale’,’maxdist’,’upfreq’,’lowfreq’) – tuple of axes to represent. The order will define which parameter will be placed on the w, z, y or x axe.
  • show_best (0) – number of best correlation value to identifie.
  • skip (None) – a dict can be passed here in order to fix a given axe, e.g.: {‘scale’: 0.001, ‘maxdist’: 500}
pytadbit.utils.hic_filtering.hic_filtering_for_modelling(matrx, method='mean')[source]
Parameters:
  • matrx – Hi-C matrix of a given experiment
  • method (mean) – method to use for filtering Hi-C columns. Aims to remove columns with abnormally low count of interactions
Returns:

the indexes of the columns not to be considered for the calculation of the z-score