Chromosome class

class pytadbit.chromosome.Chromosome(name, experiment_resolutions=None, experiment_tads=None, experiment_hic_data=None, experiment_names=None, max_tad_size=5000000, chr_len=0, parser=None, centromere_search=False)[source]

A Chromosome object designed to deal with Topologically Associating Domains predictions from different experiments, in different cell types for a given chromosome of DNA, and to compare them.

Parameters:
  • name – name of the chromosome (might be a chromosome name for example)
  • resolutions (None) – list of resolutions corresponding to a list of experiments passed.
  • experiment_hic_data (None) – list() of paths to files containing the Hi-C count matrices corresponding to different experiments
  • experiment_tads (None) – list() of paths to files containing the definition of TADs corresponding to different experiments
  • experiment_names (None) – list() of the names of each experiment
  • max_tad_size (5000000) – maximum TAD size allowed. TADs longer than this value will not be considered, and size of the corresponding chromosome size will be reduced accordingly
  • chr_len (0) – size of the DNA chromosome in bp. By default it will be inferred from the distribution of TADs
  • parser (None) –

    a parser function that returns a tuple of lists representing the data matrix and the length of a row/column. With the file example.tsv:

    chrT_001       chrT_002        chrT_003        chrT_004
    chrT_001       629     164     88      105
    chrT_002       164     612     175     110
    chrT_003       88      175     437     100
    chrT_004       105     110     100     278

    the output of parser(‘example.tsv’) would be be: [([629, 164, 88, 105, 164, 612, 175, 110, 88, 175, 437, 100, 105, 110, 100, 278]), 4]

Returns:

Chromosome object

add_experiment(name, resolution=None, tad_def=None, hic_data=None, replace=False, parser=None, conditions=None, **kwargs)[source]

Add a Hi-C experiment to Chromosome

Parameters:
  • name – name of the experiment or of the Experiment object
  • resolution – resolution of the experiment (needed if name is not an Experiment object)
  • hic_data (None) – whether a file or a list of lists corresponding to the Hi-C data
  • tad_def (None) – a file or a dict with precomputed TADs for this experiment
  • replace (False) – overwrite the experiments loaded under the same name
  • parser (None) –

    a parser function that returns a tuple of lists representing the data matrix and the length of a row/column. With a file example.tsv containing:

    chrT_001   chrT_002        chrT_003        chrT_004
    chrT_001   629     164     88      105
    chrT_002   164     612     175     110
    chrT_003   88      175     437     100
    chrT_004   105     110     100     278

    the output of parser(‘example.tsv’) would be: [([629, 164, 88, 105, 164, 612, 175, 110, 88, 175, 437, 100, 105, 110, 100, 278]), 4]

align_experiments(names=None, verbose=False, randomize=False, rnd_method='interpolate', rnd_num=1000, **kwargs)[source]

Align the predicted boundaries of two different experiments. The resulting alignment will be stored in the self.experiment list.

Parameters:
  • names (None) – list of names of the experiments to align. If None, align all
  • experiment1 – name of the first experiment to align
  • experiment2 – name of the second experiment to align
  • penalty (-0.1) – penalty for inserting a gap in the alignment
  • max_dist (100000) – maximum distance between two boundaries allowing match (100Kb seems fair with HUMAN chromosomes)
  • verbose (False) – if True, print some information about the alignments
  • randomize (False) – check the alignment quality by comparing randomized boundaries over Chromosomes of the same size. This will return a extra value, the p-value of accepting that the observed alignment is not better than a random alignment
  • rnd_method (interpolate) – by default uses the interpolation of TAD distribution. The alternative method is ‘shuffle’, where TADs are simply shuffled
  • rnd_num (1000) – number of randomizations to do
  • method (reciprocal) – if global, Needleman-Wunsch is used to align (see pytadbit.boundary_aligner.globally.needleman_wunsch()); if reciprocal, a method based on reciprocal closest boundaries is used (see pytadbit.boundary_aligner.reciprocally.reciprocal())
Returns:

the alignment and the score of the alignment (by default)

find_tad(experiments, name=None, n_cpus=1, verbose=True, max_tad_size='auto', no_heuristic=False, batch_mode=False, use_visibility=False)[source]

Call the pytadbit.tadbit.tadbit() function to calculate the position of Topologically Associated Domains

Parameters:
  • experiment – A square matrix of interaction counts of Hi-C data or a list of such matrices for replicated experiments. The counts must be evenly sampled and not normalized. ‘experiment’ can be either a list of lists, a path to a file or a file handler
  • n_cpus (1) – The number of CPUs to allocate to TADBit. If n_cpus=’max’ the total number of CPUs will be used
  • max_tad_size (auto) – an integer defining the maximum size of a TAD. Default (auto) defines it as the number of rows/columns
  • no_heuristic (False) – whether to use or not some heuristics
  • batch_mode (False) – if True, all the experiments will be concatenated into one for the search of TADs. The resulting TADs found are stored under the name ‘batch’ plus a concatenation of the experiment names passed (e.g.: if experiments=[‘exp1’, ‘exp2’], the name would be: ‘batch_exp1_exp2’).

TODO: check option -> name for batch mode... some dirty changes....

get_experiment(name)[source]

This can also be done directly with Chromosome.experiments[name].

Parameters:name – name of the experiment to select
Returns:pytadbit.Experiment
get_tad_hic(tad, x_name, normed=True, matrix_num=0)[source]

Retrieve the Hi-C data matrix corresponding to a given TAD.

Parameters:
  • tad – a given TAD (dict)
  • x_name – name of the experiment
  • normed (True) – if True, normalize the Hi-C data
Returns:

Hi-C data matrix for the given TAD

iter_tads(x_name, normed=True)[source]

Iterate over the TADs corresponding to the given experiment.

Parameters:
  • x_name – name of the experiment
  • normed (True) – normalize Hi-C data returned
Yields :

Hi-C data corresponding to each TAD

save_chromosome(out_f, fast=True, divide=True, force=False)[source]

Save a Chromosome object to a file (it uses pickle.load() from the cPickle). Once saved, the object can be loaded with load_chromosome().

Parameters:
  • out_f – path to the file where to store the cPickle object
  • fast (True) – if True, skip Hi-C data and weights
  • divide (True) – if True writes two pickles, one with what would result by using the fast option, and the second with the Hi-C and weights data. The second file name will be extended by ‘_hic’ (ie: with out_f=’chromosome12.pik’ we would obtain chromosome12.pik and chromosome12.pik_hic). When loaded load_chromosome() will automatically search for both files
  • force (False) – overwrite the existing file
set_max_tad_size(value)[source]

Change the maximum size allowed for TADs. It also applies to the computed experiments.

Parameters:value – an int value (default is 5000000)
visualize(name, tad=None, focus=None, paint_tads=False, axe=None, show=True, logarithm=True, normalized=False, relative=True, decorate=True)[source]

Visualize the matrix of Hi-C interactions.

Parameters:
  • name – name of the experiment to visualize
  • tad (None) –

    a given TAD in the form:

    {'start': start,
     'end'  : end,
     'brk'  : end,
     'score': score}
    

    Alternatively a list of the TADs can be passed (all the TADs between the first and last one passed will be showed. Thus, passing more than two TADs might be superfluous)

  • focus (None) – a tuple with the start and end positions of the region to visualize
  • paint_tads (False) – draw a box around the TADs defined for this experiment
  • axe (None) – an axe object from matplotlib can be passed in order to customize the picture
  • show (True) – either to pop-up matplotlib image or not
  • logarithm (True) – show the logarithm values
  • normalized (True) – show the normalized data (weights might have been calculated previously). Note: white rows/columns may appear in the matrix displayed; these rows correspond to filtered rows (see pytadbit.utils.hic_filtering.hic_filtering_for_modelling() )
  • relative (True) – color scale is relative to the whole matrix of data, not only to the region displayed
  • decorate (True) – draws color bar, title and axes labels

Load chromosome

pytadbit.chromosome.load_chromosome(in_f, fast=2)[source]

Load a Chromosome object from a file. A Chromosome object can be saved with the Chromosome.save_chromosome() function.

Parameters:
  • in_f – path to a saved Chromosome object file
  • fast (2) – if fast=2 do not load the Hi-C data (in the case that they were saved in a separate file see Chromosome.save_chromosome()). If fast is equal to 1, the weights will be skipped from load to save memory. Finally if fast=0, both the weights and Hi-C data will be loaded
Returns:

a Chromosome object

TODO: remove first try/except type error... this is loading old experiments

ExperimentList class

class pytadbit.chromosome.ExperimentList(thing, crm)[source]

Inherited from python built in list(), modified for tadbit pytadbit.Experiment.

Mainly, getitem, setitem, and append were modified in order to be able to search for experiments by index or by name, and to add experiments simply using Chromosome.experiments.append(Experiment).

The whole ExperimentList object is linked to a Chromosome instance (pytadbit.Chromosome).