Chromosome class¶

class pytadbit.chromosome.Chromosome(name, experiment_resolutions=None, experiment_tads=None, experiment_hic_data=None, experiment_names=None, max_tad_size=5000000, chr_len=0, parser=None, centromere_search=False)[source]¶

A Chromosome object designed to deal with Topologically Associating Domains predictions from different experiments, in different cell types for a given chromosome of DNA, and to compare them.

Parameters:

name – name of the chromosome (might be a chromosome name for example)
resolutions (None) – list of resolutions corresponding to a list of experiments passed.
experiment_hic_data (None) – list() of paths to files containing the Hi-C count matrices corresponding to different experiments
experiment_tads (None) – list() of paths to files containing the definition of TADs corresponding to different experiments
experiment_names (None) – list() of the names of each experiment
max_tad_size (5000000) – maximum TAD size allowed. TADs longer than this value will not be considered, and size of the corresponding chromosome size will be reduced accordingly
chr_len (0) – size of the DNA chromosome in bp. By default it will be inferred from the distribution of TADs
parser (None) –
a parser function that returns a tuple of lists representing the data matrix and the length of a row/column. With the file example.tsv:
```
chrT_001       chrT_002        chrT_003        chrT_004
chrT_001       629     164     88      105
chrT_002       164     612     175     110
chrT_003       88      175     437     100
chrT_004       105     110     100     278
```
the output of parser(‘example.tsv’) would be be: [([629, 164, 88, 105, 164, 612, 175, 110, 88, 175, 437, 100, 105, 110, 100, 278]), 4]

Returns:

Chromosome object

add_experiment(name, resolution=None, tad_def=None, hic_data=None, replace=False, parser=None, conditions=None, **kwargs)[source]¶

Add a Hi-C experiment to Chromosome

Parameters:

name – name of the experiment or of the Experiment object
resolution – resolution of the experiment (needed if name is not an Experiment object)
hic_data (None) – whether a file or a list of lists corresponding to the Hi-C data
tad_def (None) – a file or a dict with precomputed TADs for this experiment
replace (False) – overwrite the experiments loaded under the same name
parser (None) –
a parser function that returns a tuple of lists representing the data matrix and the length of a row/column. With a file example.tsv containing:
```
chrT_001   chrT_002        chrT_003        chrT_004
chrT_001   629     164     88      105
chrT_002   164     612     175     110
chrT_003   88      175     437     100
chrT_004   105     110     100     278
```
the output of parser(‘example.tsv’) would be: [([629, 164, 88, 105, 164, 612, 175, 110, 88, 175, 437, 100, 105, 110, 100, 278]), 4]

align_experiments(names=None, verbose=False, randomize=False, rnd_method='interpolate', rnd_num=1000, **kwargs)[source]¶

Align the predicted boundaries of two different experiments. The resulting alignment will be stored in the self.experiment list.

Parameters:

names (None) – list of names of the experiments to align. If None, align all
experiment1 – name of the first experiment to align
experiment2 – name of the second experiment to align
penalty (-0.1) – penalty for inserting a gap in the alignment
max_dist (100000) – maximum distance between two boundaries allowing match (100Kb seems fair with HUMAN chromosomes)
verbose (False) – if True, print some information about the alignments
randomize (False) – check the alignment quality by comparing randomized boundaries over Chromosomes of the same size. This will return a extra value, the p-value of accepting that the observed alignment is not better than a random alignment
rnd_method (interpolate) – by default uses the interpolation of TAD distribution. The alternative method is ‘shuffle’, where TADs are simply shuffled
rnd_num (1000) – number of randomizations to do
method (reciprocal) – if global, Needleman-Wunsch is used to align (see pytadbit.boundary_aligner.globally.needleman_wunsch()); if reciprocal, a method based on reciprocal closest boundaries is used (see pytadbit.boundary_aligner.reciprocally.reciprocal())

Returns:

the alignment and the score of the alignment (by default)

find_tad(experiments, name=None, n_cpus=1, verbose=True, max_tad_size='auto', no_heuristic=False, batch_mode=False, use_visibility=False)[source]¶

Call the pytadbit.tadbit.tadbit() function to calculate the position of Topologically Associated Domains

Parameters:

experiment – A square matrix of interaction counts of Hi-C data or a list of such matrices for replicated experiments. The counts must be evenly sampled and not normalized. ‘experiment’ can be either a list of lists, a path to a file or a file handler
n_cpus (1) – The number of CPUs to allocate to TADBit. If n_cpus=’max’ the total number of CPUs will be used
max_tad_size (auto) – an integer defining the maximum size of a TAD. Default (auto) defines it as the number of rows/columns
no_heuristic (False) – whether to use or not some heuristics
batch_mode (False) – if True, all the experiments will be concatenated into one for the search of TADs. The resulting TADs found are stored under the name ‘batch’ plus a concatenation of the experiment names passed (e.g.: if experiments=[‘exp1’, ‘exp2’], the name would be: ‘batch_exp1_exp2’).

TODO: check option -> name for batch mode... some dirty changes....

get_experiment(name)[source]¶

This can also be done directly with Chromosome.experiments[name].

Parameters:	name – name of the experiment to select
Returns:	`pytadbit.Experiment`

get_tad_hic(tad, x_name, normed=True, matrix_num=0)[source]¶

Retrieve the Hi-C data matrix corresponding to a given TAD.

Parameters:	tad – a given TAD (`dict`) x_name – name of the experiment normed (True) – if True, normalize the Hi-C data
Returns:	Hi-C data matrix for the given TAD

iter_tads(x_name, normed=True)[source]¶

Iterate over the TADs corresponding to the given experiment.

Parameters:	x_name – name of the experiment normed (True) – normalize Hi-C data returned
Yields :	Hi-C data corresponding to each TAD

save_chromosome(out_f, fast=True, divide=True, force=False)[source]¶

Save a Chromosome object to a file (it uses pickle.load() from the cPickle). Once saved, the object can be loaded with load_chromosome().

Parameters:

out_f – path to the file where to store the cPickle object
fast (True) – if True, skip Hi-C data and weights
divide (True) – if True writes two pickles, one with what would result by using the fast option, and the second with the Hi-C and weights data. The second file name will be extended by ‘_hic’ (ie: with out_f=’chromosome12.pik’ we would obtain chromosome12.pik and chromosome12.pik_hic). When loaded load_chromosome() will automatically search for both files
force (False) – overwrite the existing file

set_max_tad_size(value)[source]¶

Change the maximum size allowed for TADs. It also applies to the computed experiments.

Parameters:	value – an int value (default is 5000000)

visualize(name, tad=None, focus=None, paint_tads=False, axe=None, show=True, logarithm=True, normalized=False, relative=True, decorate=True)[source]¶

Visualize the matrix of Hi-C interactions.

Parameters:

name – name of the experiment to visualize
tad (None) –
a given TAD in the form:
```
{'start': start,
 'end'  : end,
 'brk'  : end,
 'score': score}
```
Alternatively a list of the TADs can be passed (all the TADs between the first and last one passed will be showed. Thus, passing more than two TADs might be superfluous)
focus (None) – a tuple with the start and end positions of the region to visualize
paint_tads (False) – draw a box around the TADs defined for this experiment
axe (None) – an axe object from matplotlib can be passed in order to customize the picture
show (True) – either to pop-up matplotlib image or not
logarithm (True) – show the logarithm values
normalized (True) – show the normalized data (weights might have been calculated previously). Note: white rows/columns may appear in the matrix displayed; these rows correspond to filtered rows (see pytadbit.utils.hic_filtering.hic_filtering_for_modelling() )
relative (True) – color scale is relative to the whole matrix of data, not only to the region displayed
decorate (True) – draws color bar, title and axes labels

Load chromosome¶

pytadbit.chromosome.load_chromosome(in_f, fast=2)[source]¶

Load a Chromosome object from a file. A Chromosome object can be saved with the Chromosome.save_chromosome() function.

Parameters:	in_f – path to a saved Chromosome object file fast (2) – if fast=2 do not load the Hi-C data (in the case that they were saved in a separate file see `Chromosome.save_chromosome()`). If fast is equal to 1, the weights will be skipped from load to save memory. Finally if fast=0, both the weights and Hi-C data will be loaded
Returns:	a Chromosome object

TODO: remove first try/except type error... this is loading old experiments

ExperimentList class¶

class pytadbit.chromosome.ExperimentList(thing, crm)[source]¶

Inherited from python built in list(), modified for tadbit pytadbit.Experiment.

Mainly, getitem, setitem, and append were modified in order to be able to search for experiments by index or by name, and to add experiments simply using Chromosome.experiments.append(Experiment).

The whole ExperimentList object is linked to a Chromosome instance (pytadbit.Chromosome).

Chromosome class¶

Load chromosome¶

ExperimentList class¶

Table Of Contents

Previous topic

Next topic

Navigation

Chromosome class¶

Load chromosome¶

ExperimentList class¶

Table Of Contents

Previous topic

Next topic

Quick search

Navigation