Alignment of TAD boundaries

TADBit allows to use the information from different Hi-C experiments and to put it together in order to decide whether some TAD boundaries are conserved or not.

Following with the example in the previous section (Getting started), we will load one extra experiment (from the same works of [Lieberman-Aiden2009]):

from pytadbit import Chromosome

# initiate a chromosome object that will store all Hi-C data and analysis
my_chrom = Chromosome(name='My fisrt chromosome')

# load Hi-C data
my_chrom.add_experiment('First Hi-C experiment', xp_handler="sample_data/HIC_k562_chr19_chr19_100000_obs.txt", resolution=100000)
my_chrom.add_experiment('Second Hi-C experiment', xp_handler="sample_data/HIC_gm06690_chr19_chr19_100000_obs.txt", resolution=100000)

# run core tadbit function to find TADs, on each experiment
my_chrom.find_tad('First Hi-C experiment')
my_chrom.find_tad('Second Hi-C experiment')

print my_chrom.experiments

This is the output of the last print statement:

[Experiment First Hi-C experiment (resolution: 100Kb, TADs: 42, Hi-C rows: 639),
Experiment Second Hi-C experiment (resolution: 100Kb, TADs: 31, Hi-C rows: 639)]

We now have loaded two Hi-C experiments, both at 100 Kb resolution, and have predicted the location of TADs in each of them (42 TADs detected in the first experiment and 31 in the second).

Aligning boundaries

To align TAD boundaries several algorithms have been implemented (see pytadbit.chromosome.Chromosome.align_experiments()); our recommendation, however, is to use the default “reciprocal” method (pytadbit.boundary_aligner.reciprocally.reciprocal()).

Continuing with the example, the two loaded experiments are aligned as follow:

my_chrom.align_experiments(names=["First Hi-C experiment", "Second Hi-C experiment"])

print my_chrom.alignment

If the align_experiments function is run with no argument, by default all the loaded experiments will be aligned.

The result of the print statement is:

{('First Hi-C experiment', 'Second Hi-C experiment'): Alignment of boundaries (length: 60, number of experiments: 2)}

All the alignments done between the experiments belonging to the same chromosome are stored under the alignment dictionary attached to the Chromosome object. Each alignment is an object itself (see pytadbit.alignment.Alignment)

Check alignment consistency through randomization

In order to check that the alignment makes sense and that it does not correspond to a random association of boundaries, the “randomize” parameter can be set to True when aligning:

score, pval = my_chrom.align_experiments(randomize=True, rnd_method="interpolate", rnd_num=1000)

To methods are available to randomize the TAD boundaries, “shuffle” and “interpolate” (the default one). The first one is based on shuffling the observed TADs, while the second one calculates the distribution of TAD lengths and generates a random set of TADs according to this distribution (see:* pytadbit.alignment.randomization_test() for more details).

In the example used in this tutorial, the score of the alignment is 0.27 and its p-value is < 0.001.

Alignment objects

Visualization

The first function to call to check the quality of the generated alignments is the pytadbit.alignment.Alignment.write_alignment():

ali = my_chrom.alignment[('First Hi-C experiment', 'Second Hi-C experiment')]
stdin

Alignment shown in Kb (2 experiments) (scores: 0 1 2 3 4 5 6 7 8 9 10)
  First Hi-C experiment :|   500|  1200| ---- | ---- |  3100| ---- |  4500| ---- |  5800|  6900|  7700| ---- | ---- | 10300| 10800| 11400| 12400| ---- | 13100| 13600| 14400| 16300| 18300| 18800| 19400| 24400| 32900| 34700| 35500| 37700| 38300| ---- | 39900| ---- | 41200| ---- | 43400| 44600| 45200| 45700| 47100| 47700| 48500| 49700| 50500| ---- | 52300| 53000| 55300| 56200| ---- | 59300| 60800| ---- | 63800
  Second Hi-C experiment:|   400|  1100|  1700|  2600| ---- |  4100|  4600|  5600| ---- | ---- |  7800|  8500|  9700| ---- | ---- | 11400| ---- | 12600| ---- | ---- | ---- | ---- | ---- | ---- | 19400| 24500| ---- | ---- | ---- | 37700| ---- | 39600| ---- | 40100| 41200| 42900| ---- | ---- | ---- | ---- | ---- | 47700| 48500| 49700| ---- | 50900| ---- | 53000| 55300| 56200| 56800| 59200| 60800| 62300| 63800

The different colors, corresponding to the TADBit confidence in detecting the boundaries, show how conserved the boundaries are between (in this case) cell types.

The get_column function

The pytadbit.alignment.Alignment.get_column() function allows to select specific columns of an alignment.

To select, for example, the third column of an alignment:

ali.get_column(3)

This will return:

[(3, [>-<, >2600<])]

The first element of the tuple is the column index, while the two values of the second element of the tuple are the TADs associated to the aligned boundaries in that column. Note that TAD objects are represented between the ‘>’ and ‘<’ symbols (see: pytadbit.alignment.TAD).

The pytadbit.alignment.Alignment.get_column() function can also take as an argument a function, in order to select a column (or several) depending on a specific condition. For example, to select all the boundaries with a score higher than 7:

cond1 = lambda x: x['score'] > 7

and to the get the selected columns:

ali.get_column(cond1=cond1)

resulting, in this example, in the following 3 columns:

[(24, [>19400<, >19400<]), (25, [>24400<, >24500<]), (54, [>63800<, >63800<])]

To add a second condition, e.g. to select only the columns after the 50th column of the alignment:

cond2 = lambda x: x['pos'] > 50
ali.get_column(cond1=cond1, cond2=cond2)

Resulting in:

[(54, [>63800<, >63800<])]

Finally, to be more flexible, this conditions can be applied to only a given number of experiments (in this example of a pairwise alignment, it does not make a lot of sense):

ali.get_column(cond1=cond1, cond2=cond2, min_num=1)

Will result in:

[(52, [>60800<, >60800<]), (54, [>63800<, >63800<])]