genomfart.popgen package

Submodules

genomfart.popgen.genetic_relationship module

class genomfart.popgen.genetic_relationship.genetic_relationship[source]

Class used to calculate pairwise genetic relationships between taxa

Methods

get_genetic_relationships(snp_generator[, ...]) Gets the genetic relationship matrix, as defined in the GCTA paper (e.g.
get_genetic_relationships_with_missing(...) Gets the genetic relationship matrix, as defined in the GCTA paper (e.g.
get_genetic_relationships_with_missing_Endelman(...) Gets the genetic relationship matrix, as defined in the Endelman and Jannink paper (e.g.
static get_genetic_relationships(snp_generator, min_MAF=0.025, verbosity=None)[source]

Gets the genetic relationship matrix, as defined in the GCTA paper (e.g. eq. 3 from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3014363/)

Note here that all pairs are assumed to have the same number of SNPs used for the calculation of their relationship (i.e. not accounting for missing genotypes)

Parameters:

snp_generator : generator

Generator of numpy arrays of double, giving the genotype of each sample as a number between 0 and 2.

min_MAF : float

Minimum minor allele frequency for a marker

verbosity : int, optional

If not None, how often to print to screen how many snps have been processed

Returns

——-

(Numpy square array, in which the lower diagonal contains the

genetic relationship values. Order of rows and columns is the same

as in the generator), number of markers used

static get_genetic_relationships_with_missing(snp_generator, min_MAF=0.025, max_missing=500, verbosity=None)[source]

Gets the genetic relationship matrix, as defined in the GCTA paper (e.g. eq. 3 from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3014363/)

Here, pairs can have different numbers of non-missing SNPs. Essentially, missing data gets imputed to the mean genotype, which zeros that entry out in the relationship equation.

Parameters:

snp_generator : generator

Generator of numpy arrays of double, giving the genotype of each sample as a number between 0 and 2.

min_MAF : float

Minimum minor allele frequency for a marker

max_missing : int

The maximum number of taxa allowed to be missing for a marker

verbosity : int, optional

If not None, how often to print to screen how many snps have been processed

Returns

——-

(Numpy square array, in which the lower diagonal contains the

genetic relationship values. Order of rows and columns is the same

as in the generator), (Numpy square array of same dimension as the

relationship matrix, where lower diagonal contains number of SNPs

used to calculate that relationship)

static get_genetic_relationships_with_missing_Endelman(snp_generator, min_MAF=0.025, max_missing=500, verbosity=None)[source]

Gets the genetic relationship matrix, as defined in the Endelman and Jannink paper (e.g. eq. 13 from http://www.g3journal.org/content/2/11/1405.full.pdf)

Here, pairs can have different numbers of non-missing SNPs. Essentially, missing data gets imputed to the mean genotype, which zeros that entry out in the relationship equation.

Parameters:

snp_generator : generator

Generator of numpy arrays of double, giving the genotype of each sample as a number between 0 and 2.

min_MAF : float

Minimum minor allele frequency for a marker

max_missing : int

The maximum number of taxa allowed to be missing for a marker

verbosity : int, optional

If not None, how often to print to screen how many snps have been processed

Returns

——-

(Numpy square array, in which the lower diagonal contains the

genetic relationship values. Order of rows and columns is the same

as in the generator), (Numpy square array of same dimension as the

relationship matrix, where lower diagonal contains number of SNPs

used to calculate that relationship)

Module contents