genomfart.popgen package¶
Submodules¶
genomfart.popgen.genetic_relationship module¶
-
class
genomfart.popgen.genetic_relationship.genetic_relationship[source]¶ Class used to calculate pairwise genetic relationships between taxa
Methods
get_genetic_relationships(snp_generator[, ...])Gets the genetic relationship matrix, as defined in the GCTA paper (e.g. get_genetic_relationships_with_missing(...)Gets the genetic relationship matrix, as defined in the GCTA paper (e.g. get_genetic_relationships_with_missing_Endelman(...)Gets the genetic relationship matrix, as defined in the Endelman and Jannink paper (e.g. -
static
get_genetic_relationships(snp_generator, min_MAF=0.025, verbosity=None)[source]¶ Gets the genetic relationship matrix, as defined in the GCTA paper (e.g. eq. 3 from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3014363/)
Note here that all pairs are assumed to have the same number of SNPs used for the calculation of their relationship (i.e. not accounting for missing genotypes)
Parameters: snp_generator : generator
Generator of numpy arrays of double, giving the genotype of each sample as a number between 0 and 2.
min_MAF : float
Minimum minor allele frequency for a marker
verbosity : int, optional
If not None, how often to print to screen how many snps have been processed
Returns
——-
(Numpy square array, in which the lower diagonal contains the
genetic relationship values. Order of rows and columns is the same
as in the generator), number of markers used
-
static
get_genetic_relationships_with_missing(snp_generator, min_MAF=0.025, max_missing=500, verbosity=None)[source]¶ Gets the genetic relationship matrix, as defined in the GCTA paper (e.g. eq. 3 from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3014363/)
Here, pairs can have different numbers of non-missing SNPs. Essentially, missing data gets imputed to the mean genotype, which zeros that entry out in the relationship equation.
Parameters: snp_generator : generator
Generator of numpy arrays of double, giving the genotype of each sample as a number between 0 and 2.
min_MAF : float
Minimum minor allele frequency for a marker
max_missing : int
The maximum number of taxa allowed to be missing for a marker
verbosity : int, optional
If not None, how often to print to screen how many snps have been processed
Returns
——-
(Numpy square array, in which the lower diagonal contains the
genetic relationship values. Order of rows and columns is the same
as in the generator), (Numpy square array of same dimension as the
relationship matrix, where lower diagonal contains number of SNPs
used to calculate that relationship)
-
static
get_genetic_relationships_with_missing_Endelman(snp_generator, min_MAF=0.025, max_missing=500, verbosity=None)[source]¶ Gets the genetic relationship matrix, as defined in the Endelman and Jannink paper (e.g. eq. 13 from http://www.g3journal.org/content/2/11/1405.full.pdf)
Here, pairs can have different numbers of non-missing SNPs. Essentially, missing data gets imputed to the mean genotype, which zeros that entry out in the relationship equation.
Parameters: snp_generator : generator
Generator of numpy arrays of double, giving the genotype of each sample as a number between 0 and 2.
min_MAF : float
Minimum minor allele frequency for a marker
max_missing : int
The maximum number of taxa allowed to be missing for a marker
verbosity : int, optional
If not None, how often to print to screen how many snps have been processed
Returns
——-
(Numpy square array, in which the lower diagonal contains the
genetic relationship values. Order of rows and columns is the same
as in the generator), (Numpy square array of same dimension as the
relationship matrix, where lower diagonal contains number of SNPs
used to calculate that relationship)
-
static