stats

Below is an auto-generated summary of the xftsim.stats submodule API.

class xftsim.stats.GWAS_Estimator(component_index=None, metadata={}, filter_sample=False, std_X=True, std_Y=True)

Bases: Statistic

Perform linear assocation studies for the given simulation.

When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:

  • the first dimension indexes variants via xft.index.DiploidVariantIndex

  • the second dimension indexes four association statistics: slope, se, test-statistic, and p-value

  • the third dimension indexes phenotypic components via xft.index.ComponentIndex

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.

Type:

xft.index.ComponentIndex, optional

estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)
class xftsim.stats.HasemanElstonEstimator(component_index=None, genetic_correlation=True, randomized=True, prettify=True, n_probe=100, dask=True, metadata={}, filter_sample=False)

Bases: Statistic

Estimate Haseman-Elston regression for the given simulation.

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all components.

Type:

xft.index.ComponentIndex, optional

genetic_correlation

If True, calculate and return the genetic correlation matrix.

Type:

bool

randomized

If True, use a randomized trace estimator.

Type:

bool

prettify

If True, prettify the output by converting it to a pandas DataFrame.

Type:

bool

n_probe

The number of random probes for trace estimation.

Type:

int

dask

If True, use dask for calculations.

Type:

bool

estimator(sim: xft.sim.Simulation) Dict

Estimate and return the Haseman-Elston regression for the given simulation.

estimator(phenotypes, current_std_phenotypes, current_std_genotypes)
class xftsim.stats.MatingStatistics(component_index=None, full=False, metadata={}, filter_sample=False)

Bases: Statistic

Calculate and return various mating statistics for the given simulation.

Parameters:
  • component_index (xft.index.ComponentIndex, optional) – Index of the component for which the statistics are calculated.

  • full (bool) – Ignore component_index and compute statistics for all components If component_index is not provided, and full = False, calculate statistics for phenotype components only.

estimator(sim: xft.sim.Simulation) Dict

Calculate and return the requested mating statistics for the given simulation.

estimator(phenotypes, mating)
class xftsim.stats.Pop_GWAS_Estimator(component_index=None, metadata={}, std_X=False, std_Y=False, assume_pairs=True, n_sub=0, PGS=True, PGS_sub_divisions=50, training_fraction=0.8)

Bases: Statistic

Perform one sib only linear assocation studies for the given simulation.

NOTE! Currently assumes each mate-pair produces exactly 2 offspring

When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:

  • the first dimension indexes variants via xft.index.DiploidVariantIndex

  • the second dimension indexes four association statistics: slope, se, test-statistic, and p-value

  • the third dimension indexes phenotypic components via xft.index.ComponentIndex

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.

Type:

xft.index.ComponentIndex, optional

estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)
class xftsim.stats.SampleStatistics(means=True, variance_components=True, variances=True, vcov=True, corr=True, prettify=True, metadata={}, filter_sample=False)

Bases: Statistic

Calculate and return various sample statistics for the given simulation.

means

If True, calculate and return the mean of each phenotype.

Type:

bool

variance_components

If True, calculate and return the variance components of each phenotype.

Type:

bool

variances

If True, calculate and return the variances of each phenotype.

Type:

bool

vcov

If True, calculate and return the variance-covariance matrix.

Type:

bool

corr

If True, calculate and return the correlation matrix.

Type:

bool

prettify

If True, prettify the output by converting it to a pandas DataFrame.

Type:

bool

estimator(sim: xft.sim.Simulation) Dict

Calculate and return the requested sample statistics for the given simulation.

estimator(phenotypes)
class xftsim.stats.Sib_GWAS_Estimator(component_index=None, metadata={}, std_X=False, std_Y=False, assume_pairs=True, n_sub=0, PGS=True, PGS_sub_divisions=50, training_fraction=0.8)

Bases: Statistic

Perform sib-difference linear assocation studies for the given simulation.

NOTE! Currently assumes each mate-pair produces exactly 2 offspring

When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:

  • the first dimension indexes variants via xft.index.DiploidVariantIndex

  • the second dimension indexes four association statistics: slope, se, test-statistic, and p-value

  • the third dimension indexes phenotypic components via xft.index.ComponentIndex

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.

Type:

xft.index.ComponentIndex, optional

estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)
class xftsim.stats.Statistic(estimator, parser, name, metadata={}, filter_sample=False, s_args=None)

Bases: object

Base class for defining statistic estimators.

name

The name of the statistic.

Type:

str

estimator

The function that estimates the statistic.

Type:

Callable

metadata

Any additional metadata

Type:

Dict

filter_sample

Apply global filter prior to estimation?

Type:

bool

estimate(sim: xft.sim.Simulation) None:

Estimate the statistic and update the results.

update_results(sim: xft.sim.Simulation, results: object) None:

Update the simulation’s results_store with the estimated results.

estimate(sim=None, **kwargs)
static null_parser(self, *args, **kwargs)
parse_results(sim)
update_results(sim, results)
xftsim.stats.apply_threshold_PGS(estimates, G, thresholds=array([5.00000000e-08, 1.03849902e-07, 2.15696043e-07, 4.48000259e-07, 9.30495659e-07, 1.93263766e-06, 4.01408463e-06, 8.33724592e-06, 1.73164434e-05, 3.59662191e-05, 7.47017665e-05, 1.55155423e-04, 3.22257509e-04, 6.69328214e-04, 1.39019339e-03, 2.88742895e-03, 5.99718426e-03, 1.24561400e-02, 2.58713783e-02, 5.37348020e-02, 1.11607078e-01, 2.31807683e-01, 4.81464104e-01, 1.00000000e+00]))
xftsim.stats.apply_threshold_PGS_all(gwas_results, G, minp=5e-08, maxp=1, nthresh=25)
xftsim.stats.haseman_elston(G, Y, n_probe=500, dtype=<class 'numpy.float32'>, dask=False)

Perform Haseman-Elston regression, with the option to choose randomized, deterministic, or randomized dask-based methods.

Parameters:
  • G (np.ndarray) – A 2D numpy array representing standardized (but not scaled) diploid genotypes.

  • Y (np.ndarray) – A 2D numpy array representing standardized phenotypes.

  • n_probe (int, optional, default 500) – The number of random probes for trace estimation. If n_probe is set to inf, use deterministic method.

  • dtype (numpy data type, optional, default np.float32) – The data type for the input arrays.

  • dask (bool, optional, default False) – If True, use dask for calculations.

Returns:

np.ndarray – A 2D numpy array representing the estimated genetic covariances.

xftsim.stats.threshold_PGS(estimates, threshold, G)