stats

Below is an auto-generated summary of the xftsim.stats submodule API.

class xftsim.stats.GWAS_Estimator(component_index=None, metadata={}, sample_filter=<xftsim.filters.PassFilter object>, std_X=True, std_Y=True, name='GWAS')

Bases: Statistic

BROKEN Perform linear assocation studies for the given simulation.

When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:

the first dimension indexes variants via xft.index.DiploidVariantIndex

the second dimension indexes four association statistics: slope, se, test-statistic, and p-value

the third dimension indexes phenotypic components via xft.index.ComponentIndex

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.

Type:: xft.index.ComponentIndex, optional

estimator(phenotypes, haplotypes)

class xftsim.stats.HasemanElstonEstimator(component_index=None, genetic_correlation=True, randomized=True, prettify=True, n_probe=100, dask=True, metadata={}, sample_filter=<xftsim.filters.PassFilter object>, name='HE_regression')

Bases: Statistic

Estimate Haseman-Elston regression for the given simulation.

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all components.

Type:: xft.index.ComponentIndex, optional

genetic_correlation

If True, calculate and return the genetic correlation matrix.

Type:: bool

randomized

If True, use a randomized trace estimator.

Type:: bool

prettify

If True, prettify the output by converting it to a pandas DataFrame.

Type:: bool

n_probe

The number of random probes for trace estimation.

Type:: int

dask

If True, use dask for calculations.

Type:: bool

estimator(sim: xft.sim.Simulation) → Dict: Estimate and return the Haseman-Elston regression for the given simulation.

estimator(phenotypes, haplotypes)

class xftsim.stats.HasemanElstonEstimatorSibship(component_index=None, genetic_correlation=True, randomized=True, prettify=True, n_probe=100, dask=True, metadata={}, sample_filter=<xftsim.filters.SibpairSampleFilter object>, name='HE_regression_sibship')

Bases: Statistic

Estimate Haseman-Elston regression for the given simulation.

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all components.

Type:: xft.index.ComponentIndex, optional

genetic_correlation

If True, calculate and return the genetic correlation matrix.

Type:: bool

randomized

If True, use a randomized trace estimator.

Type:: bool

prettify

If True, prettify the output by converting it to a pandas DataFrame.

Type:: bool

n_probe

The number of random probes for trace estimation.

Type:: int

dask

If True, use dask for calculations.

Type:: bool

estimator(sim: xft.sim.Simulation) → Dict: Estimate and return the Haseman-Elston regression for the given simulation.

estimator(phenotypes, haplotypes)

class xftsim.stats.MatingStatistics(component_index=None, full=False, metadata={}, sample_filter=<xftsim.filters.PassFilter object>, name='mating_statistics')

Bases: Statistic

Calculate and return various mating statistics for the given simulation.

Parameters:

component_index (xft.index.ComponentIndex, optional) – Index of the component for which the statistics are calculated.
full (bool) – Ignore component_index and compute statistics for all components If component_index is not provided, and full = False, calculate statistics for phenotype components only.

estimator(sim: xft.sim.Simulation) → Dict: Calculate and return the requested mating statistics for the given simulation.

estimator(phenotypes, mating)

class xftsim.stats.Pop_GWAS_Estimator(component_index=None, metadata={}, std_X=False, std_Y=False, assume_pairs=True, n_sub=0, PGS=True, PGS_sub_divisions=50, training_fraction=0.8, name='pop_GWAS', sample_filter=<xftsim.filters.UnrelatedSampleFilter object>)

Bases: Statistic

Perform one sib only linear assocation studies for the given simulation.

NOTE! Currently assumes each mate-pair produces exactly 2 offspring

When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:

the first dimension indexes variants via xft.index.DiploidVariantIndex

the second dimension indexes four association statistics: slope, se, test-statistic, and p-value

the third dimension indexes phenotypic components via xft.index.ComponentIndex

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.

Type:: xft.index.ComponentIndex, optional

estimator(phenotypes, haplotypes)

class xftsim.stats.SampleStatistics(means=True, variance_components=True, variances=True, vcov=True, corr=True, prettify=True, metadata={}, sample_filter=<xftsim.filters.PassFilter object>, name='sample_statistics', component_index='all')

Bases: Statistic

Calculate and return various sample statistics for the given simulation.

means

If True, calculate and return the mean of each phenotype.

Type:: bool

variance_components

If True, calculate and return the variance components of each phenotype. Overidden to FALSE if component_index is pheno

Type:: bool

variances

If True, calculate and return the variances of each phenotype.

Type:: bool

vcov

If True, calculate and return the variance-covariance matrix.

Type:: bool

corr

If True, calculate and return the correlation matrix.

Type:: bool

prettify

If True, prettify the output by converting it to a pandas DataFrame.

Type:: bool

component_index

If all, will compute statistics for all phenotype components If pheno, will compute statistics for phenotypes only If a ComponentIndex object, will compute statistics for specified components only

Type:: str, xft.index.ComponentIndex

estimator(sim: xft.sim.Simulation) → Dict: Calculate and return the requested sample statistics for the given simulation.

estimator(phenotypes)

class xftsim.stats.Sib_GWAS_Estimator(component_index=None, metadata={}, std_X=False, std_Y=False, assume_pairs=True, n_sub=0, PGS=True, PGS_sub_divisions=50, training_fraction=0.8, name='sib_GWAS', sample_filter=<xftsim.filters.SibpairSampleFilter object>)

Bases: Statistic

Perform sib-difference linear assocation studies for the given simulation.

NOTE! Currently assumes each mate-pair produces exactly 2 offspring

When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:

the first dimension indexes variants via xft.index.DiploidVariantIndex

the second dimension indexes four association statistics: slope, se, test-statistic, and p-value

the third dimension indexes phenotypic components via xft.index.ComponentIndex

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.

Type:: xft.index.ComponentIndex, optional

estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)

class xftsim.stats.Statistic(estimator, parser, name=None, metadata={}, sample_filter=<xftsim.filters.PassFilter object>, s_args=None)

Bases: object

Base class for defining statistic estimators.

name

The name of the statistic.

Type:: str

estimator

The function that estimates the statistic.

Type:: Callable

metadata

Any additional metadata

Type:: Dict

filter_sample

Apply global filter prior to estimation

Type:: xft.filters.SampleFilter

estimate(sim: xft.sim.Simulation) → None:: Estimate the statistic and update the results.

update_results(sim: xft.sim.Simulation, results: object) → None:: Update the simulation’s results_store with the estimated results.

estimate(sim=None, **kwargs)

static null_parser(self, *args, **kwargs)

parse_results(sim)

update_results(sim, results)

xftsim.stats.apply_threshold_PGS(estimates, G, thresholds=array([5.00000000e-08, 1.03849902e-07, 2.15696043e-07, 4.48000259e-07, 9.30495659e-07, 1.93263766e-06, 4.01408463e-06, 8.33724592e-06, 1.73164434e-05, 3.59662191e-05, 7.47017665e-05, 1.55155423e-04, 3.22257509e-04, 6.69328214e-04, 1.39019339e-03, 2.88742895e-03, 5.99718426e-03, 1.24561400e-02, 2.58713783e-02, 5.37348020e-02, 1.11607078e-01, 2.31807683e-01, 4.81464104e-01, 1.00000000e+00]))

xftsim.stats.apply_threshold_PGS_all(gwas_results, G, minp=5e-08, maxp=1, nthresh=25, alpha=0.05, thresholds=None)

xftsim.stats.haseman_elston(G, Y, n_probe=500, dtype=<class 'numpy.float32'>, dask=False)

Perform Haseman-Elston regression, with the option to choose randomized, deterministic, or randomized dask-based methods.

Parameters:

G (np.ndarray) – A 2D numpy array representing standardized (but not scaled) diploid genotypes.
Y (np.ndarray) – A 2D numpy array representing standardized phenotypes.
n_probe (int, optional, default 500) – The number of random probes for trace estimation. If n_probe is set to inf, use deterministic method.
dtype (numpy data type, optional, default np.float32) – The data type for the input arrays.
dask (bool, optional, default False) – If True, use dask for calculations.

Returns:

np.ndarray – A 2D numpy array representing the estimated genetic covariances.

xftsim.stats.threshold_PGS(estimates, threshold, G)