stats
Below is an auto-generated summary of the xftsim.stats submodule API.
- class xftsim.stats.GWAS_Estimator(component_index=None, metadata={}, filter_sample=False, std_X=True, std_Y=True)
Bases:
Statistic
Perform linear assocation studies for the given simulation.
When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:
the first dimension indexes variants via xft.index.DiploidVariantIndex
the second dimension indexes four association statistics: slope, se, test-statistic, and p-value
the third dimension indexes phenotypic components via xft.index.ComponentIndex
- component_index
Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.
- Type:
xft.index.ComponentIndex
, optional
- estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)
- class xftsim.stats.HasemanElstonEstimator(component_index=None, genetic_correlation=True, randomized=True, prettify=True, n_probe=100, dask=True, metadata={}, filter_sample=False)
Bases:
Statistic
Estimate Haseman-Elston regression for the given simulation.
- component_index
Index of the component for which the statistics are calculated. If not provided, calculate statistics for all components.
- Type:
xft.index.ComponentIndex
, optional
- genetic_correlation
If True, calculate and return the genetic correlation matrix.
- Type:
bool
- randomized
If True, use a randomized trace estimator.
- Type:
bool
- prettify
If True, prettify the output by converting it to a pandas DataFrame.
- Type:
bool
- n_probe
The number of random probes for trace estimation.
- Type:
int
- dask
If True, use dask for calculations.
- Type:
bool
- estimator(sim: xft.sim.Simulation) Dict
Estimate and return the Haseman-Elston regression for the given simulation.
- estimator(phenotypes, current_std_phenotypes, current_std_genotypes)
- class xftsim.stats.MatingStatistics(component_index=None, full=False, metadata={}, filter_sample=False)
Bases:
Statistic
Calculate and return various mating statistics for the given simulation.
- Parameters:
component_index (
xft.index.ComponentIndex
, optional) – Index of the component for which the statistics are calculated.full (
bool
) – Ignore component_index and compute statistics for all components If component_index is not provided, and full = False, calculate statistics for phenotype components only.
- estimator(sim: xft.sim.Simulation) Dict
Calculate and return the requested mating statistics for the given simulation.
- estimator(phenotypes, mating)
- class xftsim.stats.Pop_GWAS_Estimator(component_index=None, metadata={}, std_X=False, std_Y=False, assume_pairs=True, n_sub=0, PGS=True, PGS_sub_divisions=50, training_fraction=0.8)
Bases:
Statistic
Perform one sib only linear assocation studies for the given simulation.
NOTE! Currently assumes each mate-pair produces exactly 2 offspring
When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:
the first dimension indexes variants via xft.index.DiploidVariantIndex
the second dimension indexes four association statistics: slope, se, test-statistic, and p-value
the third dimension indexes phenotypic components via xft.index.ComponentIndex
- component_index
Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.
- Type:
xft.index.ComponentIndex
, optional
- estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)
- class xftsim.stats.SampleStatistics(means=True, variance_components=True, variances=True, vcov=True, corr=True, prettify=True, metadata={}, filter_sample=False)
Bases:
Statistic
Calculate and return various sample statistics for the given simulation.
- means
If True, calculate and return the mean of each phenotype.
- Type:
bool
- variance_components
If True, calculate and return the variance components of each phenotype.
- Type:
bool
- variances
If True, calculate and return the variances of each phenotype.
- Type:
bool
- vcov
If True, calculate and return the variance-covariance matrix.
- Type:
bool
- corr
If True, calculate and return the correlation matrix.
- Type:
bool
- prettify
If True, prettify the output by converting it to a pandas DataFrame.
- Type:
bool
- estimator(sim: xft.sim.Simulation) Dict
Calculate and return the requested sample statistics for the given simulation.
- estimator(phenotypes)
- class xftsim.stats.Sib_GWAS_Estimator(component_index=None, metadata={}, std_X=False, std_Y=False, assume_pairs=True, n_sub=0, PGS=True, PGS_sub_divisions=50, training_fraction=0.8)
Bases:
Statistic
Perform sib-difference linear assocation studies for the given simulation.
NOTE! Currently assumes each mate-pair produces exactly 2 offspring
When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:
the first dimension indexes variants via xft.index.DiploidVariantIndex
the second dimension indexes four association statistics: slope, se, test-statistic, and p-value
the third dimension indexes phenotypic components via xft.index.ComponentIndex
- component_index
Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.
- Type:
xft.index.ComponentIndex
, optional
- estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)
- class xftsim.stats.Statistic(estimator, parser, name, metadata={}, filter_sample=False, s_args=None)
Bases:
object
Base class for defining statistic estimators.
- name
The name of the statistic.
- Type:
str
- estimator
The function that estimates the statistic.
- Type:
Callable
- metadata
Any additional metadata
- Type:
Dict
- filter_sample
Apply global filter prior to estimation?
- Type:
bool
- estimate(sim: xft.sim.Simulation) None:
Estimate the statistic and update the results.
- update_results(sim: xft.sim.Simulation, results: object) None:
Update the simulation’s results_store with the estimated results.
- estimate(sim=None, **kwargs)
- static null_parser(self, *args, **kwargs)
- parse_results(sim)
- update_results(sim, results)
- xftsim.stats.apply_threshold_PGS(estimates, G, thresholds=array([5.00000000e-08, 1.03849902e-07, 2.15696043e-07, 4.48000259e-07, 9.30495659e-07, 1.93263766e-06, 4.01408463e-06, 8.33724592e-06, 1.73164434e-05, 3.59662191e-05, 7.47017665e-05, 1.55155423e-04, 3.22257509e-04, 6.69328214e-04, 1.39019339e-03, 2.88742895e-03, 5.99718426e-03, 1.24561400e-02, 2.58713783e-02, 5.37348020e-02, 1.11607078e-01, 2.31807683e-01, 4.81464104e-01, 1.00000000e+00]))
- xftsim.stats.apply_threshold_PGS_all(gwas_results, G, minp=5e-08, maxp=1, nthresh=25)
- xftsim.stats.haseman_elston(G, Y, n_probe=500, dtype=<class 'numpy.float32'>, dask=False)
Perform Haseman-Elston regression, with the option to choose randomized, deterministic, or randomized dask-based methods.
- Parameters:
G (
np.ndarray
) – A 2D numpy array representing standardized (but not scaled) diploid genotypes.Y (
np.ndarray
) – A 2D numpy array representing standardized phenotypes.n_probe (
int
, optional, default500
) – The number of random probes for trace estimation. If n_probe is set to inf, use deterministic method.dtype (
numpy data type
, optional, defaultnp.float32
) – The data type for the input arrays.dask (
bool
, optional, defaultFalse
) – If True, use dask for calculations.
- Returns:
np.ndarray
– A 2D numpy array representing the estimated genetic covariances.
- xftsim.stats.threshold_PGS(estimates, threshold, G)