xftsim package¶
Submodules¶
xftsim.arch module¶
- class xftsim.arch.AdditiveGeneticComponent(beta=None, metadata={}, component_name='addGenetic')¶
Bases:
ArchitectureComponent
A genetic component with additive effects.
- Parameters:
beta (
xft.effect.AdditiveEffects
, optional) – Additive effects, by default None.metadata (
Dict
, optional) – Additional metadata, by default an empty dictionary.
- effects¶
Additive effects.
- Type:
xft.effect.AdditiveEffects
- compute_component(haplotypes, phenotypes)¶
Compute the additive genetic component of the phenotype.
- Parameters:
haplotypes (
xr.DataArray
) – Haplotypes to be used in the computation.phenotypes (
xr.DataArray
) – Phenotypes to be modified.
- property true_cov_beta¶
Compute the covariance matrix of the additive effects.
- Returns:
ndarray
– Covariance matrix of the additive effects.
- property true_rho_beta¶
Compute the correlation coefficient matrix of the additive effects.
- Returns:
ndarray
– Correlation coefficient matrix of the additive effects.
- class xftsim.arch.AdditiveNoiseComponent(variances=None, sds=None, means=None, phenotype_name=None, component_index=None, component_name='addNoise')¶
Bases:
ArchitectureComponent
An independent Gaussian noise component.
- Parameters:
variances (
Iterable
, optional) – Variances of the noise components, by default None.sds (
Iterable
, optional) – Standard deviations of the noise components, by default None.means (
Iterable
, optional) – Means of the noise components, by default set to zero.phenotype_name (
Iterable
, optional) – Names of the phenotypes, by default None. Included for backwards compatability. Do not specify if providing component_indexcomponent_index (
xftsim.index.ComponentIndex
, optional) – Alternatively, provide output component index
- variances¶
Variances of the noise components.
- Type:
ndarray
- sds¶
Standard deviations of the noise components.
- Type:
ndarray
- compute_component(haplotypes, phenotypes)¶
Compute the noise component of the phenotype.
- Parameters:
haplotypes (
xr.DataArray
) – Haplotypes, not used in the computation.phenotypes (
xr.DataArray
) – Phenotypes to be modified.
- class xftsim.arch.Architecture(components=None, metadata={}, depth=1, expand_components=False)¶
Bases:
object
Class representing a phenogenetic architecure
- Parameters:
components (
Iterable
, optional) – An iterable collection of ArchitectureComponent objectsmetadata (
Dict
, optional) – A dictionary for holding metadata about the Architecture objectdepth (
int
, optional) – The generational depth of the architecture, default to 1expand_components (
bool
, optional) – A boolean flag indicating whether to expand the components, default to False
- metadata¶
A dictionary for holding metadata about the Architecture object
- Type:
Dict
- components¶
An iterable collection of ArchitectureComponent objects
- Type:
Iterable
- depth¶
The depth of the architecture
- Type:
int
- expand_components¶
A boolean flag indicating whether to expand the components
- Type:
bool
- founder_initializations() List: ¶
Get a list of the founder initialization of each component
- merged_component_indexer() xft.index.ComponentIndex: ¶
Get the merged component indexer
- initialize_phenotype_array(haplotypes: xr.DataArray, control: dict = None) xr.DataArray: ¶
Initialize a new phenotype array
- initialize_founder_phenotype_array(haplotypes: xr.DataArray, control: dict = None) xr.DataArray: ¶
Initialize a new founder phenotype array
- compute_phenotypes(haplotypes: xr.DataArray = None, phenotypes: xr.DataArray = None, control: dict = None) None: ¶
Compute phenotypes for the given haplotypes and phenotypes
- check_dependencies()¶
- compute_phenotypes(haplotypes=None, phenotypes=None, control=None)¶
Compute phenotypes.
- Parameters:
haplotypes (
xr.DataArray
, optional) – Input haplotypes.phenotypes (
xr.DataArray
, optional) – Input phenotypes.control (
dict
, optional) – Dictionary containing control parameters.
- property dependency_graph¶
- property dependency_graph_edges¶
- draw_dependency_graph(node_color='none', node_size=1200, font_size=5, margins=0.1, edge_color='#222222', arrowsize=6, number_edges=True, **kwargs)¶
- property founder_initializations¶
Get a list of the founder initialization of each component
- initialize_founder_phenotype_array(haplotypes, control=None)¶
Initialize a founder generation phenotype array from haplotypes under the specified architecture. In the absense of vertical transmission, this is equivalent to initialize_phenotype_array().
- Parameters:
haplotypes (
xr.DataArray
) – Input haplotypes.control (
dict
, optional) – Dictionary containing control parameters.
- Returns:
xr.DataArray
– Phenotype array with the merged component indexer and sample indexer.
- initialize_phenotype_array(haplotypes, control=None)¶
Initialize a phenotype array from haplotypes under the specified architecture.
- Parameters:
haplotypes (
xr.DataArray
) – Input haplotypes.control (
dict
, optional) – Dictionary containing control parameters.
- Returns:
xr.DataArray
– Phenotype array with the merged component indexer and sample indexer.
- property merged_component_indexer¶
Get the merged ComponentIndex indexer across all archtecure components
- class xftsim.arch.ArchitectureComponent(compute_component=None, input_cindex=None, output_cindex=None, input_haplotypes=False, founder_initialization=None, component_name='generic')¶
Bases:
object
Class representing a component of a genetic architecture.
- Parameters:
compute_component (
Callable
, optional) – Function that accesses haplotypes and/or phenotypes and modifies phenotypes by reference, by default None.input_cindex (
xft.index.ComponentIndex
, optional) – Index of the input component, by default None.output_cindex (
xft.index.ComponentIndex
, optional) – Index of the output component, by default None.input_haplotypes (
bool
orxft.index.HaploidVariantIndex
, optional) – Boolean or HaploidVariantIndex indicating if input haplotypes are used, by default False.founder_initialization (
Callable
, optional) – Function that initializes founder haplotypes for the component, by default None.
- _compute_component¶
Function that accesses haplotypes and/or phenotypes and modifies phenotypes by reference.
- Type:
Callable
orNone
- input_haplotypes¶
Boolean or HaploidVariantIndex indicating if input haplotypes are used.
- Type:
bool
orxft.index.HaploidVariantIndex
- input_cindex¶
Index of the input component.
- Type:
xft.index.ComponentIndex
- output_cindex¶
Index of the output component.
- Type:
xft.index.ComponentIndex
- founder_initialization¶
Function that initializes founder haplotypes for the component.
- Type:
Callable
orNone
- property component_name¶
- compute_component(haplotypes=None, phenotypes=None)¶
Function that accesses haplotypes and/or phenotypes and modifies phenotypes by reference.
- Parameters:
haplotypes (
xr.DataArray
, optional) – Haplotypes to be accessed, by default None.phenotypes (
xr.DataArray
, optional) – Phenotypes to be accessed and modified, by default None.
- static default_input_cindex(*args, **kwargs)¶
Static method to define the default input component index.
- static default_output_cindex(*args, **kwargs)¶
Static method to define the default output component index.
- property dependency_graph¶
- property dependency_graph_edges¶
- draw_dependency_graph(node_color='none', node_size=1500, arrowsize=7, font_size=6, margins=0.1, **kwargs)¶
- property input_component_name¶
- property input_phenotype_name¶
- property input_vorigin_relative¶
- property merged_phenotype_indexer¶
- property output_component_name¶
- property output_phenotype_name¶
- property output_vorigin_relative¶
- property phenotype_name¶
- property vorigin_relative¶
- class xftsim.arch.BinarizingTransformation(thresholds, input_cindex, output_cindex, component_name='binarize')¶
Bases:
ArchitectureComponent
An architecture component that binarizes specified phenotypes based on specified thresholds under a liability-threshold model.
Attributes:¶
- thresholdsIterable
A list or array of thresholds used for binarization.
- input_cindexxft.index.ComponentIndex
The input component index.
- output_cindexxft.index.ComponentIndex
The output component index.
- phenotype_nameIterable
The name of the phenotype.
- liability_componentstr
The liability component to be used. Default is ‘phenotype’.
- vorigin_relativeIterable
The relative V origin. Default is [-1].
- output_componentstr
The name of the output component. Default is ‘binary_phenotype’.
Methods:¶
- construct_input_cindex(phenotype_name: Iterable,
liability_component: str = ‘phenotype’, vorigin_relative: Iterable = [-1],) -> xft.index.ComponentIndex
Constructs the input component index based on given phenotype names.
- construct_output_cindex(phenotype_name: Iterable,
output_component: str = ‘binary_phenotype’, vorigin_relative: Iterable = [-1],) -> xft.index.ComponentIndex
Constructs the output component index based on given phenotype names.
- construct_cindexes(phenotype_name: Iterable,
liability_component: str = ‘phenotype’, output_component: str = ‘binary_phenotype’, vorigin_relative: Iterable = [-1],) -> Tuple[xft.index.ComponentIndex, xft.index.ComponentIndex]
Constructs both the input and output component indexes based on given phenotype names.
- compute_component(self,
haplotypes: xr.DataArray, phenotypes: xr.DataArray) -> None:
Computes the binary phenotype based on the given thresholds.
- compute_component(haplotypes, phenotypes)¶
Computes the binarizing transformation.
- Parameters:
haplotypes (
xr.DataArray
) – The haplotypes.phenotypes (
xr.DataArray
) – The phenotypes.
- static construct_cindexes(phenotype_name, liability_component='phenotype', output_component='binary_phenotype', vorigin_relative=[-1])¶
Constructs both input and output component indexes for the binarizing transformation.
- Parameters:
phenotype_name (
Iterable
) – Names of the phenotypes.liability_component (
str
, optional) – Name of the liability component. Default is “phenotype”.output_component (
str
, optional) – Name of the output component. Default is “binary_phenotype”.vorigin_relative (
Iterable
, optional) – v-origin relative. Default is [-1].
- Returns:
Tuple[xft.index.ComponentIndex
,xft.index.ComponentIndex]
– The input and output component indexes.
- static construct_input_cindex(phenotype_name, liability_component='phenotype', vorigin_relative=[-1])¶
Constructs the input component index for the binarizing transformation.
- Parameters:
phenotype_name (
Iterable
) – Names of the phenotypes.liability_component (
str
, optional) – Name of the liability component. Default is “phenotype”.vorigin_relative (
Iterable
, optional) – v-origin relative. Default is [-1].
- Returns:
xft.index.ComponentIndex
– The input component index.
- static construct_output_cindex(phenotype_name, output_component='binary_phenotype', vorigin_relative=[-1])¶
Constructs the output component index for the binarizing transformation.
- Parameters:
phenotype_name (
Iterable
) – Names of the phenotypes.output_component (
str
, optional) – Name of the output component. Default is “binary_phenotype”.vorigin_relative (
Iterable
, optional) – v-origin relative. Default is [-1].
- Returns:
xft.index.ComponentIndex
– The output component index.
- class xftsim.arch.ConstantFounderInitialization(component_index=None, constants=None)¶
Bases:
FounderInitialization
Founder initialization that sets all haplotypes to constant values.
Bases:
ArchitectureComponent
Multivariate Gaussian noise component.
- Parameters:
vcov (
ndarray
, optional) – variance covariance matrixmeans (
Iterable
, optional) – Means of the noise components, by default set to zero.phenotype_name (
Iterable
, optional) – Names of the phenotypes, by default None. Included for backwards compatability. Do not specify if providing component_indexcomponent_index (
xftsim.index.ComponentIndex
, optional) – Alternatively, provide output component index
Compute the noise component of the phenotype.
- Parameters:
haplotypes (
xr.DataArray
) – Haplotypes, not used in the computation.phenotypes (
xr.DataArray
) – Phenotypes to be modified.
- class xftsim.arch.FounderInitialization(component_index=None, initialize_component=None)¶
Bases:
object
Base class for founder initialization.
- initialize_component(phenotypes)¶
Initialize founder haplotypes for a single phenotype component.
- Parameters:
phenotypes (
xr.DataArray
) – Phenotypes for a single phenotype component.- Raises:
Warning – If no initialization method is defined.
- class xftsim.arch.GCTA_Architecture(h2, Rg=None, phenotype_name=None, variant_indexer=None, haplotypes=None)¶
Bases:
Architecture
Additive genetic architecture object under GCTA infinitessimal model <CITE>
Under this genetic architecture, all variants are causal and standardized genetic variants / sqrt(m) have the user specified (possibly diagonal) genetic correlation matrix and variance equal to h2.
- Parameters:
h2 (
Iterable
) – Vector of genetic variances or genetic variance/covariance matrixRg (
numpy.ndarray
) – Optional genetic correlation matrixphenotype_name (
Iterable
) – Optional names of phenotypesvariant_indexer (
xft.index.HaploidVariantIndex | xft.index.DiploidVariantIndex
) – Variant indexer, will determine ploidy automatically Phenotype component indexer, defaults to xft.index.ComponentIndex.RangeIndex if not providedhaplotypes (
xr.DataArray
) – Alternatively, one can simply provide haplotypes instead of the variant indexer. Ignored if variant_indexer is supplied.
- class xftsim.arch.GaussianFounderInitialization(component_index=None, variances=None, sds=None, means=None)¶
Bases:
FounderInitialization
A class for initializing founder haplotypes by drawing iid samples from normal distributions with the specified means and standard deviations.
- Parameters:
component_index (
xft.index.ComponentIndex
, optional) – A ComponentIndex object containing the indexing information of the components. If not provided, then the initialization will be null.variances (
Iterable
, optional) – An iterable object of length k_total specifying the variances of the Gaussian distribution. Either variances or sds must be provided.sds (
Iterable
, optional) – An iterable object of length k_total specifying the standard deviations of the Gaussian distribution. Either variances or sds must be provided.means (
Iterable
, optional) – An iterable object of length k_total specifying the means of the Gaussian distribution. If not provided, then the means will be set to 0.
- Raises:
AssertionError – If neither variances nor sds is provided or if the length of component_index does not match the length of sds.
- sds¶
An array of standard deviations.
- Type:
numpy.ndarray
- means¶
An array of means.
- Type:
numpy.ndarray
- component_index¶
An object containing the indexing information of the components.
- Type:
xft.index.ComponentIndex
- class xftsim.arch.HorizontalComponent(input_cindex, output_cindex, coefficient_matrix=None, normalize=True, component_name='linHoriz')¶
- class xftsim.arch.InfinitessimalArchitecture¶
Bases:
object
- class xftsim.arch.LinearTransformationComponent(input_cindex=None, output_cindex=None, coefficient_matrix=None, normalize=True, founder_initialization=None, component_name='linear')¶
Bases:
ArchitectureComponent
A linear transformation component. Maps input phenotypes to output phenotypes using linear map represented by coefficient_matrix.
- Parameters:
input_cindex (
xft.index.ComponentIndex
, optional) – Input component index, by default None.output_cindex (
xft.index.ComponentIndex
, optional) – Output component index, by default None.coefficient_matrix (
ndarray
, optional) – Coefficient matrix, by default None.normalize (
bool
, optional) – If True, normalize the input by subtracting the mean and dividing by the standard deviation, by default True.founder_initialization (
FounderInitialization
, optional) – Founder initialization, by default None.
- v_input_dimension¶
Input dimension.
- Type:
int
- v_output_dimension¶
Output dimension.
- Type:
int
- normalize¶
If True, normalize the input by subtracting the mean and dividing by the standard deviation.
- Type:
bool
- coefficient_matrix¶
Coefficient matrix.
- Type:
ndarray
- compute_component(haplotypes, phenotypes)¶
Compute the linear transformation component of the phenotype.
- Parameters:
haplotypes (
xr.DataArray
) – Haplotypes, not used in the computation.phenotypes (
xr.DataArray
) – Phenotypes to be modified.
- property linear_transformation¶
Get the linear transformation matrix.
- Returns:
pd.DataFrame
– Linear transformation matrix.
- class xftsim.arch.LinearVerticalComponent(input_cindex=None, output_cindex=None, coefficient_matrix=None, normalize=True, founder_variances=None, founder_initialization=None, component_name='linVert')¶
Bases:
LinearTransformationComponent
A vertical transmission component. Requires a way to generate “transmitted” components in the founder generation.
- Parameters:
input_cindex (
xft.index.ComponentIndex
, optional) – Input component index, by default None.output_cindex (
xft.index.ComponentIndex
, optional) – Output component index, by default None.coefficient_matrix (
ndarray
, optional) – Coefficient matrix, by default None.normalize (
bool
, optional) – If True, normalize the input by subtracting the mean and dividing by the standard deviation, by default True.founder_variances (
Iterable
, optional) – Variances of the founders, by default None.founder_initialization (
FounderInitialization
, optional) – Founder initialization, by default None.
- v_input_dimension¶
Input dimension.
- Type:
int
- v_output_dimension¶
Output dimension.
- Type:
int
- normalize¶
If True, normalize the input by subtracting the mean and dividing by the standard deviation.
- Type:
bool
- coefficient_matrix¶
Coefficient matrix.
- Type:
ndarray
- class xftsim.arch.ProductComponent(input_cindex, output_cindex, output_coef=1.0, coefficient_vector=None, mean_deviate=True, normalize=False)¶
Bases:
ArchitectureComponent
Multiplies existing components
- Parameters:
input_cindex (
xft.index.ComponentIndex
) – Index of components to multiplyoutput_cindex (
xft.index.ComponentIndex
) – Output component indexoutput_coef (
float
,options
) – Coefficent to multiply output by, by default 1.0coefficient_vector (
ndarray
, optional) – Coefficients to premultiply inputs by, by default all ones.mean_deviate (
bool
, optional) – If True, mean deviate the inputs by subtracting the mean. Defaults to True.normalize (
bool
, optional) – If True, normalize the inputs by subtracting the mean and dividing by the standard deviation prior to multiply. Defaults to False.
- compute_component(haplotypes, phenotypes)¶
Compute the noise component of the phenotype.
- Parameters:
haplotypes (
xr.DataArray
) – Haplotypes, not used in the computation.phenotypes (
xr.DataArray
) – Phenotypes to be modified.
- class xftsim.arch.SpikeSlabArchitecture¶
Bases:
object
- class xftsim.arch.SumAllTransformation(input_cindex, output_component_name='phenotype', output_comp_type='outcome', component_name='sumAll')¶
Bases:
ArchitectureComponent
Sum all intermediate phenotype components to generate outcome phenotype components.
- Parameters:
input_cindex (
xft.index.ComponentIndex
) – Input component index.
- input_haplotypes¶
If True, haplotypes are input.
- Type:
bool
- input_cindex¶
Input component index.
- Type:
xft.index.ComponentIndex
- output_cindex¶
Output component index.
- Type:
xft.index.ComponentIndex
- founder_initialization¶
Founder initialization.
- Type:
None
- compute_component(haplotypes, phenotypes)¶
Compute the sum of the input components and assign them to the output component.
Parameters:¶
- haplotypesxr.DataArray
Haplotypes.
- phenotypesxr.DataArray
Phenotypes.
Returns:¶
None
- static construct_input_cindex(phenotype_name, sum_components=['additiveGenetic', 'additiveNoise'], vorigin_relative=[-1])¶
Construct input component index.
- Parameters:
phenotype_name (
Iterable
) – Phenotype name.sum_components (
Iterable
, optional) – Components to sum, by default [‘additiveGenetic’, ‘additiveNoise’].vorigin_relative (
Iterable
, optional) – Relative vorigins, by default [-1].
- Returns:
xft.index.ComponentIndex
– Component index.
- class xftsim.arch.SumTransformation(input_cindex, output_cindex, component_name='sumTrans')¶
Bases:
ArchitectureComponent
Sum components to generate phenotypes.
- Parameters:
input_cindex (
xft.index.ComponentIndex
) – Input component index.output_cindex (
xft.index.ComponentIndex
) – Output component index.
- input_haplotypes¶
If True, haplotypes are input.
- Type:
bool
- input_cindex¶
Input component index.
- Type:
xft.index.ComponentIndex
- output_cindex¶
Output component index.
- Type:
xft.index.ComponentIndex
- founder_initialization¶
Founder initialization.
- Type:
None
- compute_component(haplotypes, phenotypes)¶
Compute the sum of the input components and assign them to the output component.
Parameters:¶
- haplotypesxr.DataArray
Haplotypes.
- phenotypesxr.DataArray
Phenotypes.
Returns:¶
None
- static construct_cindexes(phenotype_name, sum_components=['additiveGenetic', 'additiveNoise'], vorigin_relative=[-1], output_component='phenotype', comp_type='outcome')¶
Construct input and output ComponentIndex objects for SumTransformation.
Parameters:¶
- phenotype_nameIterable
Names of the phenotypes.
- sum_componentsIterable, optional (default=[“additiveGenetic”, “additiveNoise”])
Names of the components to be summed.
- vorigin_relativeIterable, optional (default=[-1])
Relative origin of the component with respect to the phenotype.
- output_componentstr, optional (default=”phenotype”)
Name of the output component.
Returns:¶
- Tuple[xft.index.ComponentIndex, xft.index.ComponentIndex]:
A tuple containing input and output ComponentIndex objects.
- static construct_input_cindex(phenotype_name, sum_components=['additiveGenetic', 'additiveNoise'], vorigin_relative=[-1])¶
Construct input component index.
- Parameters:
phenotype_name (
Iterable
) – Phenotype name.sum_components (
Iterable
, optional) – Components to sum, by default [‘additiveGenetic’, ‘additiveNoise’].vorigin_relative (
Iterable
, optional) – Relative vorigins, by default [-1].
- Returns:
xft.index.ComponentIndex
– Component index.
- static construct_output_cindex(phenotype_name, sum_components=['additiveGenetic', 'additiveNoise'], vorigin_relative=[-1], comp_type='outcome', output_name='phenotype')¶
Construct output component index.
- Parameters:
phenotype_name (
Iterable
) – Phenotype name.sum_components (
Iterable
, optional) – Components to sum, by default [‘additiveGenetic’, ‘additiveNoise’].vorigin_relative (
Iterable
, optional) – Relative vorigins, by default [-1].output_name (
str
, optional) – Output name, by default ‘phenotype’.
- Returns:
xft.index.ComponentIndex
– Component index.
- xftsim.arch.VerticalComponent¶
alias of
LinearVerticalComponent
- class xftsim.arch.ZeroFounderInitialization(component_index=None)¶
Bases:
ConstantFounderInitialization
Founder initialization that sets all haplotypes to zero.
xftsim.data module¶
xftsim.effect module¶
Summary
- class xftsim.effect.AdditiveEffects(beta, variant_indexer=None, component_indexer=None, standardized=True, scaled=True)¶
Bases:
object
Additive genetic effects object. Given matrix / vector of effects will provide various scalings / offsets for computation
- Parameters:
beta (
NDArray[Any
,Any]
) – Vector of diploid effectsvariant_indexer (
xft.index.HaploidVariantIndex | xft.index.DiploidVariantIndex
) – Variant indexer, will determine ploidy automaticallycomponent_indexer (
xft.index.ComponentIndex
, optional) – Phenotype component indexer, defaults to xft.index.ComponentIndex.RangeIndex if not providedstandardized (
bool
, optional) – True implies these are effects of standardized variants, by default Truescaled (
bool
, optional) – True implies these are effects of variants * sqrt(m_causal), by default True
- AF¶
diploid allele frequencies
- Type:
NDArray
- beta_scaled_standardized_diploid¶
Diploid effects scaled of standardized variants multiplied by number of causal variants per phenotype
- Type:
NDArray
- beta_scaled_standardized_haploid¶
haploid variant of above
- Type:
NDArray
- beta_scaled_unstandardized_diploid¶
Diploid effects scaled of unstandardized variants multiplied by number of causal variants per phenotype
- Type:
NDArray
- beta_scaled_unstandardized_haploid¶
haploid variant of above
- Type:
NDArray
- beta_unscaled_standardized_diploid¶
Diploid effects scaled of standardized variants unscaled by number of causal variants per phenotype
- Type:
NDArray
- beta_unscaled_standardized_haploid¶
haploid variant of above
- Type:
NDArray
- beta_unscaled_unstandardized_diploid¶
Diploid effects scaled of unstandardized variants unscaled by number of causal variants Multiply these against (0,1,2) raw genotypes and subtract offset to obtain phenotypes
- Type:
NDArray
- beta_unscaled_unstandardized_haploid¶
Haploid variant of above
- Type:
NDArray
- beta_raw_diploid¶
Alias for beta_unscaled_unstandardized_diploid
- Type:
NDArray
- beta_raw_haploid¶
Alias for beta_unscaled_unstandardized_haploid
- Type:
NDArray
- component_indexer¶
- Type:
xft.index.ComponentIndex
- k¶
Number of phenotypes (columns of effect matrix)
- Type:
int
- m¶
Number of diploid variants
- Type:
int
- offset¶
To compute phenotypes, add offset after multiplying by beta_raw_* to mean deviate under HWE
- Type:
NDArray
- variant_indexer¶
- Type:
xft.index.HaploidVariantIndex
- property beta_raw_diploid¶
- property beta_raw_haploid¶
- property beta_scaled_standardized_haploid¶
- property beta_scaled_unstandardized_haploid¶
- property beta_unscaled_standardized_haploid¶
- property beta_unscaled_unstandardized_haploid¶
- corr()¶
- property m_causal¶
- property offset¶
- class xftsim.effect.GCTAEffects(vg, variant_indexer=None, component_indexer=None)¶
Bases:
AdditiveEffects
Additive genetic effects object under GCTA infinitessimal model <CITE>
Under this genetic architecture, all variants are causal and standardized genetic variants / sqrt(m) have the user specified (possibly diagonal) variance covariance matrix
- Parameters:
vg (
Iterable | NDArray
) – Vector of genetic variances or genetic variance/covariance matrixvariant_indexer (
xft.index.HaploidVariantIndex | xft.index.DiploidVariantIndex
) – Variant indexer, will determine ploidy automaticallycomponent_indexer (
xft.index.ComponentIndex
, optional) – Phenotype component indexer, defaults to xft.index.ComponentIndex.RangeIndex if not provided
- class xftsim.effect.NonOverlappingEffects(vg, proportions=None, variant_indexer=None, component_indexer=None, permute=True)¶
Bases:
AdditiveEffects
Additive genetic effects object under non-infinitessimal model with no pleoitropy
Under this genetic architecture, the genome is partitioned into k+1 components corresponding to k sets of variants corresponding to those causal for each trait together with a final set of variants not causal for any traits. Within each kth set of causal variants, standardized variants are Gaussian with variance vg[k] / sqrt(proportions[k])
- Parameters:
vg (
Iterable
) – Vector of genetic variances or genetic variance/covariance matrixproportions (
Iterable
) – Proportion of variants causal for each trait. If an extra value is provided, this will be the number of variants that are noncausal for all traits. Defaults to an equal number of variants per traitpermute (
bool
) – Permute variants? If False, causal variants for each phenotype will fall into contiguous blocks, defaults to Truevariant_indexer (
xft.index.HaploidVariantIndex | xft.index.DiploidVariantIndex
) – Variant indexer, will determine ploidy automaticallycomponent_indexer (
xft.index.ComponentIndex
, optional) – Phenotype component indexer, defaults to xft.index.ComponentIndex.RangeIndex if not provided
xftsim.founders module¶
- xftsim.founders.founder_haplotypes_from_AFs(n, afs, diploid=True)¶
Generate founder haplotypes from specified allele frequencies.
- Parameters:
n (
int
) – Number of haplotypes to simulate.afs (
Iterable
) – Allele frequencies as an iterable of floats.diploid (
bool
, optional) – Flag indicating if the generated haplotypes should be diploid or haploid.
- Returns:
xft.struct.HaplotypeArray
– An object representing a set of haplotypes generated from the given allele frequencies.
- xftsim.founders.founder_haplotypes_from_plink_bfile(path)¶
Reads in PLINK 1 binary genotype data and returns a HaplotypeArray object containing pseudo-haplotypes by randomly assigning haplotypes at heterozygous sites.
- Parameters:
path (
str
) – The file path to the PLINK 1 binary genotype data.- Returns:
xr.DataArray
– Founder Pseudo-haplotype array with samples indexed by an xftsim.index.SampleIndex object and variants indexed by an xftsim.index.HaploidVariantIndex object. The “pseudo-” prefix refers to the fact that the plink bfile format doesn’t track phase.
- xftsim.founders.founder_haplotypes_from_sgkit_dataset(gdat)¶
Construct founder haplotypes array from sgkit DataArray. Useful in conjuction with sgkit.io.vcf.vcf_to_zarr() and sgkit.load_dataset()
- Parameters:
gdat (
xr.Dataset
) – Dataset generated by sgkit.load_dataset()generation (
int
) – Used to populate the generation attribute of xftsim.index.SampleIndex
- Returns:
xr.DataArray
– Array of founder haplotypes with samples indexed by an xftsim.index.SampleIndex object and variants indexed by an xftsim.index.HaploidVariantIndex object.
- xftsim.founders.founder_haplotypes_uniform_AFs(n, m, minMAF=0.1)¶
Generate founder haplotypes from uniform-distributed allele frequencies.
- Parameters:
n (
int
) – Number of haplotypes to simulate.m (
int
) – Number of variants.minMAF (
float
, optional) – Minimum minor allele frequency for generated haplotypes.
- Returns:
xft.struct.HaplotypeArray
– An object representing a set of haplotypes generated with uniform allele frequencies.
xftsim.index module¶
- class xftsim.index.ComponentIndex(phenotype_name=None, component_name=None, vorigin_relative=None, comp_type=None, comp_type_map={'phenotype': 'outcome'}, frame=None, k_total=None)¶
Bases:
XftIndex
Index object for phenotype components, including origin relative to proband.
- Parameters:
phenotype_name (
iterable
, optional) – Names of phenotypes. Either phenotype_name, frame, or k_total must be provided.component_name (
iterable
, optional) – Names of phenotype components.vorigin_relative (
iterable
, optional) – Relative origin of phenotype component. -1 for proband, 0 for mother, 1 for father.comp_type (
iterable
, optional) – Elements are either ‘intermediate’ or ‘outcome’ to distinguish between phenotype components versus phenotypes themselvesframe (
pandas.DataFrame
, optional) – Pre-existing frame to initialize index.k_total (
int
, optional) – Total number of phenotypes to generate generic names.
- phenotype_name¶
Names of phenotypes.
- Type:
numpy.ndarray
- component_name¶
Names of phenotype components.
- Type:
numpy.ndarray
- vorigin_relative¶
Relative origin of phenotype component. -1 for proband, 0 for mother, 1 for father.
- Type:
numpy.ndarray
- k_total¶
Total number of phenotypes.
- Type:
int
- k_phenotypes¶
Number of unique phenotypes.
- Type:
int
- k_components¶
Number of unique phenotype components.
- Type:
int
- k_relative¶
Number of unique relative origins.
- Type:
int
- depth¶
Generational depth from binary relative encoding.
- Type:
float
- unique_identifier¶
Unique identifier for the index.
- Type:
numpy.ndarray
- to_vorigin(origin)¶
Returns a new ComponentIndex with all vorigin_relative set to origin.
- to_proband()¶
Returns a new ComponentIndex with all vorigin_relative set to -1 (proband).
- from_frame(df)¶
Returns a new ComponentIndex initialized from a Pandas DataFrame.
- from_arrays(phenotype_name, component_name, vorigin_relative=None)¶
Returns a new ComponentIndex initialized from numpy arrays.
- from_product(phenotype_name, component_name, vorigin_relative=None)¶
Returns a new ComponentIndex initialized from a Cartesian product of phenotype_name, component_name, and vorigin_relative.
- range_index(c, component_name=['generic'], vorigin_relative=[-1], prefix='phenotype')¶
Returns a new ComponentIndex with generic phenotype names.
- property comp_type¶
- property component_name¶
- property depth¶
- static from_arrays(phenotype_name, component_name, vorigin_relative=None, comp_type=None)¶
- static from_frame(df)¶
- static from_product(phenotype_name, component_name, vorigin_relative=None, comp_type_map={'phenotype': 'outcome'})¶
- property k_components¶
- property k_phenotypes¶
- property k_relative¶
- property k_total¶
- property phenotype_name¶
- static range_index(c, component_name=['generic'], vorigin_relative=[-1], prefix='phenotype')¶
- to_proband()¶
- to_vorigin(origin)¶
- property unique_identifier¶
- property vorigin_relative¶
- class xftsim.index.DiploidVariantIndex(vid=None, chrom=None, zero_allele=None, one_allele=None, af=None, annotation_array=None, annotation_names=None, frame=None, m=None, n_chrom=1, h_copy=None, pos_bp=None, pos_cM=None)¶
Bases:
XftIndex
This class is used to index diploid genetic variants. Variants are defined by a set of unique IDs and may have additional annotations. Each variant is associated with two alleles, represented as strings.
- Parameters:
vid (
NDArray[Shape[``
”*”``]
,Object]
, optional) – Variant IDs, by default None.chrom (
NDArray[Shape[``
”*”``]
,Int]
, optional) – Chromosome of variant, by default None.zero_allele (
NDArray[Shape[``
”*”``]
,Object]
, optional) – First allele of variant, by default None.one_allele (
NDArray[Shape[``
”*”``]
,Object]
, optional) – Second allele of variant, by default None.af (
Iterable
, optional) – Allele frequency of variant, by default None.annotation_array (
Union[NDArray
,pd.DataFrame]
, optional) – Additional variant annotations, by default None.annotation_names (
Iterable
, optional) – Names of the additional variant annotations, by default None.frame (
pd.DataFrame
, optional) – A pandas DataFrame containing variant information, by default None.m (
int
, optional) – The number of variants, by default None.n_chrom (
int
, optional) – The number of chromosomes, by default 1.h_copy (
NDArray[Shape[``
”*”``]
,Object]
, optional) – A string indicating the haplotype of each variant, by default None.pos_bp (
Iterable
, optional) – Base-pair positions of the variant, by default None.pos_cM (
Iterable
, optional) – Centimorgan positions of the variant, by default None.
- vid¶
Variant IDs.
- Type:
ndarray
- chrom¶
Chromosome of variant.
- Type:
ndarray
- zero_allele¶
First allele of variant.
- Type:
ndarray
- one_allele¶
Second allele of variant.
- Type:
ndarray
- hcopy¶
A string indicating the copy of each variant.
- Type:
ndarray
- af¶
Allele frequency of variant.
- Type:
ndarray
- pos_bp¶
Base-pair positions of the variant.
- Type:
ndarray
- pos_cM¶
Centimorgan positions of the variant.
- Type:
ndarray
- ploidy¶
A string indicating the ploidy of the variant (always “Diploid” for this class).
- Type:
str
- annotation¶
A pandas DataFrame containing additional variant annotations.
- Type:
pd.DataFrame
- annotation_array¶
A numpy array containing additional variant annotations.
- Type:
Union[ndarray
,None]
- annotation_names¶
An array containing names of additional variant annotations.
- Type:
ndarray
- m¶
The number of variants.
- Type:
int
- n_chrom¶
The number of chromosomes.
- Type:
int
- n_annotations¶
The number of additional variant annotations.
- Type:
int
- maf¶
Minor allele frequency of variant.
- Type:
ndarray
- Raises:
AssertionError – If vid, m, or frame is not provided. If both zero_allele and one_allele are not provided.
- property af¶
- annotate()¶
- property annotation¶
- property annotation_array¶
- property annotation_names¶
- property chrom¶
- property hcopy¶
- property m¶
- property maf¶
- property n_annotations¶
- property n_chrom¶
- property one_allele¶
- property ploidy¶
- property pos_bp¶
- property pos_cM¶
- to_haploid()¶
- property vid¶
- property zero_allele¶
- class xftsim.index.HaploidVariantIndex(vid=None, chrom=None, zero_allele=None, one_allele=None, af=None, annotation_array=None, annotation_names=None, frame=None, m=None, n_chrom=1, h_copy=None, pos_bp=None, pos_cM=None)¶
Bases:
DiploidVariantIndex
A class representing a haploid variant index.
- vid¶
Variant IDs.
- Type:
numpy.ndarray
- chrom¶
Chromosome numbers.
- Type:
numpy.ndarray
- zero_allele¶
Alleles with value zero.
- Type:
numpy.ndarray
- one_allele¶
Alleles with value one.
- Type:
numpy.ndarray
- af¶
Allele frequencies.
- Type:
numpy.ndarray
- pos_bp¶
Positions of variants in base pairs.
- Type:
numpy.ndarray
- pos_cM¶
Positions of variants in centiMorgans.
- Type:
numpy.ndarray
- m¶
Number of unique variant IDs.
- Type:
int
- n_chrom¶
Number of unique chromosome numbers.
- Type:
int
- n_annotations¶
Number of annotations.
- Type:
int
- maf¶
Minor allele frequencies.
- Type:
numpy.ndarray
- ploidy¶
The ploidy of the variant index. In this case, “Haploid”.
- Type:
str
- hcopy¶
A string indicating the copy of each variant.
- Type:
ndarray
- to_diploid()¶
Converts the haploid variant index to diploid.
- property ploidy¶
- to_diploid()¶
- class xftsim.index.NullFilter¶
Bases:
SampleFilter
- class xftsim.index.RandomSiblingFilter¶
Bases:
SampleFilter
Randomly select one sibling per family
- class xftsim.index.RandomSiblingSubsampleFilter(k)¶
Bases:
SampleFilter
Randomly subsample k families, choosing one offspring per family
- class xftsim.index.RandomSubsampleFilter(k)¶
Bases:
SampleFilter
Randomly subsample k individuals
- class xftsim.index.SampleFilter(filter_function, filter_name=None, metadata={})¶
Bases:
object
- filter(sindex, **kwargs)¶
- class xftsim.index.SampleIndex(iid=None, fid=None, sex=None, frame=None, n=None, generation=0)¶
Bases:
XftIndex
Index for individual samples.
This class is used to keep track of information for individual samples.
- Parameters:
iid (
Iterable
, optional) – Iterable of individual IDs.fid (
Iterable
, optional) – Iterable of family IDs.sex (
Iterable
, optional) – Iterable of biological sexes.frame (
pd.DataFrame
, optional) – Dataframe containing information for each sample.n (
int
, optional) – Number of samples to generate a random ID set for.generation (
int
, optional) – Generation number for samples.
- n¶
Number of individuals.
- Type:
int
- n_fam¶
Number of families.
- Type:
int
- n_female¶
Number of biological females.
- Type:
int
- n_male¶
Number of biological males.
- Type:
int
- iid¶
Array of individual IDs.
- Type:
ndarray
- fid¶
Array of family IDs.
- Type:
ndarray
- sex¶
Array of biological sexes.
- Type:
ndarray
- property fid¶
- property iid¶
- iloc(key)¶
- property n¶
- property n_fam¶
- property n_female¶
- property n_male¶
- property sex¶
- property unique_identifier¶
- class xftsim.index.SiblingPairFilter(k=None)¶
Bases:
SampleFilter
Subsample 2 siblings each from k families with at least two siblings
- class xftsim.index.XftIndex¶
Bases:
object
XftIndex is a class representing an index for the XftSim simulation model. Super class not for direct use by the user.
Attributes:¶
- _coord_variables: List[str]
List of names of the coordinate variables.
- _index_variables: List[str]
List of names of the index variables.
- _dimension: str
Name of the dimension variable.
- _frame: pandas.DataFrame
Dataframe representing the index.
Methods:¶
- validate():
Validates the index by checking if the _coord_variables, _index_variables, and _dimension attributes are not None. Raises an AssertionError if any of these attributes is None.
- frame:
Property representing the _frame attribute. Getter: Returns the _frame attribute. Setter: Sets the _frame attribute and generates a new index using the unique_identifier property.
- frame_copy():
Returns a copy of the _frame attribute.
- unique_identifier:
Property representing the unique identifier of the index. Returns a string representing the concatenation of all index variables, separated by a period.
- coords:
Property representing the coordinates of the index. Returns a dictionary where the keys are the coordinate variables and the values are the corresponding values in the _frame attribute.
- coord_dict:
Property representing the coordinate dictionary of the index. Returns a dictionary where the keys are the variables and the values are tuples representing the (dimension, value) of each coordinate.
- coord_frame:
Property representing the coordinate frame of the index. Returns a dataframe where the columns are the coordinate variables and the rows correspond to each row in the _frame attribute.
- coord_mindex:
Property representing the coordinate multi-index of the index. Returns a multi-index where the levels correspond to the coordinate variables and the values correspond to the corresponding values in the _frame attribute.
- coord_index:
Property representing the coordinate index of the index. Returns an index representing the unique identifier of the index.
- __getitem__(arg):
Returns a new instance of the XftIndex class, corresponding to a subset of the _frame attribute. If arg is a dictionary, returns the rows where the values of the keys in the dictionary match the corresponding values in the _frame attribute. If arg is an integer or slice, returns the row(s) at the corresponding index in the _frame attribute.
- iloc(key):
Returns a new instance of the XftIndex class, corresponding to a subset of the _frame attribute. Returns the row(s) at the corresponding index in the _frame attribute.
- merge(other):
Merges the _frame attribute of the current instance with another instance of the XftIndex class. If the two instances have a different _dimension attribute or a different class type, raises a TypeError. Returns a new instance of the XftIndex class representing the merged index.
- reduce_merge(args):
Static method that reduces the list of args by calling the merge method on each pair of consecutive elements. Returns the final merged index.
- stack(other):
Stacks the _frame attribute of the current instance with another instance of the XftIndex class. If the two instances have a different _dimension attribute or a different class type, raises a TypeError. Returns a new instance of the XftIndex class representing the stacked index.
- at_most(n_new):
Downsamples the _frame attribute at random to contain at most n_new rows. If the number of rows in the _frame attribute is already less than or equal to n_new, returns a copy of the current instance. Returns a new instance of the XftIndex class representing the downsampled index.
- at_most(n_new)¶
- property coord_dict¶
- property coord_frame¶
- property coord_index¶
- property coord_mindex¶
- property coords¶
- property frame¶
- frame_copy()¶
- iloc(key)¶
- merge(other, deduplicate=True)¶
- static reduce_merge(args, deduplicate=True)¶
- stack(other)¶
- property unique_identifier¶
- validate()¶
- xftsim.index.sampleIndex_from_VCF()¶
- xftsim.index.sampleIndex_from_plink()¶
- xftsim.index.variantIndex_from_VCF()¶
- xftsim.index.variantIndex_from_plink()¶
xftsim.io module¶
- xftsim.io.genotypes_to_pseudo_haplotypes(x)¶
Converts genotype data in an xarray DataArray to pseudo-haplotype data.
- Parameters:
x (
xr.DataArray
) – An xarray DataArray containing genotype data.- Returns:
xr.DataArray
– An xarray DataArray containing pseudo-haplotype data.
- xftsim.io.haplotypes_from_sgkit_dataset(gdat, generation=0)¶
Construct haplotype array from sgkit DataArray. Useful in conjuction with sgkit.io.vcf.vcf_to_zarr() and sgkit.load_dataset()
- Parameters:
gdat (
xr.Dataset
) – Dataset generated by sgkit.load_dataset()generation (
int
) – Used to populate the generation attribute of xftsim.index.SampleIndex
- Returns:
xr.DataArray
– Haplotype array with samples indexed by an xftsim.index.SampleIndex object and variants indexed by an xftsim.index.HaploidVariantIndex object.
- xftsim.io.load_haplotype_zarr(path, compute=True, slice_x=slice(None, None, None), slice_y=slice(None, None, None), **kwargs)¶
Load haplotype data from a Zarr store.
- Parameters:
path (
str
) – The path to the Zarr store.compute (
bool
, optional) – Whether to compute the data immediately, by default True.**kwargs (
dict
) – Additional keyword arguments to pass to xr.open_dataset().
- Returns:
xr.DataArray
– The loaded haplotype data as a DataArray.
- xftsim.io.plink1_sample_index(ppxr, generation=0)¶
Create a SampleIndex object from a plink file DataArray generated by pandas_plink.
- Parameters:
ppxr (
xr.DataArray
) – An xarray DataArray representing a plink file.generation (
int
, optional) – The generation of the individuals, by default 0.
- Returns:
xft.index.SampleIndex
– A SampleIndex object.
- xftsim.io.plink1_variant_index(ppxr)¶
Create a DiploidVariantIndex object from a plink file DataArray generated by pandas_plink.
- Parameters:
ppxr (
xr.DataArray
) – An xarray DataArray representing a plink file.- Returns:
xft.index.DiploidVariantIndex
– A DiploidVariantIndex object.
- xftsim.io.read_plink1_as_pseudohaplotypes(path, generation=0)¶
Reads in PLINK 1 binary genotype data and returns a HaplotypeArray object containing pseudo-haplotypes by randomly assigning haplotypes at heterozygous sites.
- Parameters:
path (
str
) – The file path to the PLINK 1 binary genotype data.generation (
int
) – Used to populate the generation attribute of xftsim.index.SampleIndex
- Returns:
xr.DataArray
– Pseudo-haplotype array with samples indexed by an xftsim.index.SampleIndex object and variants indexed by an xftsim.index.HaploidVariantIndex object. The “pseudo-” prefix refers to the fact that the plink bfile format doesn’t track phase.- Raises:
ValueError – If the specified file path does not exist or is not in the expected format.
- xftsim.io.save_haplotype_zarr(haplotypes, path, **kwargs)¶
Save haplotype data to a Zarr store.
- Parameters:
haplotypes (
xr.DataArray
) – The haplotype data to save.path (
str
) – The path to the Zarr store.**kwargs (
dict
) – Additional keyword arguments to pass to xr.Dataset.to_zarr().
- Returns:
None
- xftsim.io.write_to_plink1(hh, path, verbose=True)¶
Writes a DataArray to a PLINK 1 binary file. Breaks phasing.
- Parameters:
hh (
xr.DataArray
) – A DataArray containing the genotype data to write.path (
str
) – The path to the output PLINK file. The ‘.bed’ extension will be added automatically.verbose (
bool
, optional) – Whether to print verbose output during writing, by default True.
- Returns:
None
xftsim.lsmate module¶
xftsim.mate module¶
This module contains functions and classes for implementing different mating regimes in the context of forward time genetics simulations.
Functions:
_solve_qap_ls: Private function that solves the Quadratic Assignment Problem using LocalSolver.
Classes:
MatingRegime: Base class for defining mating regimes. RandomMatingRegime: A class for implementing random mating. LinearAssortativeMatingRegime: A class for implementing linear assortative mating. KAssortativeMatingRegime: A class for implementing k-assortative mating. BatchedMatingRegime: A class for batching individuals to improve mating regime performance.
- class xftsim.mate.BatchedMatingRegime(regime, max_batch_size)¶
Bases:
MatingRegime
BatchedMatingRegime class that batches mating assignments, either for the sake of efficiency or to simulate stratification.
- Parameters:
regime (
MatingRegime
) – The mating regime object.max_batch_size (
int
) – Maximum size of each batch.
- regime¶
The mating regime object.
- Type:
- max_batch_size¶
Maximum size of each batch.
- Type:
int
- batch(haplotypes, phenotypes, control)¶
Split samples into batches.
- mate(haplotypes, phenotypes, control)¶
Generate mating assignments in batches.
- batch(haplotypes=None, phenotypes=None, control=None)¶
Split samples into batches.
- Parameters:
haplotypes (
xarray.DataArray
, optional) – Haplotypes array.phenotypes (
xarray.DataArray
, optional) – Phenotypes array.control (
dict
, optional) – Control parameters.
- Returns:
batches (
list
) – List of batches of samples.num_batches (
int
) – Number of batches.
- mate(haplotypes=None, phenotypes=None, control=None)¶
Generate mating assignments in batches and merge into single assignment object.
- Parameters:
haplotypes (
xarray.DataArray
, optional) – Haplotypes array.phenotypes (
xarray.DataArray
, optional) – Phenotypes array.control (
dict
, optional) – Control parameters.
- Returns:
mate_assignments (
MateAssignment
) – Mating assignments.
- class xftsim.mate.GeneralAssortativeMatingRegime(component_index, cross_corr, offspring_per_pair=<xftsim.utils.ConstantCount object>, mates_per_female=<xftsim.utils.ConstantCount object>, female_offspring_per_pair='balanced', sex_aware=False, exhaustive=True, control={})¶
Bases:
MatingRegime
A class that implements the general assortative mating regimes. I.e., matches two sets of individuals with K phenotypes to achieve an arbitrary K x K cross-mate cross-correlation structure.
- Parameters:
component_index (
xft.index.ComponentIndex
) – An object containing information about the components.cross_corr (
ndarray
) – The cross-correlation matrix of size K x K.offspring_per_pair (
Union[int
,xft.utils.VariableCount]
, optional) – The number of offspring per mating pair. Default is 1.mates_per_female (
Union[int
,xft.utils.VariableCount]
, optional) – The number of mates for each female. Default is 2.female_offspring_per_pair (
Union[str
,int
,xft.utils.VariableCount]
, optional) – The number of offspring per mating pair for females. Default is ‘balanced’.sex_aware (
bool
, optional) – Whether to consider sex in mating pairs. Default is False.exhaustive (
bool
, optional) – Whether to enumerate all possible pairs. Default is True.control (
dict
, optional) – A dictionary of control parameters passed to LocalSolver. Defaults are as follows: nb_threads=4, time_limit=120, tolerance=1e-5, verbosity=1, time_between_displays=15
- cross_corr¶
The cross-correlation matrix of size K x K.
- Type:
ndarray
- component_index¶
An object containing information about the components.
- Type:
xft.index.ComponentIndex
- K¶
The total number of components.
- Type:
int
- mate(haplotypes: xr.DataArray = None, phenotypes: xr.DataArray = None, control: dict = None) xft.mate.MateAssignment: ¶
Mate haplotypes and phenotypes based on the K-assortative mating regime.
- mate(haplotypes=None, phenotypes=None, control={})¶
Mate haplotypes and phenotypes based on the K-assortative mating regime.
- Parameters:
haplotypes (
xr.DataArray
, optional) – The haplotype data to be mated. Default is None.phenotypes (
xr.DataArray
, optional) – The phenotype data to be mated. Default is None.
- Returns:
assignment (
xft.mate.MateAssignment
) – The assignment of haplotypes to parents.
- class xftsim.mate.LinearAssortativeMatingRegime(component_index, r=0, offspring_per_pair=<xftsim.utils.ConstantCount object>, mates_per_female=<xftsim.utils.ConstantCount object>, female_offspring_per_pair='balanced', sex_aware=False, exhaustive=True)¶
Bases:
MatingRegime
A linear assortative mating regime that performs mate selection based on a specified component index. Speifically, individuals are mated such that the cross-mate correlations across all specified components are equal to r. This reflects mating on a linear combination of phenotypes and does not generalize to many cross-mate correlation stuctures observed in practice, but is more efficient.
- Parameters:
component_index (
xft.index.ComponentIndex
) – The component index used to select mating pairs based on the correlation between the phenotype values.r (
float
, optional) – The correlation coefficient, a value between -1 and 1. Defaults to 0.offspring_per_pair (
Union[int
,xft.utils.VariableCount]
, optional) – The number of offspring per pair. If int, it will be converted to a ConstantCount object. Defaults to 1.mates_per_female (
Union[int
,xft.utils.VariableCount]
, optional) – The number of mates per female. If int, it will be converted to a ConstantCount object. Defaults to 1.female_offspring_per_pair (
Union[str
,int
,xft.utils.VariableCount]
, optional) – The number of female offspring per mating pair. If ‘balanced’, the number of females is randomly selected for each pair to balance the sex ratio. If int, it will be converted to a ConstantCount object. Defaults to ‘balanced’.sex_aware (
bool
, optional) – If True, only mating pairs with different sex are allowed. Defaults to False.exhaustive (
bool
, optional) – If True, all possible mating pairs will be enumerated. If False, pairs will be randomly selected. Defaults to True.
- Raises:
AssertionError – If r is not between -1 and 1. If the correlation r is not feasible for the number of phenotypes in the component index.
TODO: see also
- mate(haplotypes=None, phenotypes=None, control=None)¶
Mate individuals.
- Parameters:
haplotypes (
xarray.DataArray
, optional) – The haplotypes of the individuals, by default None.phenotypes (
xarray.DataArray
, optional) – The phenotypes of the individuals, by default None.control (
dict
, optional) – The mating control parameters, by default None.
- Returns:
MateAssignment
– The mate assignment result.
- class xftsim.mate.MateAssignment(generation, maternal_sample_index, paternal_sample_index, previous_generation_sample_index, n_offspring_per_pair, n_females_per_pair, sex_aware=False)¶
Bases:
object
Represents a mate assignment for a given generation of individuals.
- Parameters:
generation (
int
) – The generation number.maternal_sample_index (
xft.index.SampleIndex
) – The sample index for the maternal individuals.paternal_sample_index (
xft.index.SampleIndex
) – The sample index for the paternal individuals.previous_generation_sample_index (
xft.index.SampleIndex
) – The sample index for the previous generation.n_offspring_per_pair (
NDArray[Shape[``
”*”``]
,Int64]
) – An array containing the number of offspring per mating pair.n_females_per_pair (
NDArray[Shape[``
”*”``]
,Int64]
) – An array containing the number of female offspring per mating pair.sex_aware (
bool
,optional (default=False)
) – Whether the mate assignment is sex-aware.
- get_mate_phenotypes(phenotypes, component_index=None, full=True)¶
Retrieves mate phenotypes based on the given phenotypes data.
- Parameters:
phenotypes (
xr.DataArray
) – The phenotypes data array.component_index (
xft.index.ComponentIndex
, optional) – The component index for the phenotypes data array.full (
bool
) – Ignore component_index and get all components.
- Returns:
pd.DataFrame
– A DataFrame containing the mate phenotypes.
- get_mating_frame()¶
Constructs a DataFrame containing mate phenotypes regardless of reproductive success.
- Returns:
pd.DataFrame
– A DataFrame containing mating information.
- get_reproduction_frame()¶
Constructs a DataFrame containing information relating to mates and offspring.
- Returns:
pd.DataFrame
– A DataFrame containing reproduction information.
- property is_constant_population¶
TODO property to determine if the population is constant or not.
- Returns:
bool
– True if the population is constant, False otherwise.
- property maternal_integer_index¶
The integer index for the maternal individuals.
- Returns:
np.ndarray
– An array containing the integer index for the maternal individuals.
- property n_females¶
The total number of female offspring.
- Returns:
int
– The total number of female offspring.
- property n_males¶
The total number of male offspring.
- Returns:
int
– The total number of male offspring.
- property n_reproducing_pairs¶
The total number of reproducing pairs.
- Returns:
int
– The total number of reproducing pairs.
- property n_total_offspring¶
The total number of offspring.
- Returns:
int
– The total number of offspring.
- property offspring_fids¶
The family identifiers for the offspring.
- Returns:
np.ndarray
– An array containing the family identifiers for the offspring.
- property offspring_iids¶
The unique identifiers for the offspring.
- Returns:
np.ndarray
– An array containing the unique identifiers for the offspring.
- property offspring_sample_index¶
The sample index for the offspring.
- Returns:
xft.index.SampleIndex
– The sample index for the offspring.
- property offspring_sex¶
The sex of the offspring.
- Returns:
np.ndarray
– An array containing the sex of the offspring.
- property paternal_integer_index¶
The integer index for the paternal individuals.
- Returns:
np.ndarray
– An array containing the integer index for the paternal individuals.
- static reduce_merge(assignments)¶
Merges a list of MateAssignment objects into a single MateAssignment object.
- Parameters:
assignments (
Iterable
) – An iterable of MateAssignment objects to be merged.- Returns:
MateAssignment
– A new MateAssignment object resulting from the merge of the input assignments.
- property reproducing_maternal_index¶
The maternal index for reproducing individuals.
- Returns:
xft.index.SampleIndex
– The maternal index for reproducing individuals.
- property reproducing_paternal_index¶
The paternal index for reproducing individuals.
- Returns:
xft.index.SampleIndex
– The paternal index for reproducing individuals.
- trio_view(pheno_parental, pheno_offspring)¶
Returns an array with the phenotypes of offspring, followed by the phenotypes of their parents in the same order as the order of offspring in this MateAssignment.
- Parameters:
pheno_parental (
xr.DataArray
) – An xarray DataArray containing the phenotypes of the parents.pheno_offspring (
xr.DataArray
) – An xarray DataArray containing the phenotypes of the offspring.
- Returns:
np.ndarray
– An array with the phenotypes of offspring, followed by the phenotypes of their parents.
- update_pedigree(pedigree)¶
- class xftsim.mate.MatingRegime(mateFunction=None, offspring_per_pair=<xftsim.utils.ConstantCount object>, mates_per_female=<xftsim.utils.ConstantCount object>, female_offspring_per_pair='balanced', sex_aware=False, exhaustive=True, component_index=None, haplotypes=False)¶
Bases:
object
A class for defining a mating regime to simulate the reproductive behavior of a population.
- Parameters:
mateFunction (
Callable
, optional) – A function that specifies how the mating process is carried out. Default is None.offspring_per_pair (
Union[Callable
,int
,xft.utils.VariableCount]
, optional) – The number of offspring per mating pair. This can be a callable function, an integer, or a VariableCount object. Default is xft.utils.ConstantCount(1).mates_per_female (
Union[Callable
,int
,xft.utils.VariableCount]
, optional) – The number of mating partners each female has. This can be a callable function, an integer, or a VariableCount object. Default is xft.utils.ConstantCount(1).female_offspring_per_pair (
Union[Callable
,str
,int
,xft.utils.VariableCount]
, optional) – The number of female offspring per mating pair. This can be a callable function, a string, an integer, or a VariableCount object. If set to ‘balanced’, the number of female offspring will be randomly assigned from a balanced range (0, …, total_offspring). Default is ‘balanced’.sex_aware (
bool
, optional) – Whether the mating process should take sex into account. If True, females and males will be paired up based on their sex. If False, the pairs will be randomly assigned. Default is False.exhaustive (
bool
, optional) – Whether the mating pairs should be enumerated exhaustively or randomly. If True, all possible pairings will be enumerated before repeating. If False, the pairings will be randomly assigned with replacement. Default is True.component_index (
xft.index.ComponentIndex
, optional) – Which phenotype components (if any) are used in assigning mateshaplotypes (
bool
, optional) – Flag indeicating if haplotype data is used to assign mates (defaults to False)
- sex_aware¶
Whether the mating process should take sex into account.
- Type:
bool
- offspring_per_pair¶
The number of offspring per mating pair.
- Type:
Union[Callable
,int
,xft.utils.VariableCount]
- mates_per_female¶
The number of mating partners each female has.
- Type:
Union[Callable
,int
,xft.utils.VariableCount]
- female_offspring_per_pair¶
The number of female offspring per mating pair.
- Type:
Union[Callable
,str
,int
,xft.utils.VariableCount]
- exhaustive¶
Whether the mating pairs should be enumerated exhaustively or randomly.
- Type:
bool
- mateFunction¶
A function that specifies how the mating process is carried out.
- Type:
Callable
- expected_offspring_per_pair¶
The expected number of offspring per mating pair.
- Type:
float
- expected_mates_per_female¶
The expected number of mating partners each female has.
- Type:
float
- expected_female_offspring_per_pair¶
The expected number of female offspring per mating pair.
- Type:
float
- population_growth_factor¶
The population growth factor.
- Type:
float
- get_potential_mates(haplotypes: xr.DataArray = None, phenotypes: xr.DataArray = None)¶
Returns the potential female and male mating partners based on the sex awareness parameter.
- enumerate_assignment(female_indices: NDArray, male_indices: NDArray, haplotypes: xr.DataArray = None, phenotypes: xr.DataArray = None) MateAssignment ¶
Enumerates the mating assignments.
- mate(haplotypes: xr.DataArray = None, phenotypes: xr.DataArray = None, control: dict = None) MateAssignment ¶
Calls the mateFunction to perform the mating process.
- property dependency_graph¶
- property dependency_graph_edges¶
- draw_dependency_graph(node_color='none', node_size=1500, arrowsize=7, font_size=6, margins=0.1, **kwargs)¶
- enumerate_assignment(female_indices, male_indices, haplotypes=None, phenotypes=None)¶
Enumerate the mate assignments.
- Parameters:
female_indices (
NDArray
) – The indices of the females to mate.male_indices (
NDArray
) – The indices of the males to mate.haplotypes (
xr.DataArray
) – The haplotypes to use for mating.phenotypes (
xr.DataArray
) – The phenotypes to use for mating.
- Returns:
MateAssignment
– The mate assignments.
- property expected_female_offspring_per_pair¶
Get the expected female offspring per pair.
- Returns:
float
– The expected female offspring per pair.- Raises:
NotImplementedError – If the female offspring count is not an integer or a VariableCount.
- property expected_mates_per_female¶
Get the expected mates per female.
- Returns:
float
– The expected mates per female.- Raises:
NotImplementedError – If the mates count is not an integer or a VariableCount.
- property expected_offspring_per_pair¶
Get the expected offspring per pair.
- Returns:
float
– The expected offspring per pair.- Raises:
NotImplementedError – If the offspring count is not an integer or a VariableCount.
- get_potential_mates(haplotypes=None, phenotypes=None)¶
Return potential mating pairs.
- Parameters:
haplotypes (
xr.DataArray
) – The haplotypes to use for mating.phenotypes (
xr.DataArray
) – The phenotypes to use for mating.
- Returns:
(NDArray
,NDArray)
– The potential female and male mating indices.
- mate(haplotypes=None, phenotypes=None, control=None)¶
Mate individuals.
- Parameters:
haplotypes (
xarray.DataArray
, optional) – The haplotypes of the individuals, by default None.phenotypes (
xarray.DataArray
, optional) – The phenotypes of the individuals, by default None.control (
dict
, optional) – The mating control parameters, by default None.
- Returns:
MateAssignment
– The mate assignment result.
- property mateFunction¶
- property population_growth_factor¶
Get the population growth factor.
- Returns:
float
– The population growth factor.
- class xftsim.mate.RandomMatingRegime(offspring_per_pair=<xftsim.utils.ConstantCount object>, mates_per_female=<xftsim.utils.ConstantCount object>, female_offspring_per_pair='balanced', sex_aware=False, exhaustive=True)¶
Bases:
MatingRegime
A mating regime that randomly pairs individuals and produces offspring with balanced numbers of males and females.
- Parameters:
offspring_per_pair (
xft.utils.VariableCount
, optional) – Number of offspring produced per mating pair, by default xft.utils.ConstantCount(1)mates_per_female (
xft.utils.VariableCount
, optional) – Number of males that mate with each female, by default xft.utils.ConstantCount(1)female_offspring_per_pair (
Union[str
,xft.utils.VariableCount]
, optional) – The number of female offspring per mating pair. If “balanced”, the number is balanced with the number of male offspring. By default, “balanced”.sex_aware (
bool
, optional) – If True, randomly paired individuals are selected so that there is an equal number of males and females. Otherwise, random pairing is performed. By default, False.exhaustive (
bool
, optional) – If True, perform exhaustive enumeration of potential mates. If False, perform random sampling. By default, True.
- mate(haplotypes=None, phenotypes=None, control=None)¶
Mate individuals randomly with balanced numbers of males and females.
- Parameters:
haplotypes (
xr.DataArray
, optional) – Array containing haplotypes, by default Nonephenotypes (
xr.DataArray
, optional) – Array containing phenotypes, by default Nonecontrol (
dict
, optional) – Control dictionary, by default None
- Returns:
MateAssignment
– An object containing the maternal and paternal sample indices, the number of offspring per pair, and the number of female offspring per pair.
xftsim.ped module¶
- class xftsim.ped.Pedigree(founder_sample_index)¶
Bases:
object
A class representing a pedigree as a graph.
- G¶
The directed graph representing the pedigree.
- Type:
nx.DiGraph
- _generation¶
A dictionary containing node generations.
- Type:
dict
- _fid¶
A dictionary containing node family IDs.
- Type:
dict
- _generational_depth¶
The generational depth of the tree.
- Type:
int
- generation(K: int):
Returns the subgraph of nodes with generation K.
- generations(gens):
Returns the subgraph of nodes with generations in the given iterable.
- current_generation(K):
Returns the subgraph of nodes in the current generation.
- most_recent_K_generations():
Returns the subgraph of nodes in the most recent K generations.
- _add_edges_from_arrays(x, y):
Adds edges from arrays x and y.
- add_offspring(mating: xft.mate.MateAssignment):
Adds offspring nodes and edges to the pedigree based on a MateAssignment object.
- _get_trios():
TODO
- generation(K)¶
Returns the subgraph of nodes with generation K.
- Parameters:
K (
int
) – The generation number.- Returns:
nx.subgraph_view
– The subgraph of nodes with generation K.
- property generational_depth¶
- generations(gens)¶
Returns the subgraph of nodes with generations in the given iterable.
- Parameters:
gens (
iterable
) – An iterable containing generations.- Returns:
nx.subgraph_view
– The subgraph of nodes with generations in the given iterable.
- get_current_generation()¶
Returns the subgraph of nodes in the current generation.
- Parameters:
K (
int
) – The generation number.- Returns:
nx.subgraph_view
– The subgraph of nodes in the current generation.
- get_most_recent_K_generations(K)¶
Returns the subgraph of nodes in the most recent K generations.
- Returns:
nx.subgraph_view
– The subgraph of nodes in the most recent K generations.
xftsim.proc module¶
Module to define classes for post-processing xft simulation data. Classes:
PostProcessor: Base class for defining post-processing operations on xft simulation data.
LimitMemory(PostProcessor): Class to limit the amount of memory used by the simulation by deleting old haplotype and/or phenotype data.
WriteToDisk(PostProcessor): Class to write simulation data to disk.
- class xftsim.proc.LimitMemory(n_haplotype_generations=-1, n_phenotype_generations=-1)¶
Bases:
PostProcessor
Class to limit the amount of memory used by the simulation by deleting old haplotype and/or phenotype data. Parameters: ———– n_haplotype_generations: int, optional
The number of haplotype generations to keep. If -1, keep all generations. Default is -1.
- n_phenotype_generations: int, optional
The number of phenotype generations to keep. If -1, keep all generations. Default is -1.
Methods:¶
- processor(sim: xft.sim.Simulation) -> None:
Deletes old haplotype and/or phenotype data from the simulation.
- class xftsim.proc.PostProcessor(processor, name)¶
Bases:
object
Base class for defining post-processing operations on XFT simulation data. Parameters: ———– processor: Callable
A callable object that takes a single argument of type xft.sim.Simulation and performs some post-processing operation on it.
- name: str
A name for the post-processing operation being defined.
Methods:¶
- process(sim: xft.sim.Simulation) -> None:
Applies the post-processing operation to the given simulation.
- class xftsim.proc.WriteToDisk(arg)¶
Bases:
PostProcessor
docstring for PostProcess
xftsim.reproduce module¶
- class xftsim.reproduce.Meiosis(rmap=None, p=None)¶
Bases:
object
A class representing the process of meiosis.
- recombinationMap¶
A pre-defined recombination map.
- Type:
RecombinationMap
, optional
- p¶
A probability used when generating an exchangable recombination map on the fly.
- Type:
float
, optional
- get_recombination_map(haplotypes):
Returns the recombination map, either pre-defined or generated on the fly.
- reproduce(parental_haplotypes=None, mating=None, control=None):
Returns a HaplotypeArray representing the offspring after meiosis.
- get_recombination_map(haplotypes)¶
Get the recombination map, either pre-defined or generated on the fly.
- Parameters:
haplotypes (
xr.DataArray
) – The haplotype data.- Returns:
RecombinationMap
– The recombination map.
- reproduce(parental_haplotypes=None, mating=None, control=None)¶
Return a HaplotypeArray representing the offspring after meiosis.
- Parameters:
parental_haplotypes (
xr.DataArray
, optional) – The parental haplotype data.mating (
MateAssignment
, optional) – The mate assignment object.control (
dict
, optional) – A dictionary containing control parameters.
- Returns:
HaplotypeArray
– The HaplotypeArray representing the offspring after meiosis.
- class xftsim.reproduce.RecombinationMap(p=None, vindex=None, vid=None, chrom=None)¶
Bases:
object
A class to represent a diploid recombination map. In the future, will require XftIndex object instead of vid and chrom.
- Parameters:
p (
float
ornumpy.ndarray
, optional) – Probabilities, either a float or a numpy.ndarray, default is None. A single value results in an exchangle map, an array corresponds to probabilities of recombination between specified locivindex (
xft.index.HaploidVariantIndex | xft.index.DiploidVariantIndex
) – Variant index. Only provide if not providing vid / chromvid (
NDArray[Shape[``
”*”``]
,Any]
, optional) – Variant IDs, default is None.chrom (
NDArray[Shape[``
”*”``]
,Int64]
, optional) – Chromosomes, default is None.
- static constant_map_from_haplotypes(haplotypes=<class 'xarray.core.dataarray.DataArray'>, p=0.5)¶
Create a constant recombination map from haplotypes.
- Parameters:
haplotypes (
xr.DataArray
) – Haplotypes data array.p (
np.float64
, optional) – Probability, default is 0.5.
- Returns:
RecombinationMap
– A constant recombination map.
- static variable_map_from_haplotypes_with_cM(haplotypes=<class 'xarray.core.dataarray.DataArray'>)¶
Create a variable recombination map from haplotypes with centimorgan distances.
- Parameters:
haplotypes (
xr.DataArray
) – Haplotypes data array.- Returns:
RecombinationMap
– A variable recombination map.- Raises:
ValueError – If distance in centimorgans is required and not present in the input.
- xftsim.reproduce.meiosis(parental_haplotypes, recombination_p, maternal_inds, paternal_inds)¶
Performs meiosis on parental haplotypes.
- Parameters:
parental_haplotypes (
numpy.ndarray[int8]
) – An array of parental haplotypes.recombination_p (
numpy.ndarray[float64]
) – An array of recombination probabilities.maternal_inds (
numpy.ndarray[int64]
) – An array of maternal indices.paternal_inds (
numpy.ndarray[int64]
) – An array of paternal indices.
- Returns:
numpy.ndarray[int8]
– An array of offspring haplotypes.
- xftsim.reproduce.transmit_parental_phenotypes(mating, parental_phenotypes, offspring_phenotypes, control=None)¶
Transmits parental phenotypes to offspring.
- Parameters:
mating (
MateAssignment
) – An object representing mating assignments.parental_phenotypes (
xr.DataArray
) – A data array containing parental phenotypes.offspring_phenotypes (
xr.DataArray
) – A data array containing offspring phenotypes.control (
dict
, optional) – A dictionary containing additional control parameters, default is None.
- Returns:
None
xftsim.sim module¶
- class xftsim.sim.DemoSimulation(routine='BGRM', n=2000, m=400)¶
Bases:
Simulation
- demo_routines = {'BGRM': 'Bivariate GCTA with balanced random mating demo\n', 'UGRM': 'Univariate GCTA with balanced random mating demo\n'}¶
- class xftsim.sim.Simulation(founder_haplotypes, mating_regime, recombination_map, architecture, statistics=[], post_processors=[], generation=-1, control={}, reproduction_method=<class 'xftsim.reproduce.Meiosis'>, metadata={}, filter_sample=False, sample_filter=None)¶
Bases:
object
A class for running an xft simulation.
- mating_regime¶
Mating regime.
- Type:
xft.mate.MatingRegime
- recombination_map¶
Recombination map.
- Type:
xft.reproduce.RecombinationMap
- architecture¶
Phenogenetic architecture.
- Type:
xft.arch.Architecture
- statistics¶
Iterable of statistics to compute each generation, by default empty list.
- Type:
Iterable
, optional
- post_processors¶
Iterable of post processors to apply each generation, by default empty list.
- Type:
Iterable
, optional
- generation¶
Initial generation, by default -1, corresponding to an uninitialized simulation
- Type:
int
, optional
- control¶
Control parameters for the simulation, by default an empty dictionary.
- Type:
Dict
, optional
- reproduction_method¶
Reproduction method for the simulation, by default xft.reproduce.Meiosis.
- Type:
xft.reproduce.ReproductionMethod
, optional
- control¶
Control parameters for the simulation
- Type:
dict
- haplotypes¶
Haplotypes for the current generation.
- Type:
xr.DataArray
- phenotypes¶
Phenotypes for the current generation.
- Type:
xr.DataArray
- mating¶
Mating information for the current generation.
- Type:
xr.DataArray
- parent_mating¶
Mating information for the previous generation.
- Type:
xr.DataArray
- parent_haplotypes¶
Haplotypes for the previous generation.
- Type:
xr.DataArray
- parent_phenotypes¶
Phenotypes for the previous generation.
- Type:
xr.DataArray
- results¶
Results for the current generation.
- Type:
xr.DataArray
- current_afs_empirical¶
Current empirical allele frequencies.
- Type:
xr.DataArray
- current_std_genotypes¶
Current standardized genotypes.
- Type:
xr.DataArray
- current_std_phenotypes¶
Current standardized phenotypes.
- Type:
xr.DataArray
- phenotype_store¶
Dictionary storing phenotypes for each generation.
- Type:
Dict[int
,xr.DataArray]
- haplotype_store¶
Dictionary storing haplotypes for each generation.
- Type:
Dict[int
,xr.DataArray]
- mating_store¶
Dictionary storing mating information for each generation.
- Type:
Dict[int
,xr.DataArray]
- results_store¶
Dictionary storing results for each generation.
- Type:
Dict[int
,xr.DataArray]
- pedigree¶
Pedigree information for the simulation (currently not implemented).
- Type:
Any
- metadata¶
Dictionary containing user specified metadata
- Type:
Dict
- run(n_generations: int):
Run the simulation for a specified number of generations.
- run_generation():
Run a single generation of the simulation.
- compute_phenotypes():
Compute phenotypes for the current generation.
- mate():
Perform mating for the current generation.
- reproduce():
Perform reproduction for the current generation.
- estimate_statistics():
Estimate statistics for the current generation.
- process():
Process the current generation using post-processors.
- update_pedigree():
Update pedigree information for the current generation.
- increment_generation():
Increment the current generation.
- move_forward(n_generations: int):
Move the simulation forward by a specified number of generations.
- apply_filter()¶
Apply sample filters to the current generation
- compute_phenotypes()¶
Compute phenotypes for the current generation.
- property control¶
- property current_afs_empirical¶
- property current_std_genotypes¶
- property current_std_genotypes_filtered¶
- property current_std_phenotypes¶
- property current_std_phenotypes_filtered¶
- property dependency_graph¶
- property dependency_graph_edges¶
- draw_dependency_graph(node_color='none', node_size=1200, font_size=5, margins=0.1, edge_color='#222222', arrowsize=6, number_edges=True, **kwargs)¶
- estimate_statistics()¶
Estimate statistics for the current generation.
- property generation¶
- property haplotypes¶
- property haplotypes_filtered¶
- increment_generation()¶
- mate()¶
Perform mating for the current generation.
- property mating¶
- move_forward(n_generations)¶
- property parent_haplotypes¶
- property parent_mating¶
- property parent_phenotypes¶
- property phenotypes¶
- property phenotypes_filtered¶
- pickle_results(path, metadata={}, results_store=True, architecture=True, mating_store=True, phenotype_store=True, mating_regime=False, haplotype_store=False)¶
- process()¶
Apply post-processors to the current generation.
- reproduce()¶
Perform reproduction for the current generation.
- property results¶
- run(n_generations)¶
Run the simulation for a specified number of generations.
- Parameters:
n_generations (
int
) – Number of generations to run the simulation.
- run_generation()¶
Run a single generation of the simulation.
- update_pedigree()¶
Update pedigree information (NOT IMPLEMENTED).
xftsim.stats module¶
- class xftsim.stats.GWAS_Estimator(component_index=None, metadata={}, filter_sample=False, std_X=True, std_Y=True)¶
Bases:
Statistic
Perform linear assocation studies for the given simulation.
When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:
the first dimension indexes variants via xft.index.DiploidVariantIndex
the second dimension indexes four association statistics: slope, se, test-statistic, and p-value
the third dimension indexes phenotypic components via xft.index.ComponentIndex
- component_index¶
Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.
- Type:
xft.index.ComponentIndex
, optional
- estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)¶
- class xftsim.stats.HasemanElstonEstimator(component_index=None, genetic_correlation=True, randomized=True, prettify=True, n_probe=100, dask=True, metadata={}, filter_sample=False)¶
Bases:
Statistic
Estimate Haseman-Elston regression for the given simulation.
- component_index¶
Index of the component for which the statistics are calculated. If not provided, calculate statistics for all components.
- Type:
xft.index.ComponentIndex
, optional
- genetic_correlation¶
If True, calculate and return the genetic correlation matrix.
- Type:
bool
- randomized¶
If True, use a randomized trace estimator.
- Type:
bool
- prettify¶
If True, prettify the output by converting it to a pandas DataFrame.
- Type:
bool
- n_probe¶
The number of random probes for trace estimation.
- Type:
int
- dask¶
If True, use dask for calculations.
- Type:
bool
- estimator(sim: xft.sim.Simulation) Dict ¶
Estimate and return the Haseman-Elston regression for the given simulation.
- estimator(phenotypes, current_std_phenotypes, current_std_genotypes)¶
- class xftsim.stats.MatingStatistics(component_index=None, full=False, metadata={}, filter_sample=False)¶
Bases:
Statistic
Calculate and return various mating statistics for the given simulation.
- Parameters:
component_index (
xft.index.ComponentIndex
, optional) – Index of the component for which the statistics are calculated.full (
bool
) – Ignore component_index and compute statistics for all components If component_index is not provided, and full = False, calculate statistics for phenotype components only.
- estimator(sim: xft.sim.Simulation) Dict ¶
Calculate and return the requested mating statistics for the given simulation.
- estimator(phenotypes, mating)¶
- class xftsim.stats.Pop_GWAS_Estimator(component_index=None, metadata={}, std_X=False, std_Y=False, assume_pairs=True, n_sub=0, PGS=True, PGS_sub_divisions=50, training_fraction=0.8)¶
Bases:
Statistic
Perform one sib only linear assocation studies for the given simulation.
NOTE! Currently assumes each mate-pair produces exactly 2 offspring
When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:
the first dimension indexes variants via xft.index.DiploidVariantIndex
the second dimension indexes four association statistics: slope, se, test-statistic, and p-value
the third dimension indexes phenotypic components via xft.index.ComponentIndex
- component_index¶
Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.
- Type:
xft.index.ComponentIndex
, optional
- estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)¶
- class xftsim.stats.SampleStatistics(means=True, variance_components=True, variances=True, vcov=True, corr=True, prettify=True, metadata={}, filter_sample=False)¶
Bases:
Statistic
Calculate and return various sample statistics for the given simulation.
- means¶
If True, calculate and return the mean of each phenotype.
- Type:
bool
- variance_components¶
If True, calculate and return the variance components of each phenotype.
- Type:
bool
- variances¶
If True, calculate and return the variances of each phenotype.
- Type:
bool
- vcov¶
If True, calculate and return the variance-covariance matrix.
- Type:
bool
- corr¶
If True, calculate and return the correlation matrix.
- Type:
bool
- prettify¶
If True, prettify the output by converting it to a pandas DataFrame.
- Type:
bool
- estimator(sim: xft.sim.Simulation) Dict ¶
Calculate and return the requested sample statistics for the given simulation.
- estimator(phenotypes)¶
- class xftsim.stats.Sib_GWAS_Estimator(component_index=None, metadata={}, std_X=False, std_Y=False, assume_pairs=True, n_sub=0, PGS=True, PGS_sub_divisions=50, training_fraction=0.8)¶
Bases:
Statistic
Perform sib-difference linear assocation studies for the given simulation.
NOTE! Currently assumes each mate-pair produces exactly 2 offspring
When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:
the first dimension indexes variants via xft.index.DiploidVariantIndex
the second dimension indexes four association statistics: slope, se, test-statistic, and p-value
the third dimension indexes phenotypic components via xft.index.ComponentIndex
- component_index¶
Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.
- Type:
xft.index.ComponentIndex
, optional
- estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)¶
- class xftsim.stats.Statistic(estimator, parser, name, metadata={}, filter_sample=False, s_args=None)¶
Bases:
object
Base class for defining statistic estimators.
- name¶
The name of the statistic.
- Type:
str
- estimator¶
The function that estimates the statistic.
- Type:
Callable
- metadata¶
Any additional metadata
- Type:
Dict
- filter_sample¶
Apply global filter prior to estimation?
- Type:
bool
- estimate(sim: xft.sim.Simulation) None: ¶
Estimate the statistic and update the results.
- update_results(sim: xft.sim.Simulation, results: object) None: ¶
Update the simulation’s results_store with the estimated results.
- estimate(sim=None, **kwargs)¶
- static null_parser(self, *args, **kwargs)¶
- parse_results(sim)¶
- update_results(sim, results)¶
- xftsim.stats.apply_threshold_PGS(estimates, G, thresholds=array([5.00000000e-08, 1.03849902e-07, 2.15696043e-07, 4.48000259e-07, 9.30495659e-07, 1.93263766e-06, 4.01408463e-06, 8.33724592e-06, 1.73164434e-05, 3.59662191e-05, 7.47017665e-05, 1.55155423e-04, 3.22257509e-04, 6.69328214e-04, 1.39019339e-03, 2.88742895e-03, 5.99718426e-03, 1.24561400e-02, 2.58713783e-02, 5.37348020e-02, 1.11607078e-01, 2.31807683e-01, 4.81464104e-01, 1.00000000e+00]))¶
- xftsim.stats.apply_threshold_PGS_all(gwas_results, G, minp=5e-08, maxp=1, nthresh=25)¶
- xftsim.stats.haseman_elston(G, Y, n_probe=500, dtype=<class 'numpy.float32'>, dask=False)¶
Perform Haseman-Elston regression, with the option to choose randomized, deterministic, or randomized dask-based methods.
- Parameters:
G (
np.ndarray
) – A 2D numpy array representing standardized (but not scaled) diploid genotypes.Y (
np.ndarray
) – A 2D numpy array representing standardized phenotypes.n_probe (
int
, optional, default500
) – The number of random probes for trace estimation. If n_probe is set to inf, use deterministic method.dtype (
numpy data type
, optional, defaultnp.float32
) – The data type for the input arrays.dask (
bool
, optional, defaultFalse
) – If True, use dask for calculations.
- Returns:
np.ndarray
– A 2D numpy array representing the estimated genetic covariances.
- xftsim.stats.threshold_PGS(estimates, threshold, G)¶
xftsim.struct module¶
- class xftsim.struct.GeneticMap(chrom, pos_bp, pos_cM)¶
Bases:
object
Map between physical and genetic distances.
- Parameters:
chrom (
Iterable
) – Chromsomes variants are located onpos_bp (
Iterable
) – Physical positions of variantspos_cM (
Iterable
) – Map distances in cM
- frame¶
Pandas DataFrame with the above columns
- Type:
pd.DataFrame
- chroms¶
Unique chromosomes present in map
- Type:
np.ndarray
- classmethod from_pyrho_maps(paths, sep='\t', **kwargs)¶
Construct genetic map objects from maps provided at https://github.com/popgenmethods/pyrho Please cite their work if you use their maps.
- Parameters:
paths (
Iterable
) – Paths for each chromosomesep (
str
, optional) – Passed to pd.read_csv()**kwargs – Additional arguments to pd.read_csv()
- Returns:
- interpolate_cM_chrom(pos_bp, chrom, **kwargs)¶
Interpolate cM values in a specified chromosome based on genetic map information.
- Parameters:
pos_bp (
Iterable
) – Physical positions for which to interpolate cM valueschrom (
str
) – Chromosome on which to interpolate**kwargs – Additional keyword arguments to be passed to scipy.interpolate.interp1d.
- class xftsim.struct.HaplotypeArray(haplotypes=None, variant_indexer=None, sample_indexer=None, generation=0, n=None, m=None, dask=False, **kwargs)¶
Bases:
object
Represents a 2D array of binary haplotypes with accompanying row and column indices. Dummy class used for generation of DataArrays and static methods
- class xftsim.struct.PhenotypeArray(components=None, component_indexer=None, sample_indexer=None, generation=0, n=None, k_total=None)¶
Bases:
object
An array that stores phenotypes for a set of individuals. Dummy class used for generation of DataArrays and static methods
- Parameters:
components (
ndarray
, optional) – n x 2m array of binary haplotypes.component_indexer (
xft.index.ComponentIndex
, optional) – Indexer for components.sample_indexer (
xft.index.SampleIndex
, optional) – Indexer for samples.generation (
int
, optional) – The generation this PhenotypeArray belongs to.n (
int
, optional) – The number of samples.k_total (
int
, optional) – The total number of components.
- Returns:
xr.DataArray
– The initialized PhenotypeArray.- Raises:
AssertionError – If components is provided, then n and k_total must not be provided. If component_indexer is provided, then k_total must not be provided. If sample_indexer is provided, then n must not be provided. If components is provided and sample_indexer is provided, then the shape of components must match the size of the sample dimension of sample_indexer. If components is provided and component_indexer is provided, then the shape of components must match the size of the component dimension of component_indexer. If component_indexer is provided, then the size of the component dimension of component_indexer must match k_total.
- static from_product(phenotype_name, component_name, vorigin_relative, components=None, sample_indexer=None, generation=None, haplotypes=None, n=None)¶
Create a PhenotypeArray from a product of names.
- Parameters:
phenotype_name (
iterable
) – The names of the phenotypes.component_name (
iterable
) – The names of the components.vorigin_relative (
iterable
) – The relative origins of each component.components (
xr.DataArray
, optional) – The array to use as the components.sample_indexer (
xft.index.SampleIndex
, optional) – The sample indexer to use.generation (
int
, optional) – The generation of the PhenotypeArray.haplotypes (
xr.DataArray
, optional) – The haplotypes to use.n (
int
, optional) – The number of samples to use.
- Returns:
xr.DataArray
– The new PhenotypeArray.- Raises:
AssertionError – If exactly one of generation and sample_indexer is provided, or exactly one of haplotypes and sample_indexer/generation or n/generation is provided.
- class xftsim.struct.XftAccessor(xarray_obj)¶
Bases:
object
Accessor for Xarray DataArrays with specialized functionality for HaplotypeArray and PhenotypeArray objects.
- Parameters:
xarray_obj (
xarray.DataArray
) – The DataArray to be accessed.
- _obj¶
The DataArray to be accessed.
- Type:
xarray.DataArray
- _array_type¶
The type of the DataArray, either ‘HaplotypeArray’ or ‘componentArray’.
- Type:
str
- _non_annotation_vars¶
The non-annotation variables in the DataArray.
- Type:
list
ofstr
- _variant_vars¶
The variant annotation variables in the DataArray.
- Type:
list
ofstr
- _sample_vars¶
The sample annotation variables in the DataArray.
- Type:
list
ofstr
- _component_vars¶
The component annotation variables in the DataArray.
- Type:
list
ofstr
- _row_dim¶
The label of the row dimension.
- Type:
str
- _col_dim¶
The label of the column dimension.
- Type:
str
- shape¶
The shape of the DataArray.
- Type:
tuple
- n¶
The number of rows in the DataArray.
- Type:
int
- data¶
The data in the DataArray.
- Type:
numpy.ndarray
- row_vars¶
List of coordinate variable names for the row dimension.
- Type:
list
- column_vars¶
List of coordinate variable names for the column dimension.
- Type:
list
- sample_mindex¶
MultiIndex object for the ‘sample’ dimension, containing iid, fid, and sex columns.
- Type:
pd.MultiIndex
- component_mindex¶
MultiIndex object for the ‘component’ dimension, containing phenotype_name, component_name, and vorigin_relative columns.
- Type:
pd.MultiIndex
- Raises:
NotImplementedError – If the DataArray dimensions are not (‘sample’, ‘variant’) or (‘sample’, ‘component’).
- property af_empirical¶
Empirical allele frequencies. Specific to HaplotypeArray objects.
- Returns:
numpy.ndarray
– Empirical allele frequencies.- Raises:
TypeError – If _col_dim is not ‘variant’.
- property all_components¶
Returns an array of all the unique component names. Specific to PhenotypeArray objects.
- Returns:
numpy.ndarray
– An array of all the unique component names.- Raises:
TypeError – If the column dimension is not ‘component’.
- property all_phenotypes¶
Returns an array of all the unique phenotype component names. Specific to PhenotypeArray objects.
- Returns:
numpy.ndarray
– An array of all the unique phenotype component names.- Raises:
TypeError – If the column dimension is not ‘component’.
- property all_relatives¶
Returns an array of all the unique origin relative values. Specific to PhenotypeArray objects.
- Returns:
numpy.ndarray
– An array of all the unique origin relative values.- Raises:
TypeError – If the column dimension is not ‘component’.
- as_pd(prettify=True)¶
Returns the data as a Pandas DataFrame. Specific to PhenotypeArray objects.
- Parameters:
prettify (
bool
, optional) – If True, the multi-index columns will be prettified by replacing -1, 0, 1 with ‘proband’, ‘mother’, ‘father’, respectively.- Raises:
TypeError – If the column dimension is not ‘component’.
- Returns:
pd.DataFrame
– A Pandas DataFrame representing the data.
- property column_vars¶
Get the column coordinate variables for the DataArray object.
- Returns:
XftIndex
– The column coordinate variables of the current column dimension.
- property component_mindex¶
Get a Pandas MultiIndex object for the component dimension.
- Returns:
pandas.MultiIndex
– MultiIndex object with phenotype_name, component_name, and vorigin_relative as index levels.- Raises:
NotImplementedError – If the column dimension is not ‘component’.
- property data¶
The data in the DataArray.
- Returns:
numpy.ndarray
– The data in the DataArray.
- property depth¶
Returns the generational depth from binary relative encoding. Specific to PhenotypeArray objects.
- Raises:
TypeError – If the column dimension is not ‘component’.
- Returns:
Union[float
,np.nan]
– The generational depth from binary relative encoding, or NaN if the relative origin is empty.
- property diploid_chrom¶
Diploid chromosome numbers. Specific to HaplotypeArray objects.
- Returns:
numpy.ndarray
– Diploid chromosome numbers.- Raises:
TypeError – If _col_dim is not ‘variant’.
- property diploid_vid¶
Diploid variant ID. Specific to HaplotypeArray objects.
- Returns:
numpy.ndarray
– Diploid variant IDs.- Raises:
TypeError – If _col_dim is not ‘variant’.
- property generation¶
Generation of the data. Specific to HaplotypeArray objects.
- Returns:
int
– Generation attribute.- Raises:
TypeError – If _col_dim is not ‘variant’.
- get_annotation_dict()¶
Return a dictionary of all annotation variables associated with the variants in the object. Specific to HaplotypeArray objects.
- Returns:
dict
– A dictionary where the keys are the annotation variable names and the values are the corresponding arrays.- Raises:
TypeError: – If the _col_dim attribute is not equal to ‘variant’.
- get_column_indexer()¶
Get the column indexer object for the PhenotypeArray object.
- Returns:
xft.index.Indexer
– The indexer object based on the current column dimension.- Raises:
TypeError – If the current column dimension is not recognized.
- get_comp_type(ctype='intermediate')¶
Returns the index array of components with comp_type==ctype Specific to PhenotypeArray objects.
- Returns:
XftIndex
– The index of components that match the given keyword.- Raises:
TypeError – If the column dimension is not ‘component’.
- get_component_indexer()¶
Get the component indexer of a PhenotypeArray.
- Returns:
xft.index.ComponentIndex
– A ComponentIndex object.
- get_intermediate_components()¶
Returns the index array of components with comp_type==’intermediate’ Specific to PhenotypeArray objects.
- Returns:
XftIndex
– The index of components that match the given keyword.- Raises:
TypeError – If the column dimension is not ‘component’.
- get_k_rel(rel)¶
Returns the number of components with the given relative origin. Specific to PhenotypeArray objects.
- Args:
rel (int): The relative origin of the components.
- Raises:
TypeError: If the column dimension is not ‘component’.
- Returns:
int: The number of components with the given relative origin.
- get_non_annotation_dict()¶
Return a dictionary of all non-annotation variables associated with the variants in the object. Specific to HaplotypeArray objects.
- Returns:
dict
– A dictionary where the keys are the non-annotation variable names and the values are the corresponding arrays.- Raises:
TypeError: – If the _col_dim attribute is not equal to ‘variant’.
- get_outcome_components()¶
Returns the index array of components with comp_type==’outcome’ Specific to PhenotypeArray objects.
- Returns:
XftIndex
– The index of components that match the given keyword.- Raises:
TypeError – If the column dimension is not ‘component’.
- get_row_indexer()¶
Get the row indexer.
- Returns:
xft.index.SampleIndex
– A SampleIndex object.- Raises:
TypeError – If the row dimension is not ‘sample’.
- get_sample_indexer()¶
Returns an instance of xft.index.SampleIndex representing the sample indexer constructed from the input data.
- Raises:
NotImplementedError – If _row_dim is not ‘sample’.
- Returns:
SampleIndex
– An instance of xft.index.SampleIndex constructed from the sample data in the input object.
- get_variant_indexer()¶
Get the variant indexer of a HaplotypeArray.
- Returns:
xft.index.HaploidVariantIndex
– A HaploidVariantIndex object.
- grep_component_index(keyword='phenotype')¶
Returns the index array of components whose names contain the given keyword. Specific to PhenotypeArray objects.
- Parameters:
keyword (
str
, optional) – The keyword to search for in component names, by default ‘phenotype’.- Returns:
XftIndex
– The index of components that match the given keyword.- Raises:
TypeError – If the column dimension is not ‘component’.
- interpolate_cM(gmap, **kwargs)¶
Interpolate cM values based on genetic map information. Specific to HaplotypeArray objects.
- Parameters:
gmap (
GeneticMap
) – Genetic map data**kwargs – Additional keyword arguments to be passed to scipy.interpolate.interp1d.
- Raises:
TypeError – If the column dimension is not ‘variant’.
ValueError – If not all chromosomes required are present in the genetic map
- property k_components¶
Returns the number of unique component names. Specific to PhenotypeArray objects.
- Returns:
int
– The number of unique component names.- Raises:
TypeError – If the column dimension is not ‘component’.
- property k_current¶
Returns the number of all current-gen specific components. Specific to PhenotypeArray objects.
- Raises:
TypeError – If the column dimension is not ‘component’.
- Returns:
int
– The number of all current-gen specific components.
- property k_phenotypes¶
Returns the number of unique phenotype components. Specific to PhenotypeArray objects.
- Returns:
int
– The number of unique phenotype components.- Raises:
TypeError – If the column dimension is not ‘component’.
- property k_relative¶
Returns the number of unique origin relative values. Specific to PhenotypeArray objects.
- Returns:
int
– The number of unique origin relative values.- Raises:
TypeError – If the column dimension is not ‘component’.
- property k_total¶
Returns the total number of components. Specific to PhenotypeArray objects.
- Returns:
int
– The total number of components.- Raises:
TypeError – If the column dimension is not ‘component’.
- property m¶
Return the number of distinct diploid variants. Specific to HaplotypeArray objects.
- Returns:
int
– The number of distinct diploid variants in the array.- Raises:
TypeError: – If the _col_dim attribute is not equal to ‘variant’.
- property maf_empirical¶
Empirical minor allele frequencies. Specific to HaplotypeArray objects.
- Returns:
numpy.ndarray
– Empirical minor allele frequencies.- Raises:
TypeError – If _col_dim is not ‘variant’.
- property n¶
The number of rows in the DataArray.
- Returns:
int
– The number of rows in the DataArray.
- reindex_components(value)¶
Reindex the components.
- Parameters:
value (
xft.index.ComponentIndex
) – A ComponentIndex object.- Returns:
PhenotypeArray
– A new PhenotypeArray object.
- property row_vars¶
Get the row coordinate variables for the PhenotypeArray object.
- Returns:
XftIndex
– The row coordinate variables of the row dimension.
- property sample_mindex¶
Get the sample multi-index for the PhenotypeArray object.
- Returns:
pd.MultiIndex
– A multi-index object containing sample IDs, family IDs, and sex information.- Raises:
NotImplementedError – If the current row dimension is not ‘sample’.
- set_column_indexer(value)¶
Set the column indexer object for the PhenotypeArray object.
- Parameters:
value (
xft.index.Indexer
) – The new indexer object for the PhenotypeArray object.- Returns:
None
- Raises:
TypeError – If the current column dimension is not recognized.
- set_row_indexer()¶
- set_sample_indexer(value)¶
- set_variant_indexer(value)¶
- property shape¶
The shape of the DataArray.
- Returns:
tuple
– The shape of the DataArray.
- split_by_component()¶
Splits the data by component name. Specific to PhenotypeArray objects.
- Raises:
TypeError – If the column dimension is not ‘component’.
- Returns:
Dict[str
,pd.DataFrame]
– A dictionary of dataframes, where the keys are the unique component names and the values are dataframes containing the data for each component.
- split_by_phenotype()¶
Splits the data by phenotype name. Specific to PhenotypeArray objects.
- Raises:
TypeError – If the column dimension is not ‘component’.
- Returns:
Dict[str
,pd.DataFrame]
– A dictionary of dataframes, where the keys are the unique phenotype names and the values are dataframes containing the data for each phenotype.
- split_by_phenotype_vorigin()¶
Splits the data by phenotype name and relative origin. Specific to PhenotypeArray objects.
- Raises:
TypeError –
:raises If the column dimension is not
'component'
:
- Returns:
Dict[Tuple[str
,int]
,pd.DataFrame]
– A dictionary of dataframes, where the keys are tuples of phenotype name and relative origin and the values are dataframes containing the data for each combination of phenotype name and relative origin.
- split_by_vorigin()¶
Splits the data by relative origin. Specific to PhenotypeArray objects.
- Raises:
TypeError – If the column dimension is not ‘component’.
- Returns:
Dict[int
,pd.DataFrame]
– A dictionary of dataframes, where the keys are the unique relative origins and the values are dataframes containing the data for each relative origin.
- standardize()¶
- to_diploid()¶
Convert the object to a diploid representation by adding the two haplotypes for each variant. Specific to HaplotypeArray objects.
- Raises:
TypeError: – If the _col_dim attribute is not equal to ‘variant’.
- to_diploid_standardized(af=None, scale=False)¶
Standardize the HaplotypeArray object and convert it to a diploid representation. Specific to HaplotypeArray objects.
- Parameters:
af (
NDArray
, optional) – An array containing the allele frequencies of each variant. If not provided, empirical afs will with usedscale (
bool
, optional) – Whether or not to scale the standardized array by the square root of the number of variants.
- Returns:
ndarray
– A standardized diploid array where each variant is represented as the sum of two haplotypes.- Raises:
TypeError: – If the _col_dim attribute is not equal to ‘variant’.
- use_empirical_afs()¶
Sets allele frequencies to the empirical frequencies. Specific to HaplotypeArray objects.
- Raises:
TypeError – If _col_dim is not ‘variant’.
xftsim.utils module¶
- class xftsim.utils.ConstantCount(count)¶
Bases:
VariableCount
Class representing a constant count of individuals in a population.
- draw¶
a function that generates an array of counts
- Type:
Callable
- expectation¶
expected count
- Type:
float
- nonzero_fraction¶
the fraction of the population that is nonzero
- Type:
float
- Parameters:
count (
int
) – The constant count of individuals in the population.
- class xftsim.utils.MixtureCount(componentCounts, mixture_probabilities)¶
Bases:
VariableCount
Class representing a mixture of VariableCounts of individuals in a population.
- draw¶
a function that generates an array of counts
- Type:
Callable
- expectation¶
expected count
- Type:
float
- nonzero_fraction¶
the fraction of the population that is nonzero
- Type:
float
- Parameters:
componentCounts (
Iterable
) – An iterable of VariableCount instances, representing the components of the mixture.mixture_probabilities (
NDArray[Shape[``
”*”``]
,Float64]
) – An array of probabilities associated with each component in the mixture.
- class xftsim.utils.NegativeBinomialCount(r, p)¶
Bases:
VariableCount
Class representing a negative binomial-distributed count of individuals in a population.
- draw¶
a function that generates an array of counts
- Type:
Callable
- expectation¶
expected count
- Type:
float
- nonzero_fraction¶
the fraction of the population that is nonzero
- Type:
float
- Parameters:
r (
float
) – The number of successes in the negative binomial distribution.p (
float
) – The probability of success in the negative binomial distribution.
- class xftsim.utils.PoissonCount(rate)¶
Bases:
VariableCount
Class representing a Poisson-distributed count of individuals in a population. .. attribute:: draw
a function that generates an array of counts
- type:
Callable
- expectation¶
expected count
- Type:
float
- nonzero_fraction¶
the fraction of the population that is nonzero
- Type:
float
- Parameters:
rate (
float
) – The Poisson rate parameter.
- class xftsim.utils.VariableCount(draw, expectation=None, nonzero_fraction=None)¶
Bases:
object
A class to represent random count variables
…
- draw¶
a function that generates an array of counts
- Type:
Callable
- expectation¶
expected count
- Type:
float
- nonzero_fraction¶
the fraction of the population that is nonzero
- Type:
float
- None()¶
- property expectation¶
Getter function for expectation attribute.
- Returns:
float
– Expected count.
- property nonzero_fraction¶
Getter function for nonzero_fraction attribute.
- Returns:
float
– The fraction of the population that is nonzero.
- class xftsim.utils.ZeroTruncatedPoissonCount(rate)¶
Bases:
VariableCount
Class representing a zero-truncated Poisson-distributed count of individuals in a population.
- draw¶
a function that generates an array of counts
- Type:
Callable
- expectation¶
expected count
- Type:
float
- nonzero_fraction¶
the fraction of the population that is nonzero
- Type:
float
- Parameters:
rate (
float
) – The Poisson rate parameter prior to zero-truncation.
- xftsim.utils.cartesian_product(*args)¶
Returns a list of columns comprising a cartesian product of input arrays. Emulates R function expand.grid()
- Parameters:
*args (
NDArray[Any
,Any]
) – The input arrays.- Returns:
List[NDArray[Any
,Any]]
– The list of columns.
- xftsim.utils.cov2cor(A)¶
Converts covariance matrix to correlation matrix.
Parameters:¶
- A: Union[np.ndarray, pd.DataFrame, xr.DataArray]
Input covariance matrix.
Returns:¶
- Union[np.ndarray, pd.DataFrame, xr.DataArray]
Correlation matrix.
Raises:¶
None
- xftsim.utils.ensure2D(x)¶
Ensures the input array is 2D, by adding a new dimension if needed.
- Parameters:
x (
arraylike
) – The input array, by default None.- Returns:
NDArray[Any
,Any]
– The 2D input array.- Raises:
ValueError – If the input array is not valid.
- xftsim.utils.exhaustive_enumerate(a, n_per_a)¶
Repeat each ith element of array a integer n_per_a[i] times such that each every element appears min(j, n_per_a[i]) times in order before any element appears j+1 times.
Parameters:¶
- aarray-like
1-D array of any shape and data type.
- n_per_aarray-like
1-D array of int, representing the number of times each element in a needs to be repeated.
Returns:¶
- outarray-like
1-D array of shape (n,) and the same data type as a, where each element is repeated as per n_per_a in the order before any element appears j+1 times.
Raises:¶
Warning : If the output array is empty.
Examples:¶
>>> exhaustive_enumerate(np.array((1, 2, 3, 4)), np.array((3, 2, 1, 0))) array([1, 2, 3, 1, 2, 1])
- xftsim.utils.exhaustive_permutation(a, n_sample)¶
Returns a random permutation of the input array, such that each element is selected exactly once before any element is selected twice, and so forth
Parameters:¶
- aNDArray[Shape[“*”], Any]
A numpy array to be permuted.
- n_sampleint
An integer specifying the size of the permutation to be returned.
Returns:¶
- np.ndarray
A 1D numpy array containing the permuted elements.
- xftsim.utils.ids_from_generation(generation, indices=None)¶
Generates and returns a new array of IDs using the given generation number and the given indices. The new array contains the given indices with the generation number prefixed to each index.
- Parameters:
generation (
int
) – The generation number to use in the prefix of the IDs.indices (
NDArray[Shape[``
”*”``]
,Int64]
, optional) – A numpy array of indices.
- Returns:
ndarray
– A new numpy array of IDs with the given generation number prefixed to each index.
- xftsim.utils.ids_from_generation_range(generation, n=None)¶
Returns an array of string IDs of length n, created by concatenating the input generation with an increasing sequence of integers from 0 to n-1.
Parameters:¶
- generationint
An integer representing the generation of the IDs to be created.
- nNDArray[Shape[“*”], Int64], optional (default=None)
An integer specifying the number of IDs to be generated. If None, a range of IDs starting from 0 is created.
Returns:¶
- np.ndarray
A 1D numpy array containing the IDs in string format.
- xftsim.utils.ids_from_n_generation(n, generation)¶
Creates an array of individual IDs based on the specified number of elements and generation.
- Parameters:
n (
int
) – The number of individuals.generation (
int
) – The generation number.
- Returns:
numpy.ndarray
– An array of individual IDs.
- xftsim.utils.match(a, b)¶
Finds the indices in b that match the elements in a, and returns the corresponding index of each element in b.
Parameters:¶
- aList[Hashable]
List of elements to find matches for.
- bList[Hashable]
List of elements to find matches in.
Returns:¶
- List[int]
A list of indices in b that match the elements in a.
- xftsim.utils.matching_indices_conditional(a, b, condition)¶
Returns the indices of matches between a and b arrays, given a boolean condition.
- xftsim.utils.merge_duplicate_pairs(a, b, n, sort=False)¶
Merge duplicate pairs of values in a and b based on their corresponding values in n.
Parameters:¶
- aNDArray[Shape[“*”], Any]
First array to merge.
- bNDArray[Shape[“*”], Any]
Second array to merge.
- nNDArray[Shape[“*”], Any]
Array of corresponding values that determine how the duplicates are merged.
- sortbool, optional
Whether to sort the values in a and b before merging the duplicates. Default is False.
Returns:¶
- Tuple[NDArray[Shape[“*”], Any], NDArray[Shape[“*”], Any], NDArray[Shape[“*”], Any]]
The merged arrays, with duplicates removed based on the corresponding values in n.
- xftsim.utils.merge_duplicates(it)¶
Merge duplicates in the input array by checking if any pasted elements are the same.
- Parameters:
it (
Iterable
) – A numpy array with elements to be checked for duplication.- Returns:
list
– Returns the input list with duplicates merged if present.
- xftsim.utils.paste(it, sep='_')¶
Concatenates elements in a list-like object with a specified separator.
- Parameters:
it (
list-like
) – The list-like object containing elements to concatenate.sep (
str
, optional) – The separator used to concatenate the elements. Defaults to “_”.
- Returns:
numpy.ndarray
– An array of concatenated string elements.
- xftsim.utils.print_tree(x, depth=0)¶
Print dict of dict(of dict(…)s)s in easy to read tree similar to bash program ‘tree’ Modified from https://stackoverflow.com/questions/47131263/python-3-6-print-dictionary-data-in-readable-tree-structure
- Parameters:
x (
Any
) – Dict of dicts
- xftsim.utils.profiled(call, level=1, message=None, sep=' | ')¶
A decorator that prints the duration of a function call when the specified logging level is met.
- Parameters:
call (
function
) – The function being decorated.level (
int
, optional) – The logging level at which the duration of the function call is printed. Defaults to 1.message (
str
, optional) – A custom message to display in the log output. If not provided, the name of the decorated function will be used.
- Returns:
TYPE
– Description
- xftsim.utils.sort_and_paste(x)¶
Sorts the input array in ascending order and concatenates the first element with an underscore separator followed by the second element.
Parameters:¶
- xarray-like
1-D array of any shape and data type.
Returns:¶
- outarray-like
1-D array of strings with shape (n,) and the same length as x, where each element is formed by concatenating two sorted string representations of each element in x, separated by an underscore.
Examples:¶
>>> sort_and_paste(np.array((3, 1, 2))) array(['1_2', '2_3', '1_3'], dtype='<U3')
- xftsim.utils.standardize_array(a)¶
Standardizes columns of a 2D array.
Parameters:¶
- a: ArrayLike
Input 2D array.
Returns:¶
- np.ndarray
Standardized 2D array.
Raises:¶
None
- xftsim.utils.standardize_array_hw(haplotypes, af)¶
Wraps _standardize_array_hw to prevent segfaults.
Parameters:¶
- haplotypes: NDArray[Shape[”,”], Int8]
Input array of int8 haploid genotypes.
- af: NDArray[Shape[“*”], Float]
Input array of allele frequencies.
Returns:¶
- np.ndarray
Standardized genotypes.
Raises:¶
None
- xftsim.utils.to_proportions(*args)¶
Converts input values to proportional values.
Parameters:¶
- *args: Union[float, int]
Input values.
Returns:¶
- np.ndarray
Proportional values.
Raises:¶
None
- xftsim.utils.to_simplex(*args)¶
Converts input values to a simplex vector.
Parameters:¶
- *args: Union[float, int]
Input values.
Returns:¶
- np.ndarray
Simplex vector.
Raises:¶
- ValueError
If all input values are less than or equal to zero.
- xftsim.utils.unique_identifier(frame, index_variables, prefix=None)¶
Returns a unique identifier string generated from index variables of a dataframe.
Parameters:¶
- frame: pd.DataFrame
Input dataframe.
- index_variables: List[str]
List of column names to be used as index.
- prefix: str
Optional prefix
Returns:¶
- str
Unique identifier string of the form [<prefix>..]<index_var1>.<index_var2>…
Raises:¶
None
Module contents¶
- class xftsim.Config¶
Bases:
object
A class to store configuration settings. Instantiated as xftsim.config when package is loaded
- nthreads¶
Number of threads to use for parallel execution.
- Type:
int
- print_level¶
Verbosity level for print statements.
- Type:
int
- print_durations_threshold¶
Threshold for printing durations.
- Type:
float
- get_pdurations()¶
Get the current print durations threshold.
- Returns:
float
– The print durations threshold.
- get_plevel()¶
Get the current print level.
- Returns:
int
– The print level.