xftsim package

Submodules

xftsim.arch module

class xftsim.arch.AdditiveGeneticComponent(beta=None, metadata={}, component_name='addGenetic')

Bases: ArchitectureComponent

A genetic component with additive effects.

Parameters:
  • beta (xft.effect.AdditiveEffects, optional) – Additive effects, by default None.

  • metadata (Dict, optional) – Additional metadata, by default an empty dictionary.

effects

Additive effects.

Type:

xft.effect.AdditiveEffects

compute_component(haplotypes, phenotypes)

Compute the additive genetic component of the phenotype.

Parameters:
  • haplotypes (xr.DataArray) – Haplotypes to be used in the computation.

  • phenotypes (xr.DataArray) – Phenotypes to be modified.

property true_cov_beta

Compute the covariance matrix of the additive effects.

Returns:

ndarray – Covariance matrix of the additive effects.

property true_rho_beta

Compute the correlation coefficient matrix of the additive effects.

Returns:

ndarray – Correlation coefficient matrix of the additive effects.

class xftsim.arch.AdditiveNoiseComponent(variances=None, sds=None, means=None, phenotype_name=None, component_index=None, component_name='addNoise')

Bases: ArchitectureComponent

An independent Gaussian noise component.

Parameters:
  • variances (Iterable, optional) – Variances of the noise components, by default None.

  • sds (Iterable, optional) – Standard deviations of the noise components, by default None.

  • means (Iterable, optional) – Means of the noise components, by default set to zero.

  • phenotype_name (Iterable, optional) – Names of the phenotypes, by default None. Included for backwards compatability. Do not specify if providing component_index

  • component_index (xftsim.index.ComponentIndex, optional) – Alternatively, provide output component index

variances

Variances of the noise components.

Type:

ndarray

sds

Standard deviations of the noise components.

Type:

ndarray

compute_component(haplotypes, phenotypes)

Compute the noise component of the phenotype.

Parameters:
  • haplotypes (xr.DataArray) – Haplotypes, not used in the computation.

  • phenotypes (xr.DataArray) – Phenotypes to be modified.

class xftsim.arch.Architecture(components=None, metadata={}, depth=1, expand_components=False)

Bases: object

Class representing a phenogenetic architecure

Parameters:
  • components (Iterable, optional) – An iterable collection of ArchitectureComponent objects

  • metadata (Dict, optional) – A dictionary for holding metadata about the Architecture object

  • depth (int, optional) – The generational depth of the architecture, default to 1

  • expand_components (bool, optional) – A boolean flag indicating whether to expand the components, default to False

metadata

A dictionary for holding metadata about the Architecture object

Type:

Dict

components

An iterable collection of ArchitectureComponent objects

Type:

Iterable

depth

The depth of the architecture

Type:

int

expand_components

A boolean flag indicating whether to expand the components

Type:

bool

founder_initializations() List:

Get a list of the founder initialization of each component

merged_component_indexer() xft.index.ComponentIndex:

Get the merged component indexer

initialize_phenotype_array(haplotypes: xr.DataArray, control: dict = None) xr.DataArray:

Initialize a new phenotype array

initialize_founder_phenotype_array(haplotypes: xr.DataArray, control: dict = None) xr.DataArray:

Initialize a new founder phenotype array

compute_phenotypes(haplotypes: xr.DataArray = None, phenotypes: xr.DataArray = None, control: dict = None) None:

Compute phenotypes for the given haplotypes and phenotypes

check_dependencies()
compute_phenotypes(haplotypes=None, phenotypes=None, control=None)

Compute phenotypes.

Parameters:
  • haplotypes (xr.DataArray, optional) – Input haplotypes.

  • phenotypes (xr.DataArray, optional) – Input phenotypes.

  • control (dict, optional) – Dictionary containing control parameters.

property dependency_graph
property dependency_graph_edges
draw_dependency_graph(node_color='none', node_size=1200, font_size=5, margins=0.1, edge_color='#222222', arrowsize=6, number_edges=True, **kwargs)
property founder_initializations

Get a list of the founder initialization of each component

initialize_founder_phenotype_array(haplotypes, control=None)

Initialize a founder generation phenotype array from haplotypes under the specified architecture. In the absense of vertical transmission, this is equivalent to initialize_phenotype_array().

Parameters:
  • haplotypes (xr.DataArray) – Input haplotypes.

  • control (dict, optional) – Dictionary containing control parameters.

Returns:

xr.DataArray – Phenotype array with the merged component indexer and sample indexer.

initialize_phenotype_array(haplotypes, control=None)

Initialize a phenotype array from haplotypes under the specified architecture.

Parameters:
  • haplotypes (xr.DataArray) – Input haplotypes.

  • control (dict, optional) – Dictionary containing control parameters.

Returns:

xr.DataArray – Phenotype array with the merged component indexer and sample indexer.

property merged_component_indexer

Get the merged ComponentIndex indexer across all archtecure components

class xftsim.arch.ArchitectureComponent(compute_component=None, input_cindex=None, output_cindex=None, input_haplotypes=False, founder_initialization=None, component_name='generic')

Bases: object

Class representing a component of a genetic architecture.

Parameters:
  • compute_component (Callable, optional) – Function that accesses haplotypes and/or phenotypes and modifies phenotypes by reference, by default None.

  • input_cindex (xft.index.ComponentIndex, optional) – Index of the input component, by default None.

  • output_cindex (xft.index.ComponentIndex, optional) – Index of the output component, by default None.

  • input_haplotypes (bool or xft.index.HaploidVariantIndex, optional) – Boolean or HaploidVariantIndex indicating if input haplotypes are used, by default False.

  • founder_initialization (Callable, optional) – Function that initializes founder haplotypes for the component, by default None.

_compute_component

Function that accesses haplotypes and/or phenotypes and modifies phenotypes by reference.

Type:

Callable or None

input_haplotypes

Boolean or HaploidVariantIndex indicating if input haplotypes are used.

Type:

bool or xft.index.HaploidVariantIndex

input_cindex

Index of the input component.

Type:

xft.index.ComponentIndex

output_cindex

Index of the output component.

Type:

xft.index.ComponentIndex

founder_initialization

Function that initializes founder haplotypes for the component.

Type:

Callable or None

property component_name
compute_component(haplotypes=None, phenotypes=None)

Function that accesses haplotypes and/or phenotypes and modifies phenotypes by reference.

Parameters:
  • haplotypes (xr.DataArray, optional) – Haplotypes to be accessed, by default None.

  • phenotypes (xr.DataArray, optional) – Phenotypes to be accessed and modified, by default None.

static default_input_cindex(*args, **kwargs)

Static method to define the default input component index.

static default_output_cindex(*args, **kwargs)

Static method to define the default output component index.

property dependency_graph
property dependency_graph_edges
draw_dependency_graph(node_color='none', node_size=1500, arrowsize=7, font_size=6, margins=0.1, **kwargs)
property input_component_name
property input_phenotype_name
property input_vorigin_relative
property merged_phenotype_indexer
property output_component_name
property output_phenotype_name
property output_vorigin_relative
property phenotype_name
property vorigin_relative
class xftsim.arch.BinarizingTransformation(thresholds, input_cindex, output_cindex, component_name='binarize')

Bases: ArchitectureComponent

An architecture component that binarizes specified phenotypes based on specified thresholds under a liability-threshold model.

Attributes:

thresholdsIterable

A list or array of thresholds used for binarization.

input_cindexxft.index.ComponentIndex

The input component index.

output_cindexxft.index.ComponentIndex

The output component index.

phenotype_nameIterable

The name of the phenotype.

liability_componentstr

The liability component to be used. Default is ‘phenotype’.

vorigin_relativeIterable

The relative V origin. Default is [-1].

output_componentstr

The name of the output component. Default is ‘binary_phenotype’.

Methods:

construct_input_cindex(phenotype_name: Iterable,

liability_component: str = ‘phenotype’, vorigin_relative: Iterable = [-1],) -> xft.index.ComponentIndex

Constructs the input component index based on given phenotype names.

construct_output_cindex(phenotype_name: Iterable,

output_component: str = ‘binary_phenotype’, vorigin_relative: Iterable = [-1],) -> xft.index.ComponentIndex

Constructs the output component index based on given phenotype names.

construct_cindexes(phenotype_name: Iterable,

liability_component: str = ‘phenotype’, output_component: str = ‘binary_phenotype’, vorigin_relative: Iterable = [-1],) -> Tuple[xft.index.ComponentIndex, xft.index.ComponentIndex]

Constructs both the input and output component indexes based on given phenotype names.

compute_component(self,

haplotypes: xr.DataArray, phenotypes: xr.DataArray) -> None:

Computes the binary phenotype based on the given thresholds.

compute_component(haplotypes, phenotypes)

Computes the binarizing transformation.

Parameters:
  • haplotypes (xr.DataArray) – The haplotypes.

  • phenotypes (xr.DataArray) – The phenotypes.

static construct_cindexes(phenotype_name, liability_component='phenotype', output_component='binary_phenotype', vorigin_relative=[-1])

Constructs both input and output component indexes for the binarizing transformation.

Parameters:
  • phenotype_name (Iterable) – Names of the phenotypes.

  • liability_component (str, optional) – Name of the liability component. Default is “phenotype”.

  • output_component (str, optional) – Name of the output component. Default is “binary_phenotype”.

  • vorigin_relative (Iterable, optional) – v-origin relative. Default is [-1].

Returns:

Tuple[xft.index.ComponentIndex, xft.index.ComponentIndex] – The input and output component indexes.

static construct_input_cindex(phenotype_name, liability_component='phenotype', vorigin_relative=[-1])

Constructs the input component index for the binarizing transformation.

Parameters:
  • phenotype_name (Iterable) – Names of the phenotypes.

  • liability_component (str, optional) – Name of the liability component. Default is “phenotype”.

  • vorigin_relative (Iterable, optional) – v-origin relative. Default is [-1].

Returns:

xft.index.ComponentIndex – The input component index.

static construct_output_cindex(phenotype_name, output_component='binary_phenotype', vorigin_relative=[-1])

Constructs the output component index for the binarizing transformation.

Parameters:
  • phenotype_name (Iterable) – Names of the phenotypes.

  • output_component (str, optional) – Name of the output component. Default is “binary_phenotype”.

  • vorigin_relative (Iterable, optional) – v-origin relative. Default is [-1].

Returns:

xft.index.ComponentIndex – The output component index.

class xftsim.arch.ConstantFounderInitialization(component_index=None, constants=None)

Bases: FounderInitialization

Founder initialization that sets all haplotypes to constant values.

class xftsim.arch.CorrelatedNoiseComponent(vcov=None, means=None, phenotype_name=None, component_index=None, component_name='corrNoise')

Bases: ArchitectureComponent

Multivariate Gaussian noise component.

Parameters:
  • vcov (ndarray, optional) – variance covariance matrix

  • means (Iterable, optional) – Means of the noise components, by default set to zero.

  • phenotype_name (Iterable, optional) – Names of the phenotypes, by default None. Included for backwards compatability. Do not specify if providing component_index

  • component_index (xftsim.index.ComponentIndex, optional) – Alternatively, provide output component index

compute_component(haplotypes, phenotypes)

Compute the noise component of the phenotype.

Parameters:
  • haplotypes (xr.DataArray) – Haplotypes, not used in the computation.

  • phenotypes (xr.DataArray) – Phenotypes to be modified.

class xftsim.arch.FounderInitialization(component_index=None, initialize_component=None)

Bases: object

Base class for founder initialization.

initialize_component(phenotypes)

Initialize founder haplotypes for a single phenotype component.

Parameters:

phenotypes (xr.DataArray) – Phenotypes for a single phenotype component.

Raises:

Warning – If no initialization method is defined.

class xftsim.arch.GCTA_Architecture(h2, Rg=None, phenotype_name=None, variant_indexer=None, haplotypes=None)

Bases: Architecture

Additive genetic architecture object under GCTA infinitessimal model <CITE>

Under this genetic architecture, all variants are causal and standardized genetic variants / sqrt(m) have the user specified (possibly diagonal) genetic correlation matrix and variance equal to h2.

Parameters:
  • h2 (Iterable) – Vector of genetic variances or genetic variance/covariance matrix

  • Rg (numpy.ndarray) – Optional genetic correlation matrix

  • phenotype_name (Iterable) – Optional names of phenotypes

  • variant_indexer (xft.index.HaploidVariantIndex | xft.index.DiploidVariantIndex) – Variant indexer, will determine ploidy automatically Phenotype component indexer, defaults to xft.index.ComponentIndex.RangeIndex if not provided

  • haplotypes (xr.DataArray) – Alternatively, one can simply provide haplotypes instead of the variant indexer. Ignored if variant_indexer is supplied.

class xftsim.arch.GaussianFounderInitialization(component_index=None, variances=None, sds=None, means=None)

Bases: FounderInitialization

A class for initializing founder haplotypes by drawing iid samples from normal distributions with the specified means and standard deviations.

Parameters:
  • component_index (xft.index.ComponentIndex, optional) – A ComponentIndex object containing the indexing information of the components. If not provided, then the initialization will be null.

  • variances (Iterable, optional) – An iterable object of length k_total specifying the variances of the Gaussian distribution. Either variances or sds must be provided.

  • sds (Iterable, optional) – An iterable object of length k_total specifying the standard deviations of the Gaussian distribution. Either variances or sds must be provided.

  • means (Iterable, optional) – An iterable object of length k_total specifying the means of the Gaussian distribution. If not provided, then the means will be set to 0.

Raises:

AssertionError – If neither variances nor sds is provided or if the length of component_index does not match the length of sds.

sds

An array of standard deviations.

Type:

numpy.ndarray

means

An array of means.

Type:

numpy.ndarray

component_index

An object containing the indexing information of the components.

Type:

xft.index.ComponentIndex

class xftsim.arch.HorizontalComponent(input_cindex, output_cindex, coefficient_matrix=None, normalize=True, component_name='linHoriz')

Bases: LinearTransformationComponent

class xftsim.arch.InfinitessimalArchitecture

Bases: object

class xftsim.arch.LinearTransformationComponent(input_cindex=None, output_cindex=None, coefficient_matrix=None, normalize=True, founder_initialization=None, component_name='linear')

Bases: ArchitectureComponent

A linear transformation component. Maps input phenotypes to output phenotypes using linear map represented by coefficient_matrix.

Parameters:
  • input_cindex (xft.index.ComponentIndex, optional) – Input component index, by default None.

  • output_cindex (xft.index.ComponentIndex, optional) – Output component index, by default None.

  • coefficient_matrix (ndarray, optional) – Coefficient matrix, by default None.

  • normalize (bool, optional) – If True, normalize the input by subtracting the mean and dividing by the standard deviation, by default True.

  • founder_initialization (FounderInitialization, optional) – Founder initialization, by default None.

v_input_dimension

Input dimension.

Type:

int

v_output_dimension

Output dimension.

Type:

int

normalize

If True, normalize the input by subtracting the mean and dividing by the standard deviation.

Type:

bool

coefficient_matrix

Coefficient matrix.

Type:

ndarray

compute_component(haplotypes, phenotypes)

Compute the linear transformation component of the phenotype.

Parameters:
  • haplotypes (xr.DataArray) – Haplotypes, not used in the computation.

  • phenotypes (xr.DataArray) – Phenotypes to be modified.

property linear_transformation

Get the linear transformation matrix.

Returns:

pd.DataFrame – Linear transformation matrix.

class xftsim.arch.LinearVerticalComponent(input_cindex=None, output_cindex=None, coefficient_matrix=None, normalize=True, founder_variances=None, founder_initialization=None, component_name='linVert')

Bases: LinearTransformationComponent

A vertical transmission component. Requires a way to generate “transmitted” components in the founder generation.

Parameters:
  • input_cindex (xft.index.ComponentIndex, optional) – Input component index, by default None.

  • output_cindex (xft.index.ComponentIndex, optional) – Output component index, by default None.

  • coefficient_matrix (ndarray, optional) – Coefficient matrix, by default None.

  • normalize (bool, optional) – If True, normalize the input by subtracting the mean and dividing by the standard deviation, by default True.

  • founder_variances (Iterable, optional) – Variances of the founders, by default None.

  • founder_initialization (FounderInitialization, optional) – Founder initialization, by default None.

v_input_dimension

Input dimension.

Type:

int

v_output_dimension

Output dimension.

Type:

int

normalize

If True, normalize the input by subtracting the mean and dividing by the standard deviation.

Type:

bool

coefficient_matrix

Coefficient matrix.

Type:

ndarray

class xftsim.arch.ProductComponent(input_cindex, output_cindex, output_coef=1.0, coefficient_vector=None, mean_deviate=True, normalize=False)

Bases: ArchitectureComponent

Multiplies existing components

Parameters:
  • input_cindex (xft.index.ComponentIndex) – Index of components to multiply

  • output_cindex (xft.index.ComponentIndex) – Output component index

  • output_coef (float, options) – Coefficent to multiply output by, by default 1.0

  • coefficient_vector (ndarray, optional) – Coefficients to premultiply inputs by, by default all ones.

  • mean_deviate (bool, optional) – If True, mean deviate the inputs by subtracting the mean. Defaults to True.

  • normalize (bool, optional) – If True, normalize the inputs by subtracting the mean and dividing by the standard deviation prior to multiply. Defaults to False.

compute_component(haplotypes, phenotypes)

Compute the noise component of the phenotype.

Parameters:
  • haplotypes (xr.DataArray) – Haplotypes, not used in the computation.

  • phenotypes (xr.DataArray) – Phenotypes to be modified.

class xftsim.arch.SpikeSlabArchitecture

Bases: object

class xftsim.arch.SumAllTransformation(input_cindex, output_component_name='phenotype', output_comp_type='outcome', component_name='sumAll')

Bases: ArchitectureComponent

Sum all intermediate phenotype components to generate outcome phenotype components.

Parameters:

input_cindex (xft.index.ComponentIndex) – Input component index.

input_haplotypes

If True, haplotypes are input.

Type:

bool

input_cindex

Input component index.

Type:

xft.index.ComponentIndex

output_cindex

Output component index.

Type:

xft.index.ComponentIndex

founder_initialization

Founder initialization.

Type:

None

compute_component(haplotypes, phenotypes)

Compute the sum of the input components and assign them to the output component.

Parameters:

haplotypesxr.DataArray

Haplotypes.

phenotypesxr.DataArray

Phenotypes.

Returns:

None

static construct_input_cindex(phenotype_name, sum_components=['additiveGenetic', 'additiveNoise'], vorigin_relative=[-1])

Construct input component index.

Parameters:
  • phenotype_name (Iterable) – Phenotype name.

  • sum_components (Iterable, optional) – Components to sum, by default [‘additiveGenetic’, ‘additiveNoise’].

  • vorigin_relative (Iterable, optional) – Relative vorigins, by default [-1].

Returns:

xft.index.ComponentIndex – Component index.

class xftsim.arch.SumTransformation(input_cindex, output_cindex, component_name='sumTrans')

Bases: ArchitectureComponent

Sum components to generate phenotypes.

Parameters:
  • input_cindex (xft.index.ComponentIndex) – Input component index.

  • output_cindex (xft.index.ComponentIndex) – Output component index.

input_haplotypes

If True, haplotypes are input.

Type:

bool

input_cindex

Input component index.

Type:

xft.index.ComponentIndex

output_cindex

Output component index.

Type:

xft.index.ComponentIndex

founder_initialization

Founder initialization.

Type:

None

compute_component(haplotypes, phenotypes)

Compute the sum of the input components and assign them to the output component.

Parameters:

haplotypesxr.DataArray

Haplotypes.

phenotypesxr.DataArray

Phenotypes.

Returns:

None

static construct_cindexes(phenotype_name, sum_components=['additiveGenetic', 'additiveNoise'], vorigin_relative=[-1], output_component='phenotype', comp_type='outcome')

Construct input and output ComponentIndex objects for SumTransformation.

Parameters:

phenotype_nameIterable

Names of the phenotypes.

sum_componentsIterable, optional (default=[“additiveGenetic”, “additiveNoise”])

Names of the components to be summed.

vorigin_relativeIterable, optional (default=[-1])

Relative origin of the component with respect to the phenotype.

output_componentstr, optional (default=”phenotype”)

Name of the output component.

Returns:

Tuple[xft.index.ComponentIndex, xft.index.ComponentIndex]:

A tuple containing input and output ComponentIndex objects.

static construct_input_cindex(phenotype_name, sum_components=['additiveGenetic', 'additiveNoise'], vorigin_relative=[-1])

Construct input component index.

Parameters:
  • phenotype_name (Iterable) – Phenotype name.

  • sum_components (Iterable, optional) – Components to sum, by default [‘additiveGenetic’, ‘additiveNoise’].

  • vorigin_relative (Iterable, optional) – Relative vorigins, by default [-1].

Returns:

xft.index.ComponentIndex – Component index.

static construct_output_cindex(phenotype_name, sum_components=['additiveGenetic', 'additiveNoise'], vorigin_relative=[-1], comp_type='outcome', output_name='phenotype')

Construct output component index.

Parameters:
  • phenotype_name (Iterable) – Phenotype name.

  • sum_components (Iterable, optional) – Components to sum, by default [‘additiveGenetic’, ‘additiveNoise’].

  • vorigin_relative (Iterable, optional) – Relative vorigins, by default [-1].

  • output_name (str, optional) – Output name, by default ‘phenotype’.

Returns:

xft.index.ComponentIndex – Component index.

xftsim.arch.VerticalComponent

alias of LinearVerticalComponent

class xftsim.arch.ZeroFounderInitialization(component_index=None)

Bases: ConstantFounderInitialization

Founder initialization that sets all haplotypes to zero.

xftsim.data module

xftsim.data.get_ceu_map()

Load the CEU haplotype map.

Returns:

pandas.DataFrame

A DataFrame with the CEU haplotype map.

xftsim.effect module

Summary

class xftsim.effect.AdditiveEffects(beta, variant_indexer=None, component_indexer=None, standardized=True, scaled=True)

Bases: object

Additive genetic effects object. Given matrix / vector of effects will provide various scalings / offsets for computation

Parameters:
  • beta (NDArray[Any, Any]) – Vector of diploid effects

  • variant_indexer (xft.index.HaploidVariantIndex | xft.index.DiploidVariantIndex) – Variant indexer, will determine ploidy automatically

  • component_indexer (xft.index.ComponentIndex, optional) – Phenotype component indexer, defaults to xft.index.ComponentIndex.RangeIndex if not provided

  • standardized (bool, optional) – True implies these are effects of standardized variants, by default True

  • scaled (bool, optional) – True implies these are effects of variants * sqrt(m_causal), by default True

AF

diploid allele frequencies

Type:

NDArray

beta_scaled_standardized_diploid

Diploid effects scaled of standardized variants multiplied by number of causal variants per phenotype

Type:

NDArray

beta_scaled_standardized_haploid

haploid variant of above

Type:

NDArray

beta_scaled_unstandardized_diploid

Diploid effects scaled of unstandardized variants multiplied by number of causal variants per phenotype

Type:

NDArray

beta_scaled_unstandardized_haploid

haploid variant of above

Type:

NDArray

beta_unscaled_standardized_diploid

Diploid effects scaled of standardized variants unscaled by number of causal variants per phenotype

Type:

NDArray

beta_unscaled_standardized_haploid

haploid variant of above

Type:

NDArray

beta_unscaled_unstandardized_diploid

Diploid effects scaled of unstandardized variants unscaled by number of causal variants Multiply these against (0,1,2) raw genotypes and subtract offset to obtain phenotypes

Type:

NDArray

beta_unscaled_unstandardized_haploid

Haploid variant of above

Type:

NDArray

beta_raw_diploid

Alias for beta_unscaled_unstandardized_diploid

Type:

NDArray

beta_raw_haploid

Alias for beta_unscaled_unstandardized_haploid

Type:

NDArray

component_indexer
Type:

xft.index.ComponentIndex

k

Number of phenotypes (columns of effect matrix)

Type:

int

m

Number of diploid variants

Type:

int

offset

To compute phenotypes, add offset after multiplying by beta_raw_* to mean deviate under HWE

Type:

NDArray

variant_indexer
Type:

xft.index.HaploidVariantIndex

property beta_raw_diploid
property beta_raw_haploid
property beta_scaled_standardized_haploid
property beta_scaled_unstandardized_haploid
property beta_unscaled_standardized_haploid
property beta_unscaled_unstandardized_haploid
corr()
property m_causal
property offset
class xftsim.effect.GCTAEffects(vg, variant_indexer=None, component_indexer=None)

Bases: AdditiveEffects

Additive genetic effects object under GCTA infinitessimal model <CITE>

Under this genetic architecture, all variants are causal and standardized genetic variants / sqrt(m) have the user specified (possibly diagonal) variance covariance matrix

Parameters:
  • vg (Iterable | NDArray) – Vector of genetic variances or genetic variance/covariance matrix

  • variant_indexer (xft.index.HaploidVariantIndex | xft.index.DiploidVariantIndex) – Variant indexer, will determine ploidy automatically

  • component_indexer (xft.index.ComponentIndex, optional) – Phenotype component indexer, defaults to xft.index.ComponentIndex.RangeIndex if not provided

class xftsim.effect.NonOverlappingEffects(vg, proportions=None, variant_indexer=None, component_indexer=None, permute=True)

Bases: AdditiveEffects

Additive genetic effects object under non-infinitessimal model with no pleoitropy

Under this genetic architecture, the genome is partitioned into k+1 components corresponding to k sets of variants corresponding to those causal for each trait together with a final set of variants not causal for any traits. Within each kth set of causal variants, standardized variants are Gaussian with variance vg[k] / sqrt(proportions[k])

Parameters:
  • vg (Iterable) – Vector of genetic variances or genetic variance/covariance matrix

  • proportions (Iterable) – Proportion of variants causal for each trait. If an extra value is provided, this will be the number of variants that are noncausal for all traits. Defaults to an equal number of variants per trait

  • permute (bool) – Permute variants? If False, causal variants for each phenotype will fall into contiguous blocks, defaults to True

  • variant_indexer (xft.index.HaploidVariantIndex | xft.index.DiploidVariantIndex) – Variant indexer, will determine ploidy automatically

  • component_indexer (xft.index.ComponentIndex, optional) – Phenotype component indexer, defaults to xft.index.ComponentIndex.RangeIndex if not provided

xftsim.founders module

xftsim.founders.founder_haplotypes_from_AFs(n, afs, diploid=True)

Generate founder haplotypes from specified allele frequencies.

Parameters:
  • n (int) – Number of haplotypes to simulate.

  • afs (Iterable) – Allele frequencies as an iterable of floats.

  • diploid (bool, optional) – Flag indicating if the generated haplotypes should be diploid or haploid.

Returns:

xft.struct.HaplotypeArray – An object representing a set of haplotypes generated from the given allele frequencies.

Reads in PLINK 1 binary genotype data and returns a HaplotypeArray object containing pseudo-haplotypes by randomly assigning haplotypes at heterozygous sites.

Parameters:

path (str) – The file path to the PLINK 1 binary genotype data.

Returns:

xr.DataArray – Founder Pseudo-haplotype array with samples indexed by an xftsim.index.SampleIndex object and variants indexed by an xftsim.index.HaploidVariantIndex object. The “pseudo-” prefix refers to the fact that the plink bfile format doesn’t track phase.

xftsim.founders.founder_haplotypes_from_sgkit_dataset(gdat)

Construct founder haplotypes array from sgkit DataArray. Useful in conjuction with sgkit.io.vcf.vcf_to_zarr() and sgkit.load_dataset()

Parameters:
  • gdat (xr.Dataset) – Dataset generated by sgkit.load_dataset()

  • generation (int) – Used to populate the generation attribute of xftsim.index.SampleIndex

Returns:

xr.DataArray – Array of founder haplotypes with samples indexed by an xftsim.index.SampleIndex object and variants indexed by an xftsim.index.HaploidVariantIndex object.

xftsim.founders.founder_haplotypes_uniform_AFs(n, m, minMAF=0.1)

Generate founder haplotypes from uniform-distributed allele frequencies.

Parameters:
  • n (int) – Number of haplotypes to simulate.

  • m (int) – Number of variants.

  • minMAF (float, optional) – Minimum minor allele frequency for generated haplotypes.

Returns:

xft.struct.HaplotypeArray – An object representing a set of haplotypes generated with uniform allele frequencies.

xftsim.index module

class xftsim.index.ComponentIndex(phenotype_name=None, component_name=None, vorigin_relative=None, comp_type=None, comp_type_map={'phenotype': 'outcome'}, frame=None, k_total=None)

Bases: XftIndex

Index object for phenotype components, including origin relative to proband.

Parameters:
  • phenotype_name (iterable, optional) – Names of phenotypes. Either phenotype_name, frame, or k_total must be provided.

  • component_name (iterable, optional) – Names of phenotype components.

  • vorigin_relative (iterable, optional) – Relative origin of phenotype component. -1 for proband, 0 for mother, 1 for father.

  • comp_type (iterable, optional) – Elements are either ‘intermediate’ or ‘outcome’ to distinguish between phenotype components versus phenotypes themselves

  • frame (pandas.DataFrame, optional) – Pre-existing frame to initialize index.

  • k_total (int, optional) – Total number of phenotypes to generate generic names.

phenotype_name

Names of phenotypes.

Type:

numpy.ndarray

component_name

Names of phenotype components.

Type:

numpy.ndarray

vorigin_relative

Relative origin of phenotype component. -1 for proband, 0 for mother, 1 for father.

Type:

numpy.ndarray

k_total

Total number of phenotypes.

Type:

int

k_phenotypes

Number of unique phenotypes.

Type:

int

k_components

Number of unique phenotype components.

Type:

int

k_relative

Number of unique relative origins.

Type:

int

depth

Generational depth from binary relative encoding.

Type:

float

unique_identifier

Unique identifier for the index.

Type:

numpy.ndarray

to_vorigin(origin)

Returns a new ComponentIndex with all vorigin_relative set to origin.

to_proband()

Returns a new ComponentIndex with all vorigin_relative set to -1 (proband).

from_frame(df)

Returns a new ComponentIndex initialized from a Pandas DataFrame.

from_arrays(phenotype_name, component_name, vorigin_relative=None)

Returns a new ComponentIndex initialized from numpy arrays.

from_product(phenotype_name, component_name, vorigin_relative=None)

Returns a new ComponentIndex initialized from a Cartesian product of phenotype_name, component_name, and vorigin_relative.

range_index(c, component_name=['generic'], vorigin_relative=[-1], prefix='phenotype')

Returns a new ComponentIndex with generic phenotype names.

property comp_type
property component_name
property depth
static from_arrays(phenotype_name, component_name, vorigin_relative=None, comp_type=None)
static from_frame(df)
static from_product(phenotype_name, component_name, vorigin_relative=None, comp_type_map={'phenotype': 'outcome'})
property k_components
property k_phenotypes
property k_relative
property k_total
property phenotype_name
static range_index(c, component_name=['generic'], vorigin_relative=[-1], prefix='phenotype')
to_proband()
to_vorigin(origin)
property unique_identifier
property vorigin_relative
class xftsim.index.DiploidVariantIndex(vid=None, chrom=None, zero_allele=None, one_allele=None, af=None, annotation_array=None, annotation_names=None, frame=None, m=None, n_chrom=1, h_copy=None, pos_bp=None, pos_cM=None)

Bases: XftIndex

This class is used to index diploid genetic variants. Variants are defined by a set of unique IDs and may have additional annotations. Each variant is associated with two alleles, represented as strings.

Parameters:
  • vid (NDArray[Shape[``”*”``], Object], optional) – Variant IDs, by default None.

  • chrom (NDArray[Shape[``”*”``], Int], optional) – Chromosome of variant, by default None.

  • zero_allele (NDArray[Shape[``”*”``], Object], optional) – First allele of variant, by default None.

  • one_allele (NDArray[Shape[``”*”``], Object], optional) – Second allele of variant, by default None.

  • af (Iterable, optional) – Allele frequency of variant, by default None.

  • annotation_array (Union[NDArray, pd.DataFrame], optional) – Additional variant annotations, by default None.

  • annotation_names (Iterable, optional) – Names of the additional variant annotations, by default None.

  • frame (pd.DataFrame, optional) – A pandas DataFrame containing variant information, by default None.

  • m (int, optional) – The number of variants, by default None.

  • n_chrom (int, optional) – The number of chromosomes, by default 1.

  • h_copy (NDArray[Shape[``”*”``], Object], optional) – A string indicating the haplotype of each variant, by default None.

  • pos_bp (Iterable, optional) – Base-pair positions of the variant, by default None.

  • pos_cM (Iterable, optional) – Centimorgan positions of the variant, by default None.

vid

Variant IDs.

Type:

ndarray

chrom

Chromosome of variant.

Type:

ndarray

zero_allele

First allele of variant.

Type:

ndarray

one_allele

Second allele of variant.

Type:

ndarray

hcopy

A string indicating the copy of each variant.

Type:

ndarray

af

Allele frequency of variant.

Type:

ndarray

pos_bp

Base-pair positions of the variant.

Type:

ndarray

pos_cM

Centimorgan positions of the variant.

Type:

ndarray

ploidy

A string indicating the ploidy of the variant (always “Diploid” for this class).

Type:

str

annotation

A pandas DataFrame containing additional variant annotations.

Type:

pd.DataFrame

annotation_array

A numpy array containing additional variant annotations.

Type:

Union[ndarray, None]

annotation_names

An array containing names of additional variant annotations.

Type:

ndarray

m

The number of variants.

Type:

int

n_chrom

The number of chromosomes.

Type:

int

n_annotations

The number of additional variant annotations.

Type:

int

maf

Minor allele frequency of variant.

Type:

ndarray

Raises:

AssertionError – If vid, m, or frame is not provided. If both zero_allele and one_allele are not provided.

property af
annotate()
property annotation
property annotation_array
property annotation_names
property chrom
property hcopy
property m
property maf
property n_annotations
property n_chrom
property one_allele
property ploidy
property pos_bp
property pos_cM
to_haploid()
property vid
property zero_allele
class xftsim.index.HaploidVariantIndex(vid=None, chrom=None, zero_allele=None, one_allele=None, af=None, annotation_array=None, annotation_names=None, frame=None, m=None, n_chrom=1, h_copy=None, pos_bp=None, pos_cM=None)

Bases: DiploidVariantIndex

A class representing a haploid variant index.

vid

Variant IDs.

Type:

numpy.ndarray

chrom

Chromosome numbers.

Type:

numpy.ndarray

zero_allele

Alleles with value zero.

Type:

numpy.ndarray

one_allele

Alleles with value one.

Type:

numpy.ndarray

af

Allele frequencies.

Type:

numpy.ndarray

pos_bp

Positions of variants in base pairs.

Type:

numpy.ndarray

pos_cM

Positions of variants in centiMorgans.

Type:

numpy.ndarray

m

Number of unique variant IDs.

Type:

int

n_chrom

Number of unique chromosome numbers.

Type:

int

n_annotations

Number of annotations.

Type:

int

maf

Minor allele frequencies.

Type:

numpy.ndarray

ploidy

The ploidy of the variant index. In this case, “Haploid”.

Type:

str

hcopy

A string indicating the copy of each variant.

Type:

ndarray

to_diploid()

Converts the haploid variant index to diploid.

property ploidy
to_diploid()
class xftsim.index.NullFilter

Bases: SampleFilter

class xftsim.index.RandomSiblingFilter

Bases: SampleFilter

Randomly select one sibling per family

class xftsim.index.RandomSiblingSubsampleFilter(k)

Bases: SampleFilter

Randomly subsample k families, choosing one offspring per family

class xftsim.index.RandomSubsampleFilter(k)

Bases: SampleFilter

Randomly subsample k individuals

class xftsim.index.SampleFilter(filter_function, filter_name=None, metadata={})

Bases: object

filter(sindex, **kwargs)
class xftsim.index.SampleIndex(iid=None, fid=None, sex=None, frame=None, n=None, generation=0)

Bases: XftIndex

Index for individual samples.

This class is used to keep track of information for individual samples.

Parameters:
  • iid (Iterable, optional) – Iterable of individual IDs.

  • fid (Iterable, optional) – Iterable of family IDs.

  • sex (Iterable, optional) – Iterable of biological sexes.

  • frame (pd.DataFrame, optional) – Dataframe containing information for each sample.

  • n (int, optional) – Number of samples to generate a random ID set for.

  • generation (int, optional) – Generation number for samples.

n

Number of individuals.

Type:

int

n_fam

Number of families.

Type:

int

n_female

Number of biological females.

Type:

int

n_male

Number of biological males.

Type:

int

iid

Array of individual IDs.

Type:

ndarray

fid

Array of family IDs.

Type:

ndarray

sex

Array of biological sexes.

Type:

ndarray

property fid
property iid
iloc(key)
property n
property n_fam
property n_female
property n_male
property sex
property unique_identifier
class xftsim.index.SiblingPairFilter(k=None)

Bases: SampleFilter

Subsample 2 siblings each from k families with at least two siblings

class xftsim.index.XftIndex

Bases: object

XftIndex is a class representing an index for the XftSim simulation model. Super class not for direct use by the user.

Attributes:

_coord_variables: List[str]

List of names of the coordinate variables.

_index_variables: List[str]

List of names of the index variables.

_dimension: str

Name of the dimension variable.

_frame: pandas.DataFrame

Dataframe representing the index.

Methods:

validate():

Validates the index by checking if the _coord_variables, _index_variables, and _dimension attributes are not None. Raises an AssertionError if any of these attributes is None.

frame:

Property representing the _frame attribute. Getter: Returns the _frame attribute. Setter: Sets the _frame attribute and generates a new index using the unique_identifier property.

frame_copy():

Returns a copy of the _frame attribute.

unique_identifier:

Property representing the unique identifier of the index. Returns a string representing the concatenation of all index variables, separated by a period.

coords:

Property representing the coordinates of the index. Returns a dictionary where the keys are the coordinate variables and the values are the corresponding values in the _frame attribute.

coord_dict:

Property representing the coordinate dictionary of the index. Returns a dictionary where the keys are the variables and the values are tuples representing the (dimension, value) of each coordinate.

coord_frame:

Property representing the coordinate frame of the index. Returns a dataframe where the columns are the coordinate variables and the rows correspond to each row in the _frame attribute.

coord_mindex:

Property representing the coordinate multi-index of the index. Returns a multi-index where the levels correspond to the coordinate variables and the values correspond to the corresponding values in the _frame attribute.

coord_index:

Property representing the coordinate index of the index. Returns an index representing the unique identifier of the index.

__getitem__(arg):

Returns a new instance of the XftIndex class, corresponding to a subset of the _frame attribute. If arg is a dictionary, returns the rows where the values of the keys in the dictionary match the corresponding values in the _frame attribute. If arg is an integer or slice, returns the row(s) at the corresponding index in the _frame attribute.

iloc(key):

Returns a new instance of the XftIndex class, corresponding to a subset of the _frame attribute. Returns the row(s) at the corresponding index in the _frame attribute.

merge(other):

Merges the _frame attribute of the current instance with another instance of the XftIndex class. If the two instances have a different _dimension attribute or a different class type, raises a TypeError. Returns a new instance of the XftIndex class representing the merged index.

reduce_merge(args):

Static method that reduces the list of args by calling the merge method on each pair of consecutive elements. Returns the final merged index.

stack(other):

Stacks the _frame attribute of the current instance with another instance of the XftIndex class. If the two instances have a different _dimension attribute or a different class type, raises a TypeError. Returns a new instance of the XftIndex class representing the stacked index.

at_most(n_new):

Downsamples the _frame attribute at random to contain at most n_new rows. If the number of rows in the _frame attribute is already less than or equal to n_new, returns a copy of the current instance. Returns a new instance of the XftIndex class representing the downsampled index.

at_most(n_new)
property coord_dict
property coord_frame
property coord_index
property coord_mindex
property coords
property frame
frame_copy()
iloc(key)
merge(other, deduplicate=True)
static reduce_merge(args, deduplicate=True)
stack(other)
property unique_identifier
validate()
xftsim.index.sampleIndex_from_VCF()
xftsim.index.variantIndex_from_VCF()

xftsim.io module

xftsim.io.genotypes_to_pseudo_haplotypes(x)

Converts genotype data in an xarray DataArray to pseudo-haplotype data.

Parameters:

x (xr.DataArray) – An xarray DataArray containing genotype data.

Returns:

xr.DataArray – An xarray DataArray containing pseudo-haplotype data.

xftsim.io.haplotypes_from_sgkit_dataset(gdat, generation=0)

Construct haplotype array from sgkit DataArray. Useful in conjuction with sgkit.io.vcf.vcf_to_zarr() and sgkit.load_dataset()

Parameters:
  • gdat (xr.Dataset) – Dataset generated by sgkit.load_dataset()

  • generation (int) – Used to populate the generation attribute of xftsim.index.SampleIndex

Returns:

xr.DataArray – Haplotype array with samples indexed by an xftsim.index.SampleIndex object and variants indexed by an xftsim.index.HaploidVariantIndex object.

xftsim.io.load_haplotype_zarr(path, compute=True, slice_x=slice(None, None, None), slice_y=slice(None, None, None), **kwargs)

Load haplotype data from a Zarr store.

Parameters:
  • path (str) – The path to the Zarr store.

  • compute (bool, optional) – Whether to compute the data immediately, by default True.

  • **kwargs (dict) – Additional keyword arguments to pass to xr.open_dataset().

Returns:

xr.DataArray – The loaded haplotype data as a DataArray.

xftsim.io.plink1_sample_index(ppxr, generation=0)

Create a SampleIndex object from a plink file DataArray generated by pandas_plink.

Parameters:
  • ppxr (xr.DataArray) – An xarray DataArray representing a plink file.

  • generation (int, optional) – The generation of the individuals, by default 0.

Returns:

xft.index.SampleIndex – A SampleIndex object.

xftsim.io.plink1_variant_index(ppxr)

Create a DiploidVariantIndex object from a plink file DataArray generated by pandas_plink.

Parameters:

ppxr (xr.DataArray) – An xarray DataArray representing a plink file.

Returns:

xft.index.DiploidVariantIndex – A DiploidVariantIndex object.

xftsim.io.read_plink1_as_pseudohaplotypes(path, generation=0)

Reads in PLINK 1 binary genotype data and returns a HaplotypeArray object containing pseudo-haplotypes by randomly assigning haplotypes at heterozygous sites.

Parameters:
  • path (str) – The file path to the PLINK 1 binary genotype data.

  • generation (int) – Used to populate the generation attribute of xftsim.index.SampleIndex

Returns:

xr.DataArray – Pseudo-haplotype array with samples indexed by an xftsim.index.SampleIndex object and variants indexed by an xftsim.index.HaploidVariantIndex object. The “pseudo-” prefix refers to the fact that the plink bfile format doesn’t track phase.

Raises:

ValueError – If the specified file path does not exist or is not in the expected format.

xftsim.io.save_haplotype_zarr(haplotypes, path, **kwargs)

Save haplotype data to a Zarr store.

Parameters:
  • haplotypes (xr.DataArray) – The haplotype data to save.

  • path (str) – The path to the Zarr store.

  • **kwargs (dict) – Additional keyword arguments to pass to xr.Dataset.to_zarr().

Returns:

None

xftsim.io.write_to_plink1(hh, path, verbose=True)

Writes a DataArray to a PLINK 1 binary file. Breaks phasing.

Parameters:
  • hh (xr.DataArray) – A DataArray containing the genotype data to write.

  • path (str) – The path to the output PLINK file. The ‘.bed’ extension will be added automatically.

  • verbose (bool, optional) – Whether to print verbose output during writing, by default True.

Returns:

None

xftsim.lsmate module

xftsim.mate module

This module contains functions and classes for implementing different mating regimes in the context of forward time genetics simulations.

Functions:

_solve_qap_ls: Private function that solves the Quadratic Assignment Problem using LocalSolver.

Classes:

MatingRegime: Base class for defining mating regimes. RandomMatingRegime: A class for implementing random mating. LinearAssortativeMatingRegime: A class for implementing linear assortative mating. KAssortativeMatingRegime: A class for implementing k-assortative mating. BatchedMatingRegime: A class for batching individuals to improve mating regime performance.

class xftsim.mate.BatchedMatingRegime(regime, max_batch_size)

Bases: MatingRegime

BatchedMatingRegime class that batches mating assignments, either for the sake of efficiency or to simulate stratification.

Parameters:
  • regime (MatingRegime) – The mating regime object.

  • max_batch_size (int) – Maximum size of each batch.

regime

The mating regime object.

Type:

MatingRegime

max_batch_size

Maximum size of each batch.

Type:

int

batch(haplotypes, phenotypes, control)

Split samples into batches.

mate(haplotypes, phenotypes, control)

Generate mating assignments in batches.

batch(haplotypes=None, phenotypes=None, control=None)

Split samples into batches.

Parameters:
  • haplotypes (xarray.DataArray, optional) – Haplotypes array.

  • phenotypes (xarray.DataArray, optional) – Phenotypes array.

  • control (dict, optional) – Control parameters.

Returns:

  • batches (list) – List of batches of samples.

  • num_batches (int) – Number of batches.

mate(haplotypes=None, phenotypes=None, control=None)

Generate mating assignments in batches and merge into single assignment object.

Parameters:
  • haplotypes (xarray.DataArray, optional) – Haplotypes array.

  • phenotypes (xarray.DataArray, optional) – Phenotypes array.

  • control (dict, optional) – Control parameters.

Returns:

mate_assignments (MateAssignment) – Mating assignments.

class xftsim.mate.GeneralAssortativeMatingRegime(component_index, cross_corr, offspring_per_pair=<xftsim.utils.ConstantCount object>, mates_per_female=<xftsim.utils.ConstantCount object>, female_offspring_per_pair='balanced', sex_aware=False, exhaustive=True, control={})

Bases: MatingRegime

A class that implements the general assortative mating regimes. I.e., matches two sets of individuals with K phenotypes to achieve an arbitrary K x K cross-mate cross-correlation structure.

Parameters:
  • component_index (xft.index.ComponentIndex) – An object containing information about the components.

  • cross_corr (ndarray) – The cross-correlation matrix of size K x K.

  • offspring_per_pair (Union[int, xft.utils.VariableCount], optional) – The number of offspring per mating pair. Default is 1.

  • mates_per_female (Union[int, xft.utils.VariableCount], optional) – The number of mates for each female. Default is 2.

  • female_offspring_per_pair (Union[str, int, xft.utils.VariableCount], optional) – The number of offspring per mating pair for females. Default is ‘balanced’.

  • sex_aware (bool, optional) – Whether to consider sex in mating pairs. Default is False.

  • exhaustive (bool, optional) – Whether to enumerate all possible pairs. Default is True.

  • control (dict, optional) – A dictionary of control parameters passed to LocalSolver. Defaults are as follows: nb_threads=4, time_limit=120, tolerance=1e-5, verbosity=1, time_between_displays=15

cross_corr

The cross-correlation matrix of size K x K.

Type:

ndarray

component_index

An object containing information about the components.

Type:

xft.index.ComponentIndex

K

The total number of components.

Type:

int

mate(haplotypes: xr.DataArray = None, phenotypes: xr.DataArray = None, control: dict = None) xft.mate.MateAssignment:

Mate haplotypes and phenotypes based on the K-assortative mating regime.

mate(haplotypes=None, phenotypes=None, control={})

Mate haplotypes and phenotypes based on the K-assortative mating regime.

Parameters:
  • haplotypes (xr.DataArray, optional) – The haplotype data to be mated. Default is None.

  • phenotypes (xr.DataArray, optional) – The phenotype data to be mated. Default is None.

Returns:

assignment (xft.mate.MateAssignment) – The assignment of haplotypes to parents.

class xftsim.mate.LinearAssortativeMatingRegime(component_index, r=0, offspring_per_pair=<xftsim.utils.ConstantCount object>, mates_per_female=<xftsim.utils.ConstantCount object>, female_offspring_per_pair='balanced', sex_aware=False, exhaustive=True)

Bases: MatingRegime

A linear assortative mating regime that performs mate selection based on a specified component index. Speifically, individuals are mated such that the cross-mate correlations across all specified components are equal to r. This reflects mating on a linear combination of phenotypes and does not generalize to many cross-mate correlation stuctures observed in practice, but is more efficient.

Parameters:
  • component_index (xft.index.ComponentIndex) – The component index used to select mating pairs based on the correlation between the phenotype values.

  • r (float, optional) – The correlation coefficient, a value between -1 and 1. Defaults to 0.

  • offspring_per_pair (Union[int, xft.utils.VariableCount], optional) – The number of offspring per pair. If int, it will be converted to a ConstantCount object. Defaults to 1.

  • mates_per_female (Union[int, xft.utils.VariableCount], optional) – The number of mates per female. If int, it will be converted to a ConstantCount object. Defaults to 1.

  • female_offspring_per_pair (Union[str, int, xft.utils.VariableCount], optional) – The number of female offspring per mating pair. If ‘balanced’, the number of females is randomly selected for each pair to balance the sex ratio. If int, it will be converted to a ConstantCount object. Defaults to ‘balanced’.

  • sex_aware (bool, optional) – If True, only mating pairs with different sex are allowed. Defaults to False.

  • exhaustive (bool, optional) – If True, all possible mating pairs will be enumerated. If False, pairs will be randomly selected. Defaults to True.

Raises:

AssertionError – If r is not between -1 and 1. If the correlation r is not feasible for the number of phenotypes in the component index.

TODO: see also

mate(haplotypes=None, phenotypes=None, control=None)

Mate individuals.

Parameters:
  • haplotypes (xarray.DataArray, optional) – The haplotypes of the individuals, by default None.

  • phenotypes (xarray.DataArray, optional) – The phenotypes of the individuals, by default None.

  • control (dict, optional) – The mating control parameters, by default None.

Returns:

MateAssignment – The mate assignment result.

class xftsim.mate.MateAssignment(generation, maternal_sample_index, paternal_sample_index, previous_generation_sample_index, n_offspring_per_pair, n_females_per_pair, sex_aware=False)

Bases: object

Represents a mate assignment for a given generation of individuals.

Parameters:
  • generation (int) – The generation number.

  • maternal_sample_index (xft.index.SampleIndex) – The sample index for the maternal individuals.

  • paternal_sample_index (xft.index.SampleIndex) – The sample index for the paternal individuals.

  • previous_generation_sample_index (xft.index.SampleIndex) – The sample index for the previous generation.

  • n_offspring_per_pair (NDArray[Shape[``”*”``], Int64]) – An array containing the number of offspring per mating pair.

  • n_females_per_pair (NDArray[Shape[``”*”``], Int64]) – An array containing the number of female offspring per mating pair.

  • sex_aware (bool, optional (default=False)) – Whether the mate assignment is sex-aware.

get_mate_phenotypes(phenotypes, component_index=None, full=True)

Retrieves mate phenotypes based on the given phenotypes data.

Parameters:
  • phenotypes (xr.DataArray) – The phenotypes data array.

  • component_index (xft.index.ComponentIndex, optional) – The component index for the phenotypes data array.

  • full (bool) – Ignore component_index and get all components.

Returns:

pd.DataFrame – A DataFrame containing the mate phenotypes.

get_mating_frame()

Constructs a DataFrame containing mate phenotypes regardless of reproductive success.

Returns:

pd.DataFrame – A DataFrame containing mating information.

get_reproduction_frame()

Constructs a DataFrame containing information relating to mates and offspring.

Returns:

pd.DataFrame – A DataFrame containing reproduction information.

property is_constant_population

TODO property to determine if the population is constant or not.

Returns:

bool – True if the population is constant, False otherwise.

property maternal_integer_index

The integer index for the maternal individuals.

Returns:

np.ndarray – An array containing the integer index for the maternal individuals.

property n_females

The total number of female offspring.

Returns:

int – The total number of female offspring.

property n_males

The total number of male offspring.

Returns:

int – The total number of male offspring.

property n_reproducing_pairs

The total number of reproducing pairs.

Returns:

int – The total number of reproducing pairs.

property n_total_offspring

The total number of offspring.

Returns:

int – The total number of offspring.

property offspring_fids

The family identifiers for the offspring.

Returns:

np.ndarray – An array containing the family identifiers for the offspring.

property offspring_iids

The unique identifiers for the offspring.

Returns:

np.ndarray – An array containing the unique identifiers for the offspring.

property offspring_sample_index

The sample index for the offspring.

Returns:

xft.index.SampleIndex – The sample index for the offspring.

property offspring_sex

The sex of the offspring.

Returns:

np.ndarray – An array containing the sex of the offspring.

property paternal_integer_index

The integer index for the paternal individuals.

Returns:

np.ndarray – An array containing the integer index for the paternal individuals.

static reduce_merge(assignments)

Merges a list of MateAssignment objects into a single MateAssignment object.

Parameters:

assignments (Iterable) – An iterable of MateAssignment objects to be merged.

Returns:

MateAssignment – A new MateAssignment object resulting from the merge of the input assignments.

property reproducing_maternal_index

The maternal index for reproducing individuals.

Returns:

xft.index.SampleIndex – The maternal index for reproducing individuals.

property reproducing_paternal_index

The paternal index for reproducing individuals.

Returns:

xft.index.SampleIndex – The paternal index for reproducing individuals.

trio_view(pheno_parental, pheno_offspring)

Returns an array with the phenotypes of offspring, followed by the phenotypes of their parents in the same order as the order of offspring in this MateAssignment.

Parameters:
  • pheno_parental (xr.DataArray) – An xarray DataArray containing the phenotypes of the parents.

  • pheno_offspring (xr.DataArray) – An xarray DataArray containing the phenotypes of the offspring.

Returns:

np.ndarray – An array with the phenotypes of offspring, followed by the phenotypes of their parents.

update_pedigree(pedigree)
class xftsim.mate.MatingRegime(mateFunction=None, offspring_per_pair=<xftsim.utils.ConstantCount object>, mates_per_female=<xftsim.utils.ConstantCount object>, female_offspring_per_pair='balanced', sex_aware=False, exhaustive=True, component_index=None, haplotypes=False)

Bases: object

A class for defining a mating regime to simulate the reproductive behavior of a population.

Parameters:
  • mateFunction (Callable, optional) – A function that specifies how the mating process is carried out. Default is None.

  • offspring_per_pair (Union[Callable, int, xft.utils.VariableCount], optional) – The number of offspring per mating pair. This can be a callable function, an integer, or a VariableCount object. Default is xft.utils.ConstantCount(1).

  • mates_per_female (Union[Callable, int, xft.utils.VariableCount], optional) – The number of mating partners each female has. This can be a callable function, an integer, or a VariableCount object. Default is xft.utils.ConstantCount(1).

  • female_offspring_per_pair (Union[Callable, str, int, xft.utils.VariableCount], optional) – The number of female offspring per mating pair. This can be a callable function, a string, an integer, or a VariableCount object. If set to ‘balanced’, the number of female offspring will be randomly assigned from a balanced range (0, …, total_offspring). Default is ‘balanced’.

  • sex_aware (bool, optional) – Whether the mating process should take sex into account. If True, females and males will be paired up based on their sex. If False, the pairs will be randomly assigned. Default is False.

  • exhaustive (bool, optional) – Whether the mating pairs should be enumerated exhaustively or randomly. If True, all possible pairings will be enumerated before repeating. If False, the pairings will be randomly assigned with replacement. Default is True.

  • component_index (xft.index.ComponentIndex, optional) – Which phenotype components (if any) are used in assigning mates

  • haplotypes (bool, optional) – Flag indeicating if haplotype data is used to assign mates (defaults to False)

sex_aware

Whether the mating process should take sex into account.

Type:

bool

offspring_per_pair

The number of offspring per mating pair.

Type:

Union[Callable, int, xft.utils.VariableCount]

mates_per_female

The number of mating partners each female has.

Type:

Union[Callable, int, xft.utils.VariableCount]

female_offspring_per_pair

The number of female offspring per mating pair.

Type:

Union[Callable, str, int, xft.utils.VariableCount]

exhaustive

Whether the mating pairs should be enumerated exhaustively or randomly.

Type:

bool

mateFunction

A function that specifies how the mating process is carried out.

Type:

Callable

expected_offspring_per_pair

The expected number of offspring per mating pair.

Type:

float

expected_mates_per_female

The expected number of mating partners each female has.

Type:

float

expected_female_offspring_per_pair

The expected number of female offspring per mating pair.

Type:

float

population_growth_factor

The population growth factor.

Type:

float

get_potential_mates(haplotypes: xr.DataArray = None, phenotypes: xr.DataArray = None)

Returns the potential female and male mating partners based on the sex awareness parameter.

enumerate_assignment(female_indices: NDArray, male_indices: NDArray, haplotypes: xr.DataArray = None, phenotypes: xr.DataArray = None) MateAssignment

Enumerates the mating assignments.

mate(haplotypes: xr.DataArray = None, phenotypes: xr.DataArray = None, control: dict = None) MateAssignment

Calls the mateFunction to perform the mating process.

property dependency_graph
property dependency_graph_edges
draw_dependency_graph(node_color='none', node_size=1500, arrowsize=7, font_size=6, margins=0.1, **kwargs)
enumerate_assignment(female_indices, male_indices, haplotypes=None, phenotypes=None)

Enumerate the mate assignments.

Parameters:
  • female_indices (NDArray) – The indices of the females to mate.

  • male_indices (NDArray) – The indices of the males to mate.

  • haplotypes (xr.DataArray) – The haplotypes to use for mating.

  • phenotypes (xr.DataArray) – The phenotypes to use for mating.

Returns:

MateAssignment – The mate assignments.

property expected_female_offspring_per_pair

Get the expected female offspring per pair.

Returns:

float – The expected female offspring per pair.

Raises:

NotImplementedError – If the female offspring count is not an integer or a VariableCount.

property expected_mates_per_female

Get the expected mates per female.

Returns:

float – The expected mates per female.

Raises:

NotImplementedError – If the mates count is not an integer or a VariableCount.

property expected_offspring_per_pair

Get the expected offspring per pair.

Returns:

float – The expected offspring per pair.

Raises:

NotImplementedError – If the offspring count is not an integer or a VariableCount.

get_potential_mates(haplotypes=None, phenotypes=None)

Return potential mating pairs.

Parameters:
  • haplotypes (xr.DataArray) – The haplotypes to use for mating.

  • phenotypes (xr.DataArray) – The phenotypes to use for mating.

Returns:

(NDArray, NDArray) – The potential female and male mating indices.

mate(haplotypes=None, phenotypes=None, control=None)

Mate individuals.

Parameters:
  • haplotypes (xarray.DataArray, optional) – The haplotypes of the individuals, by default None.

  • phenotypes (xarray.DataArray, optional) – The phenotypes of the individuals, by default None.

  • control (dict, optional) – The mating control parameters, by default None.

Returns:

MateAssignment – The mate assignment result.

property mateFunction
property population_growth_factor

Get the population growth factor.

Returns:

float – The population growth factor.

class xftsim.mate.RandomMatingRegime(offspring_per_pair=<xftsim.utils.ConstantCount object>, mates_per_female=<xftsim.utils.ConstantCount object>, female_offspring_per_pair='balanced', sex_aware=False, exhaustive=True)

Bases: MatingRegime

A mating regime that randomly pairs individuals and produces offspring with balanced numbers of males and females.

Parameters:
  • offspring_per_pair (xft.utils.VariableCount, optional) – Number of offspring produced per mating pair, by default xft.utils.ConstantCount(1)

  • mates_per_female (xft.utils.VariableCount, optional) – Number of males that mate with each female, by default xft.utils.ConstantCount(1)

  • female_offspring_per_pair (Union[str, xft.utils.VariableCount], optional) – The number of female offspring per mating pair. If “balanced”, the number is balanced with the number of male offspring. By default, “balanced”.

  • sex_aware (bool, optional) – If True, randomly paired individuals are selected so that there is an equal number of males and females. Otherwise, random pairing is performed. By default, False.

  • exhaustive (bool, optional) – If True, perform exhaustive enumeration of potential mates. If False, perform random sampling. By default, True.

mate(haplotypes=None, phenotypes=None, control=None)

Mate individuals randomly with balanced numbers of males and females.

Parameters:
  • haplotypes (xr.DataArray, optional) – Array containing haplotypes, by default None

  • phenotypes (xr.DataArray, optional) – Array containing phenotypes, by default None

  • control (dict, optional) – Control dictionary, by default None

Returns:

MateAssignment – An object containing the maternal and paternal sample indices, the number of offspring per pair, and the number of female offspring per pair.

xftsim.ped module

class xftsim.ped.Pedigree(founder_sample_index)

Bases: object

A class representing a pedigree as a graph.

G

The directed graph representing the pedigree.

Type:

nx.DiGraph

_generation

A dictionary containing node generations.

Type:

dict

_fid

A dictionary containing node family IDs.

Type:

dict

_generational_depth

The generational depth of the tree.

Type:

int

generation(K: int):

Returns the subgraph of nodes with generation K.

generations(gens):

Returns the subgraph of nodes with generations in the given iterable.

current_generation(K):

Returns the subgraph of nodes in the current generation.

most_recent_K_generations():

Returns the subgraph of nodes in the most recent K generations.

_add_edges_from_arrays(x, y):

Adds edges from arrays x and y.

add_offspring(mating: xft.mate.MateAssignment):

Adds offspring nodes and edges to the pedigree based on a MateAssignment object.

_get_trios():

TODO

generation(K)

Returns the subgraph of nodes with generation K.

Parameters:

K (int) – The generation number.

Returns:

nx.subgraph_view – The subgraph of nodes with generation K.

property generational_depth
generations(gens)

Returns the subgraph of nodes with generations in the given iterable.

Parameters:

gens (iterable) – An iterable containing generations.

Returns:

nx.subgraph_view – The subgraph of nodes with generations in the given iterable.

get_current_generation()

Returns the subgraph of nodes in the current generation.

Parameters:

K (int) – The generation number.

Returns:

nx.subgraph_view – The subgraph of nodes in the current generation.

get_most_recent_K_generations(K)

Returns the subgraph of nodes in the most recent K generations.

Returns:

nx.subgraph_view – The subgraph of nodes in the most recent K generations.

xftsim.proc module

Module to define classes for post-processing xft simulation data. Classes:

PostProcessor: Base class for defining post-processing operations on xft simulation data.

LimitMemory(PostProcessor): Class to limit the amount of memory used by the simulation by deleting old haplotype and/or phenotype data.

WriteToDisk(PostProcessor): Class to write simulation data to disk.

class xftsim.proc.LimitMemory(n_haplotype_generations=-1, n_phenotype_generations=-1)

Bases: PostProcessor

Class to limit the amount of memory used by the simulation by deleting old haplotype and/or phenotype data. Parameters: ———– n_haplotype_generations: int, optional

The number of haplotype generations to keep. If -1, keep all generations. Default is -1.

n_phenotype_generations: int, optional

The number of phenotype generations to keep. If -1, keep all generations. Default is -1.

Methods:

processor(sim: xft.sim.Simulation) -> None:

Deletes old haplotype and/or phenotype data from the simulation.

processor(sim)

Deletes old haplotype and/or phenotype data from the simulation.

Parameters:

sim: xft.sim.Simulation

The simulation to delete old data from.

Returns:

None

class xftsim.proc.PostProcessor(processor, name)

Bases: object

Base class for defining post-processing operations on XFT simulation data. Parameters: ———– processor: Callable

A callable object that takes a single argument of type xft.sim.Simulation and performs some post-processing operation on it.

name: str

A name for the post-processing operation being defined.

Methods:

process(sim: xft.sim.Simulation) -> None:

Applies the post-processing operation to the given simulation.

process(sim)

Applies the post-processing operation to the given simulation.

Parameters:

sim: xft.sim.Simulation

The simulation to apply the post-processing operation to.

Returns:

None

class xftsim.proc.WriteToDisk(arg)

Bases: PostProcessor

docstring for PostProcess

xftsim.reproduce module

class xftsim.reproduce.Meiosis(rmap=None, p=None)

Bases: object

A class representing the process of meiosis.

recombinationMap

A pre-defined recombination map.

Type:

RecombinationMap, optional

p

A probability used when generating an exchangable recombination map on the fly.

Type:

float, optional

get_recombination_map(haplotypes):

Returns the recombination map, either pre-defined or generated on the fly.

reproduce(parental_haplotypes=None, mating=None, control=None):

Returns a HaplotypeArray representing the offspring after meiosis.

get_recombination_map(haplotypes)

Get the recombination map, either pre-defined or generated on the fly.

Parameters:

haplotypes (xr.DataArray) – The haplotype data.

Returns:

RecombinationMap – The recombination map.

reproduce(parental_haplotypes=None, mating=None, control=None)

Return a HaplotypeArray representing the offspring after meiosis.

Parameters:
  • parental_haplotypes (xr.DataArray, optional) – The parental haplotype data.

  • mating (MateAssignment, optional) – The mate assignment object.

  • control (dict, optional) – A dictionary containing control parameters.

Returns:

HaplotypeArray – The HaplotypeArray representing the offspring after meiosis.

class xftsim.reproduce.RecombinationMap(p=None, vindex=None, vid=None, chrom=None)

Bases: object

A class to represent a diploid recombination map. In the future, will require XftIndex object instead of vid and chrom.

Parameters:
  • p (float or numpy.ndarray, optional) – Probabilities, either a float or a numpy.ndarray, default is None. A single value results in an exchangle map, an array corresponds to probabilities of recombination between specified loci

  • vindex (xft.index.HaploidVariantIndex | xft.index.DiploidVariantIndex) – Variant index. Only provide if not providing vid / chrom

  • vid (NDArray[Shape[``”*”``], Any], optional) – Variant IDs, default is None.

  • chrom (NDArray[Shape[``”*”``], Int64], optional) – Chromosomes, default is None.

static constant_map_from_haplotypes(haplotypes=<class 'xarray.core.dataarray.DataArray'>, p=0.5)

Create a constant recombination map from haplotypes.

Parameters:
  • haplotypes (xr.DataArray) – Haplotypes data array.

  • p (np.float64, optional) – Probability, default is 0.5.

Returns:

RecombinationMap – A constant recombination map.

static variable_map_from_haplotypes_with_cM(haplotypes=<class 'xarray.core.dataarray.DataArray'>)

Create a variable recombination map from haplotypes with centimorgan distances.

Parameters:

haplotypes (xr.DataArray) – Haplotypes data array.

Returns:

RecombinationMap – A variable recombination map.

Raises:

ValueError – If distance in centimorgans is required and not present in the input.

xftsim.reproduce.meiosis(parental_haplotypes, recombination_p, maternal_inds, paternal_inds)

Performs meiosis on parental haplotypes.

Parameters:
  • parental_haplotypes (numpy.ndarray[int8]) – An array of parental haplotypes.

  • recombination_p (numpy.ndarray[float64]) – An array of recombination probabilities.

  • maternal_inds (numpy.ndarray[int64]) – An array of maternal indices.

  • paternal_inds (numpy.ndarray[int64]) – An array of paternal indices.

Returns:

numpy.ndarray[int8] – An array of offspring haplotypes.

xftsim.reproduce.transmit_parental_phenotypes(mating, parental_phenotypes, offspring_phenotypes, control=None)

Transmits parental phenotypes to offspring.

Parameters:
  • mating (MateAssignment) – An object representing mating assignments.

  • parental_phenotypes (xr.DataArray) – A data array containing parental phenotypes.

  • offspring_phenotypes (xr.DataArray) – A data array containing offspring phenotypes.

  • control (dict, optional) – A dictionary containing additional control parameters, default is None.

Returns:

None

xftsim.sim module

class xftsim.sim.DemoSimulation(routine='BGRM', n=2000, m=400)

Bases: Simulation

demo_routines = {'BGRM': 'Bivariate GCTA with balanced random mating demo\n', 'UGRM': 'Univariate GCTA with balanced random mating demo\n'}
class xftsim.sim.Simulation(founder_haplotypes, mating_regime, recombination_map, architecture, statistics=[], post_processors=[], generation=-1, control={}, reproduction_method=<class 'xftsim.reproduce.Meiosis'>, metadata={}, filter_sample=False, sample_filter=None)

Bases: object

A class for running an xft simulation.

mating_regime

Mating regime.

Type:

xft.mate.MatingRegime

recombination_map

Recombination map.

Type:

xft.reproduce.RecombinationMap

architecture

Phenogenetic architecture.

Type:

xft.arch.Architecture

statistics

Iterable of statistics to compute each generation, by default empty list.

Type:

Iterable, optional

post_processors

Iterable of post processors to apply each generation, by default empty list.

Type:

Iterable, optional

generation

Initial generation, by default -1, corresponding to an uninitialized simulation

Type:

int, optional

control

Control parameters for the simulation, by default an empty dictionary.

Type:

Dict, optional

reproduction_method

Reproduction method for the simulation, by default xft.reproduce.Meiosis.

Type:

xft.reproduce.ReproductionMethod, optional

control

Control parameters for the simulation

Type:

dict

haplotypes

Haplotypes for the current generation.

Type:

xr.DataArray

phenotypes

Phenotypes for the current generation.

Type:

xr.DataArray

mating

Mating information for the current generation.

Type:

xr.DataArray

parent_mating

Mating information for the previous generation.

Type:

xr.DataArray

parent_haplotypes

Haplotypes for the previous generation.

Type:

xr.DataArray

parent_phenotypes

Phenotypes for the previous generation.

Type:

xr.DataArray

results

Results for the current generation.

Type:

xr.DataArray

current_afs_empirical

Current empirical allele frequencies.

Type:

xr.DataArray

current_std_genotypes

Current standardized genotypes.

Type:

xr.DataArray

current_std_phenotypes

Current standardized phenotypes.

Type:

xr.DataArray

phenotype_store

Dictionary storing phenotypes for each generation.

Type:

Dict[int, xr.DataArray]

haplotype_store

Dictionary storing haplotypes for each generation.

Type:

Dict[int, xr.DataArray]

mating_store

Dictionary storing mating information for each generation.

Type:

Dict[int, xr.DataArray]

results_store

Dictionary storing results for each generation.

Type:

Dict[int, xr.DataArray]

pedigree

Pedigree information for the simulation (currently not implemented).

Type:

Any

metadata

Dictionary containing user specified metadata

Type:

Dict

run(n_generations: int):

Run the simulation for a specified number of generations.

run_generation():

Run a single generation of the simulation.

compute_phenotypes():

Compute phenotypes for the current generation.

mate():

Perform mating for the current generation.

reproduce():

Perform reproduction for the current generation.

estimate_statistics():

Estimate statistics for the current generation.

process():

Process the current generation using post-processors.

update_pedigree():

Update pedigree information for the current generation.

increment_generation():

Increment the current generation.

move_forward(n_generations: int):

Move the simulation forward by a specified number of generations.

apply_filter()

Apply sample filters to the current generation

compute_phenotypes()

Compute phenotypes for the current generation.

property control
property current_afs_empirical
property current_std_genotypes
property current_std_genotypes_filtered
property current_std_phenotypes
property current_std_phenotypes_filtered
property dependency_graph
property dependency_graph_edges
draw_dependency_graph(node_color='none', node_size=1200, font_size=5, margins=0.1, edge_color='#222222', arrowsize=6, number_edges=True, **kwargs)
estimate_statistics()

Estimate statistics for the current generation.

property generation
property haplotypes
property haplotypes_filtered
increment_generation()
mate()

Perform mating for the current generation.

property mating
move_forward(n_generations)
property parent_haplotypes
property parent_mating
property parent_phenotypes
property phenotypes
property phenotypes_filtered
pickle_results(path, metadata={}, results_store=True, architecture=True, mating_store=True, phenotype_store=True, mating_regime=False, haplotype_store=False)
process()

Apply post-processors to the current generation.

reproduce()

Perform reproduction for the current generation.

property results
run(n_generations)

Run the simulation for a specified number of generations.

Parameters:

n_generations (int) – Number of generations to run the simulation.

run_generation()

Run a single generation of the simulation.

update_pedigree()

Update pedigree information (NOT IMPLEMENTED).

xftsim.stats module

class xftsim.stats.GWAS_Estimator(component_index=None, metadata={}, filter_sample=False, std_X=True, std_Y=True)

Bases: Statistic

Perform linear assocation studies for the given simulation.

When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:

  • the first dimension indexes variants via xft.index.DiploidVariantIndex

  • the second dimension indexes four association statistics: slope, se, test-statistic, and p-value

  • the third dimension indexes phenotypic components via xft.index.ComponentIndex

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.

Type:

xft.index.ComponentIndex, optional

estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)
class xftsim.stats.HasemanElstonEstimator(component_index=None, genetic_correlation=True, randomized=True, prettify=True, n_probe=100, dask=True, metadata={}, filter_sample=False)

Bases: Statistic

Estimate Haseman-Elston regression for the given simulation.

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all components.

Type:

xft.index.ComponentIndex, optional

genetic_correlation

If True, calculate and return the genetic correlation matrix.

Type:

bool

randomized

If True, use a randomized trace estimator.

Type:

bool

prettify

If True, prettify the output by converting it to a pandas DataFrame.

Type:

bool

n_probe

The number of random probes for trace estimation.

Type:

int

dask

If True, use dask for calculations.

Type:

bool

estimator(sim: xft.sim.Simulation) Dict

Estimate and return the Haseman-Elston regression for the given simulation.

estimator(phenotypes, current_std_phenotypes, current_std_genotypes)
class xftsim.stats.MatingStatistics(component_index=None, full=False, metadata={}, filter_sample=False)

Bases: Statistic

Calculate and return various mating statistics for the given simulation.

Parameters:
  • component_index (xft.index.ComponentIndex, optional) – Index of the component for which the statistics are calculated.

  • full (bool) – Ignore component_index and compute statistics for all components If component_index is not provided, and full = False, calculate statistics for phenotype components only.

estimator(sim: xft.sim.Simulation) Dict

Calculate and return the requested mating statistics for the given simulation.

estimator(phenotypes, mating)
class xftsim.stats.Pop_GWAS_Estimator(component_index=None, metadata={}, std_X=False, std_Y=False, assume_pairs=True, n_sub=0, PGS=True, PGS_sub_divisions=50, training_fraction=0.8)

Bases: Statistic

Perform one sib only linear assocation studies for the given simulation.

NOTE! Currently assumes each mate-pair produces exactly 2 offspring

When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:

  • the first dimension indexes variants via xft.index.DiploidVariantIndex

  • the second dimension indexes four association statistics: slope, se, test-statistic, and p-value

  • the third dimension indexes phenotypic components via xft.index.ComponentIndex

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.

Type:

xft.index.ComponentIndex, optional

estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)
class xftsim.stats.SampleStatistics(means=True, variance_components=True, variances=True, vcov=True, corr=True, prettify=True, metadata={}, filter_sample=False)

Bases: Statistic

Calculate and return various sample statistics for the given simulation.

means

If True, calculate and return the mean of each phenotype.

Type:

bool

variance_components

If True, calculate and return the variance components of each phenotype.

Type:

bool

variances

If True, calculate and return the variances of each phenotype.

Type:

bool

vcov

If True, calculate and return the variance-covariance matrix.

Type:

bool

corr

If True, calculate and return the correlation matrix.

Type:

bool

prettify

If True, prettify the output by converting it to a pandas DataFrame.

Type:

bool

estimator(sim: xft.sim.Simulation) Dict

Calculate and return the requested sample statistics for the given simulation.

estimator(phenotypes)
class xftsim.stats.Sib_GWAS_Estimator(component_index=None, metadata={}, std_X=False, std_Y=False, assume_pairs=True, n_sub=0, PGS=True, PGS_sub_divisions=50, training_fraction=0.8)

Bases: Statistic

Perform sib-difference linear assocation studies for the given simulation.

NOTE! Currently assumes each mate-pair produces exactly 2 offspring

When called within a Simulation, will add to Simulation.results[‘GWAS’] a 3-D array indexed as follows:

  • the first dimension indexes variants via xft.index.DiploidVariantIndex

  • the second dimension indexes four association statistics: slope, se, test-statistic, and p-value

  • the third dimension indexes phenotypic components via xft.index.ComponentIndex

component_index

Index of the component for which the statistics are calculated. If not provided, calculate statistics for all phenotype components.

Type:

xft.index.ComponentIndex, optional

estimator(phenotypes, current_std_phenotypes, current_std_genotypes, haplotypes)
class xftsim.stats.Statistic(estimator, parser, name, metadata={}, filter_sample=False, s_args=None)

Bases: object

Base class for defining statistic estimators.

name

The name of the statistic.

Type:

str

estimator

The function that estimates the statistic.

Type:

Callable

metadata

Any additional metadata

Type:

Dict

filter_sample

Apply global filter prior to estimation?

Type:

bool

estimate(sim: xft.sim.Simulation) None:

Estimate the statistic and update the results.

update_results(sim: xft.sim.Simulation, results: object) None:

Update the simulation’s results_store with the estimated results.

estimate(sim=None, **kwargs)
static null_parser(self, *args, **kwargs)
parse_results(sim)
update_results(sim, results)
xftsim.stats.apply_threshold_PGS(estimates, G, thresholds=array([5.00000000e-08, 1.03849902e-07, 2.15696043e-07, 4.48000259e-07, 9.30495659e-07, 1.93263766e-06, 4.01408463e-06, 8.33724592e-06, 1.73164434e-05, 3.59662191e-05, 7.47017665e-05, 1.55155423e-04, 3.22257509e-04, 6.69328214e-04, 1.39019339e-03, 2.88742895e-03, 5.99718426e-03, 1.24561400e-02, 2.58713783e-02, 5.37348020e-02, 1.11607078e-01, 2.31807683e-01, 4.81464104e-01, 1.00000000e+00]))
xftsim.stats.apply_threshold_PGS_all(gwas_results, G, minp=5e-08, maxp=1, nthresh=25)
xftsim.stats.haseman_elston(G, Y, n_probe=500, dtype=<class 'numpy.float32'>, dask=False)

Perform Haseman-Elston regression, with the option to choose randomized, deterministic, or randomized dask-based methods.

Parameters:
  • G (np.ndarray) – A 2D numpy array representing standardized (but not scaled) diploid genotypes.

  • Y (np.ndarray) – A 2D numpy array representing standardized phenotypes.

  • n_probe (int, optional, default 500) – The number of random probes for trace estimation. If n_probe is set to inf, use deterministic method.

  • dtype (numpy data type, optional, default np.float32) – The data type for the input arrays.

  • dask (bool, optional, default False) – If True, use dask for calculations.

Returns:

np.ndarray – A 2D numpy array representing the estimated genetic covariances.

xftsim.stats.threshold_PGS(estimates, threshold, G)

xftsim.struct module

class xftsim.struct.GeneticMap(chrom, pos_bp, pos_cM)

Bases: object

Map between physical and genetic distances.

Parameters:
  • chrom (Iterable) – Chromsomes variants are located on

  • pos_bp (Iterable) – Physical positions of variants

  • pos_cM (Iterable) – Map distances in cM

frame

Pandas DataFrame with the above columns

Type:

pd.DataFrame

chroms

Unique chromosomes present in map

Type:

np.ndarray

classmethod from_pyrho_maps(paths, sep='\t', **kwargs)

Construct genetic map objects from maps provided at https://github.com/popgenmethods/pyrho Please cite their work if you use their maps.

Parameters:
  • paths (Iterable) – Paths for each chromosome

  • sep (str, optional) – Passed to pd.read_csv()

  • **kwargs – Additional arguments to pd.read_csv()

Returns:

GeneticMap

interpolate_cM_chrom(pos_bp, chrom, **kwargs)

Interpolate cM values in a specified chromosome based on genetic map information.

Parameters:
  • pos_bp (Iterable) – Physical positions for which to interpolate cM values

  • chrom (str) – Chromosome on which to interpolate

  • **kwargs – Additional keyword arguments to be passed to scipy.interpolate.interp1d.

class xftsim.struct.HaplotypeArray(haplotypes=None, variant_indexer=None, sample_indexer=None, generation=0, n=None, m=None, dask=False, **kwargs)

Bases: object

Represents a 2D array of binary haplotypes with accompanying row and column indices. Dummy class used for generation of DataArrays and static methods

class xftsim.struct.PhenotypeArray(components=None, component_indexer=None, sample_indexer=None, generation=0, n=None, k_total=None)

Bases: object

An array that stores phenotypes for a set of individuals. Dummy class used for generation of DataArrays and static methods

Parameters:
  • components (ndarray, optional) – n x 2m array of binary haplotypes.

  • component_indexer (xft.index.ComponentIndex, optional) – Indexer for components.

  • sample_indexer (xft.index.SampleIndex, optional) – Indexer for samples.

  • generation (int, optional) – The generation this PhenotypeArray belongs to.

  • n (int, optional) – The number of samples.

  • k_total (int, optional) – The total number of components.

Returns:

xr.DataArray – The initialized PhenotypeArray.

Raises:

AssertionError – If components is provided, then n and k_total must not be provided. If component_indexer is provided, then k_total must not be provided. If sample_indexer is provided, then n must not be provided. If components is provided and sample_indexer is provided, then the shape of components must match the size of the sample dimension of sample_indexer. If components is provided and component_indexer is provided, then the shape of components must match the size of the component dimension of component_indexer. If component_indexer is provided, then the size of the component dimension of component_indexer must match k_total.

static from_product(phenotype_name, component_name, vorigin_relative, components=None, sample_indexer=None, generation=None, haplotypes=None, n=None)

Create a PhenotypeArray from a product of names.

Parameters:
  • phenotype_name (iterable) – The names of the phenotypes.

  • component_name (iterable) – The names of the components.

  • vorigin_relative (iterable) – The relative origins of each component.

  • components (xr.DataArray, optional) – The array to use as the components.

  • sample_indexer (xft.index.SampleIndex, optional) – The sample indexer to use.

  • generation (int, optional) – The generation of the PhenotypeArray.

  • haplotypes (xr.DataArray, optional) – The haplotypes to use.

  • n (int, optional) – The number of samples to use.

Returns:

xr.DataArray – The new PhenotypeArray.

Raises:

AssertionError – If exactly one of generation and sample_indexer is provided, or exactly one of haplotypes and sample_indexer/generation or n/generation is provided.

class xftsim.struct.XftAccessor(xarray_obj)

Bases: object

Accessor for Xarray DataArrays with specialized functionality for HaplotypeArray and PhenotypeArray objects.

Parameters:

xarray_obj (xarray.DataArray) – The DataArray to be accessed.

_obj

The DataArray to be accessed.

Type:

xarray.DataArray

_array_type

The type of the DataArray, either ‘HaplotypeArray’ or ‘componentArray’.

Type:

str

_non_annotation_vars

The non-annotation variables in the DataArray.

Type:

list of str

_variant_vars

The variant annotation variables in the DataArray.

Type:

list of str

_sample_vars

The sample annotation variables in the DataArray.

Type:

list of str

_component_vars

The component annotation variables in the DataArray.

Type:

list of str

_row_dim

The label of the row dimension.

Type:

str

_col_dim

The label of the column dimension.

Type:

str

shape

The shape of the DataArray.

Type:

tuple

n

The number of rows in the DataArray.

Type:

int

data

The data in the DataArray.

Type:

numpy.ndarray

row_vars

List of coordinate variable names for the row dimension.

Type:

list

column_vars

List of coordinate variable names for the column dimension.

Type:

list

sample_mindex

MultiIndex object for the ‘sample’ dimension, containing iid, fid, and sex columns.

Type:

pd.MultiIndex

component_mindex

MultiIndex object for the ‘component’ dimension, containing phenotype_name, component_name, and vorigin_relative columns.

Type:

pd.MultiIndex

Raises:

NotImplementedError – If the DataArray dimensions are not (‘sample’, ‘variant’) or (‘sample’, ‘component’).

property af_empirical

Empirical allele frequencies. Specific to HaplotypeArray objects.

Returns:

numpy.ndarray – Empirical allele frequencies.

Raises:

TypeError – If _col_dim is not ‘variant’.

property all_components

Returns an array of all the unique component names. Specific to PhenotypeArray objects.

Returns:

numpy.ndarray – An array of all the unique component names.

Raises:

TypeError – If the column dimension is not ‘component’.

property all_phenotypes

Returns an array of all the unique phenotype component names. Specific to PhenotypeArray objects.

Returns:

numpy.ndarray – An array of all the unique phenotype component names.

Raises:

TypeError – If the column dimension is not ‘component’.

property all_relatives

Returns an array of all the unique origin relative values. Specific to PhenotypeArray objects.

Returns:

numpy.ndarray – An array of all the unique origin relative values.

Raises:

TypeError – If the column dimension is not ‘component’.

as_pd(prettify=True)

Returns the data as a Pandas DataFrame. Specific to PhenotypeArray objects.

Parameters:

prettify (bool, optional) – If True, the multi-index columns will be prettified by replacing -1, 0, 1 with ‘proband’, ‘mother’, ‘father’, respectively.

Raises:

TypeError – If the column dimension is not ‘component’.

Returns:

pd.DataFrame – A Pandas DataFrame representing the data.

property column_vars

Get the column coordinate variables for the DataArray object.

Returns:

XftIndex – The column coordinate variables of the current column dimension.

property component_mindex

Get a Pandas MultiIndex object for the component dimension.

Returns:

pandas.MultiIndex – MultiIndex object with phenotype_name, component_name, and vorigin_relative as index levels.

Raises:

NotImplementedError – If the column dimension is not ‘component’.

property data

The data in the DataArray.

Returns:

numpy.ndarray – The data in the DataArray.

property depth

Returns the generational depth from binary relative encoding. Specific to PhenotypeArray objects.

Raises:

TypeError – If the column dimension is not ‘component’.

Returns:

Union[float, np.nan] – The generational depth from binary relative encoding, or NaN if the relative origin is empty.

property diploid_chrom

Diploid chromosome numbers. Specific to HaplotypeArray objects.

Returns:

numpy.ndarray – Diploid chromosome numbers.

Raises:

TypeError – If _col_dim is not ‘variant’.

property diploid_vid

Diploid variant ID. Specific to HaplotypeArray objects.

Returns:

numpy.ndarray – Diploid variant IDs.

Raises:

TypeError – If _col_dim is not ‘variant’.

property generation

Generation of the data. Specific to HaplotypeArray objects.

Returns:

int – Generation attribute.

Raises:

TypeError – If _col_dim is not ‘variant’.

get_annotation_dict()

Return a dictionary of all annotation variables associated with the variants in the object. Specific to HaplotypeArray objects.

Returns:

dict – A dictionary where the keys are the annotation variable names and the values are the corresponding arrays.

Raises:

TypeError: – If the _col_dim attribute is not equal to ‘variant’.

get_column_indexer()

Get the column indexer object for the PhenotypeArray object.

Returns:

xft.index.Indexer – The indexer object based on the current column dimension.

Raises:

TypeError – If the current column dimension is not recognized.

get_comp_type(ctype='intermediate')

Returns the index array of components with comp_type==ctype Specific to PhenotypeArray objects.

Returns:

XftIndex – The index of components that match the given keyword.

Raises:

TypeError – If the column dimension is not ‘component’.

get_component_indexer()

Get the component indexer of a PhenotypeArray.

Returns:

xft.index.ComponentIndex – A ComponentIndex object.

get_intermediate_components()

Returns the index array of components with comp_type==’intermediate’ Specific to PhenotypeArray objects.

Returns:

XftIndex – The index of components that match the given keyword.

Raises:

TypeError – If the column dimension is not ‘component’.

get_k_rel(rel)

Returns the number of components with the given relative origin. Specific to PhenotypeArray objects.

Args:

rel (int): The relative origin of the components.

Raises:

TypeError: If the column dimension is not ‘component’.

Returns:

int: The number of components with the given relative origin.

get_non_annotation_dict()

Return a dictionary of all non-annotation variables associated with the variants in the object. Specific to HaplotypeArray objects.

Returns:

dict – A dictionary where the keys are the non-annotation variable names and the values are the corresponding arrays.

Raises:

TypeError: – If the _col_dim attribute is not equal to ‘variant’.

get_outcome_components()

Returns the index array of components with comp_type==’outcome’ Specific to PhenotypeArray objects.

Returns:

XftIndex – The index of components that match the given keyword.

Raises:

TypeError – If the column dimension is not ‘component’.

get_row_indexer()

Get the row indexer.

Returns:

xft.index.SampleIndex – A SampleIndex object.

Raises:

TypeError – If the row dimension is not ‘sample’.

get_sample_indexer()

Returns an instance of xft.index.SampleIndex representing the sample indexer constructed from the input data.

Raises:

NotImplementedError – If _row_dim is not ‘sample’.

Returns:

SampleIndex – An instance of xft.index.SampleIndex constructed from the sample data in the input object.

get_variant_indexer()

Get the variant indexer of a HaplotypeArray.

Returns:

xft.index.HaploidVariantIndex – A HaploidVariantIndex object.

grep_component_index(keyword='phenotype')

Returns the index array of components whose names contain the given keyword. Specific to PhenotypeArray objects.

Parameters:

keyword (str, optional) – The keyword to search for in component names, by default ‘phenotype’.

Returns:

XftIndex – The index of components that match the given keyword.

Raises:

TypeError – If the column dimension is not ‘component’.

interpolate_cM(gmap, **kwargs)

Interpolate cM values based on genetic map information. Specific to HaplotypeArray objects.

Parameters:
  • gmap (GeneticMap) – Genetic map data

  • **kwargs – Additional keyword arguments to be passed to scipy.interpolate.interp1d.

Raises:
  • TypeError – If the column dimension is not ‘variant’.

  • ValueError – If not all chromosomes required are present in the genetic map

property k_components

Returns the number of unique component names. Specific to PhenotypeArray objects.

Returns:

int – The number of unique component names.

Raises:

TypeError – If the column dimension is not ‘component’.

property k_current

Returns the number of all current-gen specific components. Specific to PhenotypeArray objects.

Raises:

TypeError – If the column dimension is not ‘component’.

Returns:

int – The number of all current-gen specific components.

property k_phenotypes

Returns the number of unique phenotype components. Specific to PhenotypeArray objects.

Returns:

int – The number of unique phenotype components.

Raises:

TypeError – If the column dimension is not ‘component’.

property k_relative

Returns the number of unique origin relative values. Specific to PhenotypeArray objects.

Returns:

int – The number of unique origin relative values.

Raises:

TypeError – If the column dimension is not ‘component’.

property k_total

Returns the total number of components. Specific to PhenotypeArray objects.

Returns:

int – The total number of components.

Raises:

TypeError – If the column dimension is not ‘component’.

property m

Return the number of distinct diploid variants. Specific to HaplotypeArray objects.

Returns:

int – The number of distinct diploid variants in the array.

Raises:

TypeError: – If the _col_dim attribute is not equal to ‘variant’.

property maf_empirical

Empirical minor allele frequencies. Specific to HaplotypeArray objects.

Returns:

numpy.ndarray – Empirical minor allele frequencies.

Raises:

TypeError – If _col_dim is not ‘variant’.

property n

The number of rows in the DataArray.

Returns:

int – The number of rows in the DataArray.

reindex_components(value)

Reindex the components.

Parameters:

value (xft.index.ComponentIndex) – A ComponentIndex object.

Returns:

PhenotypeArray – A new PhenotypeArray object.

property row_vars

Get the row coordinate variables for the PhenotypeArray object.

Returns:

XftIndex – The row coordinate variables of the row dimension.

property sample_mindex

Get the sample multi-index for the PhenotypeArray object.

Returns:

pd.MultiIndex – A multi-index object containing sample IDs, family IDs, and sex information.

Raises:

NotImplementedError – If the current row dimension is not ‘sample’.

set_column_indexer(value)

Set the column indexer object for the PhenotypeArray object.

Parameters:

value (xft.index.Indexer) – The new indexer object for the PhenotypeArray object.

Returns:

None

Raises:

TypeError – If the current column dimension is not recognized.

set_row_indexer()
set_sample_indexer(value)
set_variant_indexer(value)
property shape

The shape of the DataArray.

Returns:

tuple – The shape of the DataArray.

split_by_component()

Splits the data by component name. Specific to PhenotypeArray objects.

Raises:

TypeError – If the column dimension is not ‘component’.

Returns:

Dict[str, pd.DataFrame] – A dictionary of dataframes, where the keys are the unique component names and the values are dataframes containing the data for each component.

split_by_phenotype()

Splits the data by phenotype name. Specific to PhenotypeArray objects.

Raises:

TypeError – If the column dimension is not ‘component’.

Returns:

Dict[str, pd.DataFrame] – A dictionary of dataframes, where the keys are the unique phenotype names and the values are dataframes containing the data for each phenotype.

split_by_phenotype_vorigin()

Splits the data by phenotype name and relative origin. Specific to PhenotypeArray objects.

Raises:

TypeError

:raises If the column dimension is not 'component':

Returns:

Dict[Tuple[str, int], pd.DataFrame] – A dictionary of dataframes, where the keys are tuples of phenotype name and relative origin and the values are dataframes containing the data for each combination of phenotype name and relative origin.

split_by_vorigin()

Splits the data by relative origin. Specific to PhenotypeArray objects.

Raises:

TypeError – If the column dimension is not ‘component’.

Returns:

Dict[int, pd.DataFrame] – A dictionary of dataframes, where the keys are the unique relative origins and the values are dataframes containing the data for each relative origin.

standardize()
to_diploid()

Convert the object to a diploid representation by adding the two haplotypes for each variant. Specific to HaplotypeArray objects.

Raises:

TypeError: – If the _col_dim attribute is not equal to ‘variant’.

to_diploid_standardized(af=None, scale=False)

Standardize the HaplotypeArray object and convert it to a diploid representation. Specific to HaplotypeArray objects.

Parameters:
  • af (NDArray, optional) – An array containing the allele frequencies of each variant. If not provided, empirical afs will with used

  • scale (bool, optional) – Whether or not to scale the standardized array by the square root of the number of variants.

Returns:

ndarray – A standardized diploid array where each variant is represented as the sum of two haplotypes.

Raises:

TypeError: – If the _col_dim attribute is not equal to ‘variant’.

use_empirical_afs()

Sets allele frequencies to the empirical frequencies. Specific to HaplotypeArray objects.

Raises:

TypeError – If _col_dim is not ‘variant’.

xftsim.utils module

class xftsim.utils.ConstantCount(count)

Bases: VariableCount

Class representing a constant count of individuals in a population.

draw

a function that generates an array of counts

Type:

Callable

expectation

expected count

Type:

float

nonzero_fraction

the fraction of the population that is nonzero

Type:

float

Parameters:

count (int) – The constant count of individuals in the population.

class xftsim.utils.MixtureCount(componentCounts, mixture_probabilities)

Bases: VariableCount

Class representing a mixture of VariableCounts of individuals in a population.

draw

a function that generates an array of counts

Type:

Callable

expectation

expected count

Type:

float

nonzero_fraction

the fraction of the population that is nonzero

Type:

float

Parameters:
  • componentCounts (Iterable) – An iterable of VariableCount instances, representing the components of the mixture.

  • mixture_probabilities (NDArray[Shape[``”*”``], Float64]) – An array of probabilities associated with each component in the mixture.

class xftsim.utils.NegativeBinomialCount(r, p)

Bases: VariableCount

Class representing a negative binomial-distributed count of individuals in a population.

draw

a function that generates an array of counts

Type:

Callable

expectation

expected count

Type:

float

nonzero_fraction

the fraction of the population that is nonzero

Type:

float

Parameters:
  • r (float) – The number of successes in the negative binomial distribution.

  • p (float) – The probability of success in the negative binomial distribution.

class xftsim.utils.PoissonCount(rate)

Bases: VariableCount

Class representing a Poisson-distributed count of individuals in a population. .. attribute:: draw

a function that generates an array of counts

type:

Callable

expectation

expected count

Type:

float

nonzero_fraction

the fraction of the population that is nonzero

Type:

float

Parameters:

rate (float) – The Poisson rate parameter.

class xftsim.utils.VariableCount(draw, expectation=None, nonzero_fraction=None)

Bases: object

A class to represent random count variables

draw

a function that generates an array of counts

Type:

Callable

expectation

expected count

Type:

float

nonzero_fraction

the fraction of the population that is nonzero

Type:

float

None()
property expectation

Getter function for expectation attribute.

Returns:

float – Expected count.

property nonzero_fraction

Getter function for nonzero_fraction attribute.

Returns:

float – The fraction of the population that is nonzero.

class xftsim.utils.ZeroTruncatedPoissonCount(rate)

Bases: VariableCount

Class representing a zero-truncated Poisson-distributed count of individuals in a population.

draw

a function that generates an array of counts

Type:

Callable

expectation

expected count

Type:

float

nonzero_fraction

the fraction of the population that is nonzero

Type:

float

Parameters:

rate (float) – The Poisson rate parameter prior to zero-truncation.

xftsim.utils.cartesian_product(*args)

Returns a list of columns comprising a cartesian product of input arrays. Emulates R function expand.grid()

Parameters:

*args (NDArray[Any, Any]) – The input arrays.

Returns:

List[NDArray[Any, Any]] – The list of columns.

xftsim.utils.cov2cor(A)

Converts covariance matrix to correlation matrix.

Parameters:

A: Union[np.ndarray, pd.DataFrame, xr.DataArray]

Input covariance matrix.

Returns:

Union[np.ndarray, pd.DataFrame, xr.DataArray]

Correlation matrix.

Raises:

None

xftsim.utils.ensure2D(x)

Ensures the input array is 2D, by adding a new dimension if needed.

Parameters:

x (arraylike) – The input array, by default None.

Returns:

NDArray[Any, Any] – The 2D input array.

Raises:

ValueError – If the input array is not valid.

xftsim.utils.exhaustive_enumerate(a, n_per_a)

Repeat each ith element of array a integer n_per_a[i] times such that each every element appears min(j, n_per_a[i]) times in order before any element appears j+1 times.

Parameters:

aarray-like

1-D array of any shape and data type.

n_per_aarray-like

1-D array of int, representing the number of times each element in a needs to be repeated.

Returns:

outarray-like

1-D array of shape (n,) and the same data type as a, where each element is repeated as per n_per_a in the order before any element appears j+1 times.

Raises:

Warning : If the output array is empty.

Examples:

>>> exhaustive_enumerate(np.array((1, 2, 3, 4)), np.array((3, 2, 1, 0)))
array([1, 2, 3, 1, 2, 1])
xftsim.utils.exhaustive_permutation(a, n_sample)

Returns a random permutation of the input array, such that each element is selected exactly once before any element is selected twice, and so forth

Parameters:

aNDArray[Shape[“*”], Any]

A numpy array to be permuted.

n_sampleint

An integer specifying the size of the permutation to be returned.

Returns:

np.ndarray

A 1D numpy array containing the permuted elements.

xftsim.utils.ids_from_generation(generation, indices=None)

Generates and returns a new array of IDs using the given generation number and the given indices. The new array contains the given indices with the generation number prefixed to each index.

Parameters:
  • generation (int) – The generation number to use in the prefix of the IDs.

  • indices (NDArray[Shape[``”*”``], Int64], optional) – A numpy array of indices.

Returns:

ndarray – A new numpy array of IDs with the given generation number prefixed to each index.

xftsim.utils.ids_from_generation_range(generation, n=None)

Returns an array of string IDs of length n, created by concatenating the input generation with an increasing sequence of integers from 0 to n-1.

Parameters:

generationint

An integer representing the generation of the IDs to be created.

nNDArray[Shape[“*”], Int64], optional (default=None)

An integer specifying the number of IDs to be generated. If None, a range of IDs starting from 0 is created.

Returns:

np.ndarray

A 1D numpy array containing the IDs in string format.

xftsim.utils.ids_from_n_generation(n, generation)

Creates an array of individual IDs based on the specified number of elements and generation.

Parameters:
  • n (int) – The number of individuals.

  • generation (int) – The generation number.

Returns:

numpy.ndarray – An array of individual IDs.

xftsim.utils.match(a, b)

Finds the indices in b that match the elements in a, and returns the corresponding index of each element in b.

Parameters:

aList[Hashable]

List of elements to find matches for.

bList[Hashable]

List of elements to find matches in.

Returns:

List[int]

A list of indices in b that match the elements in a.

xftsim.utils.matching_indices_conditional(a, b, condition)

Returns the indices of matches between a and b arrays, given a boolean condition.

Parameters:
  • a (List[Hashable]) – The first input array.

  • b (List[Hashable]) – The second input array.

  • condition (NDArray[Shape[``”*”``], Any]) – The boolean condition array to apply.

Returns:

NDArray[Shape[``”*”``], Int64] – The matching indices array.

xftsim.utils.merge_duplicate_pairs(a, b, n, sort=False)

Merge duplicate pairs of values in a and b based on their corresponding values in n.

Parameters:

aNDArray[Shape[“*”], Any]

First array to merge.

bNDArray[Shape[“*”], Any]

Second array to merge.

nNDArray[Shape[“*”], Any]

Array of corresponding values that determine how the duplicates are merged.

sortbool, optional

Whether to sort the values in a and b before merging the duplicates. Default is False.

Returns:

Tuple[NDArray[Shape[“*”], Any], NDArray[Shape[“*”], Any], NDArray[Shape[“*”], Any]]

The merged arrays, with duplicates removed based on the corresponding values in n.

xftsim.utils.merge_duplicates(it)

Merge duplicates in the input array by checking if any pasted elements are the same.

Parameters:

it (Iterable) – A numpy array with elements to be checked for duplication.

Returns:

list – Returns the input list with duplicates merged if present.

xftsim.utils.paste(it, sep='_')

Concatenates elements in a list-like object with a specified separator.

Parameters:
  • it (list-like) – The list-like object containing elements to concatenate.

  • sep (str, optional) – The separator used to concatenate the elements. Defaults to “_”.

Returns:

numpy.ndarray – An array of concatenated string elements.

xftsim.utils.print_tree(x, depth=0)

Print dict of dict(of dict(…)s)s in easy to read tree similar to bash program ‘tree’ Modified from https://stackoverflow.com/questions/47131263/python-3-6-print-dictionary-data-in-readable-tree-structure

Parameters:

x (Any) – Dict of dicts

xftsim.utils.profiled(call, level=1, message=None, sep='     | ')

A decorator that prints the duration of a function call when the specified logging level is met.

Parameters:
  • call (function) – The function being decorated.

  • level (int, optional) – The logging level at which the duration of the function call is printed. Defaults to 1.

  • message (str, optional) – A custom message to display in the log output. If not provided, the name of the decorated function will be used.

Returns:

TYPE – Description

xftsim.utils.sort_and_paste(x)

Sorts the input array in ascending order and concatenates the first element with an underscore separator followed by the second element.

Parameters:

xarray-like

1-D array of any shape and data type.

Returns:

outarray-like

1-D array of strings with shape (n,) and the same length as x, where each element is formed by concatenating two sorted string representations of each element in x, separated by an underscore.

Examples:

>>> sort_and_paste(np.array((3, 1, 2)))
array(['1_2', '2_3', '1_3'], dtype='<U3')
xftsim.utils.standardize_array(a)

Standardizes columns of a 2D array.

Parameters:

a: ArrayLike

Input 2D array.

Returns:

np.ndarray

Standardized 2D array.

Raises:

None

xftsim.utils.standardize_array_hw(haplotypes, af)

Wraps _standardize_array_hw to prevent segfaults.

Parameters:

haplotypes: NDArray[Shape[”,”], Int8]

Input array of int8 haploid genotypes.

af: NDArray[Shape[“*”], Float]

Input array of allele frequencies.

Returns:

np.ndarray

Standardized genotypes.

Raises:

None

xftsim.utils.to_proportions(*args)

Converts input values to proportional values.

Parameters:

*args: Union[float, int]

Input values.

Returns:

np.ndarray

Proportional values.

Raises:

None

xftsim.utils.to_simplex(*args)

Converts input values to a simplex vector.

Parameters:

*args: Union[float, int]

Input values.

Returns:

np.ndarray

Simplex vector.

Raises:

ValueError

If all input values are less than or equal to zero.

xftsim.utils.unique_identifier(frame, index_variables, prefix=None)

Returns a unique identifier string generated from index variables of a dataframe.

Parameters:

frame: pd.DataFrame

Input dataframe.

index_variables: List[str]

List of column names to be used as index.

prefix: str

Optional prefix

Returns:

str

Unique identifier string of the form [<prefix>..]<index_var1>.<index_var2>…

Raises:

None

Module contents

class xftsim.Config

Bases: object

A class to store configuration settings. Instantiated as xftsim.config when package is loaded

nthreads

Number of threads to use for parallel execution.

Type:

int

print_level

Verbosity level for print statements.

Type:

int

print_durations_threshold

Threshold for printing durations.

Type:

float

get_pdurations()

Get the current print durations threshold.

Returns:

float – The print durations threshold.

get_plevel()

Get the current print level.

Returns:

int – The print level.