struct

Below is an auto-generated summary of the xftsim.struct submodule API.

class xftsim.struct.GeneticMap(chrom, pos_bp, pos_cM)

Bases: object

Map between physical and genetic distances.

Parameters:
  • chrom (Iterable) – Chromsomes variants are located on

  • pos_bp (Iterable) – Physical positions of variants

  • pos_cM (Iterable) – Map distances in cM

frame

Pandas DataFrame with the above columns

Type:

pd.DataFrame

chroms

Unique chromosomes present in map

Type:

np.ndarray

classmethod from_pyrho_maps(paths, sep='\t', **kwargs)

Construct genetic map objects from maps provided at https://github.com/popgenmethods/pyrho Please cite their work if you use their maps.

Parameters:
  • paths (Iterable) – Paths for each chromosome

  • sep (str, optional) – Passed to pd.read_csv()

  • **kwargs – Additional arguments to pd.read_csv()

Returns:

GeneticMap

interpolate_cM_chrom(pos_bp, chrom, **kwargs)

Interpolate cM values in a specified chromosome based on genetic map information.

Parameters:
  • pos_bp (Iterable) – Physical positions for which to interpolate cM values

  • chrom (str) – Chromosome on which to interpolate

  • **kwargs – Additional keyword arguments to be passed to scipy.interpolate.interp1d.

class xftsim.struct.HaplotypeArray(haplotypes=None, variant_indexer=None, sample_indexer=None, generation=0, n=None, m=None, dask=False, **kwargs)

Bases: object

Represents a 2D array of binary haplotypes with accompanying row and column indices. Dummy class used for generation of DataArrays and static methods

class xftsim.struct.PhenotypeArray(components=None, component_indexer=None, sample_indexer=None, generation=0, n=None, k_total=None)

Bases: object

An array that stores phenotypes for a set of individuals. Dummy class used for generation of DataArrays and static methods

Parameters:
  • components (ndarray, optional) – n x 2m array of binary haplotypes.

  • component_indexer (xft.index.ComponentIndex, optional) – Indexer for components.

  • sample_indexer (xft.index.SampleIndex, optional) – Indexer for samples.

  • generation (int, optional) – The generation this PhenotypeArray belongs to.

  • n (int, optional) – The number of samples.

  • k_total (int, optional) – The total number of components.

Returns:

xr.DataArray – The initialized PhenotypeArray.

Raises:

AssertionError – If components is provided, then n and k_total must not be provided. If component_indexer is provided, then k_total must not be provided. If sample_indexer is provided, then n must not be provided. If components is provided and sample_indexer is provided, then the shape of components must match the size of the sample dimension of sample_indexer. If components is provided and component_indexer is provided, then the shape of components must match the size of the component dimension of component_indexer. If component_indexer is provided, then the size of the component dimension of component_indexer must match k_total.

static from_product(phenotype_name, component_name, vorigin_relative, components=None, sample_indexer=None, generation=None, haplotypes=None, n=None)

Create a PhenotypeArray from a product of names.

Parameters:
  • phenotype_name (iterable) – The names of the phenotypes.

  • component_name (iterable) – The names of the components.

  • vorigin_relative (iterable) – The relative origins of each component.

  • components (xr.DataArray, optional) – The array to use as the components.

  • sample_indexer (xft.index.SampleIndex, optional) – The sample indexer to use.

  • generation (int, optional) – The generation of the PhenotypeArray.

  • haplotypes (xr.DataArray, optional) – The haplotypes to use.

  • n (int, optional) – The number of samples to use.

Returns:

xr.DataArray – The new PhenotypeArray.

Raises:

AssertionError – If exactly one of generation and sample_indexer is provided, or exactly one of haplotypes and sample_indexer/generation or n/generation is provided.

class xftsim.struct.XftAccessor(xarray_obj)

Bases: object

Accessor for Xarray DataArrays with specialized functionality for HaplotypeArray and PhenotypeArray objects.

Parameters:

xarray_obj (xarray.DataArray) – The DataArray to be accessed.

_obj

The DataArray to be accessed.

Type:

xarray.DataArray

_array_type

The type of the DataArray, either ‘HaplotypeArray’ or ‘componentArray’.

Type:

str

_non_annotation_vars

The non-annotation variables in the DataArray.

Type:

list of str

_variant_vars

The variant annotation variables in the DataArray.

Type:

list of str

_sample_vars

The sample annotation variables in the DataArray.

Type:

list of str

_component_vars

The component annotation variables in the DataArray.

Type:

list of str

_row_dim

The label of the row dimension.

Type:

str

_col_dim

The label of the column dimension.

Type:

str

shape

The shape of the DataArray.

Type:

tuple

n

The number of rows in the DataArray.

Type:

int

data

The data in the DataArray.

Type:

numpy.ndarray

row_vars

List of coordinate variable names for the row dimension.

Type:

list

column_vars

List of coordinate variable names for the column dimension.

Type:

list

sample_mindex

MultiIndex object for the ‘sample’ dimension, containing iid, fid, and sex columns.

Type:

pd.MultiIndex

component_mindex

MultiIndex object for the ‘component’ dimension, containing phenotype_name, component_name, and vorigin_relative columns.

Type:

pd.MultiIndex

Raises:

NotImplementedError – If the DataArray dimensions are not (‘sample’, ‘variant’) or (‘sample’, ‘component’).

property af_empirical

Empirical allele frequencies. Specific to HaplotypeArray objects.

Returns:

numpy.ndarray – Empirical allele frequencies.

Raises:

TypeError – If _col_dim is not ‘variant’.

property all_components

Returns an array of all the unique component names. Specific to PhenotypeArray objects.

Returns:

numpy.ndarray – An array of all the unique component names.

Raises:

TypeError – If the column dimension is not ‘component’.

property all_phenotypes

Returns an array of all the unique phenotype component names. Specific to PhenotypeArray objects.

Returns:

numpy.ndarray – An array of all the unique phenotype component names.

Raises:

TypeError – If the column dimension is not ‘component’.

property all_relatives

Returns an array of all the unique origin relative values. Specific to PhenotypeArray objects.

Returns:

numpy.ndarray – An array of all the unique origin relative values.

Raises:

TypeError – If the column dimension is not ‘component’.

as_pd(prettify=True)

Returns the data as a Pandas DataFrame. Specific to PhenotypeArray objects.

Parameters:

prettify (bool, optional) – If True, the multi-index columns will be prettified by replacing -1, 0, 1 with ‘proband’, ‘mother’, ‘father’, respectively.

Raises:

TypeError – If the column dimension is not ‘component’.

Returns:

pd.DataFrame – A Pandas DataFrame representing the data.

property column_vars

Get the column coordinate variables for the DataArray object.

Returns:

XftIndex – The column coordinate variables of the current column dimension.

property component_mindex

Get a Pandas MultiIndex object for the component dimension.

Returns:

pandas.MultiIndex – MultiIndex object with phenotype_name, component_name, and vorigin_relative as index levels.

Raises:

NotImplementedError – If the column dimension is not ‘component’.

property data

The data in the DataArray.

Returns:

numpy.ndarray – The data in the DataArray.

property depth

Returns the generational depth from binary relative encoding. Specific to PhenotypeArray objects.

Raises:

TypeError – If the column dimension is not ‘component’.

Returns:

Union[float, np.nan] – The generational depth from binary relative encoding, or NaN if the relative origin is empty.

property diploid_chrom

Diploid chromosome numbers. Specific to HaplotypeArray objects.

Returns:

numpy.ndarray – Diploid chromosome numbers.

Raises:

TypeError – If _col_dim is not ‘variant’.

property diploid_vid

Diploid variant ID. Specific to HaplotypeArray objects.

Returns:

numpy.ndarray – Diploid variant IDs.

Raises:

TypeError – If _col_dim is not ‘variant’.

property generation

Generation of the data. Specific to HaplotypeArray objects.

Returns:

int – Generation attribute.

Raises:

TypeError – If _col_dim is not ‘variant’.

get_annotation_dict()

Return a dictionary of all annotation variables associated with the variants in the object. Specific to HaplotypeArray objects.

Returns:

dict – A dictionary where the keys are the annotation variable names and the values are the corresponding arrays.

Raises:

TypeError: – If the _col_dim attribute is not equal to ‘variant’.

get_column_indexer()

Get the column indexer object for the PhenotypeArray object.

Returns:

xft.index.Indexer – The indexer object based on the current column dimension.

Raises:

TypeError – If the current column dimension is not recognized.

get_comp_type(ctype='intermediate')

Returns the index array of components with comp_type==ctype Specific to PhenotypeArray objects.

Returns:

XftIndex – The index of components that match the given keyword.

Raises:

TypeError – If the column dimension is not ‘component’.

get_component_indexer()

Get the component indexer of a PhenotypeArray.

Returns:

xft.index.ComponentIndex – A ComponentIndex object.

get_intermediate_components()

Returns the index array of components with comp_type==’intermediate’ Specific to PhenotypeArray objects.

Returns:

XftIndex – The index of components that match the given keyword.

Raises:

TypeError – If the column dimension is not ‘component’.

get_k_rel(rel)

Returns the number of components with the given relative origin. Specific to PhenotypeArray objects.

Args:

rel (int): The relative origin of the components.

Raises:

TypeError: If the column dimension is not ‘component’.

Returns:

int: The number of components with the given relative origin.

get_non_annotation_dict()

Return a dictionary of all non-annotation variables associated with the variants in the object. Specific to HaplotypeArray objects.

Returns:

dict – A dictionary where the keys are the non-annotation variable names and the values are the corresponding arrays.

Raises:

TypeError: – If the _col_dim attribute is not equal to ‘variant’.

get_outcome_components()

Returns the index array of components with comp_type==’outcome’ Specific to PhenotypeArray objects.

Returns:

XftIndex – The index of components that match the given keyword.

Raises:

TypeError – If the column dimension is not ‘component’.

get_row_indexer()

Get the row indexer.

Returns:

xft.index.SampleIndex – A SampleIndex object.

Raises:

TypeError – If the row dimension is not ‘sample’.

get_sample_indexer()

Returns an instance of xft.index.SampleIndex representing the sample indexer constructed from the input data.

Raises:

NotImplementedError – If _row_dim is not ‘sample’.

Returns:

SampleIndex – An instance of xft.index.SampleIndex constructed from the sample data in the input object.

get_variant_indexer()

Get the variant indexer of a HaplotypeArray.

Returns:

xft.index.HaploidVariantIndex – A HaploidVariantIndex object.

grep_component_index(keyword='phenotype')

Returns the index array of components whose names contain the given keyword. Specific to PhenotypeArray objects.

Parameters:

keyword (str, optional) – The keyword to search for in component names, by default ‘phenotype’.

Returns:

XftIndex – The index of components that match the given keyword.

Raises:

TypeError – If the column dimension is not ‘component’.

interpolate_cM(gmap, **kwargs)

Interpolate cM values based on genetic map information. Specific to HaplotypeArray objects.

Parameters:
  • gmap (GeneticMap) – Genetic map data

  • **kwargs – Additional keyword arguments to be passed to scipy.interpolate.interp1d.

Raises:
  • TypeError – If the column dimension is not ‘variant’.

  • ValueError – If not all chromosomes required are present in the genetic map

property k_components

Returns the number of unique component names. Specific to PhenotypeArray objects.

Returns:

int – The number of unique component names.

Raises:

TypeError – If the column dimension is not ‘component’.

property k_current

Returns the number of all current-gen specific components. Specific to PhenotypeArray objects.

Raises:

TypeError – If the column dimension is not ‘component’.

Returns:

int – The number of all current-gen specific components.

property k_phenotypes

Returns the number of unique phenotype components. Specific to PhenotypeArray objects.

Returns:

int – The number of unique phenotype components.

Raises:

TypeError – If the column dimension is not ‘component’.

property k_relative

Returns the number of unique origin relative values. Specific to PhenotypeArray objects.

Returns:

int – The number of unique origin relative values.

Raises:

TypeError – If the column dimension is not ‘component’.

property k_total

Returns the total number of components. Specific to PhenotypeArray objects.

Returns:

int – The total number of components.

Raises:

TypeError – If the column dimension is not ‘component’.

property m

Return the number of distinct diploid variants. Specific to HaplotypeArray objects.

Returns:

int – The number of distinct diploid variants in the array.

Raises:

TypeError: – If the _col_dim attribute is not equal to ‘variant’.

property maf_empirical

Empirical minor allele frequencies. Specific to HaplotypeArray objects.

Returns:

numpy.ndarray – Empirical minor allele frequencies.

Raises:

TypeError – If _col_dim is not ‘variant’.

property n

The number of rows in the DataArray.

Returns:

int – The number of rows in the DataArray.

reindex_components(value)

Reindex the components.

Parameters:

value (xft.index.ComponentIndex) – A ComponentIndex object.

Returns:

PhenotypeArray – A new PhenotypeArray object.

property row_vars

Get the row coordinate variables for the PhenotypeArray object.

Returns:

XftIndex – The row coordinate variables of the row dimension.

property sample_mindex

Get the sample multi-index for the PhenotypeArray object.

Returns:

pd.MultiIndex – A multi-index object containing sample IDs, family IDs, and sex information.

Raises:

NotImplementedError – If the current row dimension is not ‘sample’.

set_column_indexer(value)

Set the column indexer object for the PhenotypeArray object.

Parameters:

value (xft.index.Indexer) – The new indexer object for the PhenotypeArray object.

Returns:

None

Raises:

TypeError – If the current column dimension is not recognized.

set_row_indexer()
set_sample_indexer(value)
set_variant_indexer(value)
property shape

The shape of the DataArray.

Returns:

tuple – The shape of the DataArray.

split_by_component()

Splits the data by component name. Specific to PhenotypeArray objects.

Raises:

TypeError – If the column dimension is not ‘component’.

Returns:

Dict[str, pd.DataFrame] – A dictionary of dataframes, where the keys are the unique component names and the values are dataframes containing the data for each component.

split_by_phenotype()

Splits the data by phenotype name. Specific to PhenotypeArray objects.

Raises:

TypeError – If the column dimension is not ‘component’.

Returns:

Dict[str, pd.DataFrame] – A dictionary of dataframes, where the keys are the unique phenotype names and the values are dataframes containing the data for each phenotype.

split_by_phenotype_vorigin()

Splits the data by phenotype name and relative origin. Specific to PhenotypeArray objects.

Raises:

TypeError

:raises If the column dimension is not 'component':

Returns:

Dict[Tuple[str, int], pd.DataFrame] – A dictionary of dataframes, where the keys are tuples of phenotype name and relative origin and the values are dataframes containing the data for each combination of phenotype name and relative origin.

split_by_vorigin()

Splits the data by relative origin. Specific to PhenotypeArray objects.

Raises:

TypeError – If the column dimension is not ‘component’.

Returns:

Dict[int, pd.DataFrame] – A dictionary of dataframes, where the keys are the unique relative origins and the values are dataframes containing the data for each relative origin.

standardize()
to_diploid()

Convert the object to a diploid representation by adding the two haplotypes for each variant. Specific to HaplotypeArray objects.

Raises:

TypeError: – If the _col_dim attribute is not equal to ‘variant’.

to_diploid_standardized(af=None, scale=False)

Standardize the HaplotypeArray object and convert it to a diploid representation. Specific to HaplotypeArray objects.

Parameters:
  • af (NDArray, optional) – An array containing the allele frequencies of each variant. If not provided, empirical afs will with used

  • scale (bool, optional) – Whether or not to scale the standardized array by the square root of the number of variants.

Returns:

ndarray – A standardized diploid array where each variant is represented as the sum of two haplotypes.

Raises:

TypeError: – If the _col_dim attribute is not equal to ‘variant’.

use_empirical_afs()

Sets allele frequencies to the empirical frequencies. Specific to HaplotypeArray objects.

Raises:

TypeError – If _col_dim is not ‘variant’.