struct
Below is an auto-generated summary of the xftsim.struct submodule API.
- class xftsim.struct.GeneticMap(chrom, pos_bp, pos_cM)
Bases:
object
Map between physical and genetic distances.
- Parameters:
chrom (
Iterable
) – Chromsomes variants are located onpos_bp (
Iterable
) – Physical positions of variantspos_cM (
Iterable
) – Map distances in cM
- frame
Pandas DataFrame with the above columns
- Type:
pd.DataFrame
- chroms
Unique chromosomes present in map
- Type:
np.ndarray
- classmethod from_pyrho_maps(paths, sep='\t', **kwargs)
Construct genetic map objects from maps provided at https://github.com/popgenmethods/pyrho Please cite their work if you use their maps.
- Parameters:
paths (
Iterable
) – Paths for each chromosomesep (
str
, optional) – Passed to pd.read_csv()**kwargs – Additional arguments to pd.read_csv()
- Returns:
- interpolate_cM_chrom(pos_bp, chrom, **kwargs)
Interpolate cM values in a specified chromosome based on genetic map information.
- Parameters:
pos_bp (
Iterable
) – Physical positions for which to interpolate cM valueschrom (
str
) – Chromosome on which to interpolate**kwargs – Additional keyword arguments to be passed to scipy.interpolate.interp1d.
- class xftsim.struct.HaplotypeArray(haplotypes=None, variant_indexer=None, sample_indexer=None, generation=0, n=None, m=None, dask=False, **kwargs)
Bases:
object
Represents a 2D array of binary haplotypes with accompanying row and column indices. Dummy class used for generation of DataArrays and static methods
- class xftsim.struct.PhenotypeArray(components=None, component_indexer=None, sample_indexer=None, generation=0, n=None, k_total=None)
Bases:
object
An array that stores phenotypes for a set of individuals. Dummy class used for generation of DataArrays and static methods
- Parameters:
components (
ndarray
, optional) – n x 2m array of binary haplotypes.component_indexer (
xft.index.ComponentIndex
, optional) – Indexer for components.sample_indexer (
xft.index.SampleIndex
, optional) – Indexer for samples.generation (
int
, optional) – The generation this PhenotypeArray belongs to.n (
int
, optional) – The number of samples.k_total (
int
, optional) – The total number of components.
- Returns:
xr.DataArray
– The initialized PhenotypeArray.- Raises:
AssertionError – If components is provided, then n and k_total must not be provided. If component_indexer is provided, then k_total must not be provided. If sample_indexer is provided, then n must not be provided. If components is provided and sample_indexer is provided, then the shape of components must match the size of the sample dimension of sample_indexer. If components is provided and component_indexer is provided, then the shape of components must match the size of the component dimension of component_indexer. If component_indexer is provided, then the size of the component dimension of component_indexer must match k_total.
- static from_product(phenotype_name, component_name, vorigin_relative, components=None, sample_indexer=None, generation=None, haplotypes=None, n=None)
Create a PhenotypeArray from a product of names.
- Parameters:
phenotype_name (
iterable
) – The names of the phenotypes.component_name (
iterable
) – The names of the components.vorigin_relative (
iterable
) – The relative origins of each component.components (
xr.DataArray
, optional) – The array to use as the components.sample_indexer (
xft.index.SampleIndex
, optional) – The sample indexer to use.generation (
int
, optional) – The generation of the PhenotypeArray.haplotypes (
xr.DataArray
, optional) – The haplotypes to use.n (
int
, optional) – The number of samples to use.
- Returns:
xr.DataArray
– The new PhenotypeArray.- Raises:
AssertionError – If exactly one of generation and sample_indexer is provided, or exactly one of haplotypes and sample_indexer/generation or n/generation is provided.
- class xftsim.struct.XftAccessor(xarray_obj)
Bases:
object
Accessor for Xarray DataArrays with specialized functionality for HaplotypeArray and PhenotypeArray objects.
- Parameters:
xarray_obj (
xarray.DataArray
) – The DataArray to be accessed.
- _obj
The DataArray to be accessed.
- Type:
xarray.DataArray
- _array_type
The type of the DataArray, either ‘HaplotypeArray’ or ‘componentArray’.
- Type:
str
- _non_annotation_vars
The non-annotation variables in the DataArray.
- Type:
list
ofstr
- _variant_vars
The variant annotation variables in the DataArray.
- Type:
list
ofstr
- _sample_vars
The sample annotation variables in the DataArray.
- Type:
list
ofstr
- _component_vars
The component annotation variables in the DataArray.
- Type:
list
ofstr
- _row_dim
The label of the row dimension.
- Type:
str
- _col_dim
The label of the column dimension.
- Type:
str
- shape
The shape of the DataArray.
- Type:
tuple
- n
The number of rows in the DataArray.
- Type:
int
- data
The data in the DataArray.
- Type:
numpy.ndarray
- row_vars
List of coordinate variable names for the row dimension.
- Type:
list
- column_vars
List of coordinate variable names for the column dimension.
- Type:
list
- sample_mindex
MultiIndex object for the ‘sample’ dimension, containing iid, fid, and sex columns.
- Type:
pd.MultiIndex
- component_mindex
MultiIndex object for the ‘component’ dimension, containing phenotype_name, component_name, and vorigin_relative columns.
- Type:
pd.MultiIndex
- Raises:
NotImplementedError – If the DataArray dimensions are not (‘sample’, ‘variant’) or (‘sample’, ‘component’).
- property af_empirical
Empirical allele frequencies. Specific to HaplotypeArray objects.
- Returns:
numpy.ndarray
– Empirical allele frequencies.- Raises:
TypeError – If _col_dim is not ‘variant’.
- property all_components
Returns an array of all the unique component names. Specific to PhenotypeArray objects.
- Returns:
numpy.ndarray
– An array of all the unique component names.- Raises:
TypeError – If the column dimension is not ‘component’.
- property all_phenotypes
Returns an array of all the unique phenotype component names. Specific to PhenotypeArray objects.
- Returns:
numpy.ndarray
– An array of all the unique phenotype component names.- Raises:
TypeError – If the column dimension is not ‘component’.
- property all_relatives
Returns an array of all the unique origin relative values. Specific to PhenotypeArray objects.
- Returns:
numpy.ndarray
– An array of all the unique origin relative values.- Raises:
TypeError – If the column dimension is not ‘component’.
- as_pd(prettify=True)
Returns the data as a Pandas DataFrame. Specific to PhenotypeArray objects.
- Parameters:
prettify (
bool
, optional) – If True, the multi-index columns will be prettified by replacing -1, 0, 1 with ‘proband’, ‘mother’, ‘father’, respectively.- Raises:
TypeError – If the column dimension is not ‘component’.
- Returns:
pd.DataFrame
– A Pandas DataFrame representing the data.
- property column_vars
Get the column coordinate variables for the DataArray object.
- Returns:
XftIndex
– The column coordinate variables of the current column dimension.
- property component_mindex
Get a Pandas MultiIndex object for the component dimension.
- Returns:
pandas.MultiIndex
– MultiIndex object with phenotype_name, component_name, and vorigin_relative as index levels.- Raises:
NotImplementedError – If the column dimension is not ‘component’.
- property data
The data in the DataArray.
- Returns:
numpy.ndarray
– The data in the DataArray.
- property depth
Returns the generational depth from binary relative encoding. Specific to PhenotypeArray objects.
- Raises:
TypeError – If the column dimension is not ‘component’.
- Returns:
Union[float
,np.nan]
– The generational depth from binary relative encoding, or NaN if the relative origin is empty.
- property diploid_chrom
Diploid chromosome numbers. Specific to HaplotypeArray objects.
- Returns:
numpy.ndarray
– Diploid chromosome numbers.- Raises:
TypeError – If _col_dim is not ‘variant’.
- property diploid_vid
Diploid variant ID. Specific to HaplotypeArray objects.
- Returns:
numpy.ndarray
– Diploid variant IDs.- Raises:
TypeError – If _col_dim is not ‘variant’.
- property generation
Generation of the data. Specific to HaplotypeArray objects.
- Returns:
int
– Generation attribute.- Raises:
TypeError – If _col_dim is not ‘variant’.
- get_annotation_dict()
Return a dictionary of all annotation variables associated with the variants in the object. Specific to HaplotypeArray objects.
- Returns:
dict
– A dictionary where the keys are the annotation variable names and the values are the corresponding arrays.- Raises:
TypeError: – If the _col_dim attribute is not equal to ‘variant’.
- get_column_indexer()
Get the column indexer object for the PhenotypeArray object.
- Returns:
xft.index.Indexer
– The indexer object based on the current column dimension.- Raises:
TypeError – If the current column dimension is not recognized.
- get_comp_type(ctype='intermediate')
Returns the index array of components with comp_type==ctype Specific to PhenotypeArray objects.
- Returns:
XftIndex
– The index of components that match the given keyword.- Raises:
TypeError – If the column dimension is not ‘component’.
- get_component_indexer()
Get the component indexer of a PhenotypeArray.
- Returns:
xft.index.ComponentIndex
– A ComponentIndex object.
- get_intermediate_components()
Returns the index array of components with comp_type==’intermediate’ Specific to PhenotypeArray objects.
- Returns:
XftIndex
– The index of components that match the given keyword.- Raises:
TypeError – If the column dimension is not ‘component’.
- get_k_rel(rel)
Returns the number of components with the given relative origin. Specific to PhenotypeArray objects.
- Args:
rel (int): The relative origin of the components.
- Raises:
TypeError: If the column dimension is not ‘component’.
- Returns:
int: The number of components with the given relative origin.
- get_non_annotation_dict()
Return a dictionary of all non-annotation variables associated with the variants in the object. Specific to HaplotypeArray objects.
- Returns:
dict
– A dictionary where the keys are the non-annotation variable names and the values are the corresponding arrays.- Raises:
TypeError: – If the _col_dim attribute is not equal to ‘variant’.
- get_outcome_components()
Returns the index array of components with comp_type==’outcome’ Specific to PhenotypeArray objects.
- Returns:
XftIndex
– The index of components that match the given keyword.- Raises:
TypeError – If the column dimension is not ‘component’.
- get_row_indexer()
Get the row indexer.
- Returns:
xft.index.SampleIndex
– A SampleIndex object.- Raises:
TypeError – If the row dimension is not ‘sample’.
- get_sample_indexer()
Returns an instance of xft.index.SampleIndex representing the sample indexer constructed from the input data.
- Raises:
NotImplementedError – If _row_dim is not ‘sample’.
- Returns:
SampleIndex
– An instance of xft.index.SampleIndex constructed from the sample data in the input object.
- get_variant_indexer()
Get the variant indexer of a HaplotypeArray.
- Returns:
xft.index.HaploidVariantIndex
– A HaploidVariantIndex object.
- grep_component_index(keyword='phenotype')
Returns the index array of components whose names contain the given keyword. Specific to PhenotypeArray objects.
- Parameters:
keyword (
str
, optional) – The keyword to search for in component names, by default ‘phenotype’.- Returns:
XftIndex
– The index of components that match the given keyword.- Raises:
TypeError – If the column dimension is not ‘component’.
- interpolate_cM(gmap, **kwargs)
Interpolate cM values based on genetic map information. Specific to HaplotypeArray objects.
- Parameters:
gmap (
GeneticMap
) – Genetic map data**kwargs – Additional keyword arguments to be passed to scipy.interpolate.interp1d.
- Raises:
TypeError – If the column dimension is not ‘variant’.
ValueError – If not all chromosomes required are present in the genetic map
- property k_components
Returns the number of unique component names. Specific to PhenotypeArray objects.
- Returns:
int
– The number of unique component names.- Raises:
TypeError – If the column dimension is not ‘component’.
- property k_current
Returns the number of all current-gen specific components. Specific to PhenotypeArray objects.
- Raises:
TypeError – If the column dimension is not ‘component’.
- Returns:
int
– The number of all current-gen specific components.
- property k_phenotypes
Returns the number of unique phenotype components. Specific to PhenotypeArray objects.
- Returns:
int
– The number of unique phenotype components.- Raises:
TypeError – If the column dimension is not ‘component’.
- property k_relative
Returns the number of unique origin relative values. Specific to PhenotypeArray objects.
- Returns:
int
– The number of unique origin relative values.- Raises:
TypeError – If the column dimension is not ‘component’.
- property k_total
Returns the total number of components. Specific to PhenotypeArray objects.
- Returns:
int
– The total number of components.- Raises:
TypeError – If the column dimension is not ‘component’.
- property m
Return the number of distinct diploid variants. Specific to HaplotypeArray objects.
- Returns:
int
– The number of distinct diploid variants in the array.- Raises:
TypeError: – If the _col_dim attribute is not equal to ‘variant’.
- property maf_empirical
Empirical minor allele frequencies. Specific to HaplotypeArray objects.
- Returns:
numpy.ndarray
– Empirical minor allele frequencies.- Raises:
TypeError – If _col_dim is not ‘variant’.
- property n
The number of rows in the DataArray.
- Returns:
int
– The number of rows in the DataArray.
- reindex_components(value)
Reindex the components.
- Parameters:
value (
xft.index.ComponentIndex
) – A ComponentIndex object.- Returns:
PhenotypeArray
– A new PhenotypeArray object.
- property row_vars
Get the row coordinate variables for the PhenotypeArray object.
- Returns:
XftIndex
– The row coordinate variables of the row dimension.
- property sample_mindex
Get the sample multi-index for the PhenotypeArray object.
- Returns:
pd.MultiIndex
– A multi-index object containing sample IDs, family IDs, and sex information.- Raises:
NotImplementedError – If the current row dimension is not ‘sample’.
- set_column_indexer(value)
Set the column indexer object for the PhenotypeArray object.
- Parameters:
value (
xft.index.Indexer
) – The new indexer object for the PhenotypeArray object.- Returns:
None
- Raises:
TypeError – If the current column dimension is not recognized.
- set_row_indexer()
- set_sample_indexer(value)
- set_variant_indexer(value)
- property shape
The shape of the DataArray.
- Returns:
tuple
– The shape of the DataArray.
- split_by_component()
Splits the data by component name. Specific to PhenotypeArray objects.
- Raises:
TypeError – If the column dimension is not ‘component’.
- Returns:
Dict[str
,pd.DataFrame]
– A dictionary of dataframes, where the keys are the unique component names and the values are dataframes containing the data for each component.
- split_by_phenotype()
Splits the data by phenotype name. Specific to PhenotypeArray objects.
- Raises:
TypeError – If the column dimension is not ‘component’.
- Returns:
Dict[str
,pd.DataFrame]
– A dictionary of dataframes, where the keys are the unique phenotype names and the values are dataframes containing the data for each phenotype.
- split_by_phenotype_vorigin()
Splits the data by phenotype name and relative origin. Specific to PhenotypeArray objects.
- Raises:
TypeError –
:raises If the column dimension is not
'component'
:
- Returns:
Dict[Tuple[str
,int]
,pd.DataFrame]
– A dictionary of dataframes, where the keys are tuples of phenotype name and relative origin and the values are dataframes containing the data for each combination of phenotype name and relative origin.
- split_by_vorigin()
Splits the data by relative origin. Specific to PhenotypeArray objects.
- Raises:
TypeError – If the column dimension is not ‘component’.
- Returns:
Dict[int
,pd.DataFrame]
– A dictionary of dataframes, where the keys are the unique relative origins and the values are dataframes containing the data for each relative origin.
- standardize()
- to_diploid()
Convert the object to a diploid representation by adding the two haplotypes for each variant. Specific to HaplotypeArray objects.
- Raises:
TypeError: – If the _col_dim attribute is not equal to ‘variant’.
- to_diploid_standardized(af=None, scale=False)
Standardize the HaplotypeArray object and convert it to a diploid representation. Specific to HaplotypeArray objects.
- Parameters:
af (
NDArray
, optional) – An array containing the allele frequencies of each variant. If not provided, empirical afs will with usedscale (
bool
, optional) – Whether or not to scale the standardized array by the square root of the number of variants.
- Returns:
ndarray
– A standardized diploid array where each variant is represented as the sum of two haplotypes.- Raises:
TypeError: – If the _col_dim attribute is not equal to ‘variant’.
- use_empirical_afs()
Sets allele frequencies to the empirical frequencies. Specific to HaplotypeArray objects.
- Raises:
TypeError – If _col_dim is not ‘variant’.