index

Below is an auto-generated summary of the xftsim.index submodule API.

class xftsim.index.ComponentIndex(phenotype_name=None, component_name=None, vorigin_relative=None, comp_type=None, comp_type_map={'phenotype': 'outcome'}, frame=None, k_total=None)

Bases: XftIndex

Index object for phenotype components, including origin relative to proband.

Parameters:
  • phenotype_name (iterable, optional) – Names of phenotypes. Either phenotype_name, frame, or k_total must be provided.

  • component_name (iterable, optional) – Names of phenotype components.

  • vorigin_relative (iterable, optional) – Relative origin of phenotype component. -1 for proband, 0 for mother, 1 for father.

  • comp_type (iterable, optional) – Elements are either ‘intermediate’ or ‘outcome’ to distinguish between phenotype components versus phenotypes themselves

  • frame (pandas.DataFrame, optional) – Pre-existing frame to initialize index.

  • k_total (int, optional) – Total number of phenotypes to generate generic names.

phenotype_name

Names of phenotypes.

Type:

numpy.ndarray

component_name

Names of phenotype components.

Type:

numpy.ndarray

vorigin_relative

Relative origin of phenotype component. -1 for proband, 0 for mother, 1 for father.

Type:

numpy.ndarray

k_total

Total number of phenotypes.

Type:

int

k_phenotypes

Number of unique phenotypes.

Type:

int

k_components

Number of unique phenotype components.

Type:

int

k_relative

Number of unique relative origins.

Type:

int

depth

Generational depth from binary relative encoding.

Type:

float

unique_identifier

Unique identifier for the index.

Type:

numpy.ndarray

to_vorigin(origin)

Returns a new ComponentIndex with all vorigin_relative set to origin.

to_proband()

Returns a new ComponentIndex with all vorigin_relative set to -1 (proband).

from_frame(df)

Returns a new ComponentIndex initialized from a Pandas DataFrame.

from_arrays(phenotype_name, component_name, vorigin_relative=None)

Returns a new ComponentIndex initialized from numpy arrays.

from_product(phenotype_name, component_name, vorigin_relative=None)

Returns a new ComponentIndex initialized from a Cartesian product of phenotype_name, component_name, and vorigin_relative.

range_index(c, component_name=['generic'], vorigin_relative=[-1], prefix='phenotype')

Returns a new ComponentIndex with generic phenotype names.

property comp_type
property component_name
property depth
static from_arrays(phenotype_name, component_name, vorigin_relative=None, comp_type=None)
static from_frame(df)
static from_product(phenotype_name, component_name, vorigin_relative=None, comp_type_map={'phenotype': 'outcome'})
property k_components
property k_phenotypes
property k_relative
property k_total
property phenotype_name
static range_index(c, component_name=['generic'], vorigin_relative=[-1], prefix='phenotype')
to_proband()
to_vorigin(origin)
property unique_identifier
property vorigin_relative
class xftsim.index.DiploidVariantIndex(vid=None, chrom=None, zero_allele=None, one_allele=None, af=None, annotation_array=None, annotation_names=None, frame=None, m=None, n_chrom=1, h_copy=None, pos_bp=None, pos_cM=None)

Bases: XftIndex

This class is used to index diploid genetic variants. Variants are defined by a set of unique IDs and may have additional annotations. Each variant is associated with two alleles, represented as strings.

Parameters:
  • vid (NDArray[Shape[``”*”``], Object], optional) – Variant IDs, by default None.

  • chrom (NDArray[Shape[``”*”``], Int], optional) – Chromosome of variant, by default None.

  • zero_allele (NDArray[Shape[``”*”``], Object], optional) – First allele of variant, by default None.

  • one_allele (NDArray[Shape[``”*”``], Object], optional) – Second allele of variant, by default None.

  • af (Iterable, optional) – Allele frequency of variant, by default None.

  • annotation_array (Union[NDArray, pd.DataFrame], optional) – Additional variant annotations, by default None.

  • annotation_names (Iterable, optional) – Names of the additional variant annotations, by default None.

  • frame (pd.DataFrame, optional) – A pandas DataFrame containing variant information, by default None.

  • m (int, optional) – The number of variants, by default None.

  • n_chrom (int, optional) – The number of chromosomes, by default 1.

  • h_copy (NDArray[Shape[``”*”``], Object], optional) – A string indicating the haplotype of each variant, by default None.

  • pos_bp (Iterable, optional) – Base-pair positions of the variant, by default None.

  • pos_cM (Iterable, optional) – Centimorgan positions of the variant, by default None.

vid

Variant IDs.

Type:

ndarray

chrom

Chromosome of variant.

Type:

ndarray

zero_allele

First allele of variant.

Type:

ndarray

one_allele

Second allele of variant.

Type:

ndarray

hcopy

A string indicating the copy of each variant.

Type:

ndarray

af

Allele frequency of variant.

Type:

ndarray

pos_bp

Base-pair positions of the variant.

Type:

ndarray

pos_cM

Centimorgan positions of the variant.

Type:

ndarray

ploidy

A string indicating the ploidy of the variant (always “Diploid” for this class).

Type:

str

annotation

A pandas DataFrame containing additional variant annotations.

Type:

pd.DataFrame

annotation_array

A numpy array containing additional variant annotations.

Type:

Union[ndarray, None]

annotation_names

An array containing names of additional variant annotations.

Type:

ndarray

m

The number of variants.

Type:

int

n_chrom

The number of chromosomes.

Type:

int

n_annotations

The number of additional variant annotations.

Type:

int

maf

Minor allele frequency of variant.

Type:

ndarray

Raises:

AssertionError – If vid, m, or frame is not provided. If both zero_allele and one_allele are not provided.

property af
annotate()
property annotation
property annotation_array
property annotation_names
property chrom
property hcopy
property m
property maf
property n_annotations
property n_chrom
property one_allele
property ploidy
property pos_bp
property pos_cM
to_haploid()
property vid
property zero_allele
class xftsim.index.HaploidVariantIndex(vid=None, chrom=None, zero_allele=None, one_allele=None, af=None, annotation_array=None, annotation_names=None, frame=None, m=None, n_chrom=1, h_copy=None, pos_bp=None, pos_cM=None)

Bases: DiploidVariantIndex

A class representing a haploid variant index.

vid

Variant IDs.

Type:

numpy.ndarray

chrom

Chromosome numbers.

Type:

numpy.ndarray

zero_allele

Alleles with value zero.

Type:

numpy.ndarray

one_allele

Alleles with value one.

Type:

numpy.ndarray

af

Allele frequencies.

Type:

numpy.ndarray

pos_bp

Positions of variants in base pairs.

Type:

numpy.ndarray

pos_cM

Positions of variants in centiMorgans.

Type:

numpy.ndarray

m

Number of unique variant IDs.

Type:

int

n_chrom

Number of unique chromosome numbers.

Type:

int

n_annotations

Number of annotations.

Type:

int

maf

Minor allele frequencies.

Type:

numpy.ndarray

ploidy

The ploidy of the variant index. In this case, “Haploid”.

Type:

str

hcopy

A string indicating the copy of each variant.

Type:

ndarray

to_diploid()

Converts the haploid variant index to diploid.

property ploidy
to_diploid()
class xftsim.index.NullFilter

Bases: SampleFilter

class xftsim.index.RandomSiblingFilter

Bases: SampleFilter

Randomly select one sibling per family

class xftsim.index.RandomSiblingSubsampleFilter(k)

Bases: SampleFilter

Randomly subsample k families, choosing one offspring per family

class xftsim.index.RandomSubsampleFilter(k)

Bases: SampleFilter

Randomly subsample k individuals

class xftsim.index.SampleFilter(filter_function, filter_name=None, metadata={})

Bases: object

filter(sindex, **kwargs)
class xftsim.index.SampleIndex(iid=None, fid=None, sex=None, frame=None, n=None, generation=0)

Bases: XftIndex

Index for individual samples.

This class is used to keep track of information for individual samples.

Parameters:
  • iid (Iterable, optional) – Iterable of individual IDs.

  • fid (Iterable, optional) – Iterable of family IDs.

  • sex (Iterable, optional) – Iterable of biological sexes.

  • frame (pd.DataFrame, optional) – Dataframe containing information for each sample.

  • n (int, optional) – Number of samples to generate a random ID set for.

  • generation (int, optional) – Generation number for samples.

n

Number of individuals.

Type:

int

n_fam

Number of families.

Type:

int

n_female

Number of biological females.

Type:

int

n_male

Number of biological males.

Type:

int

iid

Array of individual IDs.

Type:

ndarray

fid

Array of family IDs.

Type:

ndarray

sex

Array of biological sexes.

Type:

ndarray

property fid
property iid
iloc(key)
property n
property n_fam
property n_female
property n_male
property sex
property unique_identifier
class xftsim.index.SiblingPairFilter(k=None)

Bases: SampleFilter

Subsample 2 siblings each from k families with at least two siblings

class xftsim.index.XftIndex

Bases: object

XftIndex is a class representing an index for the XftSim simulation model. Super class not for direct use by the user.

Attributes:

_coord_variables: List[str]

List of names of the coordinate variables.

_index_variables: List[str]

List of names of the index variables.

_dimension: str

Name of the dimension variable.

_frame: pandas.DataFrame

Dataframe representing the index.

Methods:

validate():

Validates the index by checking if the _coord_variables, _index_variables, and _dimension attributes are not None. Raises an AssertionError if any of these attributes is None.

frame:

Property representing the _frame attribute. Getter: Returns the _frame attribute. Setter: Sets the _frame attribute and generates a new index using the unique_identifier property.

frame_copy():

Returns a copy of the _frame attribute.

unique_identifier:

Property representing the unique identifier of the index. Returns a string representing the concatenation of all index variables, separated by a period.

coords:

Property representing the coordinates of the index. Returns a dictionary where the keys are the coordinate variables and the values are the corresponding values in the _frame attribute.

coord_dict:

Property representing the coordinate dictionary of the index. Returns a dictionary where the keys are the variables and the values are tuples representing the (dimension, value) of each coordinate.

coord_frame:

Property representing the coordinate frame of the index. Returns a dataframe where the columns are the coordinate variables and the rows correspond to each row in the _frame attribute.

coord_mindex:

Property representing the coordinate multi-index of the index. Returns a multi-index where the levels correspond to the coordinate variables and the values correspond to the corresponding values in the _frame attribute.

coord_index:

Property representing the coordinate index of the index. Returns an index representing the unique identifier of the index.

__getitem__(arg):

Returns a new instance of the XftIndex class, corresponding to a subset of the _frame attribute. If arg is a dictionary, returns the rows where the values of the keys in the dictionary match the corresponding values in the _frame attribute. If arg is an integer or slice, returns the row(s) at the corresponding index in the _frame attribute.

iloc(key):

Returns a new instance of the XftIndex class, corresponding to a subset of the _frame attribute. Returns the row(s) at the corresponding index in the _frame attribute.

merge(other):

Merges the _frame attribute of the current instance with another instance of the XftIndex class. If the two instances have a different _dimension attribute or a different class type, raises a TypeError. Returns a new instance of the XftIndex class representing the merged index.

reduce_merge(args):

Static method that reduces the list of args by calling the merge method on each pair of consecutive elements. Returns the final merged index.

stack(other):

Stacks the _frame attribute of the current instance with another instance of the XftIndex class. If the two instances have a different _dimension attribute or a different class type, raises a TypeError. Returns a new instance of the XftIndex class representing the stacked index.

at_most(n_new):

Downsamples the _frame attribute at random to contain at most n_new rows. If the number of rows in the _frame attribute is already less than or equal to n_new, returns a copy of the current instance. Returns a new instance of the XftIndex class representing the downsampled index.

at_most(n_new)
property coord_dict
property coord_frame
property coord_index
property coord_mindex
property coords
property frame
frame_copy()
iloc(key)
merge(other, deduplicate=True)
static reduce_merge(args, deduplicate=True)
stack(other)
property unique_identifier
validate()
xftsim.index.sampleIndex_from_VCF()
xftsim.index.variantIndex_from_VCF()