index

Below is an auto-generated summary of the xftsim.index submodule API.

class xftsim.index.ComponentIndex(phenotype_name=None, component_name=None, vorigin_relative=None, comp_type=None, comp_type_map={'phenotype': 'outcome'}, frame=None, k_total=None)

Bases: XftIndex

Index object for phenotype components, including origin relative to proband.

Parameters:

phenotype_name (iterable, optional) – Names of phenotypes. Either phenotype_name, frame, or k_total must be provided.
component_name (iterable, optional) – Names of phenotype components.
vorigin_relative (iterable, optional) – Relative origin of phenotype component. -1 for proband, 0 for mother, 1 for father.
comp_type (iterable, optional) – Elements are either ‘intermediate’ or ‘outcome’ to distinguish between phenotype components versus phenotypes themselves
frame (pandas.DataFrame, optional) – Pre-existing frame to initialize index.
k_total (int, optional) – Total number of phenotypes to generate generic names.

phenotype_name

Names of phenotypes.

Type:: numpy.ndarray

component_name

Names of phenotype components.

Type:: numpy.ndarray

vorigin_relative

Relative origin of phenotype component. -1 for proband, 0 for mother, 1 for father.

Type:: numpy.ndarray

k_total

Total number of phenotypes.

Type:: int

k_phenotypes

Number of unique phenotypes.

Type:: int

k_components

Number of unique phenotype components.

Type:: int

k_relative

Number of unique relative origins.

Type:: int

depth

Generational depth from binary relative encoding.

Type:: float

unique_identifier

Unique identifier for the index.

Type:: numpy.ndarray

to_vorigin(origin): Returns a new ComponentIndex with all vorigin_relative set to origin.

to_proband(): Returns a new ComponentIndex with all vorigin_relative set to -1 (proband).

from_frame(df): Returns a new ComponentIndex initialized from a Pandas DataFrame.

from_arrays(phenotype_name, component_name, vorigin_relative=None): Returns a new ComponentIndex initialized from numpy arrays.

from_product(phenotype_name, component_name, vorigin_relative=None): Returns a new ComponentIndex initialized from a Cartesian product of phenotype_name, component_name, and vorigin_relative.

range_index(c, component_name=['generic'], vorigin_relative=[-1], prefix='phenotype'): Returns a new ComponentIndex with generic phenotype names.

property comp_type

property component_name

property depth

static from_arrays(phenotype_name, component_name, vorigin_relative=None, comp_type=None)

static from_frame(df)

static from_product(phenotype_name, component_name, vorigin_relative=None, comp_type_map={'phenotype': 'outcome'})

property k_components

property k_phenotypes

property k_relative

property k_total

property phenotype_name

static range_index(c, component_name=['generic'], vorigin_relative=[-1], prefix='phenotype')

to_proband()

to_vorigin(origin)

property unique_identifier

property vorigin_relative

class xftsim.index.DiploidVariantIndex(vid=None, chrom=None, zero_allele=None, one_allele=None, af=None, annotation_array=None, annotation_names=None, frame=None, m=None, n_chrom=1, h_copy=None, pos_bp=None, pos_cM=None)

Bases: XftIndex

This class is used to index diploid genetic variants. Variants are defined by a set of unique IDs and may have additional annotations. Each variant is associated with two alleles, represented as strings.

Parameters:

vid (NDArray[Shape[``”*”``], Object], optional) – Variant IDs, by default None.
chrom (NDArray[Shape[``”*”``], Int], optional) – Chromosome of variant, by default None.
zero_allele (NDArray[Shape[``”*”``], Object], optional) – First allele of variant, by default None.
one_allele (NDArray[Shape[``”*”``], Object], optional) – Second allele of variant, by default None.
af (Iterable, optional) – Allele frequency of variant, by default None.
annotation_array (Union[NDArray, pd.DataFrame], optional) – Additional variant annotations, by default None.
annotation_names (Iterable, optional) – Names of the additional variant annotations, by default None.
frame (pd.DataFrame, optional) – A pandas DataFrame containing variant information, by default None.
m (int, optional) – The number of variants, by default None.
n_chrom (int, optional) – The number of chromosomes, by default 1.
h_copy (NDArray[Shape[``”*”``], Object], optional) – A string indicating the haplotype of each variant, by default None.
pos_bp (Iterable, optional) – Base-pair positions of the variant, by default None.
pos_cM (Iterable, optional) – Centimorgan positions of the variant, by default None.

vid

Variant IDs.

Type:: ndarray

chrom

Chromosome of variant.

Type:: ndarray

zero_allele

First allele of variant.

Type:: ndarray

one_allele

Second allele of variant.

Type:: ndarray

hcopy

A string indicating the copy of each variant.

Type:: ndarray

af

Allele frequency of variant.

Type:: ndarray

pos_bp

Base-pair positions of the variant.

Type:: ndarray

pos_cM

Centimorgan positions of the variant.

Type:: ndarray

ploidy

A string indicating the ploidy of the variant (always “Diploid” for this class).

Type:: str

annotation

A pandas DataFrame containing additional variant annotations.

Type:: pd.DataFrame

annotation_array

A numpy array containing additional variant annotations.

Type:: Union[ndarray, None]

annotation_names

An array containing names of additional variant annotations.

Type:: ndarray

m

The number of variants.

Type:: int

n_chrom

The number of chromosomes.

Type:: int

n_annotations

The number of additional variant annotations.

Type:: int

maf

Minor allele frequency of variant.

Type:: ndarray

Raises:: AssertionError – If vid, m, or frame is not provided. If both zero_allele and one_allele are not provided.

property af

annotate()

property annotation

property annotation_array

property annotation_names

property chrom

property hcopy

property m

property maf

property n_annotations

property n_chrom

property one_allele

property ploidy

property pos_bp

property pos_cM

to_haploid()

property vid

property zero_allele

class xftsim.index.HaploidVariantIndex(vid=None, chrom=None, zero_allele=None, one_allele=None, af=None, annotation_array=None, annotation_names=None, frame=None, m=None, n_chrom=1, h_copy=None, pos_bp=None, pos_cM=None)

Bases: DiploidVariantIndex

A class representing a haploid variant index.

vid

Variant IDs.

Type:: numpy.ndarray

chrom

Chromosome numbers.

Type:: numpy.ndarray

zero_allele

Alleles with value zero.

Type:: numpy.ndarray

one_allele

Alleles with value one.

Type:: numpy.ndarray

af

Allele frequencies.

Type:: numpy.ndarray

pos_bp

Positions of variants in base pairs.

Type:: numpy.ndarray

pos_cM

Positions of variants in centiMorgans.

Type:: numpy.ndarray

m

Number of unique variant IDs.

Type:: int

n_chrom

Number of unique chromosome numbers.

Type:: int

n_annotations

Number of annotations.

Type:: int

maf

Minor allele frequencies.

Type:: numpy.ndarray

ploidy

The ploidy of the variant index. In this case, “Haploid”.

Type:: str

hcopy

A string indicating the copy of each variant.

Type:: ndarray

to_diploid(): Converts the haploid variant index to diploid.

property ploidy

to_diploid()

class xftsim.index.NullFilter: Bases: SampleFilter

class xftsim.index.RandomSiblingFilter

Bases: SampleFilter

Randomly select one sibling per family

class xftsim.index.RandomSiblingSubsampleFilter(k)

Bases: SampleFilter

Randomly subsample k families, choosing one offspring per family

class xftsim.index.RandomSubsampleFilter(k)

Bases: SampleFilter

Randomly subsample k individuals

class xftsim.index.SampleFilter(filter_function, filter_name=None, metadata={})

Bases: object

filter(sindex, **kwargs)

class xftsim.index.SampleIndex(iid=None, fid=None, sex=None, frame=None, n=None, generation=0)

Bases: XftIndex

Index for individual samples.

This class is used to keep track of information for individual samples.

Parameters:

iid (Iterable, optional) – Iterable of individual IDs.
fid (Iterable, optional) – Iterable of family IDs.
sex (Iterable, optional) – Iterable of biological sexes.
frame (pd.DataFrame, optional) – Dataframe containing information for each sample.
n (int, optional) – Number of samples to generate a random ID set for.
generation (int, optional) – Generation number for samples.

n

Number of individuals.

Type:: int

n_fam

Number of families.

Type:: int

n_female

Number of biological females.

Type:: int

n_male

Number of biological males.

Type:: int

iid

Array of individual IDs.

Type:: ndarray

fid

Array of family IDs.

Type:: ndarray

sex

Array of biological sexes.

Type:: ndarray

property fid

property iid

iloc(key)

property n

property n_fam

property n_female

property n_male

property sex

property unique_identifier

class xftsim.index.SiblingPairFilter(k=None)

Bases: SampleFilter

Subsample 2 siblings each from k families with at least two siblings

class xftsim.index.XftIndex

Bases: object

XftIndex is a class representing an index for the XftSim simulation model. Super class not for direct use by the user.

Attributes:

_coord_variables: List[str]: List of names of the coordinate variables.
_index_variables: List[str]: List of names of the index variables.
_dimension: str: Name of the dimension variable.
_frame: pandas.DataFrame: Dataframe representing the index.

Methods:

validate():: Validates the index by checking if the _coord_variables, _index_variables, and _dimension attributes are not None. Raises an AssertionError if any of these attributes is None.
frame:: Property representing the _frame attribute. Getter: Returns the _frame attribute. Setter: Sets the _frame attribute and generates a new index using the unique_identifier property.
frame_copy():: Returns a copy of the _frame attribute.
unique_identifier:: Property representing the unique identifier of the index. Returns a string representing the concatenation of all index variables, separated by a period.
coords:: Property representing the coordinates of the index. Returns a dictionary where the keys are the coordinate variables and the values are the corresponding values in the _frame attribute.
coord_dict:: Property representing the coordinate dictionary of the index. Returns a dictionary where the keys are the variables and the values are tuples representing the (dimension, value) of each coordinate.
coord_frame:: Property representing the coordinate frame of the index. Returns a dataframe where the columns are the coordinate variables and the rows correspond to each row in the _frame attribute.
coord_mindex:: Property representing the coordinate multi-index of the index. Returns a multi-index where the levels correspond to the coordinate variables and the values correspond to the corresponding values in the _frame attribute.
coord_index:: Property representing the coordinate index of the index. Returns an index representing the unique identifier of the index.
__getitem__(arg):: Returns a new instance of the XftIndex class, corresponding to a subset of the _frame attribute. If arg is a dictionary, returns the rows where the values of the keys in the dictionary match the corresponding values in the _frame attribute. If arg is an integer or slice, returns the row(s) at the corresponding index in the _frame attribute.
iloc(key):: Returns a new instance of the XftIndex class, corresponding to a subset of the _frame attribute. Returns the row(s) at the corresponding index in the _frame attribute.
merge(other):: Merges the _frame attribute of the current instance with another instance of the XftIndex class. If the two instances have a different _dimension attribute or a different class type, raises a TypeError. Returns a new instance of the XftIndex class representing the merged index.
reduce_merge(args):: Static method that reduces the list of args by calling the merge method on each pair of consecutive elements. Returns the final merged index.
stack(other):: Stacks the _frame attribute of the current instance with another instance of the XftIndex class. If the two instances have a different _dimension attribute or a different class type, raises a TypeError. Returns a new instance of the XftIndex class representing the stacked index.
at_most(n_new):: Downsamples the _frame attribute at random to contain at most n_new rows. If the number of rows in the _frame attribute is already less than or equal to n_new, returns a copy of the current instance. Returns a new instance of the XftIndex class representing the downsampled index.

at_most(n_new)

property coord_dict

property coord_frame

property coord_index

property coord_mindex

property coords

property frame

frame_copy()

iloc(key)

merge(other, deduplicate=True)

static reduce_merge(args, deduplicate=True)

stack(other)

property unique_identifier

validate()

xftsim.index.sampleIndex_from_VCF()

xftsim.index.sampleIndex_from_plink()

xftsim.index.variantIndex_from_VCF()

xftsim.index.variantIndex_from_plink()