index
Below is an auto-generated summary of the xftsim.index submodule API.
- class xftsim.index.ComponentIndex(phenotype_name=None, component_name=None, vorigin_relative=None, comp_type=None, comp_type_map={'phenotype': 'outcome'}, frame=None, k_total=None)
Bases:
XftIndex
Index object for phenotype components, including origin relative to proband.
- Parameters:
phenotype_name (
iterable
, optional) – Names of phenotypes. Either phenotype_name, frame, or k_total must be provided.component_name (
iterable
, optional) – Names of phenotype components.vorigin_relative (
iterable
, optional) – Relative origin of phenotype component. -1 for proband, 0 for mother, 1 for father.comp_type (
iterable
, optional) – Elements are either ‘intermediate’ or ‘outcome’ to distinguish between phenotype components versus phenotypes themselvesframe (
pandas.DataFrame
, optional) – Pre-existing frame to initialize index.k_total (
int
, optional) – Total number of phenotypes to generate generic names.
- phenotype_name
Names of phenotypes.
- Type:
numpy.ndarray
- component_name
Names of phenotype components.
- Type:
numpy.ndarray
- vorigin_relative
Relative origin of phenotype component. -1 for proband, 0 for mother, 1 for father.
- Type:
numpy.ndarray
- k_total
Total number of phenotypes.
- Type:
int
- k_phenotypes
Number of unique phenotypes.
- Type:
int
- k_components
Number of unique phenotype components.
- Type:
int
- k_relative
Number of unique relative origins.
- Type:
int
- depth
Generational depth from binary relative encoding.
- Type:
float
- unique_identifier
Unique identifier for the index.
- Type:
numpy.ndarray
- to_vorigin(origin)
Returns a new ComponentIndex with all vorigin_relative set to origin.
- to_proband()
Returns a new ComponentIndex with all vorigin_relative set to -1 (proband).
- from_frame(df)
Returns a new ComponentIndex initialized from a Pandas DataFrame.
- from_arrays(phenotype_name, component_name, vorigin_relative=None)
Returns a new ComponentIndex initialized from numpy arrays.
- from_product(phenotype_name, component_name, vorigin_relative=None)
Returns a new ComponentIndex initialized from a Cartesian product of phenotype_name, component_name, and vorigin_relative.
- range_index(c, component_name=['generic'], vorigin_relative=[-1], prefix='phenotype')
Returns a new ComponentIndex with generic phenotype names.
- property comp_type
- property component_name
- property depth
- static from_arrays(phenotype_name, component_name, vorigin_relative=None, comp_type=None)
- static from_frame(df)
- static from_product(phenotype_name, component_name, vorigin_relative=None, comp_type_map={'phenotype': 'outcome'})
- property k_components
- property k_phenotypes
- property k_relative
- property k_total
- property phenotype_name
- static range_index(c, component_name=['generic'], vorigin_relative=[-1], prefix='phenotype')
- to_proband()
- to_vorigin(origin)
- property unique_identifier
- property vorigin_relative
- class xftsim.index.DiploidVariantIndex(vid=None, chrom=None, zero_allele=None, one_allele=None, af=None, annotation_array=None, annotation_names=None, frame=None, m=None, n_chrom=1, h_copy=None, pos_bp=None, pos_cM=None)
Bases:
XftIndex
This class is used to index diploid genetic variants. Variants are defined by a set of unique IDs and may have additional annotations. Each variant is associated with two alleles, represented as strings.
- Parameters:
vid (
NDArray[Shape[``
”*”``]
,Object]
, optional) – Variant IDs, by default None.chrom (
NDArray[Shape[``
”*”``]
,Int]
, optional) – Chromosome of variant, by default None.zero_allele (
NDArray[Shape[``
”*”``]
,Object]
, optional) – First allele of variant, by default None.one_allele (
NDArray[Shape[``
”*”``]
,Object]
, optional) – Second allele of variant, by default None.af (
Iterable
, optional) – Allele frequency of variant, by default None.annotation_array (
Union[NDArray
,pd.DataFrame]
, optional) – Additional variant annotations, by default None.annotation_names (
Iterable
, optional) – Names of the additional variant annotations, by default None.frame (
pd.DataFrame
, optional) – A pandas DataFrame containing variant information, by default None.m (
int
, optional) – The number of variants, by default None.n_chrom (
int
, optional) – The number of chromosomes, by default 1.h_copy (
NDArray[Shape[``
”*”``]
,Object]
, optional) – A string indicating the haplotype of each variant, by default None.pos_bp (
Iterable
, optional) – Base-pair positions of the variant, by default None.pos_cM (
Iterable
, optional) – Centimorgan positions of the variant, by default None.
- vid
Variant IDs.
- Type:
ndarray
- chrom
Chromosome of variant.
- Type:
ndarray
- zero_allele
First allele of variant.
- Type:
ndarray
- one_allele
Second allele of variant.
- Type:
ndarray
- hcopy
A string indicating the copy of each variant.
- Type:
ndarray
- af
Allele frequency of variant.
- Type:
ndarray
- pos_bp
Base-pair positions of the variant.
- Type:
ndarray
- pos_cM
Centimorgan positions of the variant.
- Type:
ndarray
- ploidy
A string indicating the ploidy of the variant (always “Diploid” for this class).
- Type:
str
- annotation
A pandas DataFrame containing additional variant annotations.
- Type:
pd.DataFrame
- annotation_array
A numpy array containing additional variant annotations.
- Type:
Union[ndarray
,None]
- annotation_names
An array containing names of additional variant annotations.
- Type:
ndarray
- m
The number of variants.
- Type:
int
- n_chrom
The number of chromosomes.
- Type:
int
- n_annotations
The number of additional variant annotations.
- Type:
int
- maf
Minor allele frequency of variant.
- Type:
ndarray
- Raises:
AssertionError – If vid, m, or frame is not provided. If both zero_allele and one_allele are not provided.
- property af
- annotate()
- property annotation
- property annotation_array
- property annotation_names
- property chrom
- property hcopy
- property m
- property maf
- property n_annotations
- property n_chrom
- property one_allele
- property ploidy
- property pos_bp
- property pos_cM
- to_haploid()
- property vid
- property zero_allele
- class xftsim.index.HaploidVariantIndex(vid=None, chrom=None, zero_allele=None, one_allele=None, af=None, annotation_array=None, annotation_names=None, frame=None, m=None, n_chrom=1, h_copy=None, pos_bp=None, pos_cM=None)
Bases:
DiploidVariantIndex
A class representing a haploid variant index.
- vid
Variant IDs.
- Type:
numpy.ndarray
- chrom
Chromosome numbers.
- Type:
numpy.ndarray
- zero_allele
Alleles with value zero.
- Type:
numpy.ndarray
- one_allele
Alleles with value one.
- Type:
numpy.ndarray
- af
Allele frequencies.
- Type:
numpy.ndarray
- pos_bp
Positions of variants in base pairs.
- Type:
numpy.ndarray
- pos_cM
Positions of variants in centiMorgans.
- Type:
numpy.ndarray
- m
Number of unique variant IDs.
- Type:
int
- n_chrom
Number of unique chromosome numbers.
- Type:
int
- n_annotations
Number of annotations.
- Type:
int
- maf
Minor allele frequencies.
- Type:
numpy.ndarray
- ploidy
The ploidy of the variant index. In this case, “Haploid”.
- Type:
str
- hcopy
A string indicating the copy of each variant.
- Type:
ndarray
- to_diploid()
Converts the haploid variant index to diploid.
- property ploidy
- to_diploid()
- class xftsim.index.NullFilter
Bases:
SampleFilter
- class xftsim.index.RandomSiblingFilter
Bases:
SampleFilter
Randomly select one sibling per family
- class xftsim.index.RandomSiblingSubsampleFilter(k)
Bases:
SampleFilter
Randomly subsample k families, choosing one offspring per family
- class xftsim.index.RandomSubsampleFilter(k)
Bases:
SampleFilter
Randomly subsample k individuals
- class xftsim.index.SampleFilter(filter_function, filter_name=None, metadata={})
Bases:
object
- filter(sindex, **kwargs)
- class xftsim.index.SampleIndex(iid=None, fid=None, sex=None, frame=None, n=None, generation=0)
Bases:
XftIndex
Index for individual samples.
This class is used to keep track of information for individual samples.
- Parameters:
iid (
Iterable
, optional) – Iterable of individual IDs.fid (
Iterable
, optional) – Iterable of family IDs.sex (
Iterable
, optional) – Iterable of biological sexes.frame (
pd.DataFrame
, optional) – Dataframe containing information for each sample.n (
int
, optional) – Number of samples to generate a random ID set for.generation (
int
, optional) – Generation number for samples.
- n
Number of individuals.
- Type:
int
- n_fam
Number of families.
- Type:
int
- n_female
Number of biological females.
- Type:
int
- n_male
Number of biological males.
- Type:
int
- iid
Array of individual IDs.
- Type:
ndarray
- fid
Array of family IDs.
- Type:
ndarray
- sex
Array of biological sexes.
- Type:
ndarray
- property fid
- property iid
- iloc(key)
- property n
- property n_fam
- property n_female
- property n_male
- property sex
- property unique_identifier
- class xftsim.index.SiblingPairFilter(k=None)
Bases:
SampleFilter
Subsample 2 siblings each from k families with at least two siblings
- class xftsim.index.XftIndex
Bases:
object
XftIndex is a class representing an index for the XftSim simulation model. Super class not for direct use by the user.
Attributes:
- _coord_variables: List[str]
List of names of the coordinate variables.
- _index_variables: List[str]
List of names of the index variables.
- _dimension: str
Name of the dimension variable.
- _frame: pandas.DataFrame
Dataframe representing the index.
Methods:
- validate():
Validates the index by checking if the _coord_variables, _index_variables, and _dimension attributes are not None. Raises an AssertionError if any of these attributes is None.
- frame:
Property representing the _frame attribute. Getter: Returns the _frame attribute. Setter: Sets the _frame attribute and generates a new index using the unique_identifier property.
- frame_copy():
Returns a copy of the _frame attribute.
- unique_identifier:
Property representing the unique identifier of the index. Returns a string representing the concatenation of all index variables, separated by a period.
- coords:
Property representing the coordinates of the index. Returns a dictionary where the keys are the coordinate variables and the values are the corresponding values in the _frame attribute.
- coord_dict:
Property representing the coordinate dictionary of the index. Returns a dictionary where the keys are the variables and the values are tuples representing the (dimension, value) of each coordinate.
- coord_frame:
Property representing the coordinate frame of the index. Returns a dataframe where the columns are the coordinate variables and the rows correspond to each row in the _frame attribute.
- coord_mindex:
Property representing the coordinate multi-index of the index. Returns a multi-index where the levels correspond to the coordinate variables and the values correspond to the corresponding values in the _frame attribute.
- coord_index:
Property representing the coordinate index of the index. Returns an index representing the unique identifier of the index.
- __getitem__(arg):
Returns a new instance of the XftIndex class, corresponding to a subset of the _frame attribute. If arg is a dictionary, returns the rows where the values of the keys in the dictionary match the corresponding values in the _frame attribute. If arg is an integer or slice, returns the row(s) at the corresponding index in the _frame attribute.
- iloc(key):
Returns a new instance of the XftIndex class, corresponding to a subset of the _frame attribute. Returns the row(s) at the corresponding index in the _frame attribute.
- merge(other):
Merges the _frame attribute of the current instance with another instance of the XftIndex class. If the two instances have a different _dimension attribute or a different class type, raises a TypeError. Returns a new instance of the XftIndex class representing the merged index.
- reduce_merge(args):
Static method that reduces the list of args by calling the merge method on each pair of consecutive elements. Returns the final merged index.
- stack(other):
Stacks the _frame attribute of the current instance with another instance of the XftIndex class. If the two instances have a different _dimension attribute or a different class type, raises a TypeError. Returns a new instance of the XftIndex class representing the stacked index.
- at_most(n_new):
Downsamples the _frame attribute at random to contain at most n_new rows. If the number of rows in the _frame attribute is already less than or equal to n_new, returns a copy of the current instance. Returns a new instance of the XftIndex class representing the downsampled index.
- at_most(n_new)
- property coord_dict
- property coord_frame
- property coord_index
- property coord_mindex
- property coords
- property frame
- frame_copy()
- iloc(key)
- merge(other, deduplicate=True)
- static reduce_merge(args, deduplicate=True)
- stack(other)
- property unique_identifier
- validate()
- xftsim.index.sampleIndex_from_VCF()
- xftsim.index.sampleIndex_from_plink()
- xftsim.index.variantIndex_from_VCF()
- xftsim.index.variantIndex_from_plink()