struct
Below is an auto-generated summary of the xftsim.struct submodule API.
- class xftsim.struct.GeneticMap(chrom, pos_bp, pos_cM)[source]
Bases:
objectMap between physical and genetic distances.
- Parameters:
chrom (
Iterable) – Chromsomes variants are located onpos_bp (
Iterable) – Physical positions of variantspos_cM (
Iterable) – Map distances in cM
- Parameters:
- frame
Pandas DataFrame with the above columns
- Type:
pd.DataFrame
- chroms
Unique chromosomes present in map
- Type:
np.ndarray
- classmethod from_pyrho_maps(paths, sep='\\t', **kwargs)[source]
Construct genetic map objects from maps provided at https://github.com/popgenmethods/pyrho Please cite their work if you use their maps.
- class xftsim.struct.SampleMeta(iid, fid=None, sex=None, generation=0, extra=<factory>)[source]
Bases:
objectImmutable metadata for samples/individuals.
- Parameters:
iid (
np.ndarray) – Individual IDs (required).fid (
np.ndarray, optional) – Family IDs. Defaults to iid if not provided.sex (
np.ndarray, optional) – Biological sex (0=female, 1=male). Defaults to alternating 0,1.generation (
int, optional) – Generation number. Default is 0.extra (
dict, optional) – Arbitrary metadata arrays (ancestry PCs, batch IDs, etc.).
- Parameters:
- property unique_identifier: ndarray
Unique identifier for each sample, combining generation, iid, and fid. Format: ‘{generation}.{iid}.{fid}’
- with_generation(generation)[source]
Return a new SampleMeta with a different generation.
- Parameters:
generation (
int)- Return type:
- class xftsim.struct.VariantMeta(vid, chrom=None, pos_bp=None, pos_cM=None, af=None, zero_allele=None, one_allele=None, extra=<factory>)[source]
Bases:
objectImmutable metadata for genetic variants.
- Parameters:
vid (
np.ndarray) – Variant IDs (required).chrom (
np.ndarray, optional) – Chromosome for each variant.pos_bp (
np.ndarray, optional) – Base pair position.pos_cM (
np.ndarray, optional) – Centimorgan position.af (
np.ndarray, optional) – Allele frequencies.zero_allele (
np.ndarray, optional) – Reference allele (e.g., ‘A’).one_allele (
np.ndarray, optional) – Alternate allele (e.g., ‘G’).extra (
dict, optional) – Arbitrary metadata arrays (annotation flags, etc.).
- Parameters:
- Parameters:
- to_variant_index(af=None)[source]
Convert to legacy DiploidVariantIndex for compatibility.
- Parameters:
af (
np.ndarray, optional) – Allele frequencies. If not provided, uses stored af or NaN.- Return type:
- Returns:
xft.index.DiploidVariantIndex– A DiploidVariantIndex with the same data.- Parameters:
af (
ndarray)
- class xftsim.struct.HaplotypeOperator[source]
Bases:
ABCAbstract base class for all genotype representations.
Defines the interface for matrix-vector operations on haplotype data, used by the architecture’s genetic components. All implementations must provide
samples(SampleMeta) andvariants(VariantMeta) attributes.Concrete implementations:
DenseHaplotypeArray– NumPy-backed (n, m, 2) arrayGraphHaplotypeOperator– GRG wrapper (pygrgl graph traversal)
- samples
Sample metadata (set by concrete implementations).
- Type:
- variants
Variant metadata (set by concrete implementations).
- Type:
- abstract standardized_matvec(v, af=None)[source]
Per-SNP standardized diploid matvec.
Computes ((G - 2p) / sqrt(2p(1-p))) @ v where p is the allele frequency vector. Each column is centered AND scaled to unit variance under HWE.
- abstract recompute_af()[source]
Recompute empirical allele frequencies from current data.
- Return type:
- Returns:
np.ndarray– Allele frequencies of shape (m,).
- abstract to_dense()[source]
Materialize as a DenseHaplotypeArray.
- Return type:
- Returns:
DenseHaplotypeArray– Dense representation of the haplotype data.
- abstract meiosis(assignment, recombination_map, rng=None)[source]
Perform meiosis to produce offspring haplotypes.
- Parameters:
assignment (
MateAssignment) – Mate assignment with maternal/paternal indices and offspring metadata.recombination_map (
RecombinationMap) – Recombination probabilities between loci.rng (
numpy.random.RandomState, optional) – Master RNG used to derive per-offspring crossover seeds. Passself.rngfrom the simulation loop to keep crossover sampling tied to the simulation’s seed;Nonefalls back to a freshRandomState(non-deterministic).
- Return type:
- Returns:
HaplotypeOperator– Offspring haplotypes.
- class xftsim.struct.DenseHaplotypeArray(genotypes, generation=0, samples=None, variants=None)[source]
Bases:
HaplotypeOperatorDense numpy-backed haplotype array implementing both old HaplotypeArray interface and new HaplotypeOperator ABC.
Stores haplotypes as a 3D array with shape (n_samples, m_variants, 2) where the last dimension represents the two haplotype copies. Convention: genotypes[:,:,0] = maternal, genotypes[:,:,1] = paternal.
- Parameters:
genotypes (
np.ndarray) – 3D array of shape (n, m, 2) containing haplotype data.generation (
int, optional) – Generation number. Default is 0.samples (
SampleMeta, optional) – Sample metadata (iid, fid, sex).variants (
VariantMeta, optional) – Variant metadata (vid, chrom, pos_bp, pos_cM, af, alleles).
- Parameters:
genotypes (
Int8))generation (
int)samples (
Optional[SampleMeta])variants (
Optional[VariantMeta])
- property diploid_genotypes: ndarray
Return diploid genotype counts (0, 1, or 2) as 2D array (n, m).
- standardized_matvec(v, af=None)[source]
Per-SNP standardized matvec: ((G - 2p) / sqrt(2p(1-p))) @ v.
- standardized_rmatvec(v, af=None)[source]
Per-SNP standardized rmatvec: ((G - 2p) / sqrt(2p(1-p))).T @ v.
- standardized_haploid_matvec(u, haploid)[source]
Standardized matvec for one haplotype (0 or 1): center & scale each variant column, then multiply by u.
- subset(sample_idx=None, variant_idx=None, copy=True)[source]
Return a new DenseHaplotypeArray with a subset of samples and/or variants.
- Parameters:
- Return type:
- Returns:
DenseHaplotypeArray– Subsetted haplotype array.- Parameters:
copy (
bool)
- drop_isel(sample=None, variant=None)[source]
Drop samples or variants by index (xarray-style compatibility).
- Parameters:
sample (
array-like, optional) – Indices of samples to drop.variant (
array-like, optional) – Indices of variants to drop.
- Return type:
- Returns:
DenseHaplotypeArray– Haplotype array with specified samples/variants removed.
- property shape: Tuple[int, int]
Shape as (n_samples, 2*m_variants) for compatibility with 2D expectations.
- to_diploid_standardized(af=None, scale=False)[source]
Return standardized diploid genotypes.
- Parameters:
af (
np.ndarray, optional) – Allele frequencies to use for standardization. If None, uses empirical.scale (
bool, optional) – If True, scale by sqrt(2*p*(1-p)). Default False.
- Return type:
- Returns:
np.ndarray– Standardized diploid genotypes (n, m).- Parameters:
- get_sample_indexer()[source]
Create a SampleIndex from this haplotype array.
Deprecated since version Use: the samples attribute directly instead.
- Return type:
- Returns:
xft.index.SampleIndex– Sample indexer with data from this array.
- get_variant_indexer()[source]
Create a DiploidVariantIndex from this haplotype array.
Deprecated since version Use: the variants attribute directly instead.
- Return type:
- Returns:
xft.index.DiploidVariantIndex– Variant indexer with data from this array.
- meiosis(assignment, recombination_map, rng=None)[source]
Perform meiosis to produce offspring haplotypes.
Delegates to the existing numba-jitted _meiosis_3d kernel in reproduce.py.
- Parameters:
assignment (
MateAssignment) – Mate assignment with maternal/paternal indices and offspring metadata.recombination_map (
RecombinationMap) – Recombination probabilities between loci.rng (
numpy.random.RandomState, optional) – Master RNG; forwarded toreproduce.meiosisfor per-offspring crossover seeding.Nonepreserves the prior non-deterministic behavior.
- Return type:
- Returns:
DenseHaplotypeArray– Offspring haplotypes with inherited VariantMeta.
- property xft: HaplotypeArrayAccessor
Return accessor object for compatibility with xarray .xft interface.
- class xftsim.struct.GraphHaplotypeOperator(grg, generation=0, samples=None, variants=None)[source]
Bases:
HaplotypeOperatorGRG-backed haplotype operator using pygrgl graph traversal.
Provides O(nodes)-per-variant matvec without materializing the full genotype matrix. After meiosis, offspring revert to DenseHaplotypeArray since GRG has no native recombination support.
- Parameters:
grg (
pygrgl.GRG) – Loaded GRG object (viapygrgl.load_immutable_grg).generation (
int) – Generation number (default 0).samples (
SampleMeta, optional) – Sample metadata. If None, extracted from GRG individual IDs.variants (
VariantMeta, optional) – Variant metadata. If None, extracted from GRG mutation data.
- Parameters:
generation (
int)samples (
Optional[SampleMeta])variants (
Optional[VariantMeta])
- standardized_matvec(v, af=None)[source]
Per-SNP standardized diploid matvec (no materialization).
Computes ((G - 2p) / sqrt(2pq)) @ v = (G - 2p) @ (v / sqrt(2pq)).
- recompute_af()[source]
Compute allele frequencies via GRG UP traversal: G.T @ 1 / (2n).
- Return type:
- meiosis(assignment, recombination_map, rng=None)[source]
Perform meiosis natively on the GRG via the bubble-insertion algorithm.
The underlying
pygrgl.MutableGRGis mutated in place: offspring sample nodes and bubble nodes are added, thenset_samplesdemotes the parent generation’s samples to internal nodes. The returned operator wraps the same GRG with the offspringSampleMeta; the original parent operator becomes stale.Crossover positions are sampled per locus from
recombination_map._probabilities(matching the dense_meiosis_3dkernel’s distribution), then translated into bp-space segments usingself.variants.pos_bp.- Parameters:
assignment (
MateAssignment) – Maternal/paternal indices into the parent generation and offspringSampleMeta.recombination_map (
RecombinationMap) – Per-locus recombination probabilities.rng (
numpy.random.RandomState, optional) – Master RNG. Used to derive one independent seed per offspring viaSeedSequence.spawn; each seed is applied vianp.random.seedimmediately before that offspring’s two_meiosis_iphase draws, so crossover sampling is deterministic givenrng’s state. Matches the seed-derivation strategy used by the dense kernel — both paths consume onerng.randintdraw before spawning.
- Return type:
- Returns:
GraphHaplotypeOperator– Offspring operator wrapping the (mutated) same GRG.
- class xftsim.struct.StandardizedHaplotypeOperator(haplotypes, means=None, stds=None)[source]
Bases:
HaplotypeOperatorWraps a HaplotypeOperator so that
matvec/rmatvecact on the column-standardized matrix S = (X - mu) / sigma without materializing S.Identities used (mu, sigma broadcast across rows of X):
S @ u = H.matvec(u / sigma) - <mu, u / sigma>S.T @ v = (H.rmatvec(v) - mu * sum(v)) / sigma
Defaults follow the HWE convention used elsewhere:
mu = 2p,sigma = sqrt(2 p (1 - p))wherepcomes fromH.recompute_af(). Loci withsigma == 0(monomorphic) keepsigma = 1to avoid division by zero, matching the existingstandardized_matvecimplementations.All other
HaplotypeOperatormethods (matvec_maternal,matvec_paternal,recompute_af,to_dense,meiosis,__getitem__) forward to the underlying operator and return raw (un-standardized) results. Re-wrap explicitly if you need standardized semantics on the result.- Parameters:
haplotypes (
HaplotypeOperator) – Underlying operator providing raw genotype matvec.means (
np.ndarray, optional) – Per-variant means (length m). Defaults to2 * af.stds (
np.ndarray, optional) – Per-variant standard deviations (length m). Defaults tosqrt(2 * af * (1 - af)). Zeros are replaced with 1.
- Parameters:
haplotypes (
HaplotypeOperator)
- property samples
- property variants
- standardized_matvec(v, af=None)[source]
Already standardized: equivalent to
matvec.The
afargument is ignored; standardization parameters are fixed at construction time.
- recompute_af()[source]
Recompute empirical allele frequencies from current data.
- Return type:
- Returns:
np.ndarray– Allele frequencies of shape (m,).
- to_dense()[source]
Materialize as a DenseHaplotypeArray.
- Return type:
- Returns:
DenseHaplotypeArray– Dense representation of the haplotype data.
- meiosis(assignment, recombination_map, rng=None)[source]
Perform meiosis to produce offspring haplotypes.
- Parameters:
assignment (
MateAssignment) – Mate assignment with maternal/paternal indices and offspring metadata.recombination_map (
RecombinationMap) – Recombination probabilities between loci.rng (
numpy.random.RandomState, optional) – Master RNG used to derive per-offspring crossover seeds. Passself.rngfrom the simulation loop to keep crossover sampling tied to the simulation’s seed;Nonefalls back to a freshRandomState(non-deterministic).
- Return type:
- Returns:
HaplotypeOperator– Offspring haplotypes.
- class xftsim.struct.HaplotypeArrayAccessor(haplotypes)[source]
Bases:
objectAccessor class that mimics the xarray .xft interface for DenseHaplotypeArray. Provides compatibility with code expecting xarray-style access.
- Parameters:
haplotypes (
DenseHaplotypeArray)
- property samples: SampleMeta
Sample metadata.
- property variants: VariantMeta
Variant metadata.
- class xftsim.struct.PhenotypeArray(samples, values=None)[source]
Bases:
objectThin wrapper around a flat dict of named 1-D arrays.
Each key is a component/phenotype name (e.g. ‘height.G’, ‘height’). The dot is purely a human convention — not parsed.
- Parameters:
samples (
SampleMeta) – Sample metadata that travels with the data.values (
dict, optional) – Initial name → (n,) array mapping.
- Parameters:
samples (
SampleMeta)
- property keys
Return the names of all stored components.
- class xftsim.struct.PedigreeArray(offspring_samples, maternal_idx, paternal_idx, parent_n)[source]
Bases:
objectInteger index arrays linking offspring to parents.
Produced at reproduction time; consumed by parent/mother/father references and by filters (TrioFilter, SibPairFilter).
- Parameters:
offspring_samples (
SampleMeta) – Metadata for the offspring generation.maternal_idx (
np.ndarray) – (n,) indices into the parent generation’s SampleMeta for each offspring’s mother.paternal_idx (
np.ndarray) – (n,) indices into the parent generation’s SampleMeta for each offspring’s father.parent_n (
int) – Number of individuals in the parent generation (for bounds checking).
- Parameters:
offspring_samples (
SampleMeta)maternal_idx (
ndarray)paternal_idx (
ndarray)parent_n (
int)
-
offspring_samples:
SampleMeta