founders

Below is an auto-generated summary of the xftsim.founders submodule API.

xftsim.founders.founder_haplotypes_from_AFs(n, afs, diploid=True, generation=0)[source]

Generate founder haplotypes from specified allele frequencies.

Parameters:
  • n (int) – Number of individuals to simulate.

  • afs (Iterable) – Allele frequencies as an iterable of floats (one per variant).

  • diploid (bool, optional) – Flag indicating if the generated haplotypes should be diploid or haploid.

  • generation (int, optional) – Generation number for the founders. Default is 0.

Return type:

DenseHaplotypeArray

Returns:

xft.struct.DenseHaplotypeArray – An object representing a set of haplotypes generated from the given allele frequencies.

Parameters:
xftsim.founders.founder_haplotypes_uniform_AFs(n, m, minMAF=0.1, generation=0)[source]

Generate founder haplotypes from uniform-distributed allele frequencies.

Parameters:
  • n (int) – Number of individuals to simulate.

  • m (int) – Number of variants.

  • minMAF (float, optional) – Minimum minor allele frequency for generated haplotypes.

  • generation (int, optional) – Generation number for the founders. Default is 0.

Return type:

DenseHaplotypeArray

Returns:

xft.struct.DenseHaplotypeArray – An object representing a set of haplotypes generated with uniform allele frequencies.

Parameters:
xftsim.founders.founder_haplotypes_from_sgkit_dataset(gdat, generation=0)[source]

Construct founder haplotypes array from sgkit DataArray. Useful in conjuction with sgkit.io.vcf.vcf_to_zarr() and sgkit.load_dataset()

Parameters:
  • gdat (xr.Dataset) – Dataset generated by sgkit.load_dataset()

  • generation (int, optional) – Generation number for the founders. Default is 0.

Return type:

DenseHaplotypeArray

Returns:

xft.struct.DenseHaplotypeArray – Array of founder haplotypes.

Parameters:
  • gdat (Dataset)

  • generation (int)

Reads in PLINK 1 binary genotype data and returns a DenseHaplotypeArray object containing pseudo-haplotypes by randomly assigning haplotypes at heterozygous sites.

Parameters:
  • path (str) – The file path to the PLINK 1 binary genotype data.

  • generation (int, optional) – Generation number for the founders. Default is 0.

Return type:

DenseHaplotypeArray

Returns:

xft.struct.DenseHaplotypeArray – Founder Pseudo-haplotype array. The “pseudo-” prefix refers to the fact that the plink bfile format doesn’t track phase.

Parameters:
xftsim.founders.founder_haplotypes_from_msprime_grg(n, sequence_length, Ne=10000, recombination_rate=1e-08, mutation_rate=1e-08, generation=0, *, binary_muts=False, use_node_times=False, no_simplify=False, maintain_topo=False, ts_coals=False)[source]

Generate founder haplotypes using msprime and return them as a GraphHaplotypeOperator.

This function simulates ancestry and mutations for a population of size n over a sequence of length sequence_length, then converts the resulting TreeSequence into a Genotype Representation Graph (GRG) via the grgl CLI.

Parameters:
  • n (int) – Number of diploid individuals to simulate.

  • sequence_length (int) – The length of the genomic region to simulate (in base pairs).

  • Ne (float, optional) – Effective population size. Default is 10000.

  • recombination_rate (float, optional) – Recombination rate per base pair per generation. Default is 1e-8.

  • mutation_rate (float, optional) – Mutation rate per base pair per generation. Default is 1e-8.

  • generation (int, optional) – Generation number for the founders. Default is 0.

  • binary_muts (bool, optional) – Flag to pass –binary-muts to grgl.

  • use_node_times (bool, optional) – Flag to pass –ts-node-times to grgl.

  • no_simplify (bool, optional) – Flag to pass –no-simplify to grgl.

  • maintain_topo (bool, optional) – Flag to pass –maintain-topo to grgl.

  • ts_coals (bool, optional) – Flag to pass –ts-coals to grgl to calculate diploid coalescence information.

Return type:

GraphHaplotypeOperator

Returns:

xft.struct.GraphHaplotypeOperator – The operator containing the simulated founder graph and metadata.

Parameters:
xftsim.founders.founder_haplotypes_from_stdpopsim_grg(samples, model_id='OutOfAfrica_3G09', chromosome='chr22', species_id='HomSap', genetic_map=None, left=None, right=None, mutation_rate=None, engine_name='msprime', generation=0, *, binary_muts=False, use_node_times=False, no_simplify=False, maintain_topo=False, ts_coals=False)[source]

Generate founder haplotypes from a stdpopsim demographic model and return them as a GraphHaplotypeOperator.

Simulates a TreeSequence using a published stdpopsim demographic model (e.g. HomSap / OutOfAfrica_3G09) and converts the result to a Genotype Representation Graph (GRG) via the grgl CLI.

Parameters:
  • samples (Dict[str, int]) – Mapping of stdpopsim population name to number of diploid individuals to draw from that population (e.g. {"YRI": 100, "CEU": 100, "CHB": 100}). The available population names depend on the chosen demographic model.

  • model_id (str, optional) – Identifier of the stdpopsim demographic model. Default is "OutOfAfrica_3G09".

  • chromosome (str, optional) – Chromosome identifier passed to species.get_contig. Default is "chr22".

  • species_id (str, optional) – Species identifier used by stdpopsim. Default is "HomSap" (Homo sapiens).

  • genetic_map (str or None, optional) – Optional stdpopsim genetic map identifier (e.g. "HapMapII_GRCh38"). If None, the contig uses a uniform recombination rate.

  • left (int or None, optional) – Left coordinate (in base pairs, 0-based inclusive) of a sub-region of the chromosome to simulate. If None, simulation starts at position 0. Use together with right to shorten simulations for faster tests.

  • right (int or None, optional) – Right coordinate (in base pairs, exclusive) of a sub-region of the chromosome to simulate. If None, simulation runs to the end of the contig.

  • mutation_rate (float or None, optional) – Override for the contig’s mutation rate. If None, defaults to the demographic model’s calibrated mutation rate when one is published (model.mutation_rate); otherwise stdpopsim’s species/contig default is used.

  • engine_name (str, optional) – stdpopsim simulation engine to use. Default "msprime".

  • generation (int, optional) – Generation number for the founders. Default is 0.

  • binary_muts (bool, optional) – Flag to pass –binary-muts to grgl.

  • use_node_times (bool, optional) – Flag to pass –ts-node-times to grgl.

  • no_simplify (bool, optional) – Flag to pass –no-simplify to grgl.

  • maintain_topo (bool, optional) – Flag to pass –maintain-topo to grgl.

  • ts_coals (bool, optional) – Flag to pass –ts-coals to grgl to calculate diploid coalescence information.

Return type:

GraphHaplotypeOperator

Returns:

xft.struct.GraphHaplotypeOperator – The operator containing the simulated founder graph and metadata.

Parameters:
  • samples (Dict[str, int])

  • model_id (str)

  • chromosome (str)

  • species_id (str)

  • genetic_map (str | None)

  • left (int | None)

  • right (int | None)

  • mutation_rate (float | None)

  • engine_name (str)

  • generation (int)

  • binary_muts (bool)

  • use_node_times (bool)

  • no_simplify (bool)

  • maintain_topo (bool)

  • ts_coals (bool)

Notes

Sample IIDs are prefixed with the stdpopsim population name (e.g. "YRI_0", "YRI_1", …). The full per-individual population label is also stored on samples.extra["population"] so it can be used as a grouping variable by xftsim.arch.GroupingComponent.

Variant positions and alleles are read directly from the GRG; pos_cM is computed by integrating the contig’s recombination map. vid is formatted as "{chromosome}:{pos_bp}:{ref}:{alt}".

Parameters:
Return type:

GraphHaplotypeOperator