io
Below is an auto-generated summary of the xftsim.io submodule API.
I/O functions for xftsim data structures.
Provides serialization (save/load) for haplotypes, phenotypes, effects, architectures, and full simulation checkpoints. Also provides import functions for PLINK and sgkit datasets, and GRG loading.
Public API
- save_haplotypes_npz / load_haplotypes_npz
Round-trip DenseHaplotypeArray to/from compressed .npz.
- save_phenotypes_npz / load_phenotypes_npz
Round-trip PhenotypeArray to/from compressed .npz.
- save_effects_npz / load_effects_npz
Round-trip any EffectSpec subclass to/from compressed .npz.
- save_architecture / load_architecture
Round-trip Architecture to/from a directory (JSON + .npz).
- save_simulation_checkpoint / load_simulation_checkpoint
Round-trip full simulation state to/from a directory.
- load_grg
Load a GRG file as a GraphHaplotypeOperator.
- read_plink1_as_pseudohaplotypes
Import PLINK 1 binary files as DenseHaplotypeArray.
- haplotypes_from_sgkit_dataset
Import sgkit Dataset as DenseHaplotypeArray.
- xftsim.io.genotypes_to_pseudo_haplotypes(genotypes)[source]
Converts genotype data to pseudo-haplotype 3D array.
- xftsim.io.read_plink1_as_pseudohaplotypes(path, generation=0)[source]
Reads in PLINK 1 binary genotype data and returns a DenseHaplotypeArray object containing pseudo-haplotypes by randomly assigning haplotypes at heterozygous sites.
- Parameters:
- Return type:
- Returns:
xft.struct.DenseHaplotypeArray– Pseudo-haplotype array. The “pseudo-” prefix refers to the fact that the plink bfile format doesn’t track phase.- Raises:
ValueError – If the specified file path does not exist or is not in the expected format.
- Parameters:
- xftsim.io.haplotypes_from_sgkit_dataset(gdat, generation=0)[source]
Construct haplotype array from sgkit DataArray. Useful in conjuction with sgkit.io.vcf.vcf_to_zarr() and sgkit.load_dataset()
- xftsim.io.save_haplotypes_npz(haplotypes, path)[source]
Save DenseHaplotypeArray to compressed numpy format.
- Parameters:
haplotypes (
xft.struct.DenseHaplotypeArray) – The haplotype data to save.path (
str) – The path to save to (will add .npz extension if not present).
- Parameters:
haplotypes (
DenseHaplotypeArray)path (
str)
- Return type:
- xftsim.io.load_haplotypes_npz(path)[source]
Load DenseHaplotypeArray from compressed numpy format.
- xftsim.io.load_grg(path, generation=0, bim_path=None)[source]
Load a GRG file and return a GraphHaplotypeOperator.
- Parameters:
- Return type:
- Returns:
xft.struct.GraphHaplotypeOperator- Parameters:
- xftsim.io.save_phenotypes_npz(phenotypes, path)[source]
Save PhenotypeArray to compressed numpy format.
- Parameters:
phenotypes (
xft.struct.PhenotypeArray) – The phenotype data to save.path (
str) – The path to save to (will add .npz extension if not present).
- Parameters:
phenotypes (
PhenotypeArray)path (
str)
- Return type:
- xftsim.io.load_phenotypes_npz(path)[source]
Load PhenotypeArray from compressed numpy format.
- xftsim.io.save_effects_npz(effects, path)[source]
Save an EffectSpec (any subclass) to compressed numpy format.
- Parameters:
effects (
xft.effect.EffectSpec) – The effect specification to save.path (
str) – The path to save to.
- Parameters:
effects (
EffectSpec)path (
str)
- Return type:
- xftsim.io.load_effects_npz(path)[source]
Load an EffectSpec from compressed numpy format.
- xftsim.io.save_architecture(arch, dir_path)[source]
Save an Architecture to a directory (JSON metadata + effect .npz files).
- Parameters:
arch (
xft.arch.Architecture) – The architecture to save.dir_path (
str) – Directory path (created if it doesn’t exist).
- Parameters:
arch (
Architecture)dir_path (
str)
- Return type:
- xftsim.io.load_architecture(dir_path)[source]
Load an Architecture from a directory.
- xftsim.io.save_simulation_checkpoint(sim, dir_path)[source]
Save a simulation checkpoint to a directory.
- Return type:
- Parameters:
sim (Simulation)
dir_path (str)
What is saved
architecture (DAG of ArchComponent — see
save_architecturefor the list of supported component types)mating regime (RandomMating, LinearAssortativeMating, GeneralAssortativeMating, and BatchedMating wrapping any of the above; other regimes raise at save time)
recombination map
generation counter and retention settings
RNG state (so resumed simulations stay deterministic)
haplotype history (DenseHaplotypeArray as compressed .npz; GraphHaplotypeOperator as a native .grg file plus metadata sidecar)
phenotype history and pedigree history
per-generation Statistic results (
sim.results)
What is NOT saved
sim.statistics(the registered Statistic instances) — these are arbitrary user code and may not be pickleable. The outputs they produced are saved (insim.results) but to keep collecting new results after resume you must re-passstatistics=...toSimulation.from_checkpoint.sim.filtersandsim.callbacks— same reasoning. Re-pass them tofrom_checkpointif you want them active on the resumed run.
Failures are loud: an unsupported mating regime, architecture component, or haplotype type raises before any disk writes occur, so a partial checkpoint directory is never left behind.
- Parameters:
sim (
xft.sim.Simulation) – The simulation to checkpoint.dir_path (
str) – Directory path (created if it doesn’t exist).
- param sim:
- type sim:
- param dir_path:
- type dir_path:
- xftsim.io.load_simulation_checkpoint(dir_path)[source]
Load a simulation checkpoint from a directory.
Returns a dict with all saved state — use this to inspect results or to reconstruct a simulation for continued execution.
- xftsim.io.plink1_variant_index(ppxr)[source]
Create a DiploidVariantIndex object from a plink file DataArray generated by pandas_plink.
- Parameters:
ppxr (
xr.DataArray) – An xarray DataArray representing a plink file.- Return type:
- Returns:
xft.index.DiploidVariantIndex– A DiploidVariantIndex object.- Parameters:
ppxr (
DataArray)
- xftsim.io.plink1_sample_index(ppxr, generation=0)[source]
Create a SampleIndex object from a plink file DataArray generated by pandas_plink.