I/O

Serialization and deserialization for all xftsim data structures: haplotypes, phenotypes, effects, architectures, and full simulation checkpoints. Also provides import from PLINK/sgkit and GRG loading.

I/O functions for xftsim data structures.

Provides serialization (save/load) for haplotypes, phenotypes, effects, architectures, and full simulation checkpoints. Also provides import functions for PLINK and sgkit datasets, and GRG loading.

Public API

save_haplotypes_npz / load_haplotypes_npz

Round-trip DenseHaplotypeArray to/from compressed .npz.

save_phenotypes_npz / load_phenotypes_npz

Round-trip PhenotypeArray to/from compressed .npz.

save_effects_npz / load_effects_npz

Round-trip any EffectSpec subclass to/from compressed .npz.

save_architecture / load_architecture

Round-trip Architecture to/from a directory (JSON + .npz).

save_simulation_checkpoint / load_simulation_checkpoint

Round-trip full simulation state to/from a directory.

load_grg

Load a GRG file as a GraphHaplotypeOperator.

read_plink1_as_pseudohaplotypes

Import PLINK 1 binary files as DenseHaplotypeArray.

haplotypes_from_sgkit_dataset

Import sgkit Dataset as DenseHaplotypeArray.

xftsim.io.genotypes_to_pseudo_haplotypes(genotypes)[source]

Converts genotype data to pseudo-haplotype 3D array.

Parameters:

genotypes (np.ndarray) – 2D array of genotype data (n, m) with values 0, 1, 2.

Return type:

ndarray

Returns:

np.ndarray – 3D array of pseudo-haplotype data (n, m, 2).

Parameters:

genotypes (ndarray)

xftsim.io.read_plink1_as_pseudohaplotypes(path, generation=0)[source]

Reads in PLINK 1 binary genotype data and returns a DenseHaplotypeArray object containing pseudo-haplotypes by randomly assigning haplotypes at heterozygous sites.

Parameters:
  • path (str) – The file path to the PLINK 1 binary genotype data.

  • generation (int, optional) – Generation number. Default is 0.

Return type:

DenseHaplotypeArray

Returns:

xft.struct.DenseHaplotypeArray – Pseudo-haplotype array. The “pseudo-” prefix refers to the fact that the plink bfile format doesn’t track phase.

Raises:

ValueError – If the specified file path does not exist or is not in the expected format.

Parameters:
xftsim.io.haplotypes_from_sgkit_dataset(gdat, generation=0)[source]

Construct haplotype array from sgkit DataArray. Useful in conjuction with sgkit.io.vcf.vcf_to_zarr() and sgkit.load_dataset()

Parameters:
  • gdat (xr.Dataset) – Dataset generated by sgkit.load_dataset()

  • generation (int, optional) – Generation number. Default is 0.

Return type:

DenseHaplotypeArray

Returns:

xft.struct.DenseHaplotypeArray – Haplotype array.

Parameters:
  • gdat (Dataset)

  • generation (int)

xftsim.io.save_haplotypes_npz(haplotypes, path)[source]

Save DenseHaplotypeArray to compressed numpy format.

Parameters:
  • haplotypes (xft.struct.DenseHaplotypeArray) – The haplotype data to save.

  • path (str) – The path to save to (will add .npz extension if not present).

Parameters:
Return type:

None

xftsim.io.load_haplotypes_npz(path)[source]

Load DenseHaplotypeArray from compressed numpy format.

Parameters:

path (str) – The path to load from.

Return type:

DenseHaplotypeArray

Returns:

xft.struct.DenseHaplotypeArray – The loaded haplotype array.

Parameters:

path (str)

xftsim.io.load_grg(path, generation=0, bim_path=None)[source]

Load a GRG file and return a GraphHaplotypeOperator.

Parameters:
  • path (str) – Path to the .grg file.

  • generation (int, optional) – Generation number. Default 0.

  • bim_path (str, optional) – Path to a PLINK .bim file for chromosome/allele metadata. If None, variant metadata is extracted from the GRG itself.

Return type:

GraphHaplotypeOperator

Returns:

xft.struct.GraphHaplotypeOperator

Parameters:
xftsim.io.save_phenotypes_npz(phenotypes, path)[source]

Save PhenotypeArray to compressed numpy format.

Parameters:
  • phenotypes (xft.struct.PhenotypeArray) – The phenotype data to save.

  • path (str) – The path to save to (will add .npz extension if not present).

Parameters:
Return type:

None

xftsim.io.load_phenotypes_npz(path)[source]

Load PhenotypeArray from compressed numpy format.

Parameters:

path (str) – The path to load from.

Return type:

PhenotypeArray

Returns:

xft.struct.PhenotypeArray

Parameters:

path (str)

xftsim.io.save_effects_npz(effects, path)[source]

Save an EffectSpec (any subclass) to compressed numpy format.

Parameters:
  • effects (xft.effect.EffectSpec) – The effect specification to save.

  • path (str) – The path to save to.

Parameters:
Return type:

None

xftsim.io.load_effects_npz(path)[source]

Load an EffectSpec from compressed numpy format.

Parameters:

path (str) – The path to load from.

Return type:

EffectSpec

Returns:

xft.effect.EffectSpec – The loaded effect specification (concrete subclass).

Parameters:

path (str)

xftsim.io.save_architecture(arch, dir_path)[source]

Save an Architecture to a directory (JSON metadata + effect .npz files).

Parameters:
  • arch (xft.arch.Architecture) – The architecture to save.

  • dir_path (str) – Directory path (created if it doesn’t exist).

Parameters:
Return type:

None

xftsim.io.load_architecture(dir_path)[source]

Load an Architecture from a directory.

Parameters:

dir_path (str) – Directory containing architecture.json and effect .npz files.

Return type:

Architecture

Returns:

xft.arch.Architecture

Parameters:

dir_path (str)

xftsim.io.save_simulation_checkpoint(sim, dir_path)[source]

Save a simulation checkpoint to a directory.

Return type:

None

Parameters:

What is saved

  • architecture (DAG of ArchComponent — see save_architecture for the list of supported component types)

  • mating regime (RandomMating, LinearAssortativeMating, GeneralAssortativeMating, and BatchedMating wrapping any of the above; other regimes raise at save time)

  • recombination map

  • generation counter and retention settings

  • RNG state (so resumed simulations stay deterministic)

  • haplotype history (DenseHaplotypeArray as compressed .npz; GraphHaplotypeOperator as a native .grg file plus metadata sidecar)

  • phenotype history and pedigree history

  • per-generation Statistic results (sim.results)

What is NOT saved

  • sim.statistics (the registered Statistic instances) — these are arbitrary user code and may not be pickleable. The outputs they produced are saved (in sim.results) but to keep collecting new results after resume you must re-pass statistics=... to Simulation.from_checkpoint.

  • sim.filters and sim.callbacks — same reasoning. Re-pass them to from_checkpoint if you want them active on the resumed run.

Failures are loud: an unsupported mating regime, architecture component, or haplotype type raises before any disk writes occur, so a partial checkpoint directory is never left behind.

Parameters:
  • sim (xft.sim.Simulation) – The simulation to checkpoint.

  • dir_path (str) – Directory path (created if it doesn’t exist).

param sim:

type sim:

Simulation

param dir_path:

type dir_path:

str

xftsim.io.load_simulation_checkpoint(dir_path)[source]

Load a simulation checkpoint from a directory.

Returns a dict with all saved state — use this to inspect results or to reconstruct a simulation for continued execution.

Parameters:

dir_path (str) – Directory containing checkpoint files.

Return type:

dict[str, object]

Returns:

dict – Keys: architecture, generation, retain_haplotypes, retain_phenotypes, rng, haplotype_history, phenotype_history, pedigree_history, recombination_map, mating_regime, results.

Parameters:

dir_path (str)

xftsim.io.plink1_variant_index(ppxr)[source]

Create a DiploidVariantIndex object from a plink file DataArray generated by pandas_plink.

Parameters:

ppxr (xr.DataArray) – An xarray DataArray representing a plink file.

Return type:

DiploidVariantIndex

Returns:

xft.index.DiploidVariantIndex – A DiploidVariantIndex object.

Parameters:

ppxr (DataArray)

xftsim.io.plink1_sample_index(ppxr, generation=0)[source]

Create a SampleIndex object from a plink file DataArray generated by pandas_plink.

Parameters:
  • ppxr (xr.DataArray) – An xarray DataArray representing a plink file.

  • generation (int, optional) – The generation of the individuals, by default 0.

Return type:

SampleIndex

Returns:

xft.index.SampleIndex – A SampleIndex object.

Parameters:
  • ppxr (DataArray)

  • generation (int)