io

Below is an auto-generated summary of the xftsim.io submodule API.

xftsim.io.genotypes_to_pseudo_haplotypes(x)

Converts genotype data in an xarray DataArray to pseudo-haplotype data.

Parameters:

x (xr.DataArray) – An xarray DataArray containing genotype data.

Returns:

xr.DataArray – An xarray DataArray containing pseudo-haplotype data.

xftsim.io.haplotypes_from_sgkit_dataset(gdat, generation=0)

Construct haplotype array from sgkit DataArray. Useful in conjuction with sgkit.io.vcf.vcf_to_zarr() and sgkit.load_dataset()

Parameters:
  • gdat (xr.Dataset) – Dataset generated by sgkit.load_dataset()

  • generation (int) – Used to populate the generation attribute of xftsim.index.SampleIndex

Returns:

xr.DataArray – Haplotype array with samples indexed by an xftsim.index.SampleIndex object and variants indexed by an xftsim.index.HaploidVariantIndex object.

xftsim.io.load_haplotype_zarr(path, compute=True, **kwargs)

Load haplotype data from a Zarr store.

Parameters:
  • path (str) – The path to the Zarr store.

  • compute (bool, optional) – Whether to compute the data immediately, by default True.

  • **kwargs (dict) – Additional keyword arguments to pass to xr.open_dataset().

Returns:

xr.DataArray – The loaded haplotype data as a DataArray.

xftsim.io.plink1_sample_index(ppxr, generation=0)

Create a SampleIndex object from a plink file DataArray generated by pandas_plink.

Parameters:
  • ppxr (xr.DataArray) – An xarray DataArray representing a plink file.

  • generation (int, optional) – The generation of the individuals, by default 0.

Returns:

xft.index.SampleIndex – A SampleIndex object.

xftsim.io.plink1_variant_index(ppxr)

Create a DiploidVariantIndex object from a plink file DataArray generated by pandas_plink.

Parameters:

ppxr (xr.DataArray) – An xarray DataArray representing a plink file.

Returns:

xft.index.DiploidVariantIndex – A DiploidVariantIndex object.

xftsim.io.read_plink1_as_pseudohaplotypes(path, generation=0)

Reads in PLINK 1 binary genotype data and returns a HaplotypeArray object containing pseudo-haplotypes by randomly assigning haplotypes at heterozygous sites.

Parameters:
  • path (str) – The file path to the PLINK 1 binary genotype data.

  • generation (int) – Used to populate the generation attribute of xftsim.index.SampleIndex

Returns:

xr.DataArray – Pseudo-haplotype array with samples indexed by an xftsim.index.SampleIndex object and variants indexed by an xftsim.index.HaploidVariantIndex object. The “pseudo-” prefix refers to the fact that the plink bfile format doesn’t track phase.

Raises:

ValueError – If the specified file path does not exist or is not in the expected format.

xftsim.io.save_haplotype_zarr(haplotypes, path, **kwargs)

Save haplotype data to a Zarr store.

Parameters:
  • haplotypes (xr.DataArray) – The haplotype data to save.

  • path (str) – The path to the Zarr store.

  • **kwargs (dict) – Additional keyword arguments to pass to haplotypes.to_dataset().to_zarr().

Returns:

None

xftsim.io.write_to_plink1(hh, path, verbose=True)

Writes a DataArray to a PLINK 1 binary file. Breaks phasing.

Parameters:
  • hh (xr.DataArray) – A DataArray containing the genotype data to write.

  • path (str) – The path to the output PLINK file. The ‘.bed’ extension will be added automatically.

  • verbose (bool, optional) – Whether to print verbose output during writing, by default True.

Returns:

None