io
Below is an auto-generated summary of the xftsim.io submodule API.
- xftsim.io.genotypes_to_pseudo_haplotypes(x)
Converts genotype data in an xarray DataArray to pseudo-haplotype data.
- Parameters:
x (
xr.DataArray
) – An xarray DataArray containing genotype data.- Returns:
xr.DataArray
– An xarray DataArray containing pseudo-haplotype data.
- xftsim.io.haplotypes_from_sgkit_dataset(gdat, generation=0)
Construct haplotype array from sgkit DataArray. Useful in conjuction with sgkit.io.vcf.vcf_to_zarr() and sgkit.load_dataset()
- Parameters:
gdat (
xr.Dataset
) – Dataset generated by sgkit.load_dataset()generation (
int
) – Used to populate the generation attribute of xftsim.index.SampleIndex
- Returns:
xr.DataArray
– Haplotype array with samples indexed by an xftsim.index.SampleIndex object and variants indexed by an xftsim.index.HaploidVariantIndex object.
- xftsim.io.load_haplotype_zarr(path, compute=True, **kwargs)
Load haplotype data from a Zarr store.
- Parameters:
path (
str
) – The path to the Zarr store.compute (
bool
, optional) – Whether to compute the data immediately, by default True.**kwargs (
dict
) – Additional keyword arguments to pass to xr.open_dataset().
- Returns:
xr.DataArray
– The loaded haplotype data as a DataArray.
- xftsim.io.plink1_sample_index(ppxr, generation=0)
Create a SampleIndex object from a plink file DataArray generated by pandas_plink.
- Parameters:
ppxr (
xr.DataArray
) – An xarray DataArray representing a plink file.generation (
int
, optional) – The generation of the individuals, by default 0.
- Returns:
xft.index.SampleIndex
– A SampleIndex object.
- xftsim.io.plink1_variant_index(ppxr)
Create a DiploidVariantIndex object from a plink file DataArray generated by pandas_plink.
- Parameters:
ppxr (
xr.DataArray
) – An xarray DataArray representing a plink file.- Returns:
xft.index.DiploidVariantIndex
– A DiploidVariantIndex object.
- xftsim.io.read_plink1_as_pseudohaplotypes(path, generation=0)
Reads in PLINK 1 binary genotype data and returns a HaplotypeArray object containing pseudo-haplotypes by randomly assigning haplotypes at heterozygous sites.
- Parameters:
path (
str
) – The file path to the PLINK 1 binary genotype data.generation (
int
) – Used to populate the generation attribute of xftsim.index.SampleIndex
- Returns:
xr.DataArray
– Pseudo-haplotype array with samples indexed by an xftsim.index.SampleIndex object and variants indexed by an xftsim.index.HaploidVariantIndex object. The “pseudo-” prefix refers to the fact that the plink bfile format doesn’t track phase.- Raises:
ValueError – If the specified file path does not exist or is not in the expected format.
- xftsim.io.save_haplotype_zarr(haplotypes, path, **kwargs)
Save haplotype data to a Zarr store.
- Parameters:
haplotypes (
xr.DataArray
) – The haplotype data to save.path (
str
) – The path to the Zarr store.**kwargs (
dict
) – Additional keyword arguments to pass to haplotypes.to_dataset().to_zarr().
- Returns:
None
- xftsim.io.write_to_plink1(hh, path, verbose=True)
Writes a DataArray to a PLINK 1 binary file. Breaks phasing.
- Parameters:
hh (
xr.DataArray
) – A DataArray containing the genotype data to write.path (
str
) – The path to the output PLINK file. The ‘.bed’ extension will be added automatically.verbose (
bool
, optional) – Whether to print verbose output during writing, by default True.
- Returns:
None