utils

Below is an auto-generated summary of the xftsim.utils submodule API.

class xftsim.utils.ConstantCount(count)

Bases: VariableCount

Class representing a constant count of individuals in a population.

draw

a function that generates an array of counts

Type:: Callable

expectation

expected count

Type:: float

nonzero_fraction

the fraction of the population that is nonzero

Type:: float

Parameters:: count (int) – The constant count of individuals in the population.

class xftsim.utils.MixtureCount(componentCounts, mixture_probabilities)

Bases: VariableCount

Class representing a mixture of VariableCounts of individuals in a population.

draw

a function that generates an array of counts

Type:: Callable

expectation

expected count

Type:: float

nonzero_fraction

the fraction of the population that is nonzero

Type:: float

Parameters:

componentCounts (Iterable) – An iterable of VariableCount instances, representing the components of the mixture.
mixture_probabilities (NDArray[Shape[``”*”``], Float64]) – An array of probabilities associated with each component in the mixture.

class xftsim.utils.NegativeBinomialCount(r, p)

Bases: VariableCount

Class representing a negative binomial-distributed count of individuals in a population.

draw

a function that generates an array of counts

Type:: Callable

expectation

expected count

Type:: float

nonzero_fraction

the fraction of the population that is nonzero

Type:: float

Parameters:

r (float) – The number of successes in the negative binomial distribution.
p (float) – The probability of success in the negative binomial distribution.

class xftsim.utils.PoissonCount(rate)

Bases: VariableCount

Class representing a Poisson-distributed count of individuals in a population. .. attribute:: draw

a function that generates an array of counts

type:

Callable

expectation

expected count

Type:: float

nonzero_fraction

the fraction of the population that is nonzero

Type:: float

Parameters:: rate (float) – The Poisson rate parameter.

class xftsim.utils.VariableCount(draw, expectation=None, nonzero_fraction=None)

Bases: object

A class to represent random count variables

…

draw

a function that generates an array of counts

Type:: Callable

expectation

expected count

Type:: float

nonzero_fraction

the fraction of the population that is nonzero

Type:: float

None()

property expectation

Getter function for expectation attribute.

Returns:: float – Expected count.

property nonzero_fraction

Getter function for nonzero_fraction attribute.

Returns:: float – The fraction of the population that is nonzero.

class xftsim.utils.ZeroTruncatedPoissonCount(rate)

Bases: VariableCount

Class representing a zero-truncated Poisson-distributed count of individuals in a population.

draw

a function that generates an array of counts

Type:: Callable

expectation

expected count

Type:: float

nonzero_fraction

the fraction of the population that is nonzero

Type:: float

Parameters:: rate (float) – The Poisson rate parameter prior to zero-truncation.

xftsim.utils.cartesian_product(*args)

Returns a list of columns comprising a cartesian product of input arrays. Emulates R function expand.grid()

Parameters:: *args (NDArray[Any, Any]) – The input arrays.
Returns:: List[NDArray[Any, Any]] – The list of columns.

xftsim.utils.cov2cor(A)

Converts covariance matrix to correlation matrix.

Parameters:

A: Union[np.ndarray, pd.DataFrame, xr.DataArray]: Input covariance matrix.

Returns:

Union[np.ndarray, pd.DataFrame, xr.DataArray]: Correlation matrix.

Raises:

None

xftsim.utils.ensure2D(x)

Ensures the input array is 2D, by adding a new dimension if needed.

Parameters:: x (arraylike) – The input array, by default None.
Returns:: NDArray[Any, Any] – The 2D input array.
Raises:: ValueError – If the input array is not valid.

xftsim.utils.exhaustive_enumerate(a, n_per_a)

Repeat each ith element of array a integer n_per_a[i] times such that each every element appears min(j, n_per_a[i]) times in order before any element appears j+1 times.

Parameters:

aarray-like: 1-D array of any shape and data type.
n_per_aarray-like: 1-D array of int, representing the number of times each element in a needs to be repeated.

Returns:

outarray-like: 1-D array of shape (n,) and the same data type as a, where each element is repeated as per n_per_a in the order before any element appears j+1 times.

Raises:

Warning : If the output array is empty.

Examples:

>>> exhaustive_enumerate(np.array((1, 2, 3, 4)), np.array((3, 2, 1, 0)))
array([1, 2, 3, 1, 2, 1])

xftsim.utils.exhaustive_permutation(a, n_sample)

Returns a random permutation of the input array, such that each element is selected exactly once before any element is selected twice, and so forth

Parameters:

aNDArray[Shape[“*”], Any]: A numpy array to be permuted.
n_sampleint: An integer specifying the size of the permutation to be returned.

Returns:

np.ndarray: A 1D numpy array containing the permuted elements.

xftsim.utils.hierarchical_subsample(a1, a2, n1, n2)

Selects indices of random elements from a2 corresponding to random elements from a1. TODO: remove a2 argument

Parameters: a1 (np.ndarray): A numpy array of elements where each element can be repeated. a2 (np.ndarray): A numpy array of elements of the same length as a1. n1 (int): The number of unique random elements to select from a1. n2 (int): The number of random elements to select from a2 for each selected element in a1.

Returns: np.ndarray: A numpy array of indices from a2 corresponding to the randomly selected elements from a1.

Raises: ValueError: If the lengths of a1 and a2 do not match. ValueError: If n1 is greater than the number of unique elements in a1. ValueError: If there are not enough elements in a2 corresponding to the selected elements in a1.

Example: a1 = np.array([1, 1, 2, 2, 2]) a2 = np.array([1, 2, 1, 2, 3]) n1 = 2 n2 = 1 hierarchical_subsample(a1, a2, n1, n2) # might return [0, 4] or [1, 3] etc.

xftsim.utils.ids_from_generation(generation, indices=None)

Generates and returns a new array of IDs using the given generation number and the given indices. The new array contains the given indices with the generation number prefixed to each index.

Parameters:

generation (int) – The generation number to use in the prefix of the IDs.
indices (NDArray[Shape[``”*”``], Int64], optional) – A numpy array of indices.

Returns:

ndarray – A new numpy array of IDs with the given generation number prefixed to each index.

xftsim.utils.ids_from_generation_range(generation, n=None)

Returns an array of string IDs of length n, created by concatenating the input generation with an increasing sequence of integers from 0 to n-1.

Parameters:

generationint: An integer representing the generation of the IDs to be created.
nNDArray[Shape[“*”], Int64], optional (default=None): An integer specifying the number of IDs to be generated. If None, a range of IDs starting from 0 is created.

Returns:

np.ndarray: A 1D numpy array containing the IDs in string format.

xftsim.utils.ids_from_n_generation(n, generation)

Creates an array of individual IDs based on the specified number of elements and generation.

Parameters:

n (int) – The number of individuals.
generation (int) – The generation number.

Returns:

numpy.ndarray – An array of individual IDs.

xftsim.utils.match(a, b)

Finds the indices in b that match the elements in a, and returns the corresponding index of each element in b.

Parameters:

aList[Hashable]: List of elements to find matches for.
bList[Hashable]: List of elements to find matches in.

Returns:

List[int]: A list of indices in b that match the elements in a.

xftsim.utils.matching_indices_conditional(a, b, condition)

Returns the indices of matches between a and b arrays, given a boolean condition.

Parameters:

a (List[Hashable]) – The first input array.
b (List[Hashable]) – The second input array.
condition (NDArray[Shape[``”*”``], Any]) – The boolean condition array to apply.

Returns:

NDArray[Shape[``”*”``], Int64] – The matching indices array.

xftsim.utils.merge_duplicate_pairs(a, b, n, sort=False)

Merge duplicate pairs of values in a and b based on their corresponding values in n.

Parameters:

aNDArray[Shape[“*”], Any]: First array to merge.
bNDArray[Shape[“*”], Any]: Second array to merge.
nNDArray[Shape[“*”], Any]: Array of corresponding values that determine how the duplicates are merged.
sortbool, optional: Whether to sort the values in a and b before merging the duplicates. Default is False.

Returns:

Tuple[NDArray[Shape[“*”], Any], NDArray[Shape[“*”], Any], NDArray[Shape[“*”], Any]]: The merged arrays, with duplicates removed based on the corresponding values in n.

xftsim.utils.merge_duplicates(it)

Merge duplicates in the input array by checking if any pasted elements are the same.

Parameters:: it (Iterable) – A numpy array with elements to be checked for duplication.
Returns:: list – Returns the input list with duplicates merged if present.

xftsim.utils.paste(it, sep='_')

Concatenates elements in a list-like object with a specified separator.

Parameters:

it (list-like) – The list-like object containing elements to concatenate.
sep (str, optional) – The separator used to concatenate the elements. Defaults to “_”.

Returns:

numpy.ndarray – An array of concatenated string elements.

xftsim.utils.print_tree(x, depth=0)

Print dict of dict(of dict(…)s)s in easy to read tree similar to bash program ‘tree’ Modified from https://stackoverflow.com/questions/47131263/python-3-6-print-dictionary-data-in-readable-tree-structure

Parameters:: x (Any) – Dict of dicts

xftsim.utils.profiled(call, level=1, message=None, sep=' | ')

A decorator that prints the duration of a function call when the specified logging level is met.

Parameters:

call (function) – The function being decorated.
level (int, optional) – The logging level at which the duration of the function call is printed. Defaults to 1.
message (str, optional) – A custom message to display in the log output. If not provided, the name of the decorated function will be used.

Returns:

TYPE – Description

xftsim.utils.sort_and_paste(x)

Sorts the input array in ascending order and concatenates the first element with an underscore separator followed by the second element.

Parameters:

xarray-like: 1-D array of any shape and data type.

Returns:

outarray-like: 1-D array of strings with shape (n,) and the same length as x, where each element is formed by concatenating two sorted string representations of each element in x, separated by an underscore.

Examples:

>>> sort_and_paste(np.array((3, 1, 2)))
array(['1_2', '2_3', '1_3'], dtype='<U3')

xftsim.utils.standardize_array(a)

Standardizes columns of a 2D array.

Parameters:

a: ArrayLike: Input 2D array.

Returns:

np.ndarray: Standardized 2D array.

Raises:

None

xftsim.utils.standardize_array_hw(haplotypes, af)

Wraps _standardize_array_hw to prevent segfaults.

Parameters:

haplotypes: NDArray[Shape[”,”], Int8]: Input array of int8 haploid genotypes.
af: NDArray[Shape[“*”], Float]: Input array of allele frequencies.

Returns:

np.ndarray: Standardized genotypes.

Raises:

None

xftsim.utils.to_proportions(*args)

Converts input values to proportional values.

Parameters:

*args: Union[float, int]: Input values.

Returns:

np.ndarray: Proportional values.

Raises:

None

xftsim.utils.to_simplex(*args)

Converts input values to a simplex vector.

Parameters:

*args: Union[float, int]: Input values.

Returns:

np.ndarray: Simplex vector.

Raises:

ValueError: If all input values are less than or equal to zero.

xftsim.utils.unique_identifier(frame, index_variables, prefix=None)

Returns a unique identifier string generated from index variables of a dataframe.

Parameters:

frame: pd.DataFrame: Input dataframe.
index_variables: List[str]: List of column names to be used as index.
prefix: str: Optional prefix

Returns:

str: Unique identifier string of the form [<prefix>..]<index_var1>.<index_var2>…

Raises:

None