Bivariate Traits with Assortative Mating
This notebook demonstrates how to simulate two genetically correlated traits (e.g., height and BMI) under assortative mating using xftsim.
We will:
Define multivariate genetic effects with specified heritabilities and genetic correlation
Build a bivariate architecture using the formula DSL
Configure linear assortative mating on one trait
Run for 5 generations and track how assortative mating affects variance
Use
SampleStatisticsto monitor covariance structure over generations
1. Imports
[ ]:
import numpy as np
from xftsim.effect import MultivariateEffects
from xftsim.arch import Architecture
from xftsim.mate import RandomMating, LinearAssortativeMating
from xftsim.reproduce import RecombinationMap
from xftsim.sim import Simulation
from xftsim.stats import SampleStatistics
from xftsim.founders import founder_haplotypes_uniform_AFs
2. Create Founder Haplotypes
We use 1000 founders and 200 variants, the same as the univariate example. Both traits share the same set of causal variants.
[ ]:
np.random.seed(42)
n = 1000
m = 200
founder_hap = founder_haplotypes_uniform_AFs(n=n, m=m)
print(f"Founders: {founder_hap.n} individuals, {founder_hap.m} variants")
3. Define Multivariate Genetic Effects
MultivariateEffects.from_h2_rg() generates a (m, k) effect matrix for k traits. The per-SNP covariance is:
Cov_snp = diag(sqrt(h2/m)) @ R @ diag(sqrt(h2/m))
where R is the genetic correlation matrix. This produces effects such that: - E[sum(beta_1^2)] = h2_height = 0.5 - E[sum(beta_2^2)] = h2_bmi = 0.3 - Genetic correlation between traits = 0.3
The rg parameter is a scalar applied to all off-diagonal entries of R.
[ ]:
mv_eff = MultivariateEffects.from_h2_rg(
h2=[0.5, 0.3], # heritabilities for height and BMI
rg=0.3, # genetic correlation
m=m,
seed=123,
)
print(mv_eff)
print(f"Effect matrix shape: {mv_eff.effects.shape} (m={m}, k=2)")
print(f"Sum of squared effects per trait: "
f"{np.sum(mv_eff.effects**2, axis=0)} (targets: [0.5, 0.3])")
4. Define the Bivariate Architecture
The formula DSL supports multi-output nodes via tuple LHS syntax:
(height.G, bmi.G) ~ mvGenetic(eff): joint genetic values for both traitsheight.E ~ noise(0.5): environmental noise for height (Var = 1 - h2_height)bmi.E ~ noise(0.7): environmental noise for BMI (Var = 1 - h2_bmi)Aggregation lines combine genetic + environmental for each trait
Each line is one statement. The parser auto-detects inputs for aggregation expressions and performs topological sorting.
[ ]:
formula = """
(height.G, bmi.G) ~ mvGenetic(eff)
height.E ~ noise(0.5)
bmi.E ~ noise(0.7)
height ~ height.G + height.E
bmi ~ bmi.G + bmi.E
"""
arch = Architecture(formula=formula, effects={'eff': mv_eff})
print(arch)
print(f"\nNodes:")
for node in arch.nodes:
print(f" {node.outputs} <- {node.component}")
5. Configure Assortative Mating
LinearAssortativeMating implements rank-order pairing on a phenotypic composite. The algorithm:
Standardize each component in
component_namesto mean 0, sd 1Compute a mating score:
sqrt(|r|) * composite + sqrt(1-|r|) * noiseSort each sex by mating score, pair in rank order
Here we assort on height with spousal correlation r=0.3. This means tall individuals tend to mate with other tall individuals, which should increase genetic variance for height over generations.
[ ]:
mating_am = LinearAssortativeMating(
component_names=['height'], # assort on height only
r=0.3, # spousal correlation
offspring_per_pair=2,
)
print(mating_am)
recomb = RecombinationMap.constant_map(m=m, p=0.5)
6. Run Simulation with Assortative Mating
We run for 5 generations with SampleStatistics to track the covariance matrix each generation. We also set retain_phenotypes=5 to keep all generations’ phenotypes in memory for analysis.
[ ]:
sim_am = Simulation(
founder_haplotypes=founder_hap,
architecture=arch,
mating_regime=mating_am,
recombination_map=recomb,
retain_haplotypes=2,
retain_phenotypes=5,
statistics=[SampleStatistics()],
seed=42,
)
sim_am.run(5)
print(f"Simulation completed: generation {sim_am.generation}")
print(f"Phenotype keys: {list(sim_am.phenotypes.keys)}")
7. Run a Control Simulation with Random Mating
For comparison, we run the same architecture under random mating. This lets us see the effect of assortative mating on variance and covariance.
We regenerate founders with the same seed so initial conditions match.
[ ]:
np.random.seed(42)
founder_hap_ctrl = founder_haplotypes_uniform_AFs(n=n, m=m)
sim_rm = Simulation(
founder_haplotypes=founder_hap_ctrl,
architecture=arch,
mating_regime=RandomMating(offspring_per_pair=2),
recombination_map=recomb,
retain_haplotypes=2,
retain_phenotypes=5,
statistics=[SampleStatistics()],
seed=42,
)
sim_rm.run(5)
print(f"Control simulation completed: generation {sim_rm.generation}")
8. Track Variance Over Generations
Under assortative mating on height, we expect:
Var(height.G) to increase over generations (positive assortment concentrates alleles that increase height)
Var(bmi.G) to also increase slightly due to genetic correlation with height
Var(height.E) and Var(bmi.E) to stay constant (noise is redrawn each gen)
Under random mating, genetic variances should remain approximately stable
[ ]:
def extract_variances(sim, component_name):
"""Extract variance of a component across generations from statistics."""
gens = []
vars_ = []
for result in sim.results:
stats = result.statistics['SampleStatistics']
keys = stats['keys']
if component_name in keys:
idx = keys.index(component_name)
gens.append(result.generation)
vars_.append(stats['var'][idx])
return gens, vars_
print("Genetic variance of height over generations:")
print(f"{'Gen':<5} {'AM Var(height.G)':<20} {'RM Var(height.G)':<20}")
print("-" * 45)
gens_am, var_hg_am = extract_variances(sim_am, 'height.G')
gens_rm, var_hg_rm = extract_variances(sim_rm, 'height.G')
for g_am, v_am, v_rm in zip(gens_am, var_hg_am, var_hg_rm):
print(f"{g_am:<5} {v_am:<20.4f} {v_rm:<20.4f}")
[ ]:
print("\nGenetic variance of BMI over generations:")
print(f"{'Gen':<5} {'AM Var(bmi.G)':<20} {'RM Var(bmi.G)':<20}")
print("-" * 45)
_, var_bg_am = extract_variances(sim_am, 'bmi.G')
_, var_bg_rm = extract_variances(sim_rm, 'bmi.G')
for g, v_am, v_rm in zip(gens_am, var_bg_am, var_bg_rm):
print(f"{g:<5} {v_am:<20.4f} {v_rm:<20.4f}")
9. Spousal Correlation
One direct consequence of assortative mating is that spouses should have correlated phenotypes for the trait under assortment.
We can check this by looking at mate assignments from the last generation. The _mate_assignments dict stores MateAssignment objects with maternal_idx and paternal_idx arrays.
[ ]:
# Compute spouse correlation from the penultimate generation
# (which determined the last generation's offspring)
last_gen = sim_am.generation
prev_gen = last_gen - 1
if prev_gen in sim_am._mate_assignments and prev_gen in sim_am.phenotype_history:
assignment = sim_am._mate_assignments[prev_gen]
parent_pheno = sim_am.phenotype_history[prev_gen]
mother_height = parent_pheno['height'][assignment.maternal_idx]
father_height = parent_pheno['height'][assignment.paternal_idx]
# Take unique pairs (each pair is repeated offspring_per_pair times)
opp = 2
mother_height_unique = mother_height[::opp]
father_height_unique = father_height[::opp]
spouse_corr = np.corrcoef(mother_height_unique, father_height_unique)[0, 1]
print(f"Spousal correlation for height (gen {prev_gen}): {spouse_corr:.3f}")
print(f"(Target r was 0.3)")
else:
# If mate assignments were pruned, compute from available generations
print("Mate assignments for the penultimate generation were pruned.")
print("Try increasing retain_phenotypes or checking an earlier generation.")
10. Full Covariance Matrix at Final Generation
The SampleStatistics output includes the full covariance matrix across all phenotype components. This shows how genetic and environmental components relate to each other and to the total phenotypes.
[ ]:
final_stats = sim_am.results[-1].statistics['SampleStatistics']
keys = final_stats['keys']
cov = final_stats['cov']
print(f"Final generation covariance matrix (gen {sim_am.results[-1].generation}):")
print(f"Components: {keys}\n")
# Print formatted covariance matrix
header = f"{'':>12s}" + "".join(f"{k:>12s}" for k in keys)
print(header)
for i, ki in enumerate(keys):
row = f"{ki:>12s}" + "".join(f"{cov[i, j]:>12.4f}" for j in range(len(keys)))
print(row)
# Phenotypic correlation between height and BMI
h_idx = keys.index('height')
b_idx = keys.index('bmi')
rp = cov[h_idx, b_idx] / np.sqrt(cov[h_idx, h_idx] * cov[b_idx, b_idx])
print(f"\nPhenotypic correlation(height, bmi) = {rp:.3f}")
Summary
This notebook demonstrated:
MultivariateEffects for genetically correlated traits
Tuple LHS in the formula DSL for multi-output components:
(a, b) ~ mvGenetic(eff)LinearAssortativeMating with rank-order pairing on phenotype
How assortative mating increases genetic variance over generations
Extracting variance and covariance statistics from
sim.results
Key observations: - Assortative mating on height increases Var(height.G) relative to random mating - Due to genetic correlation, Var(bmi.G) may also increase slightly - Spousal correlation tracks the target r parameter (stochastically)