Bivariate Traits with Assortative Mating

This notebook demonstrates how to simulate two genetically correlated traits (e.g., height and BMI) under assortative mating using xftsim.

We will:

  1. Define multivariate genetic effects with specified heritabilities and genetic correlation

  2. Build a bivariate architecture using the formula DSL

  3. Configure linear assortative mating on one trait

  4. Run for 5 generations and track how assortative mating affects variance

  5. Use SampleStatistics to monitor covariance structure over generations

1. Imports

[ ]:
import numpy as np

from xftsim.effect import MultivariateEffects
from xftsim.arch import Architecture
from xftsim.mate import RandomMating, LinearAssortativeMating
from xftsim.reproduce import RecombinationMap
from xftsim.sim import Simulation
from xftsim.stats import SampleStatistics
from xftsim.founders import founder_haplotypes_uniform_AFs

2. Create Founder Haplotypes

We use 1000 founders and 200 variants, the same as the univariate example. Both traits share the same set of causal variants.

[ ]:
np.random.seed(42)

n = 1000
m = 200

founder_hap = founder_haplotypes_uniform_AFs(n=n, m=m)
print(f"Founders: {founder_hap.n} individuals, {founder_hap.m} variants")

3. Define Multivariate Genetic Effects

MultivariateEffects.from_h2_rg() generates a (m, k) effect matrix for k traits. The per-SNP covariance is:

Cov_snp = diag(sqrt(h2/m)) @ R @ diag(sqrt(h2/m))

where R is the genetic correlation matrix. This produces effects such that: - E[sum(beta_1^2)] = h2_height = 0.5 - E[sum(beta_2^2)] = h2_bmi = 0.3 - Genetic correlation between traits = 0.3

The rg parameter is a scalar applied to all off-diagonal entries of R.

[ ]:
mv_eff = MultivariateEffects.from_h2_rg(
    h2=[0.5, 0.3],   # heritabilities for height and BMI
    rg=0.3,           # genetic correlation
    m=m,
    seed=123,
)
print(mv_eff)
print(f"Effect matrix shape: {mv_eff.effects.shape}  (m={m}, k=2)")
print(f"Sum of squared effects per trait: "
      f"{np.sum(mv_eff.effects**2, axis=0)}  (targets: [0.5, 0.3])")

4. Define the Bivariate Architecture

The formula DSL supports multi-output nodes via tuple LHS syntax:

  • (height.G, bmi.G) ~ mvGenetic(eff) : joint genetic values for both traits

  • height.E ~ noise(0.5) : environmental noise for height (Var = 1 - h2_height)

  • bmi.E ~ noise(0.7) : environmental noise for BMI (Var = 1 - h2_bmi)

  • Aggregation lines combine genetic + environmental for each trait

Each line is one statement. The parser auto-detects inputs for aggregation expressions and performs topological sorting.

[ ]:
formula = """
(height.G, bmi.G) ~ mvGenetic(eff)
height.E ~ noise(0.5)
bmi.E ~ noise(0.7)
height ~ height.G + height.E
bmi ~ bmi.G + bmi.E
"""

arch = Architecture(formula=formula, effects={'eff': mv_eff})
print(arch)
print(f"\nNodes:")
for node in arch.nodes:
    print(f"  {node.outputs} <- {node.component}")

5. Configure Assortative Mating

LinearAssortativeMating implements rank-order pairing on a phenotypic composite. The algorithm:

  1. Standardize each component in component_names to mean 0, sd 1

  2. Compute a mating score: sqrt(|r|) * composite + sqrt(1-|r|) * noise

  3. Sort each sex by mating score, pair in rank order

Here we assort on height with spousal correlation r=0.3. This means tall individuals tend to mate with other tall individuals, which should increase genetic variance for height over generations.

[ ]:
mating_am = LinearAssortativeMating(
    component_names=['height'],  # assort on height only
    r=0.3,                       # spousal correlation
    offspring_per_pair=2,
)
print(mating_am)

recomb = RecombinationMap.constant_map(m=m, p=0.5)

6. Run Simulation with Assortative Mating

We run for 5 generations with SampleStatistics to track the covariance matrix each generation. We also set retain_phenotypes=5 to keep all generations’ phenotypes in memory for analysis.

[ ]:
sim_am = Simulation(
    founder_haplotypes=founder_hap,
    architecture=arch,
    mating_regime=mating_am,
    recombination_map=recomb,
    retain_haplotypes=2,
    retain_phenotypes=5,
    statistics=[SampleStatistics()],
    seed=42,
)

sim_am.run(5)
print(f"Simulation completed: generation {sim_am.generation}")
print(f"Phenotype keys: {list(sim_am.phenotypes.keys)}")

7. Run a Control Simulation with Random Mating

For comparison, we run the same architecture under random mating. This lets us see the effect of assortative mating on variance and covariance.

We regenerate founders with the same seed so initial conditions match.

[ ]:
np.random.seed(42)
founder_hap_ctrl = founder_haplotypes_uniform_AFs(n=n, m=m)

sim_rm = Simulation(
    founder_haplotypes=founder_hap_ctrl,
    architecture=arch,
    mating_regime=RandomMating(offspring_per_pair=2),
    recombination_map=recomb,
    retain_haplotypes=2,
    retain_phenotypes=5,
    statistics=[SampleStatistics()],
    seed=42,
)

sim_rm.run(5)
print(f"Control simulation completed: generation {sim_rm.generation}")

8. Track Variance Over Generations

Under assortative mating on height, we expect:

  • Var(height.G) to increase over generations (positive assortment concentrates alleles that increase height)

  • Var(bmi.G) to also increase slightly due to genetic correlation with height

  • Var(height.E) and Var(bmi.E) to stay constant (noise is redrawn each gen)

  • Under random mating, genetic variances should remain approximately stable

[ ]:
def extract_variances(sim, component_name):
    """Extract variance of a component across generations from statistics."""
    gens = []
    vars_ = []
    for result in sim.results:
        stats = result.statistics['SampleStatistics']
        keys = stats['keys']
        if component_name in keys:
            idx = keys.index(component_name)
            gens.append(result.generation)
            vars_.append(stats['var'][idx])
    return gens, vars_

print("Genetic variance of height over generations:")
print(f"{'Gen':<5} {'AM Var(height.G)':<20} {'RM Var(height.G)':<20}")
print("-" * 45)

gens_am, var_hg_am = extract_variances(sim_am, 'height.G')
gens_rm, var_hg_rm = extract_variances(sim_rm, 'height.G')

for g_am, v_am, v_rm in zip(gens_am, var_hg_am, var_hg_rm):
    print(f"{g_am:<5} {v_am:<20.4f} {v_rm:<20.4f}")
[ ]:
print("\nGenetic variance of BMI over generations:")
print(f"{'Gen':<5} {'AM Var(bmi.G)':<20} {'RM Var(bmi.G)':<20}")
print("-" * 45)

_, var_bg_am = extract_variances(sim_am, 'bmi.G')
_, var_bg_rm = extract_variances(sim_rm, 'bmi.G')

for g, v_am, v_rm in zip(gens_am, var_bg_am, var_bg_rm):
    print(f"{g:<5} {v_am:<20.4f} {v_rm:<20.4f}")

9. Spousal Correlation

One direct consequence of assortative mating is that spouses should have correlated phenotypes for the trait under assortment.

We can check this by looking at mate assignments from the last generation. The _mate_assignments dict stores MateAssignment objects with maternal_idx and paternal_idx arrays.

[ ]:
# Compute spouse correlation from the penultimate generation
# (which determined the last generation's offspring)
last_gen = sim_am.generation
prev_gen = last_gen - 1

if prev_gen in sim_am._mate_assignments and prev_gen in sim_am.phenotype_history:
    assignment = sim_am._mate_assignments[prev_gen]
    parent_pheno = sim_am.phenotype_history[prev_gen]

    mother_height = parent_pheno['height'][assignment.maternal_idx]
    father_height = parent_pheno['height'][assignment.paternal_idx]

    # Take unique pairs (each pair is repeated offspring_per_pair times)
    opp = 2
    mother_height_unique = mother_height[::opp]
    father_height_unique = father_height[::opp]

    spouse_corr = np.corrcoef(mother_height_unique, father_height_unique)[0, 1]
    print(f"Spousal correlation for height (gen {prev_gen}): {spouse_corr:.3f}")
    print(f"(Target r was 0.3)")
else:
    # If mate assignments were pruned, compute from available generations
    print("Mate assignments for the penultimate generation were pruned.")
    print("Try increasing retain_phenotypes or checking an earlier generation.")

10. Full Covariance Matrix at Final Generation

The SampleStatistics output includes the full covariance matrix across all phenotype components. This shows how genetic and environmental components relate to each other and to the total phenotypes.

[ ]:
final_stats = sim_am.results[-1].statistics['SampleStatistics']
keys = final_stats['keys']
cov = final_stats['cov']

print(f"Final generation covariance matrix (gen {sim_am.results[-1].generation}):")
print(f"Components: {keys}\n")

# Print formatted covariance matrix
header = f"{'':>12s}" + "".join(f"{k:>12s}" for k in keys)
print(header)
for i, ki in enumerate(keys):
    row = f"{ki:>12s}" + "".join(f"{cov[i, j]:>12.4f}" for j in range(len(keys)))
    print(row)

# Phenotypic correlation between height and BMI
h_idx = keys.index('height')
b_idx = keys.index('bmi')
rp = cov[h_idx, b_idx] / np.sqrt(cov[h_idx, h_idx] * cov[b_idx, b_idx])
print(f"\nPhenotypic correlation(height, bmi) = {rp:.3f}")

Summary

This notebook demonstrated:

  • MultivariateEffects for genetically correlated traits

  • Tuple LHS in the formula DSL for multi-output components: (a, b) ~ mvGenetic(eff)

  • LinearAssortativeMating with rank-order pairing on phenotype

  • How assortative mating increases genetic variance over generations

  • Extracting variance and covariance statistics from sim.results

Key observations: - Assortative mating on height increases Var(height.G) relative to random mating - Due to genetic correlation, Var(bmi.G) may also increase slightly - Spousal correlation tracks the target r parameter (stochastically)