Vertical Transmission and Trio Analysis
This notebook demonstrates vertical transmission (VT) – a model where parental phenotype directly influences offspring phenotype beyond genetic inheritance. This is common in traits like educational attainment, dietary habits, and some behavioral traits.
We will:
Define a single-trait architecture with genetic, noise, and parental transmission
Use
parent()with afounder=noise()fallback for generation 0Run the simulation and show how VT inflates parent-offspring correlation
Use
TrioFilterandSampleStatisticsto extract trio viewsCompare parent-offspring regression with and without VT
1. Imports
[ ]:
import numpy as np
from xftsim.effect import AdditiveEffects
from xftsim.arch import Architecture
from xftsim.mate import RandomMating
from xftsim.reproduce import RecombinationMap
from xftsim.sim import Simulation
from xftsim.filters import TrioFilter
from xftsim.stats import SampleStatistics
from xftsim.founders import founder_haplotypes_uniform_AFs
2. Create Founder Haplotypes
[ ]:
np.random.seed(42)
n = 1000
m = 200
founder_hap = founder_haplotypes_uniform_AFs(n=n, m=m)
print(f"Founders: {founder_hap.n} individuals, {founder_hap.m} variants")
3. Define the Architecture with Vertical Transmission
The phenotype model is:
Y = G + E + VT
where: - G = additive genetic value (h2 = 0.5) - E = environmental noise (variance = 0.3) - VT = midparent phenotype from previous generation
The parent(Y, founder=noise(0.2)) component: - At generation 0: draws from N(0, 0.2) since there are no parents yet - At generation t>0: computes the midparent value = 0.5 * (mother_Y + father_Y)
This creates a feedback loop: parental phenotype influences offspring phenotype, which then becomes the parent of the next generation. Over time, this inflates phenotypic variance and parent-offspring correlation beyond what genetics alone would predict.
[ ]:
eff = AdditiveEffects.from_h2(h2=0.5, m=m, seed=123)
formula_vt = """
Y.G ~ genetic(eff)
Y.E ~ noise(0.3)
Y.VT ~ parent(Y, founder=noise(0.2))
Y ~ Y.G + Y.E + Y.VT
"""
arch_vt = Architecture(formula=formula_vt, effects={'eff': eff})
print(arch_vt)
print(f"\nNodes:")
for node in arch_vt.nodes:
print(f" {node.outputs} <- {node.component}")
4. Define a Control Architecture (No VT)
For comparison, we build the same model but without the VT component. The noise variance is increased to keep total phenotypic variance comparable.
[ ]:
formula_ctrl = """
Y.G ~ genetic(eff)
Y.E ~ noise(0.5)
Y ~ Y.G + Y.E
"""
arch_ctrl = Architecture(formula=formula_ctrl, effects={'eff': eff})
print("Control architecture (no VT):")
print(arch_ctrl)
5. Run Simulations with TrioFilter
The TrioFilter() extracts aligned trio data (offspring + mother + father phenotypes) from adjacent generations. It returns None at generation 0 (no parents) and a TrioView at generation 1+.
Important: TrioFilter() and SampleStatistics() take NO constructor arguments.
The filters parameter is a dict, not a list: {'trio': TrioFilter()}.
[ ]:
# Common setup
mating = RandomMating(offspring_per_pair=2)
recomb = RecombinationMap.constant_map(m=m, p=0.5)
# Run VT simulation
sim_vt = Simulation(
founder_haplotypes=founder_hap,
architecture=arch_vt,
mating_regime=mating,
recombination_map=recomb,
retain_haplotypes=2,
retain_phenotypes=5,
statistics=[SampleStatistics()],
filters={'trio': TrioFilter()},
seed=42,
)
sim_vt.run(5)
print(f"VT simulation: generation {sim_vt.generation}")
# Run control simulation (regenerate founders for matching initial conditions)
np.random.seed(42)
founder_hap_ctrl = founder_haplotypes_uniform_AFs(n=n, m=m)
sim_ctrl = Simulation(
founder_haplotypes=founder_hap_ctrl,
architecture=arch_ctrl,
mating_regime=mating,
recombination_map=recomb,
retain_haplotypes=2,
retain_phenotypes=5,
statistics=[SampleStatistics()],
filters={'trio': TrioFilter()},
seed=42,
)
sim_ctrl.run(5)
print(f"Control simulation: generation {sim_ctrl.generation}")
6. Variance Components Over Generations
With VT, we expect the total phenotypic variance Var(Y) to be inflated relative to the control, because the VT component adds parental phenotype variance on top of genetic and environmental variance.
[ ]:
def get_var(sim, component):
"""Extract variance of a component across generations."""
result = []
for r in sim.results:
stats = r.statistics['SampleStatistics']
keys = stats['keys']
if component in keys:
idx = keys.index(component)
result.append((r.generation, stats['var'][idx]))
return result
print("Phenotypic variance Var(Y) over generations:")
print(f"{'Gen':<5} {'VT model':<15} {'Control':<15}")
print("-" * 35)
var_y_vt = get_var(sim_vt, 'Y')
var_y_ctrl = get_var(sim_ctrl, 'Y')
for (g_vt, v_vt), (_, v_ctrl) in zip(var_y_vt, var_y_ctrl):
print(f"{g_vt:<5} {v_vt:<15.4f} {v_ctrl:<15.4f}")
[ ]:
# Show VT component variance
var_vt_comp = get_var(sim_vt, 'Y.VT')
print("\nVertical transmission component Var(Y.VT) over generations:")
for g, v in var_vt_comp:
print(f" Gen {g}: {v:.4f}")
7. Extract Trio Views
The TrioFilter produces TrioView objects with aligned phenotype dictionaries for offspring, mothers, and fathers. This is stored in the filtered_views passed to statistics, but we can also apply the filter manually to compute parent-offspring correlations.
A TrioView has: - offspring_phenotypes: dict of name -> (n_trios,) array - mother_phenotypes: dict of name -> (n_trios,) array - father_phenotypes: dict of name -> (n_trios,) array - n_trios: number of trios
[ ]:
trio_filter = TrioFilter()
def compute_parent_offspring_corr(sim, gen):
"""Compute midparent-offspring correlation for trait Y at a given generation."""
trio_view = trio_filter.apply(
gen,
sim.phenotype_history,
sim.pedigree_history,
)
if trio_view is None:
return None
offspring_y = trio_view.offspring_phenotypes['Y']
mother_y = trio_view.mother_phenotypes['Y']
father_y = trio_view.father_phenotypes['Y']
midparent_y = 0.5 * (mother_y + father_y)
return np.corrcoef(offspring_y, midparent_y)[0, 1]
print("Midparent-offspring correlation for Y:")
print(f"{'Gen':<5} {'VT model':<15} {'Control':<15}")
print("-" * 35)
for gen in range(1, 5):
if gen in sim_vt.phenotype_history and gen in sim_ctrl.phenotype_history:
r_vt = compute_parent_offspring_corr(sim_vt, gen)
r_ctrl = compute_parent_offspring_corr(sim_ctrl, gen)
r_vt_str = f"{r_vt:.3f}" if r_vt is not None else "N/A"
r_ctrl_str = f"{r_ctrl:.3f}" if r_ctrl is not None else "N/A"
print(f"{gen:<5} {r_vt_str:<15} {r_ctrl_str:<15}")
8. Understanding the VT Inflation
Without VT, midparent-offspring correlation is approximately h2/2 for the midparent regression (or h2 for a single parent regression, divided by the phenotypic SD). The exact realized value depends on allele frequencies.
With VT, the parent-offspring correlation is inflated because: - Offspring Y = G + E + VT - VT = midparent(Y) from previous generation - So offspring Y is directly correlated with parental Y through the VT pathway in addition to the genetic pathway
Let us verify by looking at the correlation between Y.VT and midparent Y.
[ ]:
# Examine the trio view for the last available generation
last_gen = sim_vt.generation
trio_view = trio_filter.apply(
last_gen,
sim_vt.phenotype_history,
sim_vt.pedigree_history,
)
if trio_view is not None:
print(f"Trio view at generation {last_gen}:")
print(f" Number of trios: {trio_view.n_trios}")
print(f" Offspring phenotype keys: {list(trio_view.offspring_phenotypes.keys())}")
print(f" Mother phenotype keys: {list(trio_view.mother_phenotypes.keys())}")
print()
# Offspring VT component vs midparent Y
if 'Y.VT' in trio_view.offspring_phenotypes:
offspring_vt = trio_view.offspring_phenotypes['Y.VT']
midparent_y = 0.5 * (
trio_view.mother_phenotypes['Y'] +
trio_view.father_phenotypes['Y']
)
r_vt_mp = np.corrcoef(offspring_vt, midparent_y)[0, 1]
print(f" Corr(offspring VT, midparent Y) = {r_vt_mp:.3f}")
print(f" This should be close to 1.0 (VT IS midparent Y)")
# Component-wise correlations with offspring Y
offspring_y = trio_view.offspring_phenotypes['Y']
for comp in ['Y.G', 'Y.E', 'Y.VT']:
if comp in trio_view.offspring_phenotypes:
r = np.corrcoef(offspring_y, trio_view.offspring_phenotypes[comp])[0, 1]
print(f" Corr(offspring Y, offspring {comp}) = {r:.3f}")
else:
print("No trio view available at the last generation (phenotypes may be pruned).")
9. Genetic vs Total Parent-Offspring Regression
Let us compare the parent-offspring regression for the genetic component alone (Y.G) vs the total phenotype (Y). Under VT, the total Y regression will be inflated relative to the genetic-only regression.
[ ]:
def parent_offspring_regression(trio_view, component):
"""Compute midparent-offspring regression coefficient."""
if component not in trio_view.offspring_phenotypes:
return None
if component not in trio_view.mother_phenotypes:
return None
offspring = trio_view.offspring_phenotypes[component]
mother = trio_view.mother_phenotypes[component]
father = trio_view.father_phenotypes[component]
midparent = 0.5 * (mother + father)
# Regression coefficient: Cov(offspring, midparent) / Var(midparent)
cov = np.cov(offspring, midparent)[0, 1]
var_mp = np.var(midparent)
if var_mp > 0:
return cov / var_mp
return None
# Use the last generation trio view
if trio_view is not None:
print(f"Midparent-offspring regression (generation {last_gen}):")
print()
for component in ['Y.G', 'Y']:
beta = parent_offspring_regression(trio_view, component)
if beta is not None:
print(f" {component}: beta = {beta:.3f}")
print()
print("Under purely genetic transmission, the midparent-offspring")
print("regression on total Y should equal the regression on Y.G.")
print("The VT component inflates the total Y regression.")
10. Continue the Simulation
Simulations can be extended using continue_run(). This adds more generations from the current state without re-running from scratch. This is useful for running additional generations after inspecting intermediate results.
[ ]:
print(f"Before continue: generation {sim_vt.generation}, "
f"{len(sim_vt.results)} result entries")
sim_vt.continue_run(3) # Run 3 more generations
print(f"After continue: generation {sim_vt.generation}, "
f"{len(sim_vt.results)} result entries")
# Show final statistics
for result in sim_vt.results[-3:]:
stats = result.statistics['SampleStatistics']
keys = stats['keys']
y_idx = keys.index('Y')
print(f" Gen {result.generation}: Var(Y) = {stats['var'][y_idx]:.4f}")
Summary
This notebook demonstrated:
Vertical transmission via
parent(Y, founder=noise(0.2))in the formula DSLThe
founder=keyword provides a fallback for generation 0 when no parents existTrioFilter for extracting aligned parent-offspring phenotype data
How VT inflates parent-offspring correlation beyond genetic prediction
continue_run() for extending a simulation from its current state
Key findings: - VT creates a direct pathway from parental to offspring phenotype - This inflates Var(Y) and parent-offspring correlations - The effect accumulates over generations as VT feeds back - TrioFilter provides a convenient way to extract family-based views for analysis