Vertical Transmission and Trio Analysis

This notebook demonstrates vertical transmission (VT) – a model where parental phenotype directly influences offspring phenotype beyond genetic inheritance. This is common in traits like educational attainment, dietary habits, and some behavioral traits.

We will:

  1. Define a single-trait architecture with genetic, noise, and parental transmission

  2. Use parent() with a founder=noise() fallback for generation 0

  3. Run the simulation and show how VT inflates parent-offspring correlation

  4. Use TrioFilter and SampleStatistics to extract trio views

  5. Compare parent-offspring regression with and without VT

1. Imports

[ ]:
import numpy as np

from xftsim.effect import AdditiveEffects
from xftsim.arch import Architecture
from xftsim.mate import RandomMating
from xftsim.reproduce import RecombinationMap
from xftsim.sim import Simulation
from xftsim.filters import TrioFilter
from xftsim.stats import SampleStatistics
from xftsim.founders import founder_haplotypes_uniform_AFs

2. Create Founder Haplotypes

[ ]:
np.random.seed(42)

n = 1000
m = 200

founder_hap = founder_haplotypes_uniform_AFs(n=n, m=m)
print(f"Founders: {founder_hap.n} individuals, {founder_hap.m} variants")

3. Define the Architecture with Vertical Transmission

The phenotype model is:

Y = G + E + VT

where: - G = additive genetic value (h2 = 0.5) - E = environmental noise (variance = 0.3) - VT = midparent phenotype from previous generation

The parent(Y, founder=noise(0.2)) component: - At generation 0: draws from N(0, 0.2) since there are no parents yet - At generation t>0: computes the midparent value = 0.5 * (mother_Y + father_Y)

This creates a feedback loop: parental phenotype influences offspring phenotype, which then becomes the parent of the next generation. Over time, this inflates phenotypic variance and parent-offspring correlation beyond what genetics alone would predict.

[ ]:
eff = AdditiveEffects.from_h2(h2=0.5, m=m, seed=123)

formula_vt = """
Y.G ~ genetic(eff)
Y.E ~ noise(0.3)
Y.VT ~ parent(Y, founder=noise(0.2))
Y ~ Y.G + Y.E + Y.VT
"""

arch_vt = Architecture(formula=formula_vt, effects={'eff': eff})
print(arch_vt)
print(f"\nNodes:")
for node in arch_vt.nodes:
    print(f"  {node.outputs} <- {node.component}")

4. Define a Control Architecture (No VT)

For comparison, we build the same model but without the VT component. The noise variance is increased to keep total phenotypic variance comparable.

[ ]:
formula_ctrl = """
Y.G ~ genetic(eff)
Y.E ~ noise(0.5)
Y ~ Y.G + Y.E
"""

arch_ctrl = Architecture(formula=formula_ctrl, effects={'eff': eff})
print("Control architecture (no VT):")
print(arch_ctrl)

5. Run Simulations with TrioFilter

The TrioFilter() extracts aligned trio data (offspring + mother + father phenotypes) from adjacent generations. It returns None at generation 0 (no parents) and a TrioView at generation 1+.

Important: TrioFilter() and SampleStatistics() take NO constructor arguments.

The filters parameter is a dict, not a list: {'trio': TrioFilter()}.

[ ]:
# Common setup
mating = RandomMating(offspring_per_pair=2)
recomb = RecombinationMap.constant_map(m=m, p=0.5)

# Run VT simulation
sim_vt = Simulation(
    founder_haplotypes=founder_hap,
    architecture=arch_vt,
    mating_regime=mating,
    recombination_map=recomb,
    retain_haplotypes=2,
    retain_phenotypes=5,
    statistics=[SampleStatistics()],
    filters={'trio': TrioFilter()},
    seed=42,
)
sim_vt.run(5)
print(f"VT simulation: generation {sim_vt.generation}")

# Run control simulation (regenerate founders for matching initial conditions)
np.random.seed(42)
founder_hap_ctrl = founder_haplotypes_uniform_AFs(n=n, m=m)

sim_ctrl = Simulation(
    founder_haplotypes=founder_hap_ctrl,
    architecture=arch_ctrl,
    mating_regime=mating,
    recombination_map=recomb,
    retain_haplotypes=2,
    retain_phenotypes=5,
    statistics=[SampleStatistics()],
    filters={'trio': TrioFilter()},
    seed=42,
)
sim_ctrl.run(5)
print(f"Control simulation: generation {sim_ctrl.generation}")

6. Variance Components Over Generations

With VT, we expect the total phenotypic variance Var(Y) to be inflated relative to the control, because the VT component adds parental phenotype variance on top of genetic and environmental variance.

[ ]:
def get_var(sim, component):
    """Extract variance of a component across generations."""
    result = []
    for r in sim.results:
        stats = r.statistics['SampleStatistics']
        keys = stats['keys']
        if component in keys:
            idx = keys.index(component)
            result.append((r.generation, stats['var'][idx]))
    return result

print("Phenotypic variance Var(Y) over generations:")
print(f"{'Gen':<5} {'VT model':<15} {'Control':<15}")
print("-" * 35)

var_y_vt = get_var(sim_vt, 'Y')
var_y_ctrl = get_var(sim_ctrl, 'Y')

for (g_vt, v_vt), (_, v_ctrl) in zip(var_y_vt, var_y_ctrl):
    print(f"{g_vt:<5} {v_vt:<15.4f} {v_ctrl:<15.4f}")
[ ]:
# Show VT component variance
var_vt_comp = get_var(sim_vt, 'Y.VT')

print("\nVertical transmission component Var(Y.VT) over generations:")
for g, v in var_vt_comp:
    print(f"  Gen {g}: {v:.4f}")

7. Extract Trio Views

The TrioFilter produces TrioView objects with aligned phenotype dictionaries for offspring, mothers, and fathers. This is stored in the filtered_views passed to statistics, but we can also apply the filter manually to compute parent-offspring correlations.

A TrioView has: - offspring_phenotypes: dict of name -> (n_trios,) array - mother_phenotypes: dict of name -> (n_trios,) array - father_phenotypes: dict of name -> (n_trios,) array - n_trios: number of trios

[ ]:
trio_filter = TrioFilter()

def compute_parent_offspring_corr(sim, gen):
    """Compute midparent-offspring correlation for trait Y at a given generation."""
    trio_view = trio_filter.apply(
        gen,
        sim.phenotype_history,
        sim.pedigree_history,
    )
    if trio_view is None:
        return None

    offspring_y = trio_view.offspring_phenotypes['Y']
    mother_y = trio_view.mother_phenotypes['Y']
    father_y = trio_view.father_phenotypes['Y']
    midparent_y = 0.5 * (mother_y + father_y)

    return np.corrcoef(offspring_y, midparent_y)[0, 1]

print("Midparent-offspring correlation for Y:")
print(f"{'Gen':<5} {'VT model':<15} {'Control':<15}")
print("-" * 35)

for gen in range(1, 5):
    if gen in sim_vt.phenotype_history and gen in sim_ctrl.phenotype_history:
        r_vt = compute_parent_offspring_corr(sim_vt, gen)
        r_ctrl = compute_parent_offspring_corr(sim_ctrl, gen)

        r_vt_str = f"{r_vt:.3f}" if r_vt is not None else "N/A"
        r_ctrl_str = f"{r_ctrl:.3f}" if r_ctrl is not None else "N/A"

        print(f"{gen:<5} {r_vt_str:<15} {r_ctrl_str:<15}")

8. Understanding the VT Inflation

Without VT, midparent-offspring correlation is approximately h2/2 for the midparent regression (or h2 for a single parent regression, divided by the phenotypic SD). The exact realized value depends on allele frequencies.

With VT, the parent-offspring correlation is inflated because: - Offspring Y = G + E + VT - VT = midparent(Y) from previous generation - So offspring Y is directly correlated with parental Y through the VT pathway in addition to the genetic pathway

Let us verify by looking at the correlation between Y.VT and midparent Y.

[ ]:
# Examine the trio view for the last available generation
last_gen = sim_vt.generation
trio_view = trio_filter.apply(
    last_gen,
    sim_vt.phenotype_history,
    sim_vt.pedigree_history,
)

if trio_view is not None:
    print(f"Trio view at generation {last_gen}:")
    print(f"  Number of trios: {trio_view.n_trios}")
    print(f"  Offspring phenotype keys: {list(trio_view.offspring_phenotypes.keys())}")
    print(f"  Mother phenotype keys: {list(trio_view.mother_phenotypes.keys())}")
    print()

    # Offspring VT component vs midparent Y
    if 'Y.VT' in trio_view.offspring_phenotypes:
        offspring_vt = trio_view.offspring_phenotypes['Y.VT']
        midparent_y = 0.5 * (
            trio_view.mother_phenotypes['Y'] +
            trio_view.father_phenotypes['Y']
        )
        r_vt_mp = np.corrcoef(offspring_vt, midparent_y)[0, 1]
        print(f"  Corr(offspring VT, midparent Y) = {r_vt_mp:.3f}")
        print(f"  This should be close to 1.0 (VT IS midparent Y)")

    # Component-wise correlations with offspring Y
    offspring_y = trio_view.offspring_phenotypes['Y']
    for comp in ['Y.G', 'Y.E', 'Y.VT']:
        if comp in trio_view.offspring_phenotypes:
            r = np.corrcoef(offspring_y, trio_view.offspring_phenotypes[comp])[0, 1]
            print(f"  Corr(offspring Y, offspring {comp}) = {r:.3f}")
else:
    print("No trio view available at the last generation (phenotypes may be pruned).")

9. Genetic vs Total Parent-Offspring Regression

Let us compare the parent-offspring regression for the genetic component alone (Y.G) vs the total phenotype (Y). Under VT, the total Y regression will be inflated relative to the genetic-only regression.

[ ]:
def parent_offspring_regression(trio_view, component):
    """Compute midparent-offspring regression coefficient."""
    if component not in trio_view.offspring_phenotypes:
        return None
    if component not in trio_view.mother_phenotypes:
        return None

    offspring = trio_view.offspring_phenotypes[component]
    mother = trio_view.mother_phenotypes[component]
    father = trio_view.father_phenotypes[component]
    midparent = 0.5 * (mother + father)

    # Regression coefficient: Cov(offspring, midparent) / Var(midparent)
    cov = np.cov(offspring, midparent)[0, 1]
    var_mp = np.var(midparent)
    if var_mp > 0:
        return cov / var_mp
    return None

# Use the last generation trio view
if trio_view is not None:
    print(f"Midparent-offspring regression (generation {last_gen}):")
    print()

    for component in ['Y.G', 'Y']:
        beta = parent_offspring_regression(trio_view, component)
        if beta is not None:
            print(f"  {component}: beta = {beta:.3f}")

    print()
    print("Under purely genetic transmission, the midparent-offspring")
    print("regression on total Y should equal the regression on Y.G.")
    print("The VT component inflates the total Y regression.")

10. Continue the Simulation

Simulations can be extended using continue_run(). This adds more generations from the current state without re-running from scratch. This is useful for running additional generations after inspecting intermediate results.

[ ]:
print(f"Before continue: generation {sim_vt.generation}, "
      f"{len(sim_vt.results)} result entries")

sim_vt.continue_run(3)  # Run 3 more generations

print(f"After continue:  generation {sim_vt.generation}, "
      f"{len(sim_vt.results)} result entries")

# Show final statistics
for result in sim_vt.results[-3:]:
    stats = result.statistics['SampleStatistics']
    keys = stats['keys']
    y_idx = keys.index('Y')
    print(f"  Gen {result.generation}: Var(Y) = {stats['var'][y_idx]:.4f}")

Summary

This notebook demonstrated:

  • Vertical transmission via parent(Y, founder=noise(0.2)) in the formula DSL

  • The founder= keyword provides a fallback for generation 0 when no parents exist

  • TrioFilter for extracting aligned parent-offspring phenotype data

  • How VT inflates parent-offspring correlation beyond genetic prediction

  • continue_run() for extending a simulation from its current state

Key findings: - VT creates a direct pathway from parental to offspring phenotype - This inflates Var(Y) and parent-offspring correlations - The effect accumulates over generations as VT feeds back - TrioFilter provides a convenient way to extract family-based views for analysis