Advanced genetic architectures

Here we consider more complicated genetic architecture components.

Paritially overlapping non-infinitessimal models

Here we consider genetic effects for two traits, \(Y\), \(Z\), such that genome is partitioned into 5 sets:

variants causal for \(Y\) but not \(Z\)
variants causal for \(Z\) but not \(Y\)
variants causal for \(Y\) and \(Z\) with orthogonal effects
variants causal for \(Y\) and \(Z\) with correlated effects with r = 0.5
non-causal variants

We’ll have the heritabilities be 0.5 and 0.4 for \(Y\) and \(Z\), respectively, and have each causal set for the two variables account for one third of the total heritability.

The simplest way to achieve this is to manually construct a set of effects:

[1]:

import xftsim as xft
import numpy as np
np.random.seed(123)

h2_y, h2_z = 0.5, 0.4
r_yz=.5

## our haplotypes
haplotypes = xft.sim.DemoSimulation(n=8000,m=2000).haplotypes

## divide genome into five equally sized components
variant_set_indices = [np.sort(x) for x in np.array_split(np.random.permutation(haplotypes.xft.m), 5)]

[2]:

## initialize effects matrix
beta = np.zeros((haplotypes.xft.m, 2))

##  1. variants causal for $Y$ but not $Z$
beta[variant_set_indices[0],0] = np.random.randn(len(variant_set_indices[0]))*np.sqrt(h2_y/3 / len(variant_set_indices[0]))
##  2. variants causal for $Z$ but not $Y$
beta[variant_set_indices[1],1] = np.random.randn(len(variant_set_indices[1]))*np.sqrt(h2_z/3 / len(variant_set_indices[1]))
## 3. variants causal for $Y$ and $Z$ with orthogonal effects
beta[variant_set_indices[2],0] = np.random.randn(len(variant_set_indices[2]))*np.sqrt(h2_y/3 / len(variant_set_indices[2]))
beta[variant_set_indices[2],1] = np.random.randn(len(variant_set_indices[2]))*np.sqrt(h2_z/3 / len(variant_set_indices[2]))
## 4. variants causal for $Y$ and $Z$ with correlated effects
cov = np.array([[h2_y/3, r_yz*np.sqrt(h2_y*h2_z)/3],
                [r_yz*np.sqrt(h2_y*h2_z)/3,h2_z/3]])/len(variant_set_indices[3])
beta[variant_set_indices[3],:] = np.random.multivariate_normal(mean = np.zeros(2),
                                                               cov = cov,
                                                               size = len(variant_set_indices[3]))
## 5. non-causal variants are already zero

We then can construct the additive effects component:

[3]:

add_effects_object = xft.effect.AdditiveEffects(scaled=False, standardized=True,
                                                beta=beta,
                                                variant_indexer=haplotypes.xft.get_variant_indexer(),
                                                component_indexer=xft.index.ComponentIndex.from_product(['y','z'],
                                                                                                        ['addGen']))

add_comp = xft.arch.AdditiveGeneticComponent(beta=add_effects_object)

and verify that the effects are what we wanted:

[4]:

print('Total genetic covariance matrix:')
print(np.cov(haplotypes.data @ add_effects_object.beta_raw_haploid, rowvar=False))

for i,variant_set in enumerate(variant_set_indices):
    print(f"\nSet {i} genetic covariance matrix::")
    vset =np.array(list(zip(variant_set*2,variant_set*2 +1))).ravel() ## convert to haploid positions
    print(np.cov(haplotypes[:,vset].data @add_effects_object.beta_raw_haploid[vset,:], rowvar=False))

Total genetic covariance matrix:
[[0.4814071  0.07346577]
 [0.07346577 0.40713364]]

Set 0 genetic covariance matrix::
[[0.1776528 0.       ]
 [0.        0.       ]]

Set 1 genetic covariance matrix::
[[0.         0.        ]
 [0.         0.13200676]]

Set 2 genetic covariance matrix::
[[0.15190867 0.00185606]
 [0.00185606 0.14192617]]

Set 3 genetic covariance matrix::
[[0.15447392 0.06978447]
 [0.06978447 0.12699539]]

Set 4 genetic covariance matrix::
[[0. 0.]
 [0. 0.]]

We can then construct the noise and sum transformations needed to complete the phenogenetic architecture:

[5]:

noise_comp = xft.arch.AdditiveNoiseComponent(variances=[1-h2_y,1-h2_z],
    component_index=xft.index.ComponentIndex.from_product(['y','z'],
                                                          ['noise']))
sum_trans = xft.arch.SumAllTransformation(xft.index.ComponentIndex.from_product(['y','z'],
                                                                                ['addGen','noise']))
arch = xft.arch.Architecture([add_comp,noise_comp,sum_trans])

Finally, we run a simulation assuming linear assortative mating on \(Y\) and \(Z\) with an exchangeable cross-mate correlation structure with \(r_\text{mate}\) = 0.5:

[6]:

rmap = xft.reproduce.RecombinationMap.constant_map_from_haplotypes(haplotypes, p=.1)
mate = xft.mate.LinearAssortativeMatingRegime(mates_per_female=2,
                                              offspring_per_pair=1,
                                              r=.5,
    component_index=xft.index.ComponentIndex.from_product(['y','z'],
                                                          ['phenotype']))
sim = xft.sim.Simulation(founder_haplotypes=haplotypes,
                         mating_regime=mate,
                         recombination_map=rmap,
                         architecture=arch,
                         statistics=[xft.stats.SampleStatistics(),
                                     xft.stats.MatingStatistics(),
                                     xft.stats.HasemanElstonEstimator(randomized=True)])
sim.run(5)

We can see inflation in HE regression correlation estimates increasing with each generation of cross-trait assortative mating:

[7]:

xft.utils.print_tree(sim.results)
sim.results['HE_regression']['cov_HE']

sample_statistics:
|__means: <class 'pandas.core.series.Series'>
|__variances: <class 'pandas.core.series.Series'>
|__variance_components: <class 'pandas.core.series.Series'>
|__vcov: <class 'pandas.core.frame.DataFrame'>
|__corr: <class 'pandas.core.frame.DataFrame'>
mating_statistics:
|__n_reproducing_pairs: <class 'numpy.int64'>
|__n_total_offspring: <class 'numpy.int64'>
|__mean_n_offspring_per_pair: <class 'numpy.float64'>
|__mean_n_female_offspring_per_pair: <class 'numpy.float64'>
|__mate_correlations: <class 'pandas.core.frame.DataFrame'>
HE_regression:
|__cov_HE: <class 'pandas.core.frame.DataFrame'>
|__corr_HE: <class 'pandas.core.frame.DataFrame'>

[7]:

		phenotype_name	y	z
		component_name	phenotype	phenotype
		vorigin_relative	proband	proband
phenotype_name	component_name	vorigin_relative
y	phenotype	proband	0.884511	0.491227
z	phenotype	proband	0.491227	0.713657

[8]:

import seaborn as sns
import pandas as pd

results = pd.DataFrame.from_records([{'generation':key,
  'rho_beta_HE':value['HE_regression']['corr_HE'].iloc[1,0],
  'rho_score_true':value['sample_statistics']['vcov'].iloc[1,0],
  'rho_beta_true':sim.architecture.components[0].true_rho_beta[1,0]} for key,value in sim.results_store.items()]
                         )

pdat = pd.melt(results, id_vars='generation', var_name='quantity',
               value_name='genetic correlation measure')
sns.lineplot(data=pdat,
           x='generation',
           y='genetic correlation measure',
           hue='quantity',)
sns.scatterplot(data=pdat,
           x='generation',
           y='genetic correlation measure',
           hue='quantity',legend=False)

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[8], line 1
----> 1 import seaborn as sns
      2 import pandas as pd
      4 results = pd.DataFrame.from_records([{'generation':key,
      5   'rho_beta_HE':value['HE_regression']['corr_HE'].iloc[1,0],
      6   'rho_score_true':value['sample_statistics']['vcov'].iloc[1,0],
      7   'rho_beta_true':sim.architecture.components[0].true_rho_beta[1,0]} for key,value in sim.results_store.items()]
      8                          )

ModuleNotFoundError: No module named 'seaborn'

[ ]:

xft.io.write_to_plink1(sim.haplotypes,'/tmp/test')