Quickstart

In this brief tutorial, we’ll run a quick simulation that illustrates the impact of cross-trait assortative mating on the the Haseman-Elston regression genetic correlation estimator. In the process, we’ll learning about the essential components of an xftsim simulation.

Running a simulation

Our simulation is constructed by supplying each of the above to the Simulation constructor:

[7]:

sim = Simulation(founder_haplotypes=founder_haplotypes,
                 mating_regime=mating_regime,
                 recombination_map=recombination_map,
                 architecture=architecture,
                 statistics=statistics,
                 post_processors=post_processors,)

We will run the simulation for 5 generations, which takes just a few seconds on a laptop:

[8]:

%time sim.run(5)

CPU times: user 18.3 s, sys: 5.36 s, total: 23.7 s
Wall time: 8.19 s

The most recent generations are accessible in as a dict object in sim.results, and all generations’ results are stored in sim.results_store. First let’s check that mating proceded as expected:

[9]:

## generation 0
sim.results_store[0]['mating_statistics']['mate_correlations']

[9]:

component	pheno_1.phenotype.mother	pheno_2.phenotype.mother	pheno_1.phenotype.father	pheno_2.phenotype.father
component
pheno_1.phenotype.mother	1.000000	-0.047681	0.463318	0.448655
pheno_2.phenotype.mother	-0.047681	1.000000	0.449102	0.462603
pheno_1.phenotype.father	0.463318	0.449102	1.000000	-0.043387
pheno_2.phenotype.father	0.448655	0.462603	-0.043387	1.000000

[10]:

## generation 4
sim.results_store[4]['mating_statistics']['mate_correlations']
## same as
sim.results['mating_statistics']['mate_correlations']

[10]:

component	pheno_1.phenotype.mother	pheno_2.phenotype.mother	pheno_1.phenotype.father	pheno_2.phenotype.father
component
pheno_1.phenotype.mother	1.000000	0.094649	0.479426	0.511044
pheno_2.phenotype.mother	0.094649	1.000000	0.508266	0.498006
pheno_1.phenotype.father	0.479426	0.508266	1.000000	0.132462
pheno_2.phenotype.father	0.511044	0.498006	0.132462	1.000000

Next, let’s compare three quanitites across generations: - rho_beta_true, the correlation between true genetic effects for pheno_1 and pheno_2, which doesn’t vary over time - rho_score_true, the correlation between true polygenic scores for pheno_1 and pheno_2, which reflects changes in genetic architecture induced by xAM - rho_beta_HE, the HE-regression estimate of rho_beta_true, which is biased upwards under xAM

[11]:

results = pd.DataFrame.from_records([{'generation':key,
  'rho_beta_HE':value['HE_regression']['corr_HE'].iloc[1,0],
  'rho_score_true':value['sample_statistics']['vcov'].iloc[1,0],
  'rho_beta_true':sim.architecture.components[0].true_rho_beta[1,0]} for key,value in sim.results_store.items()]
                         )

pdat = pd.melt(results, id_vars='generation', var_name='quantity',
               value_name='genetic correlation measure')
sns.lineplot(data=pdat,
           x='generation',
           y='genetic correlation measure',
           hue='quantity',)
sns.scatterplot(data=pdat,
           x='generation',
           y='genetic correlation measure',
           hue='quantity',legend=False)

[11]:

<Axes: xlabel='generation', ylabel='genetic correlation measure'>

../_images/gettingstarted_quickstart_20_1.png

Writing output to disk

We save the generation 4 genotypes to disk. Here, we write to zarr, an efficient parallel format ideal for cases where we’d like to reload the data into python, but also to the plink binary format:

[12]:

xft.io.write_to_plink1(sim.haplotypes, '/tmp/plink_output')

/home/rsb/Dropbox/ftsim/xftsim/xftsim/io.py:194: UserWarning: Writing to the plink bfile format removes phase information
  warnings.warn("Writing to the plink bfile format removes phase information")
Writing BED: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.34it/s]

Writing FAM... done.
Writing BIM... done.


/home/rsb/miniconda3/lib/python3.9/site-packages/pandas_plink/_write.py:261: FutureWarning: the 'line_terminator'' keyword is deprecated, use 'lineterminator' instead.
  df.to_csv(
/home/rsb/miniconda3/lib/python3.9/site-packages/pandas_plink/_write.py:288: FutureWarning: the 'line_terminator'' keyword is deprecated, use 'lineterminator' instead.
  df.to_csv(

[13]:

xft.io.save_haplotype_zarr(sim.haplotypes, '/tmp/zarr_output', mode='w')

We can convert the phenotypes to Pandas dataframes, which we can then save using any method we like:

[14]:

pheno_df = sim.phenotypes.xft.as_pd()
pheno_df

[14]:

		phenotype_name	pheno_1	pheno_2	pheno_1	pheno_2	pheno_1	pheno_2
		component_name	additiveGenetic	additiveGenetic	additiveNoise	additiveNoise	phenotype	phenotype
		vorigin_relative	proband	proband	proband	proband	proband	proband
iid	fid	sex
4_0	4_0	0	-1.966416	-1.934732	-0.002914	-0.362107	-1.969329	-2.296839
4_1	4_0	0	-2.926725	-1.776690	0.189355	0.207299	-2.737370	-1.569391
4_2	4_1	0	-1.092239	-1.597979	1.325921	0.199160	0.233683	-1.398819
4_3	4_1	0	-0.711714	-1.140154	-0.652625	1.048390	-1.364339	-0.091765
4_4	4_2	0	-0.574885	-0.999057	-0.593752	-0.475246	-1.168637	-1.474302
...	...	...	...	...	...	...	...	...
4_7995	4_3997	1	1.663567	1.608789	1.210872	0.930285	2.874439	2.539074
4_7996	4_3998	0	1.173014	0.511302	1.246304	-0.215005	2.419319	0.296297
4_7997	4_3998	1	1.179811	1.610954	-0.950225	0.467568	0.229586	2.078522
4_7998	4_3999	0	0.418514	1.636576	-0.764532	-0.758268	-0.346019	0.878308
4_7999	4_3999	0	2.185312	1.953812	0.093433	0.410642	2.278744	2.364453

8000 rows × 6 columns

[15]:

pheno_df.to_csv('/tmp/pheno_out.csv')

Quickstart

Setting up a simulation

Running a simulation

Writing output to disk