This repository contains the in-development analysis files for a novel method for estimating lifetime reproductive success by quantifying sexual selection and male gametic selection using whole-genome sequencing data.
At present these scripts are provided as they were used to generate a preliminary analysis of whole-genome sequencing data from wild-caught individual Drosophila melanogaster females, pools of their offspring, and pools of males, which are putative sires. This analysis is part of a forthcoming manuscript describing this method and providing a proof-of-concept.
The sequence data will be available on the NCBI sequence read archive under BioProject Accession PRJNA1294250
In future we plan to generalize these scripts for broader use.
The scripts in this library were developed and tested for Python 3,
and depend on functions in the standard sys and math libraries,
as well as several functions from scipy and
numpy.
These scripts operate on variant calls from whole-genome sequencing data in the standard VCF format, which can be obtained using many standard variant calling pipelines. The data were additionally filtered to include only:
- Biallelic sites
- Sites with >= 600 calls
- Mapping quality >= 20
- Minor allele frequency >= 0.05
- Euchromatic regions of the genome
Some scripts additionally require text files that describe sample metadata, which have been provided for the PRJNA1294250 data. These files Sample.key.txt, male.pools.txt, and Female.bulks.txt are expected to be in the working directory.
These scripts should be run sequentially in the order described below, as many of the scripts depend on the intermediate outputs of the earlier scripts.
Calculates the allele frequency divergence between independent pools within the male pool collection.
python Null.var.male.pools.py ${chromosome}
Calculates allele frequency divergence between two artificial female pools; random sorting of individual females into pools.
python Null.var.female.pools.py ${chromosome}
Tests for male-female divergence in allele frequency in two ways. First, test the females as a pool and then second as individuals.
python GSC.adults.v3.py ${chromosome}
Generate simulations that will be used to determine thresholds for suppressing SNPs based on deviations between the pools of females and the individual females.
python GSC.adults.sim2.py ${chromosome} ${sim_rep}
Applies the test for the difference between the allele frequencies of all males (pMale) vs. males that were successful sires (pMaleSuccessful) for each SNP. The output files additionally have the male/female test aligned to the pMale/pMaleSuccessful test.
Calculates the total sequence depth for (a) male pools and (b) females. Outputs the frequency distribution for each separately for autosomes and the X chromosome.
python depth.distributions.py ${VCF}
Calculates allele frequencies from the AD field, first for the male pool (combined) and then for females. The females are first scored as a pool and then by individual read depth.
python p.by.depth.py ${chromosome}