Skip to content

KU-GDSC/selection-components

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Quantifying Selection Components from WGS Data

This repository contains the in-development analysis files for a novel method for estimating lifetime reproductive success by quantifying sexual selection and male gametic selection using whole-genome sequencing data.

Table of Contents

Background

At present these scripts are provided as they were used to generate a preliminary analysis of whole-genome sequencing data from wild-caught individual Drosophila melanogaster females, pools of their offspring, and pools of males, which are putative sires. This analysis is part of a forthcoming manuscript describing this method and providing a proof-of-concept.

The sequence data will be available on the NCBI sequence read archive under BioProject Accession PRJNA1294250

In future we plan to generalize these scripts for broader use.

Requirements

The scripts in this library were developed and tested for Python 3, and depend on functions in the standard sys and math libraries, as well as several functions from scipy and numpy.

These scripts operate on variant calls from whole-genome sequencing data in the standard VCF format, which can be obtained using many standard variant calling pipelines. The data were additionally filtered to include only:

  1. Biallelic sites
  2. Sites with >= 600 calls
  3. Mapping quality >= 20
  4. Minor allele frequency >= 0.05
  5. Euchromatic regions of the genome

Some scripts additionally require text files that describe sample metadata, which have been provided for the PRJNA1294250 data. These files Sample.key.txt, male.pools.txt, and Female.bulks.txt are expected to be in the working directory.

These scripts should be run sequentially in the order described below, as many of the scripts depend on the intermediate outputs of the earlier scripts.

Usage

Analysis Scripts

Calculates the allele frequency divergence between independent pools within the male pool collection.

python Null.var.male.pools.py ${chromosome}

Calculates allele frequency divergence between two artificial female pools; random sorting of individual females into pools.

python Null.var.female.pools.py ${chromosome}

Tests for male-female divergence in allele frequency in two ways. First, test the females as a pool and then second as individuals.

python GSC.adults.v3.py ${chromosome}

Generate simulations that will be used to determine thresholds for suppressing SNPs based on deviations between the pools of females and the individual females.

python GSC.adults.sim2.py ${chromosome} ${sim_rep}

Applies the test for the difference between the allele frequencies of all males (pMale) vs. males that were successful sires (pMaleSuccessful) for each SNP. The output files additionally have the male/female test aligned to the pMale/pMaleSuccessful test.

Summary Scripts

Calculates the total sequence depth for (a) male pools and (b) females. Outputs the frequency distribution for each separately for autosomes and the X chromosome.

python depth.distributions.py ${VCF} 

Calculates allele frequencies from the AD field, first for the male pool (combined) and then for females. The females are first scored as a pool and then by individual read depth.

python p.by.depth.py ${chromosome}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages