This walkthrough describes the initial processing of 2bRAD reads generated from six coral species collected across urbanized and reef habitats in southeast Florida. It describes (1) setting up the workspace, (2) downloading, unzipping, and concatenating reads, (3) demultiplexing row pools and removing Illumina adapters, and (4) performing quality filtering. For species-specific pipelines from genome alignment on, please see below:
Library prep, bioinformatics, and analysis protocols are credited to the 2bRAD pipeline originally developed by Misha Matz: https://doi.org/10.1038/nmeth.2023, and further refined by Ryan Eckert: https://ryaneckert.github.io/Stephanocoenia_FKNMS_PopGen/code/
Download necessary scripts and load modules for processing/analysis
cd ~/bin
git clone https://github.com/mstudiva/Urban-coral-population-genetics
mv Urban-coral-population-genetics/scripts/* .
git clone https://github.com/xiaoming-liu/stairway-plot-v2.git
mv stairway-plot-v2/stairway_plot_v2.1.2.zip .
unzip stairway_plot_v2.1.2.zip
rm stairway_plot_v2.1.2.zip
wget https://github.com/ANGSD/angsd/releases/download/0.940/angsd0.940.tar.gz;
tar xf angsd0.940.tar.gz;
cd htslib; make;
cd ..; cd angsd;
make HTSSRC=../htslib;
git clone https://github.com/brannala/BA3.git
cd BA3
make BA3SNP
Make all bin scripts executable
chmod +x *.sh *.pl *.py
These packages don’t play well with KoKo modules, or they require specific dependencies that conflict with base installations
module load anaconda3-2021.05-gcc-9.4.0-llhdqho
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda create -n 2bRAD bowtie2 samtools bcftools
conda create -n cutadapt cutadapt
conda create -n pcangsd bioconda::pcangsd
conda create -n moments bioconda::moments
conda create -n pgdspider bioconda::pgdspider
conda create -n bayescan bioconda::bayescan
wget "https://launch.basespace.illumina.com/CLI/latest/amd64-linux/bs" -O $HOME/bin/bs
chmod +x ~/bin/bs
Go to the website and confirm authorization by logging in to your BaseSpace account
bs auth
mkdir rawReads
cd rawReads
Transfer directly from Illumina BaseSpace)
echo '#!/bin/bash' > downloadReads.sh
echo 'bs download project --concurrency=high -q -n ######## -o .' >> downloadReads.sh
# -n is the project name and -o is the output directory
echo "find . -name '*.gz' -exec mv {} . \;" >> downloadReads.sh
echo 'rmdir SA*' >>downloadReads.sh
echo 'mkdir ../concatReads' >> downloadReads.sh
echo 'cp *.gz ../concatReads' >> downloadReads.sh
echo 'cd ../concatReads' >> downloadReads.sh
echo 'mergeReads.sh -o mergeTemp' >> downloadReads.sh
# -o is the directory to put output files in
echo 'rm *L00*' >> downloadReads.sh
echo "find . -name '*.gz' -exec mv {} . \;" >> downloadReads.sh
echo 'gunzip *.gz' >> downloadReads.sh
echo 'rmdir mergeTemp' >> downloadReads.sh
chmod +x downloadReads.sh
launcher_creator.py -b 'srun downloadReads.sh' -n downloadReads -q shortq7 -t 06:00:00 -e studivanms@gmail.com
sbatch --mem=200GB downloadReads.slurm
Counting raw reads
echo '#!/bin/bash' >rawReads.sh
echo readCounts.sh -e gz -o Raw >>rawReads.sh
sbatch -o rawReads.o%j -e rawReads.e%j --mail-type=ALL --mail-user=studivanms@gmail.com rawReads.sh
scp RawReadCounts to local machine
Deduplicates row pools into separate 3ill-BC’s (1-12), using reverse complement as the ID
2bRAD_trim_launch_dedup.pl fastq > trims.sh
launcher_creator.py -j trims.sh -n trims -q shortq7 -t 06:00:00 -e studivanms@gmail.com
sbatch --mem=200GB trims.slurm
Do we have the correct number of files?
ls -l *.tr0 | wc -l
mkdir trimmedReads
srun mv *.tr0 trimmedReads/ &
# Rezips the raw fastq's for storage
zipper.py -f fastq -a -9 --launcher -e studivanms@gmail.com
sbatch --mem=200GB zip.slurm
cd ../trimmedReads
Rename files based on two column lookup table (sampleID.csv): filename, then sample ID
srun sampleRename.py -i sampleID -f tr0
If any files are not renamed, their barcodes have been mis-sequenced Move them to a different directory (the hyphen is only found in non-renamed files)
mkdir unused
mv *-* unused/
For loop to generate a list of commands for each file
echo '#!/bin/bash' > trimse.sh
echo 'module load miniconda3-4.6.14-gcc-8.3.0-eenl5dj' >> trimse.sh
echo 'conda activate cutadapt' >> trimse.sh
for file in *.tr0; do
echo "cutadapt -q 15,15 -m 36 -o ${file/.tr0/}.trim $file > ${file/.tr0/}.trimlog.txt" >> trimse.sh;
done
Since this job cannot be run in parallel, split job script up and run each separately
for i in {1..6}; do cp trimse.sh "trimse$i.sh"; done
conda activate cutadapt
sbatch -o trimse.o%j -e trimse.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse.sh
sbatch -o trimse2.o%j -e trimse2.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse2.sh
sbatch -o trimse3.o%j -e trimse3.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse3.sh
sbatch -o trimse4.o%j -e trimse4.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse4.sh
sbatch -o trimse5.o%j -e trimse5.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse5.sh
sbatch -o trimse6.o%j -e trimse6.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse6.sh
sbatch -o trimse7.o%j -e trimse7.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse7.sh
conda deactivate
Do we have the correct number of files?
ls -l *.trim | wc -l
Counting the trimmed reads
echo '#!/bin/bash' >cleanReads
echo readCounts.sh -e trim -o Filt >>cleanReads
sbatch --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com cleanReads
scp FiltReadCounts to local machine
mkdir ../filteredReads
mv *.trim ../filteredReads
# Rezips the row pools for storage
zipper.py -f tr0 -a -9 --launcher -e studivanms@gmail.com
sbatch zip.slurm
Now that your reads are demultiplexed and trimmed, follow the species-specific pipelines below: