2bRAD - from raw reads to trimmed samples

version: 18 November, 2025

GitHub repository

A B O U T T H I S D O C U M E N T

This walkthrough describes the initial processing of 2bRAD reads generated from six coral species collected across urbanized and reef habitats in southeast Florida. It describes (1) setting up the workspace, (2) downloading, unzipping, and concatenating reads, (3) demultiplexing row pools and removing Illumina adapters, and (4) performing quality filtering. For species-specific pipelines from genome alignment on, please see below:

Library prep, bioinformatics, and analysis protocols are credited to the 2bRAD pipeline originally developed by Misha Matz: https://doi.org/10.1038/nmeth.2023, and further refined by Ryan Eckert: https://ryaneckert.github.io/Stephanocoenia_FKNMS_PopGen/code/

S E T U P

Download necessary scripts and load modules for processing/analysis

Downloading scripts

cd ~/bin
git clone https://github.com/mstudiva/Urban-coral-population-genetics
mv Urban-coral-population-genetics/scripts/* .

git clone https://github.com/xiaoming-liu/stairway-plot-v2.git
mv stairway-plot-v2/stairway_plot_v2.1.2.zip .
unzip stairway_plot_v2.1.2.zip
rm stairway_plot_v2.1.2.zip 

wget https://github.com/ANGSD/angsd/releases/download/0.940/angsd0.940.tar.gz;
tar xf angsd0.940.tar.gz;
cd htslib; make;
cd ..; cd angsd;
make HTSSRC=../htslib;

git clone https://github.com/brannala/BA3.git
cd BA3
make BA3SNP

Make all bin scripts executable

chmod +x *.sh *.pl *.py

Creating conda environments

These packages don’t play well with KoKo modules, or they require specific dependencies that conflict with base installations

module load anaconda3-2021.05-gcc-9.4.0-llhdqho
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

conda create -n 2bRAD bowtie2 samtools bcftools

conda create -n cutadapt cutadapt

conda create -n pcangsd bioconda::pcangsd

conda create -n moments bioconda::moments

conda create -n pgdspider bioconda::pgdspider

conda create -n bayescan bioconda::bayescan

Installing Illumina BaseSpace

wget "https://launch.basespace.illumina.com/CLI/latest/amd64-linux/bs" -O $HOME/bin/bs

chmod +x ~/bin/bs

Go to the website and confirm authorization by logging in to your BaseSpace account

bs auth

D O W N L O A D R E A D S

Downloading and concatenating raw reads

mkdir rawReads
cd rawReads

Transfer directly from Illumina BaseSpace)

echo '#!/bin/bash' > downloadReads.sh
echo 'bs download project --concurrency=high -q -n ######## -o .' >> downloadReads.sh
# -n is the project name and -o is the output directory

echo "find . -name '*.gz' -exec mv {} . \;" >> downloadReads.sh
echo 'rmdir SA*' >>downloadReads.sh
echo 'mkdir ../concatReads' >> downloadReads.sh
echo 'cp *.gz ../concatReads' >> downloadReads.sh
echo 'cd ../concatReads' >> downloadReads.sh
echo 'mergeReads.sh -o mergeTemp' >> downloadReads.sh
# -o is the directory to put output files in

echo 'rm *L00*' >> downloadReads.sh
echo "find . -name '*.gz' -exec mv {} . \;" >> downloadReads.sh
echo 'gunzip *.gz' >> downloadReads.sh
echo 'rmdir mergeTemp' >> downloadReads.sh

chmod +x downloadReads.sh

launcher_creator.py -b 'srun downloadReads.sh' -n downloadReads -q shortq7 -t 06:00:00 -e studivanms@gmail.com
sbatch --mem=200GB downloadReads.slurm

Counting raw reads

echo '#!/bin/bash' >rawReads.sh
echo readCounts.sh -e gz -o Raw >>rawReads.sh
sbatch -o rawReads.o%j -e rawReads.e%j --mail-type=ALL --mail-user=studivanms@gmail.com rawReads.sh

scp RawReadCounts to local machine

D E M U L T I P L E X I N G & T R I M M I N G

Demultiplex reads and trim Illumina adapters

Deduplicates row pools into separate 3ill-BC’s (1-12), using reverse complement as the ID

2bRAD_trim_launch_dedup.pl fastq > trims.sh
launcher_creator.py -j trims.sh -n trims -q shortq7 -t 06:00:00 -e studivanms@gmail.com
sbatch --mem=200GB trims.slurm

Do we have the correct number of files?

ls -l *.tr0 | wc -l

mkdir trimmedReads
srun mv *.tr0 trimmedReads/ &

# Rezips the raw fastq's for storage
zipper.py -f fastq -a -9 --launcher -e studivanms@gmail.com
sbatch --mem=200GB zip.slurm

cd ../trimmedReads

Rename files based on two column lookup table (sampleID.csv): filename, then sample ID

srun sampleRename.py -i sampleID -f tr0

If any files are not renamed, their barcodes have been mis-sequenced Move them to a different directory (the hyphen is only found in non-renamed files)

mkdir unused
mv *-* unused/

Quality filtering with cutadapt

For loop to generate a list of commands for each file

echo '#!/bin/bash' > trimse.sh
echo 'module load miniconda3-4.6.14-gcc-8.3.0-eenl5dj' >> trimse.sh
echo 'conda activate cutadapt' >> trimse.sh
for file in *.tr0; do
echo "cutadapt -q 15,15 -m 36 -o ${file/.tr0/}.trim $file > ${file/.tr0/}.trimlog.txt" >> trimse.sh;
done

Since this job cannot be run in parallel, split job script up and run each separately

for i in {1..6}; do cp trimse.sh "trimse$i.sh"; done
conda activate cutadapt
sbatch -o trimse.o%j -e trimse.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse.sh
sbatch -o trimse2.o%j -e trimse2.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse2.sh
sbatch -o trimse3.o%j -e trimse3.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse3.sh
sbatch -o trimse4.o%j -e trimse4.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse4.sh
sbatch -o trimse5.o%j -e trimse5.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse5.sh
sbatch -o trimse6.o%j -e trimse6.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse6.sh
sbatch -o trimse7.o%j -e trimse7.e%j --mem=200GB --mail-type=ALL --mail-user=studivanms@gmail.com trimse7.sh
conda deactivate