ABySS¶

ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. It can handle very large data sets and was one of the first assemblers to demonstrate the use of the de Bruijn graph technique.

Prepare Input Files

ABySS requires paired-end sequencing data as input, typically in FASTQ format. Make sure you have your input files ready in the cluster. For this tutorial, we'll assume that your files are named reads1.fastq and reads2.fastq.

Submit Abyss job

First create the following submission file abyss-slurm.sh using a text editor:

abyss-slurm.sh

#!/bin/bash
#SBATCH -J abyss            # job name
#SBATCH -o log_slurm.o%j    # output and error file name (%j expands to jobID)
#SBATCH -n 1                # total number of tasks requested
#SBATCH -N 1                # number of nodes you want to run on
#SBATCH --cpus-per-task 48
#SBATCH -p bsudfq           # queue (partition)
#SBATCH -t 12:00:00         # run time (hh:mm:ss)


# Load the abyss module
module load abyss

# Run abyss
abyss-pe k=64 name=my_assembly in='reads1.fastq reads2.fastq'

Replace 64 with the k-mer size appropriate for your data, my_assembly with the name you want for your output, and 'reads1.fastq reads2.fastq' with the names of your actual input files.

Then submit your job to the scheduler:

sbatch abyss-slurm.sh

Check Output

ABySS will create several output files. The main output is a FASTA file containing the assembled sequences, which will have the name you specified (e.g., my_assembly-contigs.fa).
```
ls -l my_assembly*
```

Resources¶

ABySS Documentation: Official documentation.
ABySS GitHub Repository: Provides the latest version of ABySS, including source code, release notes, and additional documentation.
ABySS: A parallel assembler for short read sequence data: Paper introducing ABySS.
SeqAnswers and Biostars: Bioinformatics forums.