Explanations

Purpose of each nextflow script:

         start.nf

                    This script is designed to initialize the environment needed for proper execution of the
                    main.nf script. This start.nf script only need to be run once, as it is primarily tasked
                    with donwloading test data, databases, and pre-processing said databases.

         main.nf

                    This script is designed to be the primary analysis tool. main.nf can be repeated as
                    many times as desired, with various processing options, and can be parallelized for large
                    scale analysis. main.nf takes in shotgun metagenomic data and quantifies the absolute
                    abundance of Antimicrobial Resistant Genes and 16S rRNA Genes for the input
                    data/samples. Input data can be provided as local fastq files or fetched from the NCBI
                    Sequence Read Archive when given an SRSA ID.

         modules.nf

                    This script is designed to house all the processes (functions) needed for the start.nf
                    and main.nf workflows. These processes are the functional code and commands
                    needed to drive the analysis. Individual processes in modules.nf can be called
                    independtly of the workflow if users wish to carry out merely a fraction of the entire
                    bioinformaic pipeline, or desire a non-automated step-by-step analysis.

When to use each nextflow script:

         start.nf

                    This script should be the first script executed upon downloading the GitHub Repository.
                    If users already have all their data and processed databases stored locally, this script is
                    irrelevant.

         main.nf

                    This script is run when the user wishes to know the absolute ARG and 16S rRNA
                    abundance found in paired-end shotgun metagenomically sequenced data of animal
                    agricutural origin. The pipeline is run for every set of paired-end reads presented to the
                    workflow as an input.

         modules.nf

                    This script is only run when the user wishes to execute a single process individually and
                    not carry out the entire workflow. Otherwise, it serves as a formatted group of
                    functions that are imported by the actual workflows: start.nf and main.nf.

Potential issues with each nextflow script:

         start.nf

                    -Needs a significant amount of memory and time, lack of either could induce errors.

                    -Database downloads other than the defaults prove tricky at times.

         main.nf

                    -Currently relies on access to my labs software directory for
                    BBDuk and CD-HHIT-EST. Future docker container development will resolve this issue.

         modules.nf

                    -Some processes are hard coded for slurm resources allocations, which requires
                    access to an HPC and slurm scheduler.