4.50. metagenomic_denovo

Assisted de novo viral genome assembly (SPAdes, scaffolding, and polishing) from metagenomic raw reads. Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2 and optionally using BWA, BLASTN, and/or BMTAGGER databases), and FASTQC/multiQC of reads.

4.50.1. Inputs

4.50.1.1. Required inputs

metagenomic_denovo.fastq_1
File — Default: None
Unaligned read1 file in fastq format

metagenomic_denovo.kraken2_db_tgz
File — Default: None
Pre-built Kraken database tarball containing three files: hash.k2d, opts.k2d, and taxo.k2d.

metagenomic_denovo.krona_taxonomy_tab
File — Default: None
Krona taxonomy database containing a single file: taxonomy.tab, or possibly just a compressed taxonomy.tab

metagenomic_denovo.ncbi_taxdump_tgz
File — Default: None
An NCBI taxdump.tar.gz file that contains, at the minimum, a nodes.dmp and names.dmp file.

metagenomic_denovo.reference_genome_fasta
Array[File]+ — Default: None
After denovo assembly, large contigs are scaffolded against a reference genome to determine orientation and to join contigs together, before further polishing by reads. You must supply at least one reference genome (all segments/chromomes in a single fasta file). If more than one reference is provided, contigs will be scaffolded against all of them and the one with the most complete assembly will be chosen for downstream polishing.

metagenomic_denovo.sample_name
String — Default: None
Sample name. This is required and will populate the 'SM' read group value and will be used as the output filename (must be filename-friendly).

metagenomic_denovo.sequencing_platform
String — Default: None
Sequencing platform. This is required and will populate the 'PL' read group value. Must be one of CAPILLARY, DNBSEQ, HELICOS, ILLUMINA, IONTORRENT, LS454, ONT, PACBIO, or SOLID.

metagenomic_denovo.trim_clip_db
File — Default: None
Adapter sequences to remove via trimmomatic prior to SPAdes assembly

4.50.1.2. Other common inputs

metagenomic_denovo.refine.trim_coords_bed
File? — Default: None
optional primers to trim in reference coordinate space (0-based BED format)

4.50.1.3. Other inputs

Show/Hide

metagenomic_denovo.assemble.docker
String — Default: "quay.io/broadinstitute/viral-assemble:2.1.33.0"
???

metagenomic_denovo.assemble.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.assemble.spades_min_contig_len
Int? — Default: None
???

metagenomic_denovo.assemble.spades_n_reads
Int — Default: 10000000
???

metagenomic_denovo.assemble.spades_options
String? — Default: None
???

metagenomic_denovo.deplete_blastDbs
Array[File] — Default: []
Optional list of databases to use for blastn-based depletion. Sequences in fasta format will be indexed on the fly, pre-blast-indexed databases may be provided as tarballs.

metagenomic_denovo.deplete_bmtaggerDbs
Array[File] — Default: []
Optional list of databases to use for bmtagger-based depletion. Sequences in fasta format will be indexed on the fly, pre-bmtagger-indexed databases may be provided as tarballs.

metagenomic_denovo.deplete_bwaDbs
Array[File] — Default: []
Optional list of databases to use for bwa mem-based depletion. Sequences in fasta format will be indexed on the fly, pre-bwa-indexed databases may be provided as tarballs.

metagenomic_denovo.deplete_k2.docker
String — Default: "quay.io/broadinstitute/viral-classify:2.1.33.0"
???

metagenomic_denovo.deplete_k2.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.deplete_k2.minimum_hit_groups
Int? — Default: None
???

metagenomic_denovo.deplete_k2.taxonomic_ids
Array[Int]? — Default: None
???

metagenomic_denovo.deplete_k2.withoutChildren
Boolean — Default: false
???

metagenomic_denovo.deplete_taxa.clear_tags
Boolean? — Default: false
???

metagenomic_denovo.deplete_taxa.cpu
Int? — Default: 8
???

metagenomic_denovo.deplete_taxa.docker
String — Default: "quay.io/broadinstitute/viral-classify:2.1.33.0"
???

metagenomic_denovo.deplete_taxa.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.deplete_taxa.query_chunk_size
Int? — Default: None
???

metagenomic_denovo.deplete_taxa.tags_to_clear_space_separated
String? — Default: "XT X0 X1 XA AM SM BQ CT XN OC OP"
???

metagenomic_denovo.fastq_2
File? — Default: None
Unaligned read2 file in fastq format. This should be empty for single-end read conversion and required for paired-end reads. If provided, it must match fastq_1 in length and order.

metagenomic_denovo.fastqc_dehosted.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.fastqc_raw.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.fastqc_taxfilt.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.FastqToUBAM.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.FastqToUBAM.platform_unit
String? — Default: None
???

metagenomic_denovo.FastqToUBAM.readgroup_name
String? — Default: None
???

metagenomic_denovo.FastqToUBAM.sequencing_center
String? — Default: None
???

metagenomic_denovo.filter_acellular.docker
String — Default: "quay.io/broadinstitute/viral-classify:2.1.33.0"
???

metagenomic_denovo.filter_acellular.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.filter_acellular.minimum_hit_groups
Int? — Default: None
???

metagenomic_denovo.filter_acellular.taxonomic_ids
Array[Int]? — Default: None
???

metagenomic_denovo.filter_acellular.withoutChildren
Boolean — Default: false
???

metagenomic_denovo.filter_to_taxon.docker
String — Default: "quay.io/broadinstitute/viral-classify:2.1.33.0"
???

metagenomic_denovo.filter_to_taxon.error_on_reads_in_neg_control
Boolean? — Default: false
???

metagenomic_denovo.filter_to_taxon.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.filter_to_taxon.neg_control_prefixes_space_separated
String? — Default: "neg water NTC"
???

metagenomic_denovo.filter_to_taxon.negative_control_reads_threshold
Int? — Default: 0
???

metagenomic_denovo.filter_to_taxon_db
File? — Default: None
Optional database to use to filter read set to those that match by LASTAL. Sequences in fasta format will be indexed on the fly.

metagenomic_denovo.kraken2.confidence_threshold
Float? — Default: None
Kraken2 confidence score threshold (0.0-1.0). See https://ccb.jhu.edu/software/kraken2/index.shtml?t=manual#confidence-scoring

metagenomic_denovo.kraken2.docker
String — Default: "quay.io/broadinstitute/viral-classify:2.1.33.0"
???

metagenomic_denovo.kraken2.machine_mem_gb
Int — Default: 72
???

metagenomic_denovo.kraken2.min_base_qual
Int? — Default: None
Minimum base quality used in classification

metagenomic_denovo.library_name
String — Default: "1"
???

metagenomic_denovo.refine.align_to_ref.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.refine.align_to_ref.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.refine.align_to_ref.sample_name
String — Default: basename(basename(basename(reads_unmapped_bam,".bam"),".taxfilt"),".clean")
???

metagenomic_denovo.refine.align_to_ref_options
Map[String,String] — Default: {"novoalign": "-r Random -l 40 -g 40 -x 20 -t 501 -k", "bwa": "-k 12 -B 1", "minimap2": ""}
???

metagenomic_denovo.refine.align_to_self.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.refine.align_to_self.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.refine.align_to_self.sample_name
String — Default: basename(basename(basename(reads_unmapped_bam,".bam"),".taxfilt"),".clean")
???

metagenomic_denovo.refine.align_to_self_options
Map[String,String] — Default: {"novoalign": "-r Random -l 40 -g 40 -x 20 -t 100", "bwa": "", "minimap2": ""}
???

metagenomic_denovo.refine.aligner
String — Default: "minimap2"
Read aligner software to use. Options: novoalign, bwa, minimap2. Minimap2 can automatically handle Illumina, PacBio, or Oxford Nanopore reads as long as the 'PL' field in the BAM read group header is set properly (novoalign and bwa are Illumina-only).

metagenomic_denovo.refine.alignment_metrics.amplicon_set
String? — Default: None
???

metagenomic_denovo.refine.alignment_metrics.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.refine.alignment_metrics.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.refine.alignment_metrics.max_amp_len
Int? — Default: 5000
???

metagenomic_denovo.refine.alignment_metrics.max_amplicons
Int? — Default: 500
???

metagenomic_denovo.refine.call_consensus.docker
String — Default: "quay.io/broadinstitute/viral-assemble:2.1.33.0"
???

metagenomic_denovo.refine.call_consensus.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.refine.call_consensus.mark_duplicates
Boolean — Default: false
???

metagenomic_denovo.refine.isnvs_ref.docker
String — Default: "quay.io/biocontainers/lofreq:2.1.5--py38h588ecb2_4"
???

metagenomic_denovo.refine.isnvs_ref.out_basename
String — Default: basename(aligned_bam,'.bam')
???

metagenomic_denovo.refine.isnvs_self.docker
String — Default: "quay.io/biocontainers/lofreq:2.1.5--py38h588ecb2_4"
???

metagenomic_denovo.refine.isnvs_self.out_basename
String — Default: basename(aligned_bam,'.bam')
???

metagenomic_denovo.refine.ivar_trim.docker
String — Default: "andersenlabapps/ivar:1.3.1"
???

metagenomic_denovo.refine.ivar_trim.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.refine.ivar_trim.min_keep_length
Int? — Default: None
Minimum length of read to retain after trimming (Default: 30)

metagenomic_denovo.refine.ivar_trim.min_quality
Int? — Default: 1
Minimum quality threshold for sliding window to pass (Default: 20)

metagenomic_denovo.refine.ivar_trim.primer_offset
Int? — Default: None
???

metagenomic_denovo.refine.ivar_trim.sliding_window
Int? — Default: None
Width of sliding window for quality trimming (Default: 4)

metagenomic_denovo.refine.major_cutoff
Float — Default: 0.75
If the major allele is present at a frequency higher than this cutoff, we will call an unambiguous base at that position. If it is equal to or below this cutoff, we will call an ambiguous base representing all possible alleles at that position.

metagenomic_denovo.refine.merge_align_to_ref.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.refine.merge_align_to_ref.reheader_table
File? — Default: None
???

metagenomic_denovo.refine.merge_align_to_self.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.refine.merge_align_to_self.reheader_table
File? — Default: None
???

metagenomic_denovo.refine.min_coverage
Int — Default: 3
Minimum read coverage required to call a position unambiguous.

metagenomic_denovo.refine.novocraft_license
File? — Default: None
The default Novoalign short read aligner is a commercially licensed software that is available in a much slower, single-threaded version for free. If you have a paid license file, provide it here to run in multi-threaded mode. If this is omitted, it will run in single-threaded mode.

metagenomic_denovo.refine.plot_ref_coverage.base_q_threshold
Int? — Default: None
???

metagenomic_denovo.refine.plot_ref_coverage.bin_large_plots
Boolean — Default: false
???

metagenomic_denovo.refine.plot_ref_coverage.binning_summary_statistic
String? — Default: "max"
???

metagenomic_denovo.refine.plot_ref_coverage.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.refine.plot_ref_coverage.mapping_q_threshold
Int? — Default: None
???

metagenomic_denovo.refine.plot_ref_coverage.max_coverage_depth
Int? — Default: None
???

metagenomic_denovo.refine.plot_ref_coverage.plot_height_pixels
Int? — Default: 850
???

metagenomic_denovo.refine.plot_ref_coverage.plot_only_non_duplicates
Boolean — Default: false
???

metagenomic_denovo.refine.plot_ref_coverage.plot_pixels_per_inch
Int? — Default: 100
???

metagenomic_denovo.refine.plot_ref_coverage.plot_width_pixels
Int? — Default: 1100
???

metagenomic_denovo.refine.plot_ref_coverage.plotXLimits
String? — Default: None
???

metagenomic_denovo.refine.plot_ref_coverage.plotYLimits
String? — Default: None
???

metagenomic_denovo.refine.plot_ref_coverage.read_length_threshold
Int? — Default: None
???

metagenomic_denovo.refine.plot_ref_coverage.skip_mark_dupes
Boolean — Default: false
???

metagenomic_denovo.refine.plot_self_coverage.base_q_threshold
Int? — Default: None
???

metagenomic_denovo.refine.plot_self_coverage.bin_large_plots
Boolean — Default: false
???

metagenomic_denovo.refine.plot_self_coverage.binning_summary_statistic
String? — Default: "max"
???

metagenomic_denovo.refine.plot_self_coverage.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.refine.plot_self_coverage.mapping_q_threshold
Int? — Default: None
???

metagenomic_denovo.refine.plot_self_coverage.max_coverage_depth
Int? — Default: None
???

metagenomic_denovo.refine.plot_self_coverage.plot_height_pixels
Int? — Default: 850
???

metagenomic_denovo.refine.plot_self_coverage.plot_only_non_duplicates
Boolean — Default: false
???

metagenomic_denovo.refine.plot_self_coverage.plot_pixels_per_inch
Int? — Default: 100
???

metagenomic_denovo.refine.plot_self_coverage.plot_width_pixels
Int? — Default: 1100
???

metagenomic_denovo.refine.plot_self_coverage.plotXLimits
String? — Default: None
???

metagenomic_denovo.refine.plot_self_coverage.plotYLimits
String? — Default: None
???

metagenomic_denovo.refine.plot_self_coverage.read_length_threshold
Int? — Default: None
???

metagenomic_denovo.refine.plot_self_coverage.skip_mark_dupes
Boolean — Default: false
???

metagenomic_denovo.refine.run_discordance.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.refine.skip_mark_dupes
Boolean — Default: false
skip Picard MarkDuplicates step after alignment. This is recommended to be set to true for PCR amplicon based data. (Default: false)

metagenomic_denovo.rmdup_ubam.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.rmdup_ubam.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.rmdup_ubam.method
String — Default: "mvicuna"
mvicuna or cdhit

metagenomic_denovo.run_date_iso
String? — Default: None
???

metagenomic_denovo.scaffold.aligner
String — Default: "muscle"
???

metagenomic_denovo.scaffold.docker
String — Default: "quay.io/broadinstitute/viral-assemble:2.1.33.0"
???

metagenomic_denovo.scaffold.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.scaffold.min_length_fraction
Float? — Default: None
???

metagenomic_denovo.scaffold.min_unambig
Float? — Default: None
???

metagenomic_denovo.scaffold.nucmer_max_gap
Int? — Default: None
???

metagenomic_denovo.scaffold.nucmer_min_cluster
Int? — Default: None
???

metagenomic_denovo.scaffold.nucmer_min_match
Int? — Default: None
???

metagenomic_denovo.scaffold.replace_length
Int — Default: 55
???

metagenomic_denovo.scaffold.scaffold_min_contig_len
Int? — Default: None
???

metagenomic_denovo.scaffold.scaffold_min_pct_contig_aligned
Float? — Default: None
???

metagenomic_denovo.spikein.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

metagenomic_denovo.spikein.machine_mem_gb
Int? — Default: None
???

metagenomic_denovo.spikein.topNHits
Int — Default: 3
???

metagenomic_denovo.spikein_db
File? — Default: None
ERCC/SDSI spike-in sequences

metagenomic_denovo.taxa_to_avoid_assembly
Array[String] — Default: ["Vertebrata", "other sequences", "Bacteria"]
???

metagenomic_denovo.taxa_to_dehost
Array[String] — Default: ["Vertebrata"]
???

4.50.2. Outputs

metagenomic_denovo.aligned_bam
File
???

metagenomic_denovo.aligned_only_reads_bam
File
???

metagenomic_denovo.aligned_only_reads_fastqc
File
???

metagenomic_denovo.assembly_length
Int
???

metagenomic_denovo.assembly_length_unambiguous
Int
???

metagenomic_denovo.assembly_preimpute_length
Int
???

metagenomic_denovo.assembly_preimpute_length_unambiguous
Int
???

metagenomic_denovo.bases_aligned
Float
???

metagenomic_denovo.contigs_fasta
File
???

metagenomic_denovo.coverage_plot
File
???

metagenomic_denovo.coverage_tsv
File
???

metagenomic_denovo.dedup_bam
File
???

metagenomic_denovo.dedup_fastqc
File
???

metagenomic_denovo.denovo_in_bam
File
???

metagenomic_denovo.depleted_bam
File
???

metagenomic_denovo.depleted_fastqc
File
???

metagenomic_denovo.final_assembly_fasta
File
???

metagenomic_denovo.intermediate_gapfill_fasta
File
???

metagenomic_denovo.intermediate_scaffold_fasta
File
???

metagenomic_denovo.isnvs_vcf
File
???

metagenomic_denovo.kraken2_krona_plot
File
???

metagenomic_denovo.kraken2_summary_report
File
???

metagenomic_denovo.mean_coverage
Float
???

metagenomic_denovo.num_libraries
Int
???

metagenomic_denovo.num_read_groups
Int
???

metagenomic_denovo.raw_fastqc
File
???

metagenomic_denovo.raw_unmapped_bam
File
???

metagenomic_denovo.read_counts_dedup
Int
???

metagenomic_denovo.read_counts_denovo_in
Int
???

metagenomic_denovo.read_counts_depleted
Int
???

metagenomic_denovo.read_counts_raw
Int
???

metagenomic_denovo.read_counts_taxfilt
Int
???

metagenomic_denovo.read_pairs_aligned
Int
???

metagenomic_denovo.reads_aligned
Int
???

metagenomic_denovo.replicate_concordant_sites
Int
???

metagenomic_denovo.replicate_discordant_indels
Int
???

metagenomic_denovo.replicate_discordant_snps
Int
???

metagenomic_denovo.replicate_discordant_vcf
File
???

metagenomic_denovo.scaffold_fasta
File
???

metagenomic_denovo.scaffolding_alt_contigs
File
???

metagenomic_denovo.scaffolding_chosen_ref_names
Array[String]
???

metagenomic_denovo.scaffolding_stats
File
???

metagenomic_denovo.spikein_hits
File?
???

metagenomic_denovo.taxfilt_bam
File
???

metagenomic_denovo.taxfilt_fastqc
File
???

metagenomic_denovo.viral_assemble_version
String
???

metagenomic_denovo.viral_classify_version
String
???


Generated using WDL AID (1.0.0)