# assemble_denovo_metagenomic Performs viral de novo assembly on metagenomic reads against a large range of possible reference genomes. Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads. Scaffold de novo contigs against a set of possible references and subsequently polish with reads. This workflow accepts a very large set of input reference genomes. It will subset the reference genomes to those with ANI hits to the provided contigs/MAGs and cluster the reference hits by any ANI similarity to each other. It will choose the top reference from each cluster and produce one assembly for each cluster. This is intended to allow for the presence of multiple diverse viral taxa (coinfections) while forcing a choice of the best assembly from groups of related reference genomes. ## Inputs ### Required inputs

assemble_denovo_metagenomic.batch_id_list
Array[String] — Default: None
???

assemble_denovo_metagenomic.download_ref_genomes_from_tsv.emailAddress
String — Default: None
???

assemble_denovo_metagenomic.kraken2_db_tgz
File — Default: None
Pre-built Kraken database tarball containing three files: hash.k2d, opts.k2d, and taxo.k2d.

assemble_denovo_metagenomic.krona_taxonomy_db_kraken2_tgz
File — Default: None
Krona taxonomy database containing a single file: taxonomy.tab, or possibly just a compressed taxonomy.tab

assemble_denovo_metagenomic.ncbi_taxdump_tgz
File — Default: None
An NCBI taxdump.tar.gz file that contains, at the minimum, a nodes.dmp and names.dmp file.

assemble_denovo_metagenomic.reads_bams
Array[File]+ — Default: None
Reads to classify. May be unmapped or mapped or both, paired-end or single-end. Multiple input files will be merged first.

assemble_denovo_metagenomic.sample_id
String — Default: None
???

assemble_denovo_metagenomic.spades.spades_n_reads
Int — Default: 10000000
Subsample reads threshold prior to assembly. Default set to 10000000

assemble_denovo_metagenomic.taxid_to_ref_accessions_tsv
File — Default: None
???

assemble_denovo_metagenomic.trim_clip_db
File — Default: None
Adapter sequences to remove via trimmomatic prior to SPAdes assembly

### Other common inputs

assemble_denovo_metagenomic.refine.trim_coords_bed
File? — Default: None
optional primers to trim in reference coordinate space (0-based BED format)

### Advanced inputs
Show/Hide

assemble_denovo_metagenomic.refine.call_consensus.mark_duplicates
Boolean — Default: false
Instead of removing duplicates, simply marks them.

assemble_denovo_metagenomic.refine.ivar_trim.min_keep_length
Int? — Default: None
Minimum length of read to retain after trimming (Default: 30)

assemble_denovo_metagenomic.refine.ivar_trim.min_quality
Int? — Default: 1
Minimum quality threshold for sliding window to pass (Default: 20)

assemble_denovo_metagenomic.refine.ivar_trim.sliding_window
Int? — Default: None
Width of sliding window for quality trimming (Default: 4)

assemble_denovo_metagenomic.scaffold.nucmer_max_gap
Int? — Default: None
When scaffolding contigs to the reference via nucmer, this specifies the -g parameter to nucmer (the maximum allowed gap between adjacent matches in a cluster). Our default is 200 (up from nucmer default of 90), mummer documentation suggests it is valid to increase up to 1000 to allow for more diversity.

assemble_denovo_metagenomic.scaffold.nucmer_min_cluster
Int? — Default: None
When scaffolding contigs to the reference via nucmer, this specifies the -c parameter to nucmer (minimum cluster length). Our default is the nucmer default of 65 bp.

assemble_denovo_metagenomic.scaffold.nucmer_min_match
Int? — Default: None
When scaffolding contigs to the reference via nucmer, this specifies the -l parameter to nucmer (the minimal size of a maximal exact match). Our default is 10 (down from nucmer default of 20) to allow for more divergence.

assemble_denovo_metagenomic.scaffold.replace_length
Int — Default: 55
The first and last replace_length base pairs of each segment in the output genome will be replaced with the equivalent sequences in the reference genome as a mechanism to handle common assembly errors in repetitive or inverted regions that are common to chromosome/segment ends. Valid values are any non-negative integer. Default is 55 bp.

assemble_denovo_metagenomic.scaffold.scaffold_min_contig_len
Int? — Default: None
Any sequences in contigs_fasta that are shorter than this length will be ignored for scaffolding.

assemble_denovo_metagenomic.scaffold.scaffold_min_pct_contig_aligned
Float? — Default: None
Any contig alignments to the reference scaffold that account for less than this fraction of the contig's length will be rejected for scaffolding. Valid values are fractions from 0 to 1; the default value is 0.3.

### Other inputs
Show/Hide

assemble_denovo_metagenomic.assembly_stats_empty.cpus
Int — Default: 4
???

assemble_denovo_metagenomic.assembly_stats_non_empty.cpus
Int — Default: 4
???

assemble_denovo_metagenomic.bbnorm_bam.cpu
Int? — Default: None
???

assemble_denovo_metagenomic.bbnorm_bam.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-core"
???

assemble_denovo_metagenomic.bbnorm_bam.kmer_length
Int? — Default: None
Kmer length for BBNorm analysis. Longer kmers are more specific but require more memory. (default: bbnorm default of 31)

assemble_denovo_metagenomic.bbnorm_bam.machine_mem_gb
Int? — Default: None
???

assemble_denovo_metagenomic.bbnorm_bam.passes
Int? — Default: None
Number of normalization passes. More passes give more accurate normalization but take longer. (default: 2 for inputs < 15GB, 1 for inputs >= 15GB to optimize runtime)

assemble_denovo_metagenomic.bbnorm_bam.target
Int — Default: 1000
BBNorm target normalization depth. Reads are downsampled to achieve approximately this coverage depth. (default: 10000)

assemble_denovo_metagenomic.biosample_accession
String? — Default: None
???

assemble_denovo_metagenomic.deplete.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-classify"
???

assemble_denovo_metagenomic.deplete.machine_mem_gb
Int — Default: 8
???

assemble_denovo_metagenomic.deplete.minimum_hit_groups
Int? — Default: None
???

assemble_denovo_metagenomic.deplete.taxonomic_ids
Array[Int]? — Default: None
???

assemble_denovo_metagenomic.deplete.withoutChildren
Boolean — Default: false
???

assemble_denovo_metagenomic.download_ref_genomes_from_tsv.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-phylo"
???

assemble_denovo_metagenomic.filter_acellular.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-classify"
???

assemble_denovo_metagenomic.filter_acellular.machine_mem_gb
Int — Default: 8
???

assemble_denovo_metagenomic.filter_acellular.minimum_hit_groups
Int? — Default: None
???

assemble_denovo_metagenomic.filter_acellular.taxonomic_ids
Array[Int]? — Default: None
???

assemble_denovo_metagenomic.filter_acellular.withoutChildren
Boolean — Default: false
???

assemble_denovo_metagenomic.kraken2.confidence_threshold
Float? — Default: 0.05
Kraken2 confidence score threshold (0.0-1.0). See https://ccb.jhu.edu/software/kraken2/index.shtml?t=manual#confidence-scoring

assemble_denovo_metagenomic.kraken2.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-classify"
???

assemble_denovo_metagenomic.kraken2.machine_mem_gb
Int — Default: 90
???

assemble_denovo_metagenomic.kraken2.min_base_qual
Int? — Default: None
Minimum base quality used in classification

assemble_denovo_metagenomic.max_reads_for_assembly
Int — Default: 10000000
???

assemble_denovo_metagenomic.merge_raw_reads.disk_size
Int — Default: 750
???

assemble_denovo_metagenomic.merge_raw_reads.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-core"
???

assemble_denovo_metagenomic.merge_raw_reads.machine_mem_gb
Int — Default: 8
???

assemble_denovo_metagenomic.merge_raw_reads.reheader_table
File? — Default: None
???

assemble_denovo_metagenomic.merge_raw_reads.sample_name
String? — Default: None
???

assemble_denovo_metagenomic.min_reads_for_rmdup
Int — Default: 5000000
???

assemble_denovo_metagenomic.refine.align_to_ref.cpu
Int? — Default: None
???

assemble_denovo_metagenomic.refine.align_to_ref.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-core"
???

assemble_denovo_metagenomic.refine.align_to_ref.machine_mem_gb
Int? — Default: None
???

assemble_denovo_metagenomic.refine.align_to_ref.read_count_downsample_threshold
Int — Default: 150000000
???

assemble_denovo_metagenomic.refine.align_to_ref.sample_name
String — Default: basename(basename(basename(reads_unmapped_bam,".bam"),".taxfilt"),".clean")
???

assemble_denovo_metagenomic.refine.align_to_ref_options
Map[String,String] — Default: {"novoalign": "-r Random -l 40 -g 40 -x 20 -t 501 -k", "bwa": "-k 12 -B 1", "minimap2": ""}
???

assemble_denovo_metagenomic.refine.align_to_self.cpu
Int? — Default: None
???

assemble_denovo_metagenomic.refine.align_to_self.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-core"
???

assemble_denovo_metagenomic.refine.align_to_self.machine_mem_gb
Int? — Default: None
???

assemble_denovo_metagenomic.refine.align_to_self.read_count_downsample_threshold
Int — Default: 150000000
???

assemble_denovo_metagenomic.refine.align_to_self.sample_name
String — Default: basename(basename(basename(reads_unmapped_bam,".bam"),".taxfilt"),".clean")
???

assemble_denovo_metagenomic.refine.align_to_self_options
Map[String,String] — Default: {"novoalign": "-r Random -l 40 -g 40 -x 20 -t 100", "bwa": "", "minimap2": ""}
???

assemble_denovo_metagenomic.refine.aligner
String — Default: "minimap2"
Read aligner software to use. Options: novoalign, bwa, minimap2. Minimap2 can automatically handle Illumina, PacBio, or Oxford Nanopore reads as long as the 'PL' field in the BAM read group header is set properly (novoalign and bwa are Illumina-only).

assemble_denovo_metagenomic.refine.alignment_metrics.amplicon_set
String? — Default: None
???

assemble_denovo_metagenomic.refine.alignment_metrics.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-core"
???

assemble_denovo_metagenomic.refine.alignment_metrics.machine_mem_gb
Int — Default: 16
???

assemble_denovo_metagenomic.refine.alignment_metrics.max_amp_len
Int — Default: 5000
???

assemble_denovo_metagenomic.refine.alignment_metrics.max_amplicons
Int — Default: 500
???

assemble_denovo_metagenomic.refine.call_consensus.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-assemble"
???

assemble_denovo_metagenomic.refine.call_consensus.machine_mem_gb
Int — Default: 8
???

assemble_denovo_metagenomic.refine.isnvs_ref.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-phylo"
???

assemble_denovo_metagenomic.refine.isnvs_ref.out_basename
String — Default: basename(aligned_bam,'.bam')
???

assemble_denovo_metagenomic.refine.isnvs_self.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-phylo"
???

assemble_denovo_metagenomic.refine.isnvs_self.out_basename
String — Default: basename(aligned_bam,'.bam')
???

assemble_denovo_metagenomic.refine.ivar_trim.bam_basename
String — Default: basename(aligned_bam,".bam")
???

assemble_denovo_metagenomic.refine.ivar_trim.disk_size
Int — Default: 375
???

assemble_denovo_metagenomic.refine.ivar_trim.docker
String — Default: "andersenlabapps/ivar:1.3.1"
???

assemble_denovo_metagenomic.refine.ivar_trim.machine_mem_gb
Int? — Default: None
???

assemble_denovo_metagenomic.refine.ivar_trim.primer_offset
Int? — Default: None
???

assemble_denovo_metagenomic.refine.major_cutoff
Float — Default: 0.75
If the major allele is present at a frequency higher than this cutoff, we will call an unambiguous base at that position. If it is equal to or below this cutoff, we will call an ambiguous base representing all possible alleles at that position.

assemble_denovo_metagenomic.refine.merge_align_to_ref.disk_size
Int — Default: 750
???

assemble_denovo_metagenomic.refine.merge_align_to_ref.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-core"
???

assemble_denovo_metagenomic.refine.merge_align_to_ref.machine_mem_gb
Int — Default: 8
???

assemble_denovo_metagenomic.refine.merge_align_to_ref.reheader_table
File? — Default: None
???

assemble_denovo_metagenomic.refine.merge_align_to_self.disk_size
Int — Default: 750
???

assemble_denovo_metagenomic.refine.merge_align_to_self.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-core"
???

assemble_denovo_metagenomic.refine.merge_align_to_self.machine_mem_gb
Int — Default: 8
???

assemble_denovo_metagenomic.refine.merge_align_to_self.reheader_table
File? — Default: None
???

assemble_denovo_metagenomic.refine.min_coverage
Int — Default: 3
Minimum read coverage required to call a position unambiguous.

assemble_denovo_metagenomic.refine.novocraft_license
File? — Default: None
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.base_q_threshold
Int? — Default: None
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.bin_large_plots
Boolean — Default: false
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.binning_summary_statistic
String? — Default: "max"
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-core"
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.mapping_q_threshold
Int? — Default: None
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.max_coverage_depth
Int? — Default: None
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.plot_height_pixels
Int? — Default: 850
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.plot_only_non_duplicates
Boolean — Default: false
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.plot_pixels_per_inch
Int? — Default: 100
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.plot_width_pixels
Int? — Default: 1100
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.plotXLimits
String? — Default: None
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.plotYLimits
String? — Default: None
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.read_length_threshold
Int? — Default: None
???

assemble_denovo_metagenomic.refine.plot_ref_coverage.skip_mark_dupes
Boolean — Default: false
???

assemble_denovo_metagenomic.refine.plot_self_coverage.base_q_threshold
Int? — Default: None
???

assemble_denovo_metagenomic.refine.plot_self_coverage.bin_large_plots
Boolean — Default: false
???

assemble_denovo_metagenomic.refine.plot_self_coverage.binning_summary_statistic
String? — Default: "max"
???

assemble_denovo_metagenomic.refine.plot_self_coverage.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-core"
???

assemble_denovo_metagenomic.refine.plot_self_coverage.mapping_q_threshold
Int? — Default: None
???

assemble_denovo_metagenomic.refine.plot_self_coverage.max_coverage_depth
Int? — Default: None
???

assemble_denovo_metagenomic.refine.plot_self_coverage.plot_height_pixels
Int? — Default: 850
???

assemble_denovo_metagenomic.refine.plot_self_coverage.plot_only_non_duplicates
Boolean — Default: false
???

assemble_denovo_metagenomic.refine.plot_self_coverage.plot_pixels_per_inch
Int? — Default: 100
???

assemble_denovo_metagenomic.refine.plot_self_coverage.plot_width_pixels
Int? — Default: 1100
???

assemble_denovo_metagenomic.refine.plot_self_coverage.plotXLimits
String? — Default: None
???

assemble_denovo_metagenomic.refine.plot_self_coverage.plotYLimits
String? — Default: None
???

assemble_denovo_metagenomic.refine.plot_self_coverage.read_length_threshold
Int? — Default: None
???

assemble_denovo_metagenomic.refine.plot_self_coverage.skip_mark_dupes
Boolean — Default: false
???

assemble_denovo_metagenomic.refine.run_discordance.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-core"
???

assemble_denovo_metagenomic.refine.sample_original_name
String? — Default: None
???

assemble_denovo_metagenomic.refine.skip_mark_dupes
Boolean — Default: false
skip Picard MarkDuplicates step after alignment. This is recommended to be set to true for PCR amplicon based data. (Default: false)

assemble_denovo_metagenomic.report_primary_kraken_taxa.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-classify"
???

assemble_denovo_metagenomic.report_primary_kraken_taxa.focal_taxon
String — Default: "Viruses"
???

assemble_denovo_metagenomic.sample_name
String? — Default: None
???

assemble_denovo_metagenomic.scaffold.aligner
String — Default: "muscle"
Alignment tools used to align the reference sequence to aligned contigs. Possible options: muscle, mafft, mummer (= nucmer), set to muscle for default.

assemble_denovo_metagenomic.scaffold.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-assemble"
???

assemble_denovo_metagenomic.scaffold.machine_mem_gb
Int? — Default: None
???

assemble_denovo_metagenomic.scaffold.sample_name
String — Default: basename(basename(contigs_fasta,".fasta"),".assembly1-spades")
???

assemble_denovo_metagenomic.scaffold.skani_c
Int? — Default: None
???

assemble_denovo_metagenomic.scaffold.skani_m
Int? — Default: None
???

assemble_denovo_metagenomic.scaffold.skani_s
Int? — Default: None
???

assemble_denovo_metagenomic.select_references.cpu
Int — Default: 2
???

assemble_denovo_metagenomic.select_references.disk_size
Int — Default: 100
???

assemble_denovo_metagenomic.select_references.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-assemble"
???

assemble_denovo_metagenomic.select_references.machine_mem_gb
Int — Default: 4
???

assemble_denovo_metagenomic.select_references.skani_c
Int? — Default: None
???

assemble_denovo_metagenomic.select_references.skani_m
Int? — Default: None
???

assemble_denovo_metagenomic.select_references.skani_n
Int? — Default: None
???

assemble_denovo_metagenomic.select_references.skani_s
Int? — Default: None
???

assemble_denovo_metagenomic.spades.cpu
Int? — Default: None
???

assemble_denovo_metagenomic.spades.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-assemble"
???

assemble_denovo_metagenomic.spades.machine_mem_gb
Int? — Default: None
???

assemble_denovo_metagenomic.spades.sample_name
String — Default: basename(basename(reads_unmapped_bam,".bam"),".taxfilt")
???

assemble_denovo_metagenomic.spades.spades_min_contig_len
Int? — Default: None
Minimum length of output contig.

assemble_denovo_metagenomic.spades.spades_options
String? — Default: None
Display additional options to pass the SPAdes assembler.

assemble_denovo_metagenomic.spikein.cpu
Int? — Default: None
???

assemble_denovo_metagenomic.spikein.docker
String — Default: "ghcr.io/broadinstitute/viral-ngs:3.0.4-core"
???

assemble_denovo_metagenomic.spikein.machine_mem_gb
Int? — Default: None
???

assemble_denovo_metagenomic.spikein.topNHits
Int — Default: 3
???

assemble_denovo_metagenomic.spikein_db
File? — Default: None
ERCC spike-in sequences

assemble_denovo_metagenomic.table_name
String — Default: "sample"
???

assemble_denovo_metagenomic.tar_extract.disk_size
Int — Default: 375
???

assemble_denovo_metagenomic.tar_extract.tar_opts
String — Default: "-z"
???

assemble_denovo_metagenomic.tax_lookup.set_default_keys
Array[String] — Default: []
???

assemble_denovo_metagenomic.taxa_to_avoid_assembly
Array[String] — Default: ["Vertebrata", "other sequences", "Bacteria"]
???

assemble_denovo_metagenomic.taxa_to_dehost
Array[String] — Default: ["Vertebrata"]
???

assemble_denovo_metagenomic.unique_batch_ids.separator
String — Default: ","
???

## Outputs

assemble_denovo_metagenomic.assembly_all_fastas
Array[File]
???

assemble_denovo_metagenomic.assembly_all_lengths_unambig
Array[Int]
???

assemble_denovo_metagenomic.assembly_all_pct_ref_cov
Array[Float]
???

assemble_denovo_metagenomic.assembly_all_taxids
Array[String]
???

assemble_denovo_metagenomic.assembly_all_taxnames
Array[String]
???

assemble_denovo_metagenomic.assembly_method
String
???

assemble_denovo_metagenomic.assembly_stats_by_taxon
Array[Map[String,String]]
???

assemble_denovo_metagenomic.assembly_stats_by_taxon_tsv
File
???

assemble_denovo_metagenomic.batch_ids
String
???

assemble_denovo_metagenomic.contigs_fasta
File
???

assemble_denovo_metagenomic.kraken2_focal_taxon_name
String
???

assemble_denovo_metagenomic.kraken2_focal_total_reads
Int
???

assemble_denovo_metagenomic.kraken2_krona_plot
File
Visualize the results of the Kraken2 analysis with Krona, which disaplys taxonmic hierarchiral data in multi-layerd pie.

assemble_denovo_metagenomic.kraken2_reads_report
File
???

assemble_denovo_metagenomic.kraken2_summary_report
File
Kraken report output file.

assemble_denovo_metagenomic.kraken2_top_taxa_report
File
???

assemble_denovo_metagenomic.kraken2_top_taxon_id
String
???

assemble_denovo_metagenomic.kraken2_top_taxon_name
String
???

assemble_denovo_metagenomic.kraken2_top_taxon_num_reads
Int
???

assemble_denovo_metagenomic.kraken2_top_taxon_pct_of_focal
Float
???

assemble_denovo_metagenomic.kraken2_top_taxon_rank
String
???

assemble_denovo_metagenomic.read_counts_acellular
Int
???

assemble_denovo_metagenomic.read_counts_assembly_input
Int
???

assemble_denovo_metagenomic.read_counts_dehosted
Int
???

assemble_denovo_metagenomic.read_counts_prespades_subsample
Int
???

assemble_denovo_metagenomic.read_counts_raw
Int
???

assemble_denovo_metagenomic.reads_acellular_ubam
File
???

assemble_denovo_metagenomic.reads_assembly_input_ubam
File
???

assemble_denovo_metagenomic.reads_dehosted_ubam
File
???

assemble_denovo_metagenomic.skani_contigs_to_refs_dist_tsv
File
???

assemble_denovo_metagenomic.skani_num_ref_clusters
Int
???

assemble_denovo_metagenomic.spikein_pct_lesser_hits
String?
???

assemble_denovo_metagenomic.spikein_pct_of_total_reads
String?
???

assemble_denovo_metagenomic.spikein_report
File?
???

assemble_denovo_metagenomic.spikein_tophit
String?
???

assemble_denovo_metagenomic.viral_assemble_version
String
???

assemble_denovo_metagenomic.viral_classify_version
String
???


> Generated using WDL AID (1.0.0)