4.24. genbank¶

Prepare assemblies for Genbank submission. This includes annotation by simple coordinate transfer from Genbank annotations and a multiple alignment. See https://viral-pipelines.readthedocs.io/en/latest/ncbi_submission.html for details.

4.24.1. Inputs¶

4.24.1.1. Required inputs¶

genbank.assemblies_fasta
Array[File]+ — Default: None
Genomes to prepare for Genbank submission. One file per genome: all segments/chromosomes included in one file. All fasta files must contain exactly the same number of sequences as reference_fasta (which must equal the number of files in reference_annot_tbl).

genbank.authors_sbt
File — Default: None
A genbank submission template file (SBT) with the author list, created at https://submit.ncbi.nlm.nih.gov/genbank/template/submission/

genbank.reference_annot_tbl
Array[File]+ — Default: None
NCBI Genbank feature tables, one file for each segment/chromosome described in reference_fasta.

genbank.reference_fasta
File — Default: None
Reference genome, all segments/chromosomes in one fasta file. Headers must be Genbank accessions.

4.24.1.2. Other common inputs¶

genbank.biosampleMap
File? — Default: None
A two column tab text file mapping sample IDs (first column) to NCBI BioSample accession numbers (second column). These typically take the format 'SAMN****' and are obtained by registering your samples first at https://submit.ncbi.nlm.nih.gov/

genbank.coverage_table
File? — Default: None
A two column tab text file mapping sample IDs (first column) to average sequencing coverage (second column, floating point number).

genbank.genbankSourceTable
File? — Default: None
A tab-delimited text file containing requisite metadata for Genbank (a 'source modifier table'). https://www.ncbi.nlm.nih.gov/WebSub/html/help/genbank-source-table.html

genbank.molType
String? — Default: None
The type of molecule being described. This defaults to 'viral cRNA' as this pipeline is most commonly used for viral submissions, but any value allowed by the INSDC controlled vocabulary may be used here. Valid values are described at http://www.insdc.org/controlled-vocabulary-moltype-qualifier

genbank.organism
String? — Default: None
The scientific name for the organism being submitted. This is typically the species name and should match the name given by the NCBI Taxonomy database. For more info, see: https://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp.html#Organism

genbank.sequencingTech
String? — Default: None
The type of sequencer used to generate reads. NCBI has a controlled vocabulary for this value which can be found here: https://submit.ncbi.nlm.nih.gov/structcomment/nongenomes/

4.24.1.3. Other inputs¶

Show/Hide

genbank.annot.docker
String — Default: "quay.io/broadinstitute/viral-phylo"
???

genbank.comment
String? — Default: None
Optional comments that can be displayed in the COMMENT section of the Genbank record. This may include any disclaimers about assembly quality or notes about pre-publication availability or requests to discuss pre-publication use with authors.

genbank.mafft.docker
String — Default: "quay.io/broadinstitute/viral-phylo"
???

genbank.mafft.machine_mem_gb
Int? — Default: None
???

genbank.mafft.mafft_ep
Float? — Default: None
???

genbank.mafft.mafft_gapOpeningPenalty
Float? — Default: None
???

genbank.mafft.mafft_maxIters
Int? — Default: None
???

genbank.prep_genbank.assembly_method
String? — Default: None
Very short description of the software approach used to assemble the genome. We typically provide a github link here. If this is specified, assembly_method_version should also be specified.

genbank.prep_genbank.assembly_method_version
String? — Default: None
The version of the software used. If this is specified, assembly_method should also be specified.

genbank.prep_genbank.docker
String — Default: "quay.io/broadinstitute/viral-phylo"
???

genbank.prep_genbank.machine_mem_gb
Int? — Default: None
???

Generated using WDL AID (0.1.1)