4.31. genbank¶

Prepare assemblies for Genbank submission. This includes annotation by simple coordinate transfer from Genbank annotations and a multiple alignment. See https://viral-pipelines.readthedocs.io/en/latest/ncbi_submission.html for details.

4.31.1. Inputs¶

4.31.1.1. Required inputs¶

genbank.assemblies_fasta
Array[File]+ — Default: None
Genomes to prepare for Genbank submission. One file per genome: all segments/chromosomes included in one file. All fasta files must contain exactly the same number of sequences as reference_fasta (which must equal the number of files in reference_annot_tbl).

genbank.authors_sbt
File — Default: None
A genbank submission template file (SBT) with the author list, created at https://submit.ncbi.nlm.nih.gov/genbank/template/submission/

genbank.biosample_attributes
File — Default: None
A post-submission attributes file from NCBI BioSample, which is available at https://submit.ncbi.nlm.nih.gov/subs/ and clicking on 'Download attributes file with BioSample accessions'.

genbank.reference_fastas
Array[File]+ — Default: None
Reference genome, each segment/chromosome in a separate fasta file, in the exact same count and order as the segments/chromosomes described in genome_fasta. Headers must be Genbank accessions.

genbank.reference_feature_tables
Array[File]+ — Default: None
NCBI Genbank feature table, each segment/chromosome in a separate TBL file, in the exact same count and order as the segments/chromosomes described in genome_fasta and reference_fastas. Accession numbers in the TBL files must correspond exactly to those in reference_fasta.

genbank.taxid
Int — Default: None
The NCBI taxonomy ID for the species being submitted in this batch (all sequences in this batch must belong to the same taxid). https://www.ncbi.nlm.nih.gov/taxonomy/

4.31.1.2. Other common inputs¶

genbank.coverage_table
File? — Default: None
A two column tab text file mapping sample IDs (first column) to average sequencing coverage (second column, floating point number).

genbank.molType
String? — Default: 'cRNA'
The type of molecule being described. This defaults to 'cRNA' as this pipeline is most commonly used for viral submissions, but any value allowed by the INSDC controlled vocabulary may be used here. Valid values are described at http://www.insdc.org/controlled-vocabulary-moltype-qualifier

genbank.organism
String? — Default: None
The scientific name for the organism being submitted. This is typically the species name and should match the name given by the NCBI Taxonomy database. For more info, see: https://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp.html#Organism

genbank.sequencingTech
String? — Default: None
The type of sequencer used to generate reads. NCBI has a controlled vocabulary for this value which can be found here: https://submit.ncbi.nlm.nih.gov/structcomment/nongenomes/

4.31.1.3. Other inputs¶

Show/Hide

genbank.annot.docker
String — Default: "quay.io/broadinstitute/viral-phylo:2.1.10.0"
???

genbank.biosample_to_genbank.docker
String — Default: "quay.io/broadinstitute/viral-phylo:2.1.10.0"
???

genbank.comment
String? — Default: None
Optional comments that can be displayed in the COMMENT section of the Genbank record. This may include any disclaimers about assembly quality or notes about pre-publication availability or requests to discuss pre-publication use with authors.

genbank.prep_genbank.assembly_method
String? — Default: None
Very short description of the software approach used to assemble the genome. We typically provide a github link here. If this is specified, assembly_method_version should also be specified.

genbank.prep_genbank.assembly_method_version
String? — Default: None
The version of the software used. If this is specified, assembly_method should also be specified.

genbank.prep_genbank.docker
String — Default: "quay.io/broadinstitute/viral-phylo:2.1.10.0"
???

genbank.prep_genbank.machine_mem_gb
Int? — Default: None
???

Generated using WDL AID (0.1.1)