4.71. subsample_by_metadata_with_focal

Filter and subsample a global sequence set with a bias towards a geographic area of interest.

4.71.1. Inputs

4.71.1.1. Required inputs

subsample_by_metadata_with_focal.sample_metadata_tsvs
Array[File]+ — Default: None
Tab-separated metadata file that contain binning variables and values. Must contain all samples: output will be filtered to the IDs present in this file.

subsample_by_metadata_with_focal.sequences_fasta
File — Default: None
Sequences in fasta format.

4.71.1.2. Other inputs

Show/Hide

subsample_by_metadata_with_focal.cat_fasta.cpus
Int — Default: 4
???

subsample_by_metadata_with_focal.derived_cols.disk_size
Int — Default: 50
???

subsample_by_metadata_with_focal.derived_cols.docker
String — Default: "quay.io/broadinstitute/viral-core:2.1.33"
???

subsample_by_metadata_with_focal.derived_cols.lab_highlight_loc
String? — Default: None
This option copies the 'originating_lab' and 'submitting_lab' columns to new ones including a prefix, but only if they match certain criteria. The value of this string must be of the form prefix;col_header=value:col_header=value. For example, 'MA;country=USA:division=Massachusetts' will copy the originating_lab and submitting_lab columns to MA_originating_lab and MA_submitting_lab, but only for those rows where country=USA and division=Massachusetts.

subsample_by_metadata_with_focal.derived_cols.table_map
Array[File] — Default: []
Mapping tables. Each mapping table is a tsv with a header. The first column is the output column name for this mapping (it will be created or overwritten). The subsequent columns are matching criteria. The value in the first column is written to the output column. The exception is in the case where all match columns are '*' -- in this case, the value in the first column is the column header name to copy over.

subsample_by_metadata_with_focal.focal_bin_max
Int — Default: 50
The output will contain no more than this number of focal samples from each discrete value in the focal_bin_variable column.

subsample_by_metadata_with_focal.focal_bin_variable
String — Default: "division"
The focal subset of samples will be evenly subsampled across the discrete values of this column header.

subsample_by_metadata_with_focal.focal_value
String — Default: "North America"
The dataset will be bifurcated based whether the focal_variable column matches this value or not. Rows that match this value are considered to be part of the 'focal' set of interest, rows that do not are part of the 'global' set.

subsample_by_metadata_with_focal.focal_variable
String — Default: "region"
The dataset will be bifurcated based on this column header.

subsample_by_metadata_with_focal.global_bin_max
Int — Default: 50
The output will contain no more than this number of global samples from each discrete value in the global_bin_variable column.

subsample_by_metadata_with_focal.global_bin_variable
String — Default: "country"
The global subset of samples will be evenly subsampled across the discrete values of this column header.

subsample_by_metadata_with_focal.prefilter.disk_size
Int — Default: 750
???

subsample_by_metadata_with_focal.prefilter.docker
String — Default: "nextstrain/base:build-20230905T192825Z"
???

subsample_by_metadata_with_focal.prefilter.exclude
File? — Default: None
???

subsample_by_metadata_with_focal.prefilter.exclude_where
Array[String]? — Default: None
???

subsample_by_metadata_with_focal.prefilter.group_by
String? — Default: None
???

subsample_by_metadata_with_focal.prefilter.include
File? — Default: None
???

subsample_by_metadata_with_focal.prefilter.include_where
Array[String]? — Default: None
???

subsample_by_metadata_with_focal.prefilter.max_date
Float? — Default: None
???

subsample_by_metadata_with_focal.prefilter.min_date
Float? — Default: None
???

subsample_by_metadata_with_focal.prefilter.min_length
Int? — Default: None
???

subsample_by_metadata_with_focal.prefilter.non_nucleotide
Boolean — Default: true
???

subsample_by_metadata_with_focal.prefilter.priority
File? — Default: None
???

subsample_by_metadata_with_focal.prefilter.sequences_per_group
Int? — Default: None
???

subsample_by_metadata_with_focal.prefilter.subsample_seed
Int? — Default: None
???

subsample_by_metadata_with_focal.priorities
File? — Default: None
???

subsample_by_metadata_with_focal.subsample_focal.disk_size
Int — Default: 750
???

subsample_by_metadata_with_focal.subsample_focal.docker
String — Default: "nextstrain/base:build-20230905T192825Z"
???

subsample_by_metadata_with_focal.subsample_focal.exclude
File? — Default: None
???

subsample_by_metadata_with_focal.subsample_focal.include
File? — Default: None
???

subsample_by_metadata_with_focal.subsample_focal.include_where
Array[String]? — Default: None
???

subsample_by_metadata_with_focal.subsample_focal.max_date
Float? — Default: None
???

subsample_by_metadata_with_focal.subsample_focal.min_date
Float? — Default: None
???

subsample_by_metadata_with_focal.subsample_focal.min_length
Int? — Default: None
???

subsample_by_metadata_with_focal.subsample_focal.non_nucleotide
Boolean — Default: true
???

subsample_by_metadata_with_focal.subsample_focal.subsample_seed
Int? — Default: None
???

subsample_by_metadata_with_focal.subsample_global.disk_size
Int — Default: 750
???

subsample_by_metadata_with_focal.subsample_global.docker
String — Default: "nextstrain/base:build-20230905T192825Z"
???

subsample_by_metadata_with_focal.subsample_global.exclude
File? — Default: None
???

subsample_by_metadata_with_focal.subsample_global.include
File? — Default: None
???

subsample_by_metadata_with_focal.subsample_global.include_where
Array[String]? — Default: None
???

subsample_by_metadata_with_focal.subsample_global.max_date
Float? — Default: None
???

subsample_by_metadata_with_focal.subsample_global.min_date
Float? — Default: None
???

subsample_by_metadata_with_focal.subsample_global.min_length
Int? — Default: None
???

subsample_by_metadata_with_focal.subsample_global.non_nucleotide
Boolean — Default: true
???

subsample_by_metadata_with_focal.subsample_global.subsample_seed
Int? — Default: None
???

subsample_by_metadata_with_focal.tsv_join.machine_mem_gb
Int — Default: 7
???

subsample_by_metadata_with_focal.tsv_join.out_suffix
String — Default: ".txt"
???

subsample_by_metadata_with_focal.tsv_join.prefer_first
Boolean — Default: true
???

4.71.2. Outputs

subsample_by_metadata_with_focal.focal_kept
Int
???

subsample_by_metadata_with_focal.global_kept
Int
???

subsample_by_metadata_with_focal.keep_list
File
???

subsample_by_metadata_with_focal.metadata_merged
File
???

subsample_by_metadata_with_focal.sequences_kept
Int
???

subsample_by_metadata_with_focal.subsampled_sequences
File
???


Generated using WDL AID (1.0.0)