Preparing a Source Modifiers Table File for All Source Modifiers
Important updates:
-
As previously announced,
all GenBank sequence submissions require collection-date and geo_loc_name
starting December 2024.
-
Modifiers marked as deprecated in the list below will no longer be accepted as separate
modifiers in new submissions starting January 2025. They may still appear on older records and taxonomic
terms will still be shown in the BioSource organism name. This will not affect BioSample submissions.
BankIt accepts source modifiers (e.g. specimen
voucher and isolate) in two ways, as a tab-delimited text
file containing a Source Modifiers Table (as described below) or by applying
the same source modifier value to all sequences in the submission using the
input form. Source modifiers can be changed by uploading new tables to overwrite a
previous table or by correcting or removing a previously input value in the
form. The current values of all source modifiers appear at the bottom of the page.
It is recommended for multiple sequences that you use only a table file
that contains all the source modifiers you want to add and that you do not add
source modifiers using both a table and the input value forms.
Setting up the Source Modifiers Table
The Source Modifiers Table is a
tab-delimited text file of the source modifiers for all
specimens in a BankIt set.
The following modifiers must have only 'TRUE' as the value reported in a
source modifier table when they are used:
- Germline
- Metagenomic
- Rearranged
- Transgenic
See below for an annotated list of source modifiers
Contents of the Source Modifiers Table
The first row in the table contains the labels for each column. Each column in the table is a different source modifier. See below for the complete list of source modifiers.
The first column contains the Sequence_IDs used to identify each sequence in the nucleotide FASTA file.
Specimens are identified in the Source Modifiers Table by the same Sequence_ID used in the FASTA file.
The heading for the first column must be exactly Sequence_ID as shown in the sample below.
Each specimen in the set must have a line in the source modifiers file, even if there are no modifiers to apply to the specimen.
Each Sequence_ID may appear only once in the source modifier file.
Shown below are the contents of a Sample Source Modifiers Table file. Right-click on the link to save as a tab-delimited text file.
Sequence_ID |
Collected_by |
Collection_date |
Country (geo_loc_name) |
Isolation_source |
Isolate |
Lat_Lon |
Specimen_voucher |
Seq1 |
C. Grant |
31-Jan-2001 |
USA |
soil |
A |
13.57 N 24.68 W |
MKP 334 |
Seq2 |
S. Tracy |
28-Feb-2002 |
Slovakia |
contaminated soil |
B |
13.24 N 24.35 W |
MKP 1230 |
Seq3 |
A. Gardner |
16-Apr-2001 |
France |
farm soil |
C |
43.21 N 56.78 W |
1B-2526 |
Seq4 |
F. McMurray |
26-May-2002 |
Germany |
farm runoff water |
D |
45.32 N 21.34 E |
WBM 86-64 |
Seq5 |
V. Leigh |
13-Jun-2003 |
Brazil |
forest soil |
E |
46.80 N 13.57 E |
1B-2518 |
Seq6 |
E. Flynn |
15-Aug-2000 |
Australia |
river water |
F |
68.53 S 57.42 E |
WBM 86-65 |
Seq7 |
G. Kelly |
26-Oct-2002 |
Mexico |
river bed soil |
G |
22.44 S 55.77 W |
1B-2355 |
Saving the Source Modifiers Table
When using a spreadsheet program,
be sure to save your file as tab-delimited text.
If you are not sure that the "Save" option in your program will do this for you, use "Save As..."
In Excel, select "Save As..." from the File menu. In the "Save as type:" pull-down menu, select "Text (Tab delimited) (*.txt)."
Source Modifiers
Important updates:
-
As previously announced,
all GenBank sequence submissions require collection-date and geo_loc_name
starting December 2024.
-
Modifiers marked as deprecated in the list below will no longer be accepted as separate
modifiers in new submissions starting January 2025. They may still appear on older records and taxonomic
terms will still be shown in the BioSource organism name. This will not affect BioSample submissions.
Commonly used Source Modifiers
The following source modifiers are available to further describe the
sequences in a submission:
- Altitude - Altitude in metres above or below sea level of where the sample was collected.
- Authority - deprecated - do
not use.
- Bio_material - An identifier for the biological material from which the nucleotide sequence was obtained, with optional institution code and collection code for the place where it is currently stored.
This should be provided using the following format 'institution-code:collection-code:material_id'.
material_id is mandatory, institution-code and collection-code are optional; institution-code is mandatory when collection-code is present.
This qualifier should be used to annotate the identifiers of material in biological collections which include zoos and aquaria, stock centers, seed banks, germplasm repositories and DNA banks.
- Biotype - deprecated - do
not use.
- Biovar - deprecated - do
not use.
- Breed - The named breed from which sequence was obtained (usually applied to domesticated mammals).
- Cell_line - Cell line from which sequence was obtained.
- Cell_type - Type of cell from which sequence was obtained.
- Chemovar - deprecated - do
not use.
- Clone - Name of clone from which sequence was obtained.
- Collected_by - Name of person who collected the sample.
- Collection_date - Date the specimen was collected.
In format DD-Mon-YYYY, that is 2-digit date, three-character abbreviation of month, and 4-digit year,
(e.g., 11-Feb-2002).
Mon-YYYY and YYYY are alternate formats to use when date information is less complete.
- Country
(geo_loc_name) - Where the sequence's organism was
located. May be a country, an ocean or major sea. Additional region or locality
information must be after the country, ocean, or major sea name and separated by a ':'. For
example: USA: Riverview Park, Ripkentown, MD
- Cultivar - Cultivated variety of plant from which sequence was obtained.
- Culture_collection - Institution code and identifier for the culture from which the nucleotide sequence was obtained, with optional collection code.
This should be provided using the following format
'institution-code:collection-code:culture-id'. culture-id and institution-code are mandatory.
This qualifier should be used to annotate live microbial and viral cultures, and cell lines that have been deposited in curated culture collections.
- Dev_stage - Developmental stage of organism.
- Ecotype - The named ecotype (population adapted to a local habitat) from which sequence was obtained (customarily applied to populations of Arabidopsis thaliana).
- Forma - deprecated - do not
use.
- Forma_specialis - deprecated
- do not use.
- Fwd_primer_name - name of forward PCR primer
- Fwd_primer_seq - nucleotide sequence of forward PCR primer
- Genotype - Genotype of the organism.
- Haplogroup - Name for a group of similar haplotypes that share some sequence variation
- Haplotype - Haplotype of the organism.
- Host - When the sequence submission is from an organism that exists in a symbiotic, parasitic, or other special relationship with some second organism, the 'host' modifier can be used to identify the name of the host species.
- Identified_by - deprecated -
do not use.
- Isolate - Identification or description of the specific individual from which this sequence was obtained.
- Isolation source - Describes the local geographical source of the organism from which the sequence was obtained.
- Lab_host - Laboratory host used to propagate the organism from which the sequence was obtained.
- Lat_Lon - Latitude and longitude, in decimal degrees, of where the sample was collected.
- Note - Any additional information that you wish to provide about the sequence.
- Pathovar - deprecated - do
not use.
- Pop_variant - deprecated -
do not use.
- Rev_primer_name - name of reverse PCR primer
- Rev_primer_seq - nucleotide sequence of reverse PCR primer
- Segment - name of viral or phage segment sequenced
- Serogroup - deprecated - do
not use.
- Serotype - serological variety of a species
characterized by its antigenic properties.
- Serovar - serological variety of a species (usually a prokaryote) characterized by its antigenic properties.
- Sex - Sex of the organism from which the sequence was obtained.
- Specimen_voucher - An identifier of the individual or collection of the source organism and the place where it is currently stored, usually an institution.
This should be provided using the following format
'institution-code:collection-code:specimen-id'. specimen-id is mandatory,
collection-code is optional; institution-code is mandatory when collection-code
is provided. Examples:
- 99-SRNP
- UAM:Mamm:52179
- personal collection:Joe Smith:99-SRNP
- AMCC:101706
- Strain - Strain of organism from which sequence was obtained.
- Sub_species - Subspecies of organism from which sequence was obtained.
- Subclone - deprecated - do
not use.
- Subtype - deprecated - do
not use.
- Substrain - deprecated - do
not use.
- Tissue_lib - Tissue library from which the sequence was obtained.
- Tissue_type - Type of tissue from which sequence was obtained.
- Type - deprecated - do not
use.
- Variety - Variety of organism from which sequence was obtained.