GEO Metadata Validation Rules

To improve GEO's processing rate and maintain a high standard of metadata collection, GEO has implemented an automated pre-checking service for metadata completeness, formatting and content in the metadata spreadsheet. After completion of FTP transfer for raw and processed data files, the completed metadata file should be uploaded on the Submit Metadata page.

Upon upload, the metadata file will be scanned and checked for formatting and content within seconds. For example, if a section (STUDY, SAMPLES, PROTOCOLS, PAIRED-END EXPERIMENTS) is missing, you will receive the error message "Uploaded file is missing mandatory section" and a table will appear with the name of the missing section. If you receive an error message, please correct the indicated fields of your metadata file and upload your file again. Uploading a complete metadata file will return the message "Your metadata file has been successfully uploaded". Successful uploading of the metadata file places your submission into GEO's processing queue and you will receive an email notification with your submission summary.

error name	error message that you will receive	explanation and how to fix
excel_parse_failure	Uploaded file cannot be read. The file must be in Excel version 2007 or higher with .xlsx extension.	The file is not an Excel version 2007 or higher file with .xlsx extension. GEO cannot process metadata files submitted with extension .txt, .csv, or .tsv. Do not compress the metadata Excel spreadsheet. A compressed metadata Excel spreadsheet cannot be read.
discontinued_template	It appears that you have used a discontinued version of the metadata spreadsheet. Please use the above link to download the newest version and resubmit.	Old versions of the metadata spreadsheet are not supported. Please download, complete, and submit the newest version of the metadata spreadsheet.
missing_worksheet	Uploaded file is missing required worksheet named "Metadata". Please make sure you are using our newest metadata template.	The Excel tab (also called a worksheet) containing the metadata information must be named "Metadata" or "2. Metadata Template". Any other tab name will produce the "missing_worksheet" error. For example, do not rename the tab "RNAseq" or "ChIPseq". Do not include multiple tabs with metadata for separate studies in the same file. GEO needs one metadata file per study.
missing_section	Uploaded file is missing mandatory section:	The metadata tab must have sections titled STUDY, SAMPLES and PROTOCOLS. If it is a paired-end sequencing study, the metadata file must also contain a PAIRED-END EXPERIMENTS section.
empty_samples_section	SAMPLES section does not list any samples. Please make sure that library names do not start with "#" symbol since such lines are treated as comments and ignored.	Samples must be listed in the SAMPLES section.
missing_mandatory_info	Uploaded file is missing mandatory information in the STUDY or PROTOCOLS sections:	Required fields in STUDY and PROTOCOLS sections are: title, summary (abstract), experimental design, extract protocol, library construction protocol, library strategy, data processing description, assembly or genome build, and processed data files format and content. Library strategy refers to the experiment type such as RNA-seq, ATAC-seq, or Hi-C. A table will be provided that lists the fields in STUDY and/or PROTOCOLS sections that are empty.
missing_sample_header	SAMPLES section is missing required headers for the table:	Deleting columns from the metadata template in the SAMPLES section is not allowed and will produce the "missing_sample_header" error. A table will be provided which lists the missing headers in the SAMPLES section. You can add columns to the SAMPLES section for additional characteristics appropriate for your samples. For example, you could use the header "overall survival" and provide survival data for each sample.
empty_library_name	At least one of the samples has empty library name.	In the SAMPLES section at least one of the samples has empty library name. Sometimes this error is caused by non-empty cells in the SAMPLES section that are not associated with the included samples.
missing_sample_info	SAMPLES section is missing required information:	Every sample in the SAMPLES section must include information for library name, title, organism, molecule, single or paired-end, and instrument model. A table will be provided which lists the missing field for each library name.
duplicate_library_names	Identical library names were found. Library names must be unique. This check is case insensitive, meaning that "Control1" and "control1" will be considered identical. Identical names are:	Every library name in the SAMPLES section must be unique. A table will be provided which lists the non-unique library name and the number of times it was found (occurrences) in the SAMPLES section.
duplicate_sample_titles	Identical sample titles were found. Sample titles must be unique. This check is case insensitive, meaning that "Control1" and "control1" will be considered identical. Identical titles are:	Every title in the SAMPLES section must be unique. A table will be provided which lists the non-unique title and the number of times (occurrences) it was found in the SAMPLES section.
invalid_contributor_format	The contributor name is not correctly formatted. The format is: 'Firstname, I, Lastname' or 'Firstname, Lastname'. First (given) name must be at least one character long. 'I' represents middle name initial and must be exactly one character. Last (family) name must be at least two characters long. List only one contributor name per row. Examples and guidance for contributor name format are available in the metadata template.	Contributor names must be provided in the accepted format of First, Last or First, I, Last. I represents middle name initial, if present. A comma must separate the individual parts of the name. List one contributor name per row. You can add as many extra rows with field name "contributor" as you need.
long_sample_title	Sample title is too long. Maximum length allowed is 120 characters.	Sample titles can be no longer than 120 characters. A short sample title of 3-5 words is easy to read and displays clearly on the website.
empty_field_name	The following rows in STUDY and/or PROTOCOLS sections are missing the field name such as "contributor" or "data processing step". Add the correct field name in the cell to the left of the cell with text listed below.
out_of_bound_text	Extra text was found beyond the first two columns in STUDY and/or PROTOCOLS sections. Please remove it. If you need to include different protocols for subsets of samples, please add all PROTOCOLS fields (extract protocol, library protocol, data processing step, etc) to the SAMPLES section as additional columns.
raw_file_not_found	The metadata file lists raw files that are not found in your personalized upload space. Upload any missing files OR correct the metadata file by listing the exact file names (names are case-sensitive, cannot include paths, and must include file extensions such as ".gz" when compressed). The following raw files are not found in your personalized upload space:
no_paths_allowed	A directory path to a file name has been found in the metadata file. All raw data, processed data, and supplementary files must be listed without a path. For example, use "data_matrix.txt" instead of "/Home/RNAseq/Data/Processed/data_matrix.txt". Please remove paths and resubmit.	Inclusion of a path in a file name prevents file detection on GEO's server. List the file name without path.
invalid_organism_name	Organism name(s) could not be resolved automatically in NCBI Taxonomy database. The name was either not found, or it returned multiple entries. Please check spelling of organism name. Make sure you have provided a valid scientific name at species level (or lower rank, such as subspecies), e.g., Mus musculus. Do not include taxonomic authority in the name such as L. for Linnaeus. If the organism name is valid but not yet included in NCBI Taxonomy database, contact GEO using the "email us" link located above this message.	Make sure that the 'organism' field contains the scientific name of the organism at species level or below. The organism name cannot include additional text such as tissue information e.g., Mus musculus heart. List one name per column. Add extra 'organism' columns if the sample includes material from more than one organism.
missing_sample_column_name	Some columns in the SAMPLES section are not named. Add column names to the header row.	The header row in the SAMPLES section must have a name for each column for which there is sample information. Remove any unintentional text that you do not want on the sample record.
duplicate_raw_file_names	Identical raw data file names have been found in the SAMPLES section. All samples must be associated with unique raw data files. Please check raw file names for typos or inadvertent copy/paste errors. For single-cell studies with multiplexed raw data, please see the metadata template worksheet "scMulti-omics seq EXAMPLE" for guidance. If you have questions or need help, contact GEO using the "email us" link located above this message.	Each sample must be associated with independent raw data files. If your single-cell samples have been multiplexed, create one sample per sequencing library and create separate samples for individual library types such as GEX, HTO, ADT, TCR, etc.
processed_data_file_not_found	The metadata file lists processed data files that are not found in your personalized upload space. Upload any missing files OR correct the metadata file by listing the exact file names (names are case-sensitive, cannot include paths, and must include file extensions such as ".gz" when compressed). List one processed data file per "processed data file" column in the SAMPLES section or "supplementary file" field in the STUDY section. If a sample (for example, input) does not have any associated processed data, leave the "processed data file" cell empty for that sample. The following processed data files are not found in your personalized upload space:
processed_data_required	Your submission does not contain any processed data file(s). Include a processed data file that contains data for all samples as a "supplementary file" in the STUDY section or provide sample-specific processed data file(s) listed in the "processed data file" field of the SAMPLES section. You can add as many "processed data file" columns as you need. Enter only one file per spreadsheet cell. If some samples (such as input) do not have associated processed data, leave the field empty for those samples.
paired_end_section_invalid_header	The PAIRED-END EXPERIMENTS section header row is not formatted correctly. There should be up to 4 columns, named as "file name 1", "file name 2", "file name 3" and "file name 4". All columns with file names must include a header.
paired_end_section_column_limit	Each row of the PAIRED-END EXPERIMENTS section can include a maximum of four files. Each row should include paired-end files from one run. The following file names were found beyond the fourth column:
paired_end_section_raw_file_omitted	Paired-end raw files must be listed in both sections of the metadata file. List one set of paired-end raw files (R1, R2 or I1, R1, R2, for example) per row in the PAIRED-END EXPERIMENTS section. The following raw files from SAMPLES section are not found in the PAIRED-END EXPERIMENTS section:
paired_end_section_with_non_paired_end_file	PAIRED-END EXPERIMENTS section includes raw files that are marked as "single" in SAMPLES section or files that are not included in "raw file" columns in SAMPLES section:
paired_end_section_library_mismatch	PAIRED-END EXPERIMENTS section contains at least one row with files from different libraries or samples.
paired_end_section_duplicate_file_names	The PAIRED-END EXPERIMENTS section contains non-unique file names. Please correct all file names so that they are unique.