Foreign Contamination Screen (FCS)

Foreign Contamination Screen (FCS)

Foreign Contamination Screen (FCS) is a tool suite for detecting various types of contaminants in genome assemblies including synthetic sequences (adaptors, vectors, sequencing controls) and biological sequences from non-target organisms. NCBI uses FCS to conduct a set of screens based on the taxonomic identity of the target genome assembly, for both GenBank and RefSeq assemblies.

FCS reports

The report files fcs_summary_genbank.txt.gz and fcs_summary_refseq.txt.gz provide contamination statistics for all GenBank and RefSeq assemblies, respectively.

The files fcs_details_genbank.txt.gz and fcs_details_refseq.txt.gz provide aggregated details for all sequences identified as contaminants in GenBank and RefSeq assemblies, respectively.

Each GenBank and RefSeq assembly also have individual contamination reports that can be accessed from the FTP link on the NCBI Datasets genome page for a given genome assembly. The file ending in *fcs_report.txt provides contamination details. Genome assemblies with no contamination identified by FCS contain reports with two header rows detailing the parameters of the FCS run but are otherwise empty, i.e. have no contaminated sequence accessions listed.

See the FCS README for additional information, including options on how to use the reports to identify assemblies with lower contamination levels or hardmask contaminant sequences to aid downstream analyses.

For more information on FCS, see Astashyn and Tvedte et al 2024 and the FCS GitHub page.

FCS criteria for assigning a genome as contaminated

Prokaryotes

adaptor & vector contamination

  • If a non-WGS/complete genome assembly, at least one sequence flagged as contaminant
  • If a WGS genome assembly, at least 100 contigs flagged as contaminant
  • At least 60% of the genome flagged as contaminant

non-target organism contamination

  • At least 10 kb of high-value contamination (primate, eukaryotic virus, synthetic sequence)
  • At least 100 kb of medium-value contamination (other eukaryotes)
  • At least 200 kb of total contamination (including prokaryote-in-prokaryote)
  • At least 5% of the genome flagged with any contamination

Eukaryotes

manual review

  • At least 10 Mbp of total contamination, or
  • Older assembly versions that have been cleaned and replaced by a newer version
Generated November 25, 2024