Querying GEO DataSets and GEO Profiles
Quick examples
This database stores original submitter-supplied study descriptions, as well as curated gene expression DataSets. DataSets form the basis of GEO's advanced data display and analysis tools, including gene expression profile charts and clusters.
Search Examples:
Search by... | Search text |
---|---|
Free text | smoking cancer |
Keywords and species | (smok* OR diet) AND (mammals[organism] NOT human[organism]) |
Studies in the NIH Roadmap Epigenomics project | "roadmap epigenomics"[Project] |
Study type | "expression profiling by high throughput sequencing"[DataSet Type] |
Studies with between 100 and 500 samples | 100:500[Number of Samples] |
Studies with CEL files | "cel"[Supplementary Files] |
DataSets that have 'age' as an experimental variable | "age"[Subset Variable Type] |
Author | smith a[Author] |
Published between January and June 2007 | 2007/01:2007/06[Publication Date] |
Platform accession | GPL570 |
Studies with PubMed identifiers | "gds pubmed"[Filter] |
This database stores individual gene expression profiles from curated DataSets. Search for profiles of interest based on gene annotation or pre-computed profile characteristics.
Search Examples:
Search by... | Search text |
---|---|
Free text | smoking P450 |
Gene symbol | CYP1A1[Gene Symbol] |
Gene symbols in DataSets that contain specific keywords | (CYP1A1[Gene Symbol] OR ME1[Gene Symbol]) AND (smok* OR diet) |
Partial gene name in a specific DataSet | kinase[Gene Description] AND GDS182 |
GenBank accession | NM_014033 |
Gene Ontology(GO) term in a specific DataSet | apoptosis[Gene Ontology] AND GDS182 |
Chromosome region and species | (8[Chromosome] AND 10000:3000000[Base Position]) AND mouse[organism] |
Genes that show subset effects in DataSets that examine the effect of an agent | agent[Flag Information] AND "value subset effect"[Flag Type] |
Platform accession | GPL570 |
How to construct queries
GEO DataSets and
GEO Profiles
are part of NCBI's
network of Entrez databases.
As with these other databases, data of interest may be located simply by entering keywords into the
GEO DataSets
or GEO Profiles search boxes.
The Advanced Search and Limits pages, linked at the head of the GEO DataSets and GEO Profiles pages,
assist greatly in the construction of complex queries.
To construct a complex query, specify the search terms, their fields, and the Boolean operations
to perform on the terms using the following syntax:
term [field] OPERATOR term [field]
where term is the search terms, field is the search field, and OPERATOR
is the Boolean operator ('AND', 'OR', 'NOT' must be capitalized).
Additional query construction notes and features are provided in the following table:
Notes and features | Example |
---|---|
Complete listings and descriptions of all supported fields are provided in the tables below. | a search example for each field is provided within the tables |
Fields may be specified either by their full name or an alias. Full names and aliases are listed in the tables below. | gds[Entry Type] and gds[ETYP] perform the same search |
Some fields have a fixed list of allowed search terms, others are free text. The tables below indicate which fields have fixed lists. Lists of allowed terms may be browsed on the Advanced Search page by selecting the relevant field from the drop-down menu and clicking 'Show Index'. |
'age' is a fixed term for the Subset Variable Type field age[Subset Variable Type] |
Use quotes to indicate a phrase. | salt stress retrieves studies that mention both salt and stress anywhere in the description, whereas "salt stress" retrieves studies where the words exist as a phrase |
Use parentheses to properly combine multiple search criteria. The terms inside the parentheses are processed as a unit and then incorporated into the overall search. | human[organism] AND (smok* OR diet)
specifically retrieves human studies that mention either smoking or diet, whereas human[organism] AND smok* OR diet also returns all studies that mention diet, regardless of organism |
Use an asterisk to expand your search with a wildcard. Wildcards can be placed at the beginning or end of a text string, but not in the middle. | smok* will retrieve documents that contain words like smoke, smoking or smoker |
Use a colon to indicate a range. | 2007/01:2007/06[Publication Date] retrieves studies published between January and June 2007 |
Use the 'History' section at the foot of the Advanced Search pages to combine previous queries or find the intersection of multiple queries. Each query you have performed recently is assigned a specific number which can be included within the search statement. | #1 NOT #2 (#1 OR #2) AND human[organism] |
Query fields and examples
Field full name | Field aliases | Description | Search term values and rules | Example |
---|---|---|---|---|
All Fields | ALL, * | All terms from all searchable fields. Default field. | free text, wildcard (*) supported | Find any record that contains the word 'cancer' cancer[All fields] |
Author | AUTH, AU, AUTHOR NAME | Contributors or authors associated with the study | free text, wildcard (*) supported, author initials are optional | Find records authored by A Smith smith a[Author] |
DataSet Type | GTYP, gdsType | DataSet or Series type | fixed list, check Advanced Search page for list of indexed terms | Find all studies that examine gene expression by high throughput sequencing expression profiling by high throughput sequencing[DataSet Type] |
Description | DESC, DSC, DESCR | Text provided in the DataSet, Series or Sample description, summary and other metadata fields | free text, wildcard (*) supported | Find studies that contain smoking-related terms in their descriptions smok*[DESC] |
Entry Type | ETYP, entryType | Record type | fixed list, use gds (DataSet), gse (Series) or gpl (Platform) | Find only DataSet records gds[Entry Type] |
Filter | FILT, FLTR, SUBSET, SB, FIL | Filters for records that have links to other NCBI databases | fixed list, check Advanced Search page for list of indexed terms | Find records that have PubMed links gds pubmed[Filter] |
GEO Accession | ACCN, accession | GEO accession number | valid DataSet (GDS), Platform (GPL), Sample (GSM) or Series (GSE) accession | Find all studies performed on Platform GPL570 GPL570[GEO Accession] |
MeSH Terms | MESH, MH, SUBH, SH, Subheading | Medical Subject Headings (MeSH) terms | Medical Subject Headings (MeSH) terms, wildcard (*) supported | Find records that have MeSH term methylation methylation[MeSH Terms] |
Number of Platform Probes | NPRO, n_probes | Number of Platform probe IDs | integer, range function supported | Find Platforms that have over 1 million probes 1000000:100000000[Number of Platform Probes] |
Number of Samples | NSAM, n_samples | Number of Samples in the DataSet or Series | integer, range function supported | Find studies with between 100 and 500 samples 100:500[Number of Samples] |
Organism | ORGN, PORGN, primary organism | Name of the organism | NCBI taxonomy terms, wildcard (*) supported, all levels in the taxonomy lineage and common names are indexed | Find studies performed on mouse Mus musculus[Organism] |
Platform Technology Type | PTYP, ptechType | Platform type | fixed list, check Advanced Search page for list of indexed terms | Find all studies performed with next-generation sequencing technology high throughput sequencing[Platform Technology Type] |
Project | PROJ | Featured project data | fixed list, use roadmap epigenomics, encode, pilot encode, or modencode | Find studies in the NIH Roadmap Epigenomics project roadmap epigenomics[Project] |
Publication Date | PDAT, DP | Date on which record was released | format YYYY/MM, range function supported | Find studies published between January and June 2007 2007/01:2007/06[Publication Date] |
Related Platform | RGPL, relatedGPL | Retrieves the Plaform(s) for a specified DataSet or Series | valid DataSet (GDS) or Series (GSE) accession | Find Platforms related to GSE22474 GSE22474[Related Platform] |
Related Series | RGSE, relatedGSE | Retrieves the Series for a specified DataSet or Platform | valid DataSet (GDS) or Platform (GPL) accession | Find Series related to GPL570 GPL570[Related Series] |
Reporter Identifier | GEID, seqacc, clone, orf, unigene, Gene Identifier | Name or identifier of Platform probe; pertains only to Platforms that have been subjected to re-annotation pipeline | free text, wildcard (*) supported | Find DataSets that include a probe corresponding to Arg1 Arg1[Reporter Identifier] |
Sample Source | SRC, source | The source of the biological material of the Sample; warning: submitter-supplied field, not curated | free text, wildcard (*) supported | Find studies with samples from brain brain[Sample Source] |
Sample Type | STYP, sampType | Sample type or molecule | fixed list, check Advanced Search page for list of indexed terms | Find studies that use protein samples protein[Sample Type] |
Sample Value Type | VTYP, valType | Sample value type; pertains only to curated DataSets | fixed list, check Advanced Search page for list of indexed terms | Find DataSets with log ratio sample values log ratio[Sample Value Type] |
Submitter Institute | INST, institute | Institute or organization as given in submitter account | free text | Find data submitted by the Broad Institute Broad Institute[Institute] |
Subset Description | SSDE, SSDESC | DataSet subset descriptions | free text, wildcard (*) supported | Find DataSets that include the term 'male' in subset description male[Subset Description] |
Subset Variable Type | SSTP, SSTYPE | Name of DataSet experimental variable | fixed list, check Advanced Search page for list of indexed terms | Find DataSets that have 'age' as an experimental variable age[Subset Variable Type] |
Supplementary Files | SFIL, SFILE, suppFile | Supplementary file type names | free text, wildcard (*) supported | Find studies that have Affymetrix CEL files cel[Supplementary Files] |
Tag Length | TAGL, taglength | SAGE or MPSS tag length in base pairs | integer | Find 10 base pair SAGE data 10[Tag Length] |
Title | TITL, TITLE, TI | Text from titles of DataSets, Series, Platforms, and Samples | free text, wildcard (*) supported | Find records where 'Affymetrix' appears in a title Affymetrix[Title] |
Update Date | UDAT | Date on which record was last updated | format YYYY/MM, range function supported | Find records updated during June 2010 2010/06[Update Date] |
Field full name | Field aliases | Description | Search term values and rules | Example |
---|---|---|---|---|
All Fields | ALL, * | All terms from all searchable fields. Default field. | free text, wildcard (*) supported | Find P450 genes in DataSets that investigate smoking smok* AND P450 |
Annotation Type | ATYP, annot_type | Source of annotation | fixed list, use gene, nucleotide, unigene or protein | Find profiles with Gene-based annotation gene[Annotation Type] |
Base Position | CPOS, CPOSITION, CHRPOS | Base pair position on chromosome | integer, range function supported, must be used in conjuction with Chromosome field | Find profiles that lie between base positions 10000 to 3000000 on chromosome 8 in mouse (8[Chromosome] AND 10000:3000000[Base Position]) AND mouse[organism] |
Chromosome | CHR, CHROMOSOME, CH, CHROM | Chromosome number or name | chromosome number or name | Find profiles that lie between base positions 10000 to 3000000 on chromosome 8 in mouse (8[Chromosome] AND 10000:3000000[Base Position]) AND mouse[organism] |
DataSet Type | GTYP, gdsType | DataSet type | fixed list, check Advanced Search page for list of indexed terms | Find MPSS profiles expression profiling by mpss[DataSet Type] |
Filter | FILT, FLTR, SUBSET, SB, FIL | Filters for records that have links to other NCBI databases | fixed list, check Advanced Search Preview/Index page for list of indexed terms | Find profiles that have links to NCBI's Gene database geo gene[Filter] |
Flag Information | FINF, FLAG_INFO, NOTE | Profiles of specific subset types and for which a subset effect is found. GEO DataSets are partitioned into subsets that reflect experimental design. Profiles are flagged as having subset effects if they display differential expression across experimental variables. CAUTION: The subset effect scoring method is ad hoc, taking into account group medians, means, deviation inside the groups, penalties and arbitrary cutoff thresholds. This flag is simply an attempt to give potentially differentially-regulated genes higher visibility, and is not intended to provide an absolute determination of significance. | fixed list, check Advanced Search page for list of indexed terms | Find profiles that exhibit subset effects with respect to age or development stage age[Flag Information] OR development stage[Flag Information] |
Flag Type | FTYP, FLAG_TYPE | Profiles that exhibit specific types of subset effects. GEO DataSets are partitioned into subsets that reflect experimental design. Profiles are flagged as having subset effects if they display differential expression across experimental variables. CAUTION: The subset effect scoring method is ad hoc, taking into account group medians, means, deviation inside the groups, penalties and arbitrary cutoff thresholds. This flag is simply an attempt to give potentially differentially-regulated genes higher visibility, and is not intended to provide an absolute determination of significance. | fixed list, check Advanced Search for list of indexed terms | Find profiles that exhibit rank subset effects rank subset effect[Flag Type] |
GDS Text | GDST, GDStxt | Text from DataSet title and summary | free text, wildcard (*) supported | Find profiles for Datasets that investigate muscular dystrophy muscular dystrophy[GDS Text] |
GEO Accession | ACCN, accession | GEO accession number | valid DataSet (GDS), Platform (GPL), Sample (GSM) or Series (GSE) accession | Find profiles for Platform GPL570 GPL570[GEO Accession] |
GEO Description/Title Text | GEOT, TI, GEOtxt | Text provided in the DataSet or Series description, title and other metadata fields | free text, wildcard (*) supported | Find profiles from studies that examine aspirin aspirin[GEO Description/Title Text] |
GI | GI | Mapped GenBank Identifier | integer | Find profiles for GenBank Identifier 89145416 89145416[GI] |
Gene Description | GDSC, GEND, aliases, GENE, GeneDesc | Gene description and aliases from Gene, title from UniGene. | free text, wildcard (*) supported | Find kinase genes in GDS182 kinase[Gene Description] AND GDS182 |
Gene Ontology | GO | Gene Ontology terms | Gene Ontology (GO) terms, wildcard (*) supported | Find apoptosis genes in GDS182 apoptosis[Gene Ontology] AND GDS182 |
Gene Symbol | SYMB, GeneSymbol | Gene Symbol from Gene or UniGene | free text, wildcard (*) supported | Find CYP1A1 gene CYP1A1[Gene Symbol] |
ID_REF | ID, ID_REF | ID from GEO Platform, SAGE tag, Affy ProbeSet ID | free text, wildcard (*) supported | Find profiles for Affymetrix probeset ID 218973_at 218973_at[ID_REF] |
Max Value Rank | RMAX, RNKMX | The maximum value percentile rank for any Sample within DataSet | integer, 0-100, range function supported | Find profiles where the maximum rank percentile is in the 1st percentile (ie, genes with low expression) 1[Max Value Rank] |
Min Value Rank | RMIN, RNKMN | The minimum value percentile rank for any Sample within DataSet | integer, 0-100, range function supported | Find profiles where the minimum rank percentile is in the 100th percentile (ie, highly expressed genes) 100[Min Value Rank] |
Number of Samples | NSAM, n_samples | Number of Samples in the DataSet | integer, range function supported | Find profiles with between 100 and 200 samples 100:200[Number of Samples] |
Organism | ORGN | Name of the organism | NCBI taxonomy terms, wildcard (*) supported, all levels in the taxonomy lineage and common names are indexed | Find mouse profiles Mus musculus[Organism] |
Platform Reporter Type | RTYP, rep_type | Platform reporter type used for annotation | fixed list, check Advanced Search page for list of indexed terms | Find profiles where a CLONE ID is the basis for annotation Mus musculus[Organism] |
Ranked Standard Deviation | RSTD, RNSTD | Percentile rank of profile standard deviation compared to all other profiles in a DataSet | integer, 0-100, range function supported | Find profiles with a high level of standard deviation 100[Ranked Standard Deviation] |
Reporter Identifier | NAME, identifier, Gene Identifier | Name or identifier of Platform probe | free text, wildcard (*) supported | Find profiles that include a probe corresponding to Arg1 D00636[Reporter Identifier] |
Sample Source | SRC, source | The source of the biological material of the Sample; warning: submitter-supplied field, not curated | free text, wildcard (*) supported | Find profiles with samples from brain brain[Sample Source] |
Sample Value Type | VTYP, value_type | Sample value type | fixed list, check Advanced Search page for list of indexed terms | Find profiles with log ratio sample values log ratio[Sample Value Type] |