MINiML (MIAME Notation in Markup Language)
What is MINiML?Back to top
MINiML (MIAME Notation in Markup Language, pronounced 'minimal') is a data exchange format optimized for microarray gene expression data, as well as many other types of high-throughput molecular abundance data. MINiML assumes only very basic relations between objects: Platform (e.g., array), Sample (e.g., hybridization), and Series (experiment). MINiML captures all components of the MIAME checklist, as well as any additional information that the submitter wants to provide. MINiML uses XML Schema as syntax.
MINiML XML Schema definition is available.
Why another data exchange format?Back to top
GEO has been using SOFT (Simple Omnibus Format in Text) as a data exchange format. An advantage of SOFT is its simplicity which makes it suitable for parsing and generation by virtually any text manipulating language. However, excellent tools exist today to programmatically support XML formats and provide better document structure, syntax definitions or data rendering. MINiML is effectively an XML rendering of SOFT.
GEO fully supports both SOFT and MINiML.
MINiML Elements and Content GuidelinesBack to top
The table below provides content guidelines and constraints for most MINiML elements; it is not exhaustive.
Element name | Number of allowed labels | Allowed values and constraints | Content Guidelines |
---|---|---|---|
Title | required | string of length 1-120 characters, must be unique within local file and over all previously submitted Platforms for that submitter | Provide a unique title that describes your Platform. We suggest that you use the system '[institution/lab][species][number of features][version]', e.g. "FHCRC Mouse 15K v1.0". |
Distribution | required | commercial, non-commercial, custom-commercial, or virtual | Microarrays are 'commercial', 'non-commercial', or 'custom-commercial' in accordance with how the array was manufactured . Use 'virtual' only if creating a virtual definition for MS, MPSS, SARST, or RT-PCR data. |
Technology | required | spotted DNA/cDNA, spotted oligonucleotide, in situ oligonucleotide, antibody, tissue, SARST, RT-PCR, MS, or MPSS | Select the category that best describes the Platform technology. |
Organism | required and unbounded | use standard NCBI Taxonomy nomenclature | Identify the organism(s) from which the features on the Platform were designed or derived. |
Manufacturer | required | any | Provide the name of the company, facility or laboratory where the array was manufactured or produced. |
Manufacture-Protocol | required | any | Describe the array manufacture protocol. Include as much detail as possible, e.g., clone/primer set identification and preparation, strandedness/length, arrayer hardware/software, spotting protocols. Please provide complete protocol descriptions within your submission. |
Catalog-Number | optional | any | Provide the manufacturer catalog number for commercially-available arrays. |
Web-Link | optional and unbounded | valid URL | Specify a Web link that directs users to supplementary information about the array. Please restrict to Web sites that you know are stable. |
Support | optional | any | Provide the surface type of the array, e.g., glass, nitrocellulose, nylon, silicon, unknown. |
Coating | optional | any | Provide the coating of the array, e.g., aminosilane, quartz, polysine, unknown. |
Description | optional | any | Provide any additional descriptive information not captured in another field, e.g., array and/or feature physical dimensions, element grid system. |
Contributor-Ref | optional and unbounded | List all people associated with this array design. | |
Pubmed_ID | optional and unbounded | an integer | Specify a valid PubMed identifier (PMID) that references a published article that describes the array. |
Data-Table | required | a plain text (ASCII) tab-delimited table | Data-Tables can be supplied either within the MINiML file (Internal-Data), or can be external files (External-Data).
External-Data files should be zipped or tarred together with the MINiML file at the time of submission. A full description of Platform data tables, required columns, content and restrictions is provided in the Platform data table guidelines. One difference to note is that data tables do not have headers in MINiML files - table columns are defined by position. |
Supplementary-Data | optional and unbounded | a link or path to supplementary data | Examples of Platform supplementary data include original GAL and CSV files. Supplementary files can be zipped or tarred together with the MINiML file at time of submission. |
Element name | Number of allowed labels | Allowed values and constraints | Content Guidelines |
---|---|---|---|
Title | required | string of length 1-120 characters, must be unique within local file and over all previously submitted Samples for that submitter | Provide a unique title that describes this Sample. We suggest that you use the system [biomaterial]-[condition(s)]-[replicate number], e.g., Muscle_exercised_60min_rep2. |
Channel-Count | required | nomenclature | State the number of channels in the experiment, e.g., two-color hybridizations are typically 2-channel, Affymetrix hybridizations are typically 1-channel. |
Source | required per channel | any | Briefly identify the biological material and the experimental variable(s) for this Sample, e.g., vastus lateralis muscle, exercised, 60 min. |
Organism | required and unbounded per channel | use standard NCBI Taxonomy nomenclature | Identify the organism(s) from which the biological material was derived. |
Characteristics | required per channel | any | List all available characteristics of the biological source e.g., Strain: C57BL/6 Gender: female Age: 45 days Tissue: bladder tumor Tumor stage: Ta |
Biomaterial-Provider | optional per channel | any | Specify the name of the company, laboratory or person that provided the biological material. |
Treatment-Protocol | optional per channel | any | Describe any treatments applied to the biological material prior to extract preparation. Please provide complete protocol descriptions within your submission. |
Growth-Protocol | optional per channel | any | Describe the conditions that were used to grow or maintain organisms or cells prior to extract preparation. Please provide complete protocol descriptions within your submission. |
Molecule | required per channel | total RNA, polyA RNA, cytoplasmic RNA, nuclear RNA, genomic DNA, protein, or other | Specify the type of molecule that was extracted from the biological material. |
Extract-Protocol | optional per channel | any | Describe the protocol used to isolate the extract material. Please provide complete protocol descriptions within your submission. |
Label | required per channel | any | Specify the compound used to label the extract e.g., biotin, Cy3, Cy5, 33P. |
Label-Protocol | optional per channel | any | Describe the protocol used to label the extract. Please provide complete protocol descriptions within your submission. |
Hybridization-Protocol | optional | any | Describe the protocols used for hybridization, blocking and washing, and any post-processing steps such as staining. Please provide complete protocol descriptions within your submission. |
Scan-Protocol | optional | any | Describe the scanning and image acquisition protocols, hardware, and software. Please provide complete protocol descriptions within your submission. |
Data-Processing | required | any | Provide details of how data in the VALUE column of your table were generated and calculated, i.e., normalization method, data selection procedures and parameters, transformation algorithm and scaling parameters (e.g., MAS5.0, scaled to 100). |
Description | required | any | Include any additional information not provided in the other fields, or paste in broad descriptions that cannot be easily dissected into the other fields. |
Platform-Ref | required | a valid Platform identifier | Reference the Platform iid upon which this hybridization was performed. |
Data-Table | required | a plain text (ASCII) tab-delimited table | Data-Tables can be supplied either within the MINiML file (Internal-Data), or can be external files (External-Data).
External-Data files should be zipped or tarred together with the MINiML file at the time of submission. One difference to note is that data tables do not have headers in MINiML files - table columns are defined by position. |
Supplementary-Data | required | a reference to supplementary data, or type="none" | Examples of Sample supplementary data include original GPR, CEL, EXP, RPT, CAB, and TIFF files. Supplementary files should be zipped or tarred together with the MINiML file at time of submission. Provision of supplementary raw data files facilitates the unambiguous interpretation of data and potential verification of conclusions as set forth in the MIAME guidelines. |
Anchor | required for SAGE Samples | NlaIII or Sau3A | Supply for SAGE submissions only. State the enzyme anchor. |
Type | required for SAGE Samples | RNA, genomic, protein, SAGE, MPSS, SARST, mixed | Supply for SAGE submissions only (this field is derived automatically for other Sample types using the Molecule field). |
Tag-Count | required for SAGE Samples | an integer | Supply for SAGE submissions only. State the sum number of tags quantified in this Sample. |
Tag-Length | required for SAGE Samples | an integer | Supply for SAGE submissions only. State the base pair length of the SAGE tags, excluding anchor sequence. |
Element name | Number of allowed labels | Allowed values and constraints | Content Guidelines |
---|---|---|---|
Title | required | string of length 1-120 characters, must be unique within local file and over all previously submitted Series for that submitter | Provide a unique title that describes the overall study. |
Summary | required | any | Summarize the goals and objectives of this study. The abstract from the associated publication may be suitable. |
Type | required | any | Enter keyword(s) that generally describe the type of study. Examples include: time course, dose response, comparative genomic hybridization, ChIP-chip, cell type comparison, disease state analysis, stress response, genetic modification, etc. |
Overall-Design | required | any | Provide a brief description of the experimental design. Indicate how many Samples are analyzed, if replicates are included, are there control and/or reference Samples, dye-swaps, etc. |
Pubmed-ID | optional and unbounded | an integer | Specify a valid PubMed identifier (PMID) that references a published article describing this study. Most commonly, this information is not available at the time of submission - it can be added later once the data are published. |
Web-Link | optional and unbounded | valid URL | Specify a Web link that directs users to supplementary information about the study. Please restrict to Web sites that you know are stable. |
Contributor-Ref | optional and unbounded | List all people associated with this study. | |
Sample-Ref | required and unbounded | valid Sample identifiers | Reference the Sample iid that make up this experiment. |
Variable Factor Description Sample-Ref |
optional and unbounded | Allowed 'Factors' include: dose, time, tissue, strain, gender, cell line, development stage, age, agent, cell type, infection, isolate, metabolism, shock, stress, temperature, specimen, disease state, protocol, growth protocol, genotype/variation, species, individual, or other |
Indicate and describe the variable type(s) investigated in this study. NOTE - this information does not appear in Series records or downloads, but will be used to assemble corresponding GEO DataSet records. |
Repeats Factor Sample-Ref |
optional and unbounded | Allowed 'Factors' include: biological replicate technical replicate - extract technical replicate - labeled-extract |
Indicate the repeat type(s). NOTE - this information does not appear in Series records or downloads, but will be used to assemble corresponding GEO DataSet records. |