Summary of JSON, JSON Lines, and CSV/TSV tabular formats
Summary and usage guidance for metadata files included in NCBI data packages.
Summary of JSON, JSON Lines, and CSV/TSV tabular formats
Summary and usage guidance for metadata files included in NCBI data packages.
Key Attributes | JSON | JSON Lines | CSV | TSV | Simple Tabular | Frictionless Data |
---|---|---|---|---|---|---|
Full Name | JavaScript Object Notation | JavaScript Object Notation Lines, formerly Newline Delimited JavaScript Object Notation | Comma Separated Values | Tab Separated Values | None | Frictionless Data Package, Tabular Data Package |
Adoption and Support | Industry-wide | Prevalent in Data Science | Ubiquitous | Ubiquitous | Ubiquitous | Proposed, poor adoption |
Operating System Support (“out-of-the-box” experience) | Limited: Modern Unix distros may include jq | Limited: Modern Unix distros may include jq | Limited (view as text, edit with syntax-aware programmer’s text editor) | Supported (e.g. Unix Core Utilities) | Supported (e.g. Unix Core Utilities) | Unsupported |
Excel Support | Limited: Use advanced functions like Excel Power Query M Language | Limited: Use advanced functions like Excel Power Query M Language | Yes, but possible data corruption | Yes, but possible data corruption | Yes, but possible data corruption | No |
Major Use Cases | Lingua franca of API services | Data Science | Tabular data containing text with special characters: literal whitespace (tabs, newlines) and punctuation | Numerical analysis (simple tabular data without overhead of quoting or escaping), tabular data with literal punctuation (commas, double-quotes) but no literal tabs | Unix text processing | Proposal discussions of data interchange among advocates of FAIR data practices |
Standards Compliance | High (non-compliance in corner cases): Parsing JSON is a Minefield | High | Low (files in the wild often violate formal specifications); CSV/TSV often confused | Low (files in the wild often violate formal specifications); CSV/TSV often confused | None (no standards); often confused with CSV/TSV | High |
Formal Specifications | RFC8259 , json.org | jsonlines.org , ndjson.org , ndjson-spec | RFC4180 , RFC7111 | Unofficial | None | Frictionless Standards : Tabular Data Package , CSV Dialect |
IANA Media Type | application/json | application/x-ndjson (unregistered) | text/csv | text/tab-separated-values | None | Various (multiple files) |
Filename Extension | *.json | *.jsonl (recommended), *.ndjson (historical) | *.csv | *.tsv , *.txt | *.txt | Various (multiple files) |
Extensibility and Format Versioning | None (no way to identify files conforming to older versus more recent revisions of formal specifications) | None (no way to identify files conforming to older versus more recent revisions of formal specifications) | None | None | None | Extensibility mechanisms; optional format versioning (metadata may specify profiles and schemas) |
Format Identification (presence of file signatures or magic numbers) | None | None | None | None | None | None |
Schema Languages | JSON Schema | None (every line is JSON which may have its own schema) | CSV Schema (draft; no adoption) | None | None | Table Schema |
Supported Data Types | Nested object, array, string, number, boolean, null | Nested object, array, string, number, boolean, null | string | string | string | Various Types and Formats : string, number, integer, boolean, date, time, duration, etc. |
Supports Binary Data | No; binary data should be BASE64 encoded | No; binary data should be BASE64 encoded | Yes, formally; unreliable in practice | No | No | Varies (depends on choice of formats) |
Schema Support | Yes: Typed data, optional schema, embedded or linked to an external definition | None (every line is JSON which may have its own schema) | Limited: Column names, embedded | None | None | Yes |
Data Model | Structured (hierarchical) data | Tabular (conventionally); list of structured data (formally supported by specification; tools may lack support) | Tabular | Tabular | Tabular | Relational (multiple tables) |
Human Readable | Partial | Partial | High | Very High | Varies | Partial to High (depends on choice of formats) |
Support for File Metadata (non-data) | No | No | No | No | Varies | Yes |
Support for Comments (non-data) | No | No | No | No | Varies | Yes, optional |
Compactness | Verbose | Verbose | Compact | Compact | Compact | Varies (depends on choice of formats) |
Support for Streaming (appendable format) | No | Yes | Yes | Yes | Yes | No (multiple files, even if some support append mode updates) |
Resilience and Security | Partial: Well-formed constraints, start/stop tags guard against premature truncation; no checksums | Poor: No guard against premature truncation | Poor: No guard against premature truncation, weak standards compliance, frequent errors and corruption when parsing data | Poor: No guard against premature truncation, weak standards compliance, frequent errors and corruption when parsing data | Poor: No guard against premature truncation, weak standards compliance, frequent errors and corruption when parsing data | Optional checksums |
Performance | Slow to Very Fast (highly optimized modern parsing libraries) | Slow to Very Fast (highly optimized modern parsing libraries) | Fast | Very Fast (simple syntax: no quote or escape processing) | Varies | Varies |