Summary of JSON, JSON Lines, and CSV/TSV tabular formats

Summary and usage guidance for metadata files included in NCBI data packages.

Summary of JSON, JSON Lines, and CSV/TSV tabular formats

Summary and usage guidance for metadata files included in NCBI data packages.
The following table summarizes key attributes of several generic file formats used by NCBI Datasets or inspiring our choices. For practical guidance and examples of using these file formats, see: Working with data reports and Tools for JSON and JSON Lines, as well as the explanation for Why JSON and JSON Lines.
Key AttributesJSONJSON LinesCSVTSVSimple TabularFrictionless Data
Full NameJavaScript Object NotationJavaScript Object Notation Lines, formerly Newline Delimited JavaScript Object NotationComma Separated ValuesTab Separated ValuesNoneFrictionless Data Package, Tabular Data Package
Adoption and SupportIndustry-widePrevalent in Data ScienceUbiquitousUbiquitousUbiquitousProposed, poor adoption
Operating System Support (“out-of-the-box” experience)Limited: Modern Unix distros may include jqLimited: Modern Unix distros may include jqLimited (view as text, edit with syntax-aware programmer’s text editor)Supported (e.g. Unix Core Utilities)Supported (e.g. Unix Core Utilities)Unsupported
Excel SupportLimited: Use advanced functions like Excel Power Query M LanguageLimited: Use advanced functions like Excel Power Query M LanguageYes, but possible data corruptionYes, but possible data corruptionYes, but possible data corruptionNo
Major Use CasesLingua franca of API servicesData ScienceTabular data containing text with special characters: literal whitespace (tabs, newlines) and punctuationNumerical analysis (simple tabular data without overhead of quoting or escaping), tabular data with literal punctuation (commas, double-quotes) but no literal tabsUnix text processingProposal discussions of data interchange among advocates of FAIR data practices
Standards ComplianceHigh (non-compliance in corner cases): Parsing JSON is a MinefieldHighLow (files in the wild often violate formal specifications); CSV/TSV often confusedLow (files in the wild often violate formal specifications); CSV/TSV often confusedNone (no standards); often confused with CSV/TSVHigh
Formal SpecificationsRFC8259 , json.orgjsonlines.org , ndjson.org , ndjson-specRFC4180 , RFC7111UnofficialNoneFrictionless Standards : Tabular Data Package , CSV Dialect
IANA Media Typeapplication/jsonapplication/x-ndjson (unregistered)text/csvtext/tab-separated-valuesNoneVarious (multiple files)
Filename Extension*.json*.jsonl (recommended), *.ndjson (historical)*.csv*.tsv, *.txt*.txtVarious (multiple files)
Extensibility and Format VersioningNone (no way to identify files conforming to older versus more recent revisions of formal specifications)None (no way to identify files conforming to older versus more recent revisions of formal specifications)NoneNoneNoneExtensibility mechanisms; optional format versioning (metadata may specify profiles and schemas)
Format Identification (presence of file signatures or magic numbers)NoneNoneNoneNoneNoneNone
Schema LanguagesJSON SchemaNone (every line is JSON which may have its own schema)CSV Schema (draft; no adoption)NoneNoneTable Schema
Supported Data TypesNested object, array, string, number, boolean, nullNested object, array, string, number, boolean, nullstringstringstringVarious Types and Formats : string, number, integer, boolean, date, time, duration, etc.
Supports Binary DataNo; binary data should be BASE64 encodedNo; binary data should be BASE64 encodedYes, formally; unreliable in practiceNoNoVaries (depends on choice of formats)
Schema SupportYes: Typed data, optional schema, embedded or linked to an external definitionNone (every line is JSON which may have its own schema)Limited: Column names, embeddedNoneNoneYes
Data ModelStructured (hierarchical) dataTabular (conventionally); list of structured data (formally supported by specification; tools may lack support)TabularTabularTabularRelational (multiple tables)
Human ReadablePartialPartialHighVery HighVariesPartial to High (depends on choice of formats)
Support for File Metadata (non-data)NoNoNoNoVariesYes
Support for Comments (non-data)NoNoNoNoVariesYes, optional
CompactnessVerboseVerboseCompactCompactCompactVaries (depends on choice of formats)
Support for Streaming (appendable format)NoYesYesYesYesNo (multiple files, even if some support append mode updates)
Resilience and SecurityPartial: Well-formed constraints, start/stop tags guard against premature truncation; no checksumsPoor: No guard against premature truncationPoor: No guard against premature truncation, weak standards compliance, frequent errors and corruption when parsing dataPoor: No guard against premature truncation, weak standards compliance, frequent errors and corruption when parsing dataPoor: No guard against premature truncation, weak standards compliance, frequent errors and corruption when parsing dataOptional checksums
PerformanceSlow to Very Fast (highly optimized modern parsing libraries)Slow to Very Fast (highly optimized modern parsing libraries)FastVery Fast (simple syntax: no quote or escape processing)VariesVaries
Generated November 25, 2024