CSV - Never Complete Only Abandoned

Despite the categorization, CSV is not a standard. It is a collection of ad-hoc formats for tabular and jagged array data. > CSV is a very old, very simple and very common “standard” for (tabular) data. We say “standard” in quotes because there was never a formal standard for CSV \- https://frictionlessdata.io/blog/2018/07/09/csv/ # Notability In 2013 or so I had to write a custom CSV parser because my team was being sent a massive absolutely garbage file with poor escaping and multiple embedded encodings. I managed to parse around 3 nines of the data we were ingesting using this parser when before the parsers were would die or simply refuse. It's really appealing because it seems so simple, yet as soon as you try to do anything with it, things become messy and complicated. # Platform Support Everything and nothing. # Features Simple yet complicated. ## Missing Features - CSV lacks any way to specify type information: that is, there is no standardized way to distinguish “1” the string from 1 the number. - No support for relationships between different “tables”. - CSV is really only for tabular data – it is not so good for data with nesting or where structure is not especially tabular. # Tips # References - https://frictionlessdata.io/blog/2018/07/09/csv/ - https://datatracker.ietf.org/doc/html/rfc4180 ## Schemas - [CSV Dialect Description Format](https://specs.frictionlessdata.io/csv-dialect/) - Schema sidecar files in JSON, lighter weight than than CSVW - [CSVW](https://csvw.org/) - Sidecar JSON but in a *very* XML style - [Table Schema](https://frictionlessdata.io/table-schema/) - Provides a very simple way to describe your schema externally, including data types and relationships - [Linked CSV](http://jenit.github.io/linked-csv/) - Embeds schema information into the CSV file ## Alternatives - [[WSV]]