# SQL Test Data Generator
[[💡 App ideas]] [[golang]] [[Software Testing]]
**Github**: [GitHub - wesen/squeak: SQL Test Data Generator](https://github.com/wesen/squeak/tree/main)
**Youtrack**: [squeak - Youtrack](https://wesen.youtrack.cloud/projects/5f728a0e-c12b-4517-91fe-96c8acbb0477)
The idea is to build a small self-contained binary that can be used to quickly scaffold relatively complex [[SQL]] [[Relational Databases|database]] schemas and data ([[MySQL]], [[PostgreSQL]] or [[SQLite]]). This is useful for [[Unit tests]], and could potentially be extended to be used for [[Property Based Testing]].
> [!note] I want this library to [[Yak Shaving|shave a yak]], really. I want to have some reasonably complex test database to use for the tutorials I want to write for [[pgchameleon]].
> [!warning] 🧠🧠The writing below is all brainstorming and designing the application. Nothing is at the stage of being documentation yet. I am using the [[ZK - 2g1a - Design software by writing it out with words]] and [[ZK - 3d2 - Design systems using arrows and boxes]]
## General design
### 🧠Brainstorm
I want this to be a standard CLI application with
- (viper +) cobra command parsing
- should I have viper use the given file as a config file?
- should the user be able to override parts of the file with command line flags? in this case viper might make sense
- I think adding viper to the mix is maybe a bad idea
- using [gofakeit](https://github.com/brianvoe/gofakeit) with is a [[faker]] for [[golang]]
- not sure if I want to use a SQL builder, or if I should just use golang templates and code generation
- besides generating full sqlite DB files, should we generate anything besides CREATE and INSERT raw SQL files?
- do we want to deal with generating migrations (because that's interesting to execute as part of testing say, [[Idea - majipoor - no nonsense mysql to postgresql replication]])
### Overall structure
[[Idea - SQL Test Data Generator 2022-07-05 09.58.01.excalidraw]]
![[Idea - SQL Test Data Generator 2022-07-05 09.58.01.excalidraw.svg]]
### Input YAML format
[[Idea - SQL Test Data Generator 2022-07-05 10.02.59.excalidraw]]
![[Idea - SQL Test Data Generator 2022-07-05 10.02.59.excalidraw.svg]]
## Schema generator
Given an input [[YAML]] file (exact format will be determined while building), it will:
- generate [[SQL]] `CREATE` statements
- parse, interpret and cast data types
- enums, numbers, strings, blobs
- support NULL and default values
- create foreign key relationships
- allow constraints to be specified
Verbs that are related to schema generator:
- `create` (or maybe `generate` ?) (but later we might want say, `import` and `migrate`)
- `generate` sounds like a better name, since we generate both the schema and the data
- we would also need specific names to generate `UPDATE` and `DELETE` and `ALTER` statements. Maybe these would be specified separately in the [[YAML]] input file?
- do we need anything else? it would be cool to inspect an existing `CREATE` statement and generate a YAML file from that, but that's for the future (especially seeing how complex CREATE statements can be). This could come in useful for generating migrations however.
### Simple example YAML file
The first thing I want to try is just generate a 1000 users for SQLite.
```yaml
tables:
users:
id: auto
firstName: firstName
lastName: lastName
email: email
generate:
createTables: true
tables:
users:
count: 1000
dialect: sqlite
# this could be sql or db
output: sql
```
## Test data generation
Once the schema has been generated, the YAML file will be used to specify how to generate test data:
- using [[faker]] style generation for each field
- arity indication for [[Relational Algebra|relational]] data
- handle dates intelligently
- compute DAG to know which data to insert first when foreign keys are at play
- allow for more complex SQL [[Go templates|template]] [[Code Generation]] (if deemed necessary for creating say, complex inventory metrics)
## Bundle common tables
Similar to [[faker]], the library will come with preconfigured "commonly used" tables:
- users, products, addresses, posts, categories, etc...
- generate a starter YAML to be further tweaked
This will allow users to quickly scaffold a complex DB directly with a single [[Command Line|CLI]] command (no [[YAML]] file).
## See also
- [[Scaffolding a database for unit testing]]