Core concepts
This section covers the core concepts found in Synth.
#
NamespacesThe namespace is the top-level abstraction in Synth. Namespaces are the equivalent of traditional schemas in the world of relational databases likes PostgreSQL. References can exist between fields in a given namespace, but never across namespaces.
Namespaces are simply directories from which synth
reads a collection of
schema files. For example, a namespace blog
could have the following
structure:
โโโ blog/ โโโ users.json โโโ posts.json
Any file whose extension is .json
in a namespace directory will be opened by
the synth generate
subcommand and considered part of the
namespace's schema.
#
CollectionsEvery namespace has zero or more collections. Collections are addressable by their name and correspond to tables in the world of relational databases. Strictly speaking, collections are a super-set of tables as they are in fact arbitrarily deep JSON document trees.
Collections are represented in a namespace directory as JSON files. The name
of a collection (the way it is referred to by synth
) is its filename
without the extension. For example the file bank/transactions.json
defines a
collection named transactions
in a namespace bank
.
For a more comprehensive example, let's imagine our namespace bank
has a
collection transactions
and another collection users
. The directory
structure then looks like this:
โโโ bank/ โโโ transactions.json โโโ users.json
Collections must be valid instances of the synth
schema that
describe an array. This means at the top-level all collections must
be array generators.
#
Field referencesA field reference is a special kind of fields that is useful for declaring relations between different parts of a collection or different collections in the same namespace.
A field reference can be specified by using the same_as generator type.
The value of the "ref"
field should be the address of the field you want to
refer to. A field address takes the
form <collection name>.<level_0>.<level_1>...
. For example, say we have a
collection users.json
containing the following schema:
{ "type": "array", "length": { "type": "number", "subtype": "u64", "range": { "low": 1, "high": 4, "step": 1 } }, "content": { "type": "object", "username": { "type": "string", "faker": { "generator": "username" } }, "credit_card": { "type": "string", "faker": { "generator": "credit_card" } }, "id": { "type": "number", "subtype": "u64", "id": {} } }}
A reference to the username
field would have the
address users.content.username
. If we want to add a reference to this field
from another collection we would simply add:
{ "type": "array", "length": 1, "content": { "type": "object", "username": { "type": "same_as", "ref": "users.content.username" } }}
#
SchemaThe schema is the core data structure that you need to understand to be productive with Synth. The schema represents your data model, it tells Synth exactly how to generate data, which fields we need, what types and so on. This is a little involved so there is a section devoted to just the schema.
#
ScenariosSince collections correspond to closely to a database collection, we will have numerous use cases which only uses a subset of the collections in a namespace. This is were we will use scenarios.
Scenarios allow us to define a specific use case for the data in a namespace.
So expanding from our bank
example, we can create a scenario which only
generates data for users by having the following directory structure:
โโโ bank/ โโโ scenarios โย ย โโโ users-only.json โโโ transactions.json โโโ users.json
This creates a scenario called users-only
by having a [scenario-name].json
inside the scenarios/
directory inside our namespace.
The definition for this scenario will look as follow:
{ "users": {}}
This definition explicitly marks the users
collection for inclusion inside
this scenario.
#
Importing datasetsSynth can ingest and build schemas on the fly with
the synth import
command.
#
Generating dataTo generate data from an existing namespace use
the synth generate
command.
synth
uses a seedable pseudo-random source of entropy. By default,
the seed is set to a constant value of 0
using the
Rust-native rand::SeedableRng::seed_from_u64
function. This
means that, by default, the data that synth
generates is
deterministic: it is only a function of your schema files.
This behavior can be tuned (and the seed be changed, or randomized) using
the --seed
or --random
flag.