Skip to main content

3 posts tagged with "story"

View All Tags

· 8 min read
Andre Bogus

Rust Nightly

When my colleague Christos wrote about using Rust for our startup, he made no mention of the fact that we actually use a nightly Rust compiler, and our code incorporates a set of nightly features. So I thought it might be beneficial to look at the cost-benefit ratio of using Rust nightly, for us, for the open source ecosystem, and for Rust.

Disclaimer: I wasn't the one who introduced nightly Rust features into this particular codebase, but I have a good few years of experience with nightly Rust, working on clippy since 2015

The Good#

Everybody loves features! Features are great! Can't have too many of them. And even if you don't use any nightly features, using a nightly Rust compiler will give you 12 weeks of head start on performance benefits before they hit stable Rust. In fact, for a good four years, I only had a nightly compiler installed, because my Chromebook had very little disk space and I needed nightly for clippy anyway. Besides, who's going to find bugs that nightly may have if no one tests it? Apart from officially unstable features (like, say, internal compiler APIs that clippy used), I only ever once encountered any incompatibility – which I will report later.

That said, once you have nightly, those features are a real temptation. Curiosity had me playing with new toys on more than one crate: bytecount, initially flamer and overflower, and mutagen is still nightly only. Apart from the latter, most only played with one feature at a time. Even with unstable features, many of them were pretty solid already and changes were often easy enough to follow. And if you only write a prototype of something (well, you probably shouldn't be using Rust, but if like me you are very well versed with it, and feel sufficiently productive), an unstable feature or two may give you the extra oomph to get it done, quickly.

Many of the features unlock certain powers that our code can use (e.g. async_closure), or give us a better programming experience (e.g. box_patterns). Those latter features are often undervalued; if for example you have a chain of objects on the heap, matching through all of them in one pattern makes the code much easier to read and maintain (as long as that feature is available, of course). Having to write two, three, four match statements for this becomes cumbersome quickly.

The Bad#

But there's a psychological factor at play: Once you give in to temptation with one feature, your hesitation to adopting yet another one has already diminished. Adding a third one looks even more benign and the fourth one is a no-brainer, right? And so at synth, our core/src/lib.rs starts with this:

#![feature(    format_args_capture,    async_closure,    map_first_last,    box_patterns,    error_iter,    try_blocks)]

There even used to be a try_trait until I removed it because it was easier to replace with a custom implementation than changing the whole code over to try_trait_v2 when the trait design was improved. The synth/src/lib.rs uses another concat_idents feature in some macros, so that's not even all of it. git blame tells us that the first four features were introduced in January '21, with the latter two were added in July. Coming back to that try_trait change, when #88223 hit, I found our code no longer compiled with the latest nightly. Of course we had pinned an older nightly, but I regularly checked if there was trouble on the horizon.

Now this trait was mostly used to use try-like semantics for generators (the part of synth that builds the test data) which could either return a result or yield yet another part of the thing to generate. While this design is very elegant, it is not really dependent on the try trait implementation at all, it only reused a few types from it (and even rarely used question marks). So when this code stopped working, I tried to update it to the new implementation, and when that proved too hairy, I simply removed the feature along with the one or two question marks, added our own GeneratorResult trait and changed a few occurrences of Try trait usage to use our own trait so the code worked again.

I think this case shows us two things:

  1. Sometimes the cost-benefit ratio of a feature can change over time, and often the costs are paid later rather than sooner. In this case, it was the cost of replacing the feature with our own implementation which we could have made right away without using the feature. On the other hand, the feature had worked well for us for a good while, so that was basically repaying technical debt.
  2. Often, it's not too hard to replace a feature. Still it's a chore that takes up time and resources, so unless there's an acute need to do so, the opportunity cost means other things will often be more valuable right now. So features, once used, tend to linger.

Regarding the pinned nightly and also the only incompatibility I ever encountered, I removed that pin when the May 2021 nightly version we had used stopped working with an updated proc_macro2 crate. It was later re-established with a newer version. We have this version in all our CI jobs and also in our rust-toolchain file. There also were a few troubles when we had CI and the main toolchain inadvertently go out of sync, but those were fixed up quickly.

For a version that hails as "unstable", Rust nightly is actually surprisingly solid. That has two reasons: 1. Every PR that gets merged has been extensively tested by CI, so obvious errors get caught out before the code even hits nightly. 2. Whenever a change looks risky, the infrastructure team supplies a "crater run". Crater is a tool that will try to compile every crate on crates.io with a given version of a compiler and compares if things fail to compile now. Since crates.io has 68798 crates in stock at the time of this writing, there's a pretty good chance that whatever weird thing you might encounter in live code is thrown at the compiler. I have done such a rustc change once, and it was very reassuring to know that my PR didn't break any code out there.

The Conclusion#

If you want to compile your code with the current fastest version, you can use nightly now. As long as you don't use any features, your code should still compile on stable (however, I would still use a stable compiler in CI to check, because some changes may become insta-stable, e.g. added trait implementations; Rust cannot put those behind a feature gate). There is a very small risk of breakage, but you can revert to a beta or stable compiler with no hassle if that happens.

Using nightly features is in a way like every other form of technical debt. A bit of risk taking that can give you some potentially big payoff now, at the price of possible future breakage. Whether you want to take that risk depends a lot on your project and the phase it lives in. If you're a startup desperate to get your project out there, not using that feature may mean that there won't be a project to fix later otherwise. On the other hand, if you are writing code that should live for a while, or a library that is aimed to be widely used, avoiding nightly features is likely your best bet.

If you are on nightly, you have two options: Go all in and embrace the instability or pin a known good version. I now think that pinning is the only sane option for all but hobby projects (where a bit of breakage can be acceptable every now and then). I note that clippy has a special place here, because it's essentially tied to the current rust version by design, and we get away with syncing every two weeks and staying on master (not even nightly) otherwise. Once you decide on a pinned version, you may as well pin all your dependencies and update them very cautiously, because any update could break your build. Even then it may be a good idea to test with a current nightly every now and then to gauge whether any incompatibility will hit you whenever you should decide to update.

If you encounter breakage, take a step back and look if the feature is still pulling its weight or if it's cheaper to get rid of it. So to sum up: Going nightly carries some risk. Being aware of and mitigating that risk can give you benefits now, at the cost of a price tag in the future. As always, your mileage may vary.

· 10 min read
Andre Bogus

50 ways to crash our product

I personally think that the software we build should make more people's lives better than it makes worse. So when users recently started filing bug reports, I read them with mixed feelings. On one hand, it meant that those particular users were actually using synth, on the other hand, it also meant that we were failing to give them a polished experience. So when it was my turn to write more stuff about what we do here, I set myself a challenge: Find as many ways as I can to break our product.

I briefly considered fuzzing, but decided against it. It felt like cheating. Where's the challenge in that? Also, I wanted to be sure that the bugs would be reachable by ordinary (or perhaps at least exceptional) users, and would accept misleading error messages (that a fuzzer couldn't well decide) as bugs. Finally I am convinced I learn more about some code when actively trying to break it, and that's always a plus. So "let's get cracking!" I quoth and off I went.

Overview#

Before we start, I should perhaps consider giving a short architectural overview on synth. Basically the software has four parts:

  1. The DSL (which is implemented by a set of types in core/src/schema that get deserialized from JSON),
  2. a compiler that creates a graph (which form a directed acyclic graph of items that can generate values),
  3. export (writing to the data sink) and
  4. import facilities (for creating a synth namespace from a database schema)

My plan was to look at each of the components and see if I can find inputs to break them in interesting ways. For example, leaving out certain elements or putting incorrect JSON data (that would not trip up the deserialization part, but lead to incorrect compilation later on) might be a fruitful target. Starting from an empty schema:

{    "type": "array",    "length": 1,    "content": {        "type": "object"    }}

I then called out synth generate until finding a problem. First, I attempted to insert confusing command line arguments, but the clap-based parser handled all of them gracefully. Kudos!

#1 The first thing I tried is using a negative length:

{    "type": "array",    "length": -1,    "content": {        "type": "object"    }}

Which was met with BadRequest: could not convert from value 'i64(-1)': Type { expected: "U32", got: "i64(-1)" }. Not exactly a crash, but the error message could be friendlier and have more context. I should note that this is a very unspecialized error variant within the generator framework. It would make sense to validate this before compiling the generator and emit a more user-friendly error.

Bonus: If we make the length "optional": true (which could happen because of a copy & paste error), depending on the seed, we will get another BadRequest error. The evil thing is that this will only happen with about half of the seeds, so you may or may not be lucky here (or may even become unlucky if another version would slightly change the seed handling).

#2 Changing the length field to {} makes for another befuddling error:

Error: Unable to open the namespace
Caused by:    0: at file 2_unitlength/unitlength.json    1: Failed to parse collection    2: missing field `type` at line 8 column 1

The line number is wrong here, the length should be in line six in the content object, not in line eight.

#3 It's not that long that we can use literal numbers for number constants here (for example given the length). The old way would use a number generator. A recent improvement let us generate arbitrary numbers, however this is likely not a good idea for a length field:

{    "type": "array",    "length": {        "type": "number",        "subtype": "u32"    },    "content": {        "type": "object"    }}

This might be done very quickly, but far more likely it will work for a long time, exhausing memory in the process, because this actually generates a whole lot of empty objects (which are internally BTreeMaps, so an empty one comes at 24 bytes) – up to 4.294.967.295 of them, which would fill 96GB! While this is not an error per se, we should probably at least warn on this mistake. We could also think about streaming the result instead of storing it all in memory before writing it out, at least unless there are references that need to be stored, and this would also allow us to issue output more quickly.

#4 Let's now add a string:

``json synth[expect = "string` generator is missing a subtype"] { "type": "array", "length": { "type": "number", "subtype": "u32" }, "content": { "type": "object", "s": { "type": "string" } } }


Oops, I forgot to specify which kind of string. But I wouldn't know that from the error:
```consoleError: Unable to open the namespace
Caused by:    0: at file 4_unknownstring/unknownstring.json    1: Failed to parse collection    2: invalid value: map, expected map with a single key at line 10 column 1

#5 Ok, let's make that a format then. However, I forgot that the formatmust contains a map with the keys "format" and "arguments", putting them into the s map directly:

``json synth[expect = "argumentsis expected to be a field offormat`"] { "type": "array", "length": { "type": "number", "subtype": "u32" }, "content": { "type": "object", "s": { "type": "string", "format": "say my {name}", "arguments": { "name": "name" } } } }


```consoleError: Unable to open the namespace
Caused by:    0: at file 5_misformat/misformat.json    1: Failed to parse collection    2: invalid value: map, expected map with a single key at line 14 column 1

#6 Ok, then let's try to use a faker. Unfortunately, I haven't really read the docs, so I'll just try the first thing that comes to mind:

``json synth[expect = "fakeris expected to have agenerator` field. Try '"faker": {"generator": "name"}'"] { "type": "array", "length": { "type": "number", "subtype": "u32" }, "content": { "type": "object", "name": { "type": "string", "faker": "name" } } }


This gets us:
```consoleError: Unable to open the namespace
Caused by:    0: at file empty/empty.json    1: Failed to parse collection    2: invalid type: string "name", expected struct FakerContent at line 11 column 1

One could say that the error is not exactly misleading, but not exactly helpful either. As I've tried a number of things already, I'll take it. Once I get the syntax right ("faker": { "generator": "name" }, the rest of the faker stuff seems to be rock solid.

#7 Trying to mess up with date_time, I mistakenly specify a date format for a naive_time value.

``json synth[expect = "unknown variant date_time, expected one of pattern, faker, categorical, serialized, uuid, truncated, sliced, format, constant`"] { "type": "array", "length": 1, "content": { "type": "object", "date": { "type": "string", "date_time": { "format": "%Y-%m-%d", "subtype": "naive_time", "begin": "1999-01-01", "end": "2199-31-12" } } } } }


This gets me the following error which is again misplaced at the end of the input, and not exactly understandable. The same happens if I select a date format of `"%H"` and bounds of `0` to `23`.
```consoleError: Unable to open the namespace
Caused by:    0: at file 7_datetime/datetime.json    1: Failed to parse collection    2: input is not enough for unique date and time at line 16 column 1

I believe since the time is not constrained in any way by the input, we should just issue a warning and generate an unconstrained time instead, so the user will at least get some data. Interestingly, seconds seem to be optional, so %H:%M works.

#8 More, if I use naive_date instead, but make the minimum 0-0-0, we get the technically correct but still mis-spanned:

Error: Unable to open the namespace
Caused by:    0: at file 8_endofdays/endofdays.json    1: Failed to parse collection    2: input is out of range at line 16 column 1s

For the record, the error is on line 11.

#9 Now we let date_time have some rest and go on to categorical. Having just one variant with a weight of 0 will actually trigger an unreachable error:

{    "type": "array",    "length": 1,    "content": {        "type": "object",        "cat": {            "type": "string",            "categorical": {                "empty": 0            }        }    }}

Well, the code thinks we should not be able to reach it. Surprise!

thread 'main' panicked at 'internal error: entered unreachable code', /home/andre/projects/synth/core/src/schema/content/categorical.rs:82:9note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

For the record, this is the first internal error I was able to uncover so far. Given this success with categorical strings, it was natural to look if one_of could be similarly broken, but the generator just chose the one variant despite its 0.0 weight.

#10 Unsupported types on import

Databases can sometimes contain strange things, and so far the support is in beta, so it was expected that I would find types for which we currently don't implement import. This includes JSON for mysql and postgres, the mysql spatial datatypes as well as postgres' geometric types, user-defined enumerations, postgres' network address types, postgres arrays (soon only nested ones), etc.

The way to reproduce that is to create a table with a field of the type, e.g. here with mysql

CREATE TABLE IF NOT EXISTS json (    data JSON);
DELETE FROM json;
INSERT INTO json (data) VALUES ('{ "a": ["b", 42] }');

Now call synth import jsonnamespace --from mysql://<user>:<password>@<host>:<port>/<database> to get

Error: We haven't implemented a converter for json

Since the error is mostly the same for all types, and was somewhat expected, I won't claim a point for each type here.

#11 Exporting an array of nulls into postgres is not correctly implemented, so

{    "type": "array",    "length": 5,    "content": {        "type": "object",        "s": {            "type": "array",            "length": 1,            "content": {                "type": "null"            }        }    }}

will give us a wrong data type error from postgres. The problem here is that we lose the type information from the generator, and just emit null values which do not allow us to construct the right types for encoding into a postgres buffer. The solution would be to re-architect the whole system to reinstate that type information, possibly side-stepping sqlx in the process. Note that this is not equal to issue #171, which relates to nested arrays.

#12 going back to #3, I thought about other ways to make the code overconsume resources. But time and memory are only one thing to consume, in fact it's easy enough to consume another: The stack. The following bash script:

X='{ "type": "null" }'
for i in $(seq 0 4096)do    X="{ \"type\": \"string\", \"format\": { \"format\": \"{x}\", \"arguments\": { \"x\": $X } } }"done
echo $X > 12_stack_depth/stack_depth.jsonsynth gen --size 1 12_stack_depth

will generate the following error:

Error: Unable to open the namespace
Caused by:    0: at file 12_stack_depth/stack_depth.json    1: Failed to parse collection    2: recursion limit exceeded at line 1 column 2929

So I give up. I've found 1 way to crash our product with an unintended error, reproduced some known limitations and outlined a number of error messages we can improve on. I fell far short of my original goal, which either means I'm really bad at finding errors, or our code is incredibly reliable. Given the track record of software written in Rust, I'd like to think it's the latter, but I'll leave judgement to you.

Anyway, this was a fun exercise and I looked at many more things that turned out to just work well, so that's a good thing™. With all the test automation we have today, it's easy to forget that the manual approach also has its upsides. So feel free and try to break your (or our) code!

· 12 min read

So you want to mock an API

API mocking refers to the process of simulating the behaviour of a real API using a fake replacement.

There are many great use-cases for API mocking. You may want to mock an API to eliminate development dependencies between engineering teams. If for example a service which is a dependency to your front-end isn't ready - you may want a placeholder which will unblock your front-end team.

API mocking is also a really powerful tool when doing integration tests against 3rd party APIs. This can be broken down roughly into functional and non-function testing:

  • Functional Testing: You care about the semantics of the API. It is important to be able to make a request and get an appropriate response which takes your request into account. Unfortunately API providers often have sub-par testing environments which can make this a real pain.
  • Non-Functional Testing: You don't really care about the semantics of the API. This can be used to speed up integration tests as requests don't need to travel to the API providers servers. You may also want to verify your SLAs, load test / stress test your systems etc. In this case being rate-limited by the API provider's testmode can be a limiting constraint.

At Synth, we're building a declarative data generator. We wanted to apply our data generation engine to mocking a subset of a popular API and see how far we could go. We set out to prototype a solution over (roughly) 5 days as a side project - this blog post is an overview of that journey.

Day 1: Picking an API#

So much to do, so little time. We decided we wanted to mock a popular API but didn't know where to start. Companies like Stripe have an excellent testmode and even an open source http server which you can use instead.

We decided to ask the internet 'Which API have you been struggling to test against' on various forums like Reddit, HN and others. Ok so now we wait and see let's see what the internet has to say.

Day 2: Choosing Shopify#

Lo and behold! the internet responded. A bunch of people responded and primarily complained about payment processors (except for Stripe which was explicitly praised yet again!). A few products and companies came up repeatedly as being difficult to test against. We qualitatively evaluated the internet's feedback and reviewed documentation from the different APIs mentioned to understand the implementation complexity. After all we had 3.5 days left, so we couldn't pick anything too complex. In the end we decided to go with the Shopify API!

Just as a disclaimer we have absolutely no issues with Shopify, it just so happens that a lot of the feedback we got pointed us that direction.

Now the Shopify API is pretty big - and we're building a mock server POC from scratch, so we decided to narrow down and try to mock a single endpoint first. We chose the Event API which seemed pretty straight forward. Looking at the Event API there are three dimensions to consider when designing our POC solution.

1. The Data Model#

The Event API returns a JSON payload which is a collection of Events. An example Event can be seen below:

{    // Refers to a certain event and its resources.    "arguments": "Ipod Nano - 8GB",    // A text field containing information about the event.    "body": null,    // The date and time (ISO 8601 format) when the event was created.    "created_at": "2015-04-20T08:33:57-11:00",    // The ID of the event.    "id": 164748010,    // A human readable description of the event.    "desciption": "Received a new order",    // A relative URL to the resource the event is for, if applicable.    "path": "/admin/orders/406514653/transactions/#1145",    // A human readable description of the event. Can contain some HTML formatting.    "message": "Received a new order",    // The ID of the resource that generated the event.    "subject_id": 406514653,    // The type of the resource that generated the event.    "subject_type": "Order",    // The type of event that occurred.     "verb": "confirmed"}

Off the bat it's clear that there is some business logic that needs to be implemented. For example, there is some notion of causality, i.e. an Order cannot be closed before it's been placed. This non-trivial business logic was good news - it means we can showcase some complex data generation logic that's built into synth.

Since we don't have access to the code that runs the Shopify API, we have to simulate the behaviour of the Event data model. There are varying degrees of depth into which one can go, and we broke it into 4 levels:

  1. Level 1 - Stub: Level 1 is just about exposing an endpoint where the data on a per element basis 'looks' right. You have the correct types, but you don't really care about correctness across elements. For example, you care that path has the correct subject_id in the URI, but you don't care that a given Order goes from placed to closed to re_openedetc...
  2. Level 2 - Mock: Level 2 involves maintaining the semantics of the Events collection as a whole. For example created_at should always increase as id increases (a larger id means an event was generated at a later date). verbs should follow proper causality (as per the order example above). etc.
  3. Level 3 - Emulate: Level 3 is about maintaining semantics across endpoints. For example creating an order in a different Shopify API endpoint should create an order_placed event in the Event API.
  4. Level 4 - Simulate: Here you are basically reverse engineering all the business logic of the API. It should be indistinguishable from the real thing.

Really these levels can be seen as increasing in scope as you simulate semantics per-element, per-endpoint, cross-endpoint and finally for the entire API.

2. The Endpoint Behaviour#

The Event API exposes 2 endpoints:

  • GET /admin/api/2021-07/events.json which retrieves a list of all events
  • GET /admin/api/2021-07/events/{event_id}.json which retrieves a single even by its ID.

The first endpoint exposes various query parameters (which are basically filters) which alter the response body:

limit:          The number of results to show. (default: 50, maximum: 250)since_id:       Show only results after the specified ID.created_at_min: Show events created at or after this date and time. (format: 2014-04-25T16:15:47-04:00)created_at_max: Show events created at or before this date and time. (format: 2014-04-25T16:15:47-04:00)filter:         Show events specified in this filter.verb:           Show events of a certain type.fields:         Show only certain fields, specified by a comma-separated list of field names.

Luckily the filtering behaviour is simple, and as long as the implementation stays true to the description in the docs it should be easy to emulate.

The second endpoint takes one query parameter which is a comma-separated list of fields to return for a given event. Again should be easy enough.

3. Authentication#

We decided not to touch authentication for now as the scope would blow up for a 5-day POC. Interestingly we got a bunch of feedback that mocking OAuth flows or similar would be really useful, regardless of any specific API. We may come back to this at a future date.

Day 3: Evaluating Implementation Alternatives#

And then there was Day 3. We'd done our due diligence to pick a popular yet underserved API, and we'd drilled down on how deep we could go in trying to faithfully represent the implementation.

As any self-respecting engineer would do, we decided to scour the internet for off-the-shelf solutions to automate as much of the grunt work as possible. Some naive Googling brought up a mock server called JSON server - an API automation solution which spins up a REST API for you given a data definition. Excited by this we quickly wrote up 2 fake Event API events, and started JSON server feeding it the fake events - and it worked!

Well almost; we were initially excited by the fact that it did exactly what is said on the tin and very well, however it didn't have an easy way to specify the custom query parameters we needed to faithfully reproduce the API. For example returning results before or after a given created_at timestamp (feel free to let us know if we missed something here!).

So we needed something a little more sophisticated. The internet came to the rescue again with a comprehensive list of API simulation tools. The basic precondition we had was that the API simulator had to be OSS with a permissive license we could build on. This immediately disqualified 50% of the available solutions, and we did a divide and conquer exercise quickly evaluating the rest.

Api Simulation Tools

The remaining tools were either not built for this purpose, or they were incredibly complex pieces of software that would take a while to get acquainted with.

In the end we decided to implement the endpoint functionality ourselves - we figured that a 50 LOC node/express server would do a fine job for a POC.

Day 4: Implementing the core functionality#

Day 4 was the most straight forward. Let's get this thing to work!

1. The Data Model#

We decided to reproduce the API at level 1-2 since we didn't really have any other endpoints. We used synth to quickly whip up a data model that generates data that looks like responses from the Event API. I won't go into depth on how this works here as it's been covered in other posts. In about 15 minutes of tweaking the synth schema, we generated ~10 Mb data that looks like this:

[    {we had        "arguments": "generate virtual platforms",        "body": null,        "created_at": "2019-09-17T14:16:47",        "description": "Received new order",        "id": 477306,        "message": "Received new order",        "path": "/admin/orders/83672/transactions/#6020",        "subject_id": 1352997,        "subject_type": "Order",        "verb": "closed"    },    {        "arguments": "innovate best-of-breed schemas",        "body": null,        "created_at": "2017-05-20T00:04:41",        "description": "Received new order",        "id": 370051,        "message": "Received new order",        "path": "/admin/orders/82607/transactions/#9154",        "subject_id": 1226112,        "subject_type": "Order",        "verb": "sale_pending"    },    {        "arguments": "incentivize scalable mindshare",        "body": null,        "created_at": "2018-02-21T12:51:36",        "description": "Received new order",        "id": 599084,        "message": "Received new order",        "path": "/admin/orders/01984/transactions/#3595",        "subject_id": 1050540,        "subject_type": "Order",        "verb": "placed"    }]

We then dumped it all in a MongoDB collection with the one-liner:

$ synth generate shopify --to mongodb://localhost:27017/shopify --size 40000

Next step is to re-create the Event API endpoint.

2. Creating the API#

Creating the API was pretty straightforward. We wrote a prisma model for the responses which basically worked out of the box with the data dumped into MongoDB by synth. This gave us all the filtering we needed basically for free.

Then we wrote a quick and dirty express server that maps the REST endpoint's querystrings into a query for prisma. The whole thing turned out to be ~90 LOC. You can check the source here .

Day 5: Packaging#

Docker

The data is ready, the API is ready, time to package this thing up and give it to people to actually use. Let's see if our experiment was a success.

While strategising about distributing our API, we were optimising for two things:

  1. Ease of use - how simple it is for someone to download this thing and get going
  2. Time - we have 2-3 hours to make sure this thing is packaged and ready to go

in that order.

We needed to pack the data, database and a node runtime to actually run the server. Our initial idea was to use docker-compose with 2 services, the database and web-server, and then the network plumbing to get it to work. After discussing this for a few minutes, we decided that docker-compose may be an off-ramp for some users as they don't have it installed or are not familiar with how it works. This went against our first tenet which is 'ease of use'.

So we decided to take the slightly harder and hackier route of packaging the whole thing in a single Docker container. It seemed like the best trade-off between goals 1 and 2.

There were 6 steps to getting this thing over the line:

  1. Start with the MongoDB base image. This gives us a Linux environment and a database.
  2. Download and install NodeJS runtime in the container.
  3. Download and install synth in the container.
  4. Copy the javascript sources over & the synth data model
  5. Write a small ENTRYPOINT shell script to start the mongod, server and generate data into the server
  6. Start the server and expose port 3000

And we're done! We've hackily happily packaged our mock API in a platform agnostic one liner.

Was it a success?#

An important aspect of this experiment was to see if we could conceive, research, design and implement a PoC in a week (as a side project, we were working on synth at the same time). I can safely say this was a success! We got it done to spec.

An interesting thing to note is that 60% of the time was spent on ideating, researching and planning - and only 40% of the time on the actual implementation. However, spending all that time planning before writing code definitely saved us a bunch of time, and if we didn't plan so much the project would have overshot or failed.

Now if the PoC itself was a success is a different question. This is where you come in. If you're using the Event API, build the image and play around with it.

You can get started by quickly cloning our git repository and then:

cd shopify && docker build -t shopify-mock . && docker run --rm -p 3000:3000 shopify-mock

then simply:

curl "localhost:3000/admin/api/2021-07/events.json"

We'd like to keep iterating on the Shopify API and improve it. If there is interest we'll add more endpoints and improve the existing Event data model.

If you'd like to contribute, or are interested mocks for other APIs other than Shopify, feel free to open an issue on GitHub!