I recently wrote the most complex procedural Rust macro I’ve ever attempted. This post tries to outline the problems I’ve encountered and tells how I overcame them.
With synth, we are building a declarative command line test data generator. For now, the specification that declares what test data to build is just JSON that gets deserialised to our data structures using
serde_json. This was a quick and easy way to configure our application without much overhead in terms of code. For example:
However, it’s also not very nice to write (for example JSON has no comments, no formulas, etc.), so we wanted to bind our specification to a scripting language. Our end goal is to extend the language (both in terms of builtin functions and syntax) to make the configuration really elegant. After some testing and benchmarking different runtimes, our choice fell on koto, a nice little scripting language that was built foremost for live coding.
Unfortunately, koto has a very bare interface to bind to external Rust code. Since we are talking about a rather large number of types we want to include, it was clear from the start that we would want to generate the code to bind to koto.
So I started with a somewhat simple macro to wrangle koto types (e.g. Maps) into our Rust types. However, I soon found that the marshalling overhead would have been fine for an initial setup phase, but not for recurrent calls into koto (for example for filter functions called for each row). Thus I changed my approach to try and bind functions, then extended that to types and impl blocks.
I found – as I then thought – a genius technique of generating functions that would call each other, thus daisy-chaining a series of code blocks into one that could then be called with another
bindlang_main!() proc macro:
I also wrote a derive macro to implement the marshalling traits. This worked well for a small example that was entirely contained within one module, but failed once the code was spread out through multiple modules: The functions would no longer be in the same scope and therefore stopped finding each other.
Worse, I needed a number of pre-defined maps with functions for method dispatch for our external types within koto. A type in Rust can have an arbitrary number of impl blocks but I needed exactly one table, and I couldn’t simply daisy-chain those.
It was clear I needed a different solution. After thinking long and hard I came to the conclusion that I needed to pull all the code together in one scope, by the
bindlang_main!() macro. My idea was that I create a
syn::Items to be quoted together into one
TokenStream. A lazy static
Arc<Mutex<Vec<Context>>> was to collect the information from multiple attribute invocations:
This was when I found out that none of
syn's type is
Send and therefore cannot be stored within a
Mutex. My first attempt to circumvent this was moving everything to
Strings and using
syn::parse_str to get the items out. This failed because of macro hygiene: Each identifier in Rust
proc_macros has an identity. Two identifiers resulting from two macro operations will get different identities, no matter if their textual representation is the same.
I also found that
proc_macro_derives have no way to get the
#[derive(..)] attribute of the type, and I wanted to also bind derived trait implementations (at least for
Default, because some types have no other constructors). So I removed the derive and moved the implementation to the
#[bindlang] attribute macro, which now works on types, impl blocks and fns.
Beware: This makes use of the (unspecified, but as of now working) top-down order of macro expansion to work!
There is a
Span::mixed_context() variant that will yield semi-hygienic macros (like with
macro_rules). However, this looked risky (macro hygiene is there to protect us, so we better have a good reason to override it), so I took the data oriented approach, collecting the info I needed to create the code in the
lazy_static to walk within
bindlang_main!(). I still tried to generate the trait impls for marshalling directly in the attribute macro, but this again ran into macro hygiene trouble, because I could not recreate the virtual dispatch table identifiers. After moving this part to the main macro, too, the macro finally expanded successfully.
Except it didn’t compile successfully.
I had forgotten to
use the items I was creating code for in the macro, and koto requires all external types to implement
Display. So I added those imports as macro arguments and added the
impls to be met with a type inference error within the macro invocation. Clearly I needed some type annotations, but the error message only showed me the macro invocation, which was pretty unhelpful.
My solution to debug those was to
cargo expand the code, comment out the macro invocation and copy the expanded portions in its place so that my IDE would pinpoint the error for me. I had to manually un-expand some
format! invocations so the code would resolve correctly, and finally found where I needed more type annotations. With those added, the code finally compiled. Whew!
I then extended the bindings to also cover trait impls and
Options, while my colleague Christos changed the code to marshall Rust values into koto values to mangle
Result::Err(_) into koto’s runtime errors. Remembering that implicit constructors (structs and enum variants) are also useful, I added support to binding those if public. There was another error where intermediate code generated wouldn't parse, but some
eprintln! debugging helped pinpoint the piece of code where it happened.
When trying to bind functions taking a non
self referenced argument (e.g.
fn from(value: &Value) -> Self), I found that the bindings would not work, because my
FromValue implementation could not get references. Remember, a function in Rust cannot return a borrow into a value that lives only within it. It took me a while to remember I blogged about the solution in 2015! Closures to the rescue! The basic idea is to have a function that takes a closure with a reference argument and return the result of that closure:
Having this in a separate trait allows us to distinguish types where the borrow isn't
&T, e.g. for
&str. Also we gain a bit of simplicity by using unified function call syntax (
MyType::my_fn(..) instead of
v0.my_fn). This also meant I had to nest the argument parsing: I did this by creating the innermost
Expr and wrap it in argument extractors in reverse argument order:
match in the
else part is there to introduce a binding without requiring a
Block, a common macro trick. Now all that was left to do was add
#[bindlang] attributes to our
Namespace and its contents, and also add a lot of
Display implementations because koto requires this for all
In conclusion, our test configuration should now look something like:
That's only the beginning: I want to introduce a few coercions, a custom koto prelude and perhaps some syntactic sugar to make this even easier to both read and write.
Macros that collect state to use later are possible (well, as long as the expansion order stays as it is) and useful, especially where code is strewn across various blocks (or even files or modules). However, if code relies on other code, it best be emitted in one go, otherwise module visibility and macro hygiene conspire to make life hard for the macro author. And if at one point the expansion order gets changed in a way that breaks the macro, I can change it to a standalone crate to be called from build.rs thanks to
proc_macro2 being decoupled from the actual implementation.