Skip to main content

Complex Procedural Rust Macros

· 8 min read
Andre Bogus

I recently wrote the most complex procedural Rust macro I’ve ever attempted. This post tries to outline the problems I’ve encountered and tells how I overcame them.

The Background#

With synth, we are building a declarative command line test data generator. For now, the specification that declares what test data to build is just JSON that gets deserialised to our data structures using serde_json. This was a quick and easy way to configure our application without much overhead in terms of code. For example:

{    "type": "array",    "length": {        "type": "number",        "constant": 3    },    "content": {        "type": "object",        "id": {            "type": "number",            "id": {}        },        "name": {            "type": "string",            "faker": {                "generator": "name"            }        },        "email": {            "type": "string",            "faker": {                "generator": "ascii_email"            }        }    }}

However, it’s also not very nice to write (for example JSON has no comments, no formulas, etc.), so we wanted to bind our specification to a scripting language. Our end goal is to extend the language (both in terms of builtin functions and syntax) to make the configuration really elegant. After some testing and benchmarking different runtimes, our choice fell on koto, a nice little scripting language that was built foremost for live coding.

Unfortunately, koto has a very bare interface to bind to external Rust code. Since we are talking about a rather large number of types we want to include, it was clear from the start that we would want to generate the code to bind to koto.

Early Beginnings#

So I started with a somewhat simple macro to wrangle koto types (e.g. Maps) into our Rust types. However, I soon found that the marshalling overhead would have been fine for an initial setup phase, but not for recurrent calls into koto (for example for filter functions called for each row). Thus I changed my approach to try and bind functions, then extended that to types and impl blocks.

I found – as I then thought – a genius technique of generating functions that would call each other, thus daisy-chaining a series of code blocks into one that could then be called with another bindlang_main!() proc macro:

static FN_NUMBER: AtomicUsize = AtomicUsize::new(0);
fn next_fn(mut b: Block, arg: &Expr) -> Item {    let number = FN_NUMBER.fetch_add(1, SeqCst);    let this_fn = fn_ident(number);    if let Some(n) = number.checked_sub(1) {        let last_fn = fn_ident(n);        b.stmts.push(parse_quote! { #last_fn(#arg); });    }    b.stmts.extend(last_call);    parse_quote! { fn #this_fn(#arg) { #b } }}
#[proc_macro]fn bindlang_main(arg: TokenStream) -> TokenStream {    let arg = ident(arg.to_string());    TokenStream::from(if let Some(n) = FN_NUMBER.load(SeqCst).checked_sub(1) {        let last_fn = fn_ident(n);        quote! { #last_fn(#arg) }    } else {        proc_macro2::TokenStream::default()    })}

I also wrote a derive macro to implement the marshalling traits. This worked well for a small example that was entirely contained within one module, but failed once the code was spread out through multiple modules: The functions would no longer be in the same scope and therefore stopped finding each other.

Worse, I needed a number of pre-defined maps with functions for method dispatch for our external types within koto. A type in Rust can have an arbitrary number of impl blocks but I needed exactly one table, and I couldn’t simply daisy-chain those.

It was clear I needed a different solution. After thinking long and hard I came to the conclusion that I needed to pull all the code together in one scope, by the bindlang_main!() macro. My idea was that I create a HashMap of syn::Items to be quoted together into one TokenStream. A lazy static Arc<Mutex<Vec<Context>>> was to collect the information from multiple attribute invocations:

#[derive(Default)]struct Context {    bare_fns: Vec<MethodSig>,    modules: HashMap<String, Vec<MethodSig>>,    vtables: HashMap<String, Vec<MethodSig>>,    types: HashMap<String, String>,}lazy_static::lazy_static! {    static ref CONTEXT: Arc<Mutex<Context>> = Arc::new(Mutex::new(Context::default()));}
#[proc_macro_attribute]pub fn bindlang(_attrs: TokenStream, code: TokenStream) -> TokenStream {    let code_cloned = code.clone();    let input = parse_macro_input!(code_cloned as Item);    // evaluate input here, and store information in Context    // CONTEXT.lock().unwrap()...    code}

This was when I found out that none of syn's type is Send and therefore cannot be stored within a Mutex. My first attempt to circumvent this was moving everything to Strings and using syn::parse_str to get the items out. This failed because of macro hygiene: Each identifier in Rust proc_macros has an identity. Two identifiers resulting from two macro operations will get different identities, no matter if their textual representation is the same.

I also found that proc_macro_derives have no way to get the #[derive(..)] attribute of the type, and I wanted to also bind derived trait implementations (at least for Default, because some types have no other constructors). So I removed the derive and moved the implementation to the #[bindlang] attribute macro, which now works on types, impl blocks and fns.

Beware: This makes use of the (unspecified, but as of now working) top-down order of macro expansion to work!

Dirty tricks avoided#

There is a Span::mixed_context() variant that will yield semi-hygienic macros (like with macro_rules). However, this looked risky (macro hygiene is there to protect us, so we better have a good reason to override it), so I took the data oriented approach, collecting the info I needed to create the code in the lazy_static to walk within bindlang_main!(). I still tried to generate the trait impls for marshalling directly in the attribute macro, but this again ran into macro hygiene trouble, because I could not recreate the virtual dispatch table identifiers. After moving this part to the main macro, too, the macro finally expanded successfully.

Except it didn’t compile successfully.

I had forgotten to use the items I was creating code for in the macro, and koto requires all external types to implement Display. So I added those imports as macro arguments and added the Display impls to be met with a type inference error within the macro invocation. Clearly I needed some type annotations, but the error message only showed me the macro invocation, which was pretty unhelpful.

Expanding Our Vision#

My solution to debug those was to cargo expand the code, comment out the macro invocation and copy the expanded portions in its place so that my IDE would pinpoint the error for me. I had to manually un-expand some format! invocations so the code would resolve correctly, and finally found where I needed more type annotations. With those added, the code finally compiled. Whew!

I then extended the bindings to also cover trait impls and Options, while my colleague Christos changed the code to marshall Rust values into koto values to mangle Result::Err(_) into koto’s runtime errors. Remembering that implicit constructors (structs and enum variants) are also useful, I added support to binding those if public. There was another error where intermediate code generated wouldn't parse, but some eprintln! debugging helped pinpoint the piece of code where it happened.

When trying to bind functions taking a non self referenced argument (e.g. fn from(value: &Value) -> Self), I found that the bindings would not work, because my FromValue implementation could not get references. Remember, a function in Rust cannot return a borrow into a value that lives only within it. It took me a while to remember I blogged about the solution in 2015! Closures to the rescue! The basic idea is to have a function that takes a closure with a reference argument and return the result of that closure:

pub trait RefFromValue {    fn ref_from_value<R, F: Fn(&Self) -> R>(        key_path: &KeyPath<'_>,        value: &Value,        f: F,    ) -> Result<R, RuntimeError>;}

Having this in a separate trait allows us to distinguish types where the borrow isn't &T, e.g. for &str. Also we gain a bit of simplicity by using unified function call syntax (MyType::my_fn(..) instead of v0.my_fn). This also meant I had to nest the argument parsing: I did this by creating the innermost Expr and wrap it in argument extractors in reverse argument order:

let mut expr: Expr = parse_quote! { #path(#(#inner_idents),*) };for (i, ((a, v), mode)) in idents.iter().zip(args.iter()).enumerate().rev() {    expr = if mode.is_ref() {        parse_quote! {             ::lang_bindings::RefFromValue::ref_from_value(                &::lang_bindings::KeyPath::Index(#i, None),                #a,                |#v| #expr            )?        }    } else {        parse_quote! {            match (::lang_bindings::FromValue::from_value(                &::lang_bindings::KeyPath::Index(#i, None),                #a            )?) {                 (#v) => #expr            }        }    };}

Note that match in the else part is there to introduce a binding without requiring a Block, a common macro trick. Now all that was left to do was add #[bindlang] attributes to our Namespace and its contents, and also add a lot of Display implementations because koto requires this for all ExternalValue implementors.

In conclusion, our test configuration should now look something like:

synth.Namespace({    synth.Name("users"): synth.Content.Array (    {            synth.Name("id"): synth.Content.Number(NumberContent.Id(schema.Id)),            synth.Name("name"): synth.Content.String(StringContent.Faker("firstname", ["EN"])),            synth.Name("email"): synth.Content.String(StringContent.Faker("email", ["EN"])),    },    synth.Content.Number(NumberContent.Constant(10))    })})

That's only the beginning: I want to introduce a few coercions, a custom koto prelude and perhaps some syntactic sugar to make this even easier to both read and write.

The Takeaway#

Macros that collect state to use later are possible (well, as long as the expansion order stays as it is) and useful, especially where code is strewn across various blocks (or even files or modules). However, if code relies on other code, it best be emitted in one go, otherwise module visibility and macro hygiene conspire to make life hard for the macro author. And if at one point the expansion order gets changed in a way that breaks the macro, I can change it to a standalone crate to be called from build.rs thanks to proc_macro2 being decoupled from the actual implementation.