3 posts tagged with "rust"

The nightly elephant in the room

October 11, 2021 · 8 min read

Chief Rustacean

Rust Nightly

When my colleague Christos wrote about using Rust for our startup, he made no mention of the fact that we actually use a nightly Rust compiler, and our code incorporates a set of nightly features. So I thought it might be beneficial to look at the cost-benefit ratio of using Rust nightly, for us, for the open source ecosystem, and for Rust.

Disclaimer: I wasn't the one who introduced nightly Rust features into this particular codebase, but I have a good few years of experience with nightly Rust, working on clippy since 2015

The Good#

Everybody loves features! Features are great! Can't have too many of them. And even if you don't use any nightly features, using a nightly Rust compiler will give you 12 weeks of head start on performance benefits before they hit stable Rust. In fact, for a good four years, I only had a nightly compiler installed, because my Chromebook had very little disk space and I needed nightly for clippy anyway. Besides, who's going to find bugs that nightly may have if no one tests it? Apart from officially unstable features (like, say, internal compiler APIs that clippy used), I only ever once encountered any incompatibility – which I will report later.

That said, once you have nightly, those features are a real temptation. Curiosity had me playing with new toys on more than one crate: bytecount, initially flamer and overflower, and mutagen is still nightly only. Apart from the latter, most only played with one feature at a time. Even with unstable features, many of them were pretty solid already and changes were often easy enough to follow. And if you only write a prototype of something (well, you probably shouldn't be using Rust, but if like me you are very well versed with it, and feel sufficiently productive), an unstable feature or two may give you the extra oomph to get it done, quickly.

Many of the features unlock certain powers that our code can use (e.g. async_closure), or give us a better programming experience (e.g. box_patterns). Those latter features are often undervalued; if for example you have a chain of objects on the heap, matching through all of them in one pattern makes the code much easier to read and maintain (as long as that feature is available, of course). Having to write two, three, four match statements for this becomes cumbersome quickly.

The Bad#

But there's a psychological factor at play: Once you give in to temptation with one feature, your hesitation to adopting yet another one has already diminished. Adding a third one looks even more benign and the fourth one is a no-brainer, right? And so at synth, our core/src/lib.rs starts with this:

#![feature(    format_args_capture,    async_closure,    map_first_last,    box_patterns,    error_iter,    try_blocks)]

There even used to be a try_trait until I removed it because it was easier to replace with a custom implementation than changing the whole code over to try_trait_v2 when the trait design was improved. The synth/src/lib.rs uses another concat_idents feature in some macros, so that's not even all of it. git blame tells us that the first four features were introduced in January '21, with the latter two were added in July. Coming back to that try_trait change, when #88223 hit, I found our code no longer compiled with the latest nightly. Of course we had pinned an older nightly, but I regularly checked if there was trouble on the horizon.

Now this trait was mostly used to use try-like semantics for generators (the part of synth that builds the test data) which could either return a result or yield yet another part of the thing to generate. While this design is very elegant, it is not really dependent on the try trait implementation at all, it only reused a few types from it (and even rarely used question marks). So when this code stopped working, I tried to update it to the new implementation, and when that proved too hairy, I simply removed the feature along with the one or two question marks, added our own GeneratorResult trait and changed a few occurrences of Try trait usage to use our own trait so the code worked again.

I think this case shows us two things:

Sometimes the cost-benefit ratio of a feature can change over time, and often the costs are paid later rather than sooner. In this case, it was the cost of replacing the feature with our own implementation which we could have made right away without using the feature. On the other hand, the feature had worked well for us for a good while, so that was basically repaying technical debt.
Often, it's not too hard to replace a feature. Still it's a chore that takes up time and resources, so unless there's an acute need to do so, the opportunity cost means other things will often be more valuable right now. So features, once used, tend to linger.

Regarding the pinned nightly and also the only incompatibility I ever encountered, I removed that pin when the May 2021 nightly version we had used stopped working with an updated proc_macro2 crate. It was later re-established with a newer version. We have this version in all our CI jobs and also in our rust-toolchain file. There also were a few troubles when we had CI and the main toolchain inadvertently go out of sync, but those were fixed up quickly.

For a version that hails as "unstable", Rust nightly is actually surprisingly solid. That has two reasons: 1. Every PR that gets merged has been extensively tested by CI, so obvious errors get caught out before the code even hits nightly. 2. Whenever a change looks risky, the infrastructure team supplies a "crater run". Crater is a tool that will try to compile every crate on crates.io with a given version of a compiler and compares if things fail to compile now. Since crates.io has 68798 crates in stock at the time of this writing, there's a pretty good chance that whatever weird thing you might encounter in live code is thrown at the compiler. I have done such a rustc change once, and it was very reassuring to know that my PR didn't break any code out there.

The Conclusion#

If you want to compile your code with the current fastest version, you can use nightly now. As long as you don't use any features, your code should still compile on stable (however, I would still use a stable compiler in CI to check, because some changes may become insta-stable, e.g. added trait implementations; Rust cannot put those behind a feature gate). There is a very small risk of breakage, but you can revert to a beta or stable compiler with no hassle if that happens.

Using nightly features is in a way like every other form of technical debt. A bit of risk taking that can give you some potentially big payoff now, at the price of possible future breakage. Whether you want to take that risk depends a lot on your project and the phase it lives in. If you're a startup desperate to get your project out there, not using that feature may mean that there won't be a project to fix later otherwise. On the other hand, if you are writing code that should live for a while, or a library that is aimed to be widely used, avoiding nightly features is likely your best bet.

If you are on nightly, you have two options: Go all in and embrace the instability or pin a known good version. I now think that pinning is the only sane option for all but hobby projects (where a bit of breakage can be acceptable every now and then). I note that clippy has a special place here, because it's essentially tied to the current rust version by design, and we get away with syncing every two weeks and staying on master (not even nightly) otherwise. Once you decide on a pinned version, you may as well pin all your dependencies and update them very cautiously, because any update could break your build. Even then it may be a good idea to test with a current nightly every now and then to gauge whether any incompatibility will hit you whenever you should decide to update.

If you encounter breakage, take a step back and look if the feature is still pulling its weight or if it's cheaper to get rid of it. So to sum up: Going nightly carries some risk. Being aware of and mitigating that risk can give you benefits now, at the cost of a price tag in the future. As always, your mileage may vary.

Building a startup with Rust

October 7, 2021 · 8 min read

Christos Hadjiaslanis

Founder

Rust

When building a company you are setting out to fundamentally solve a problem. For this reason, engineers have been systematically attracted by this romantic idea of changing the world with your brain and a laptop. We are at heart problem solvers.

As engineers, we can (and most of us have) become zealous at times about our solutions to these problems. We have pragmatists who just get stuff done - they address the symptom fast and effectively. We have idealists who will grind at an elegant scalable solution and try to treat the disease. Whichever camp you subscribe to, at a certain point you need to form an opinion about which technologies you are going to use to solve the problems you see in the world - and this opinion will inevitably cause contention.

Conventional wisdom is to 'use the right tool for the job'. The choice of programming language for example, depends on the domain of the problem you are trying to solve. If you're implementing some algorithm, in a secluded project, it's easy to make the case about what the language for the job may be. You can run a benchmark and literally test the execution time for each candidate language (if you're optimising for execution time). You can persuade yourself you've made a rational and 'objectively correct' decision.

However, in the context of building a business, your optimisation function is a high-dimensional mess involving performance, development velocity, hiring, server costs, ecosystem, tooling, support, licenses etc. You can assign weights to what is most important for your business, but at the end of the day the decision is inevitably qualitative.

At Synth, we're working on building the best data generator in the world. We made a conscious decision to use Rust for our main line of products. After more than a year of building I've had the opportunity to see Rust at its best and worst in the context of starting a company - this post is a compilation of these (at times cynical) thoughts.

Development Velocity#

Rust has a really steep learning curve. Coming from an OO background it took me months to become productive in Rust. This was incredibly frustrating for me as I felt that my lack of productivity was impacting the team, which it was. Even when you eventually do become productive (and you will), Rust forces you to really think deeply about what you're doing and things inevitably take longer to get over the line. A poorly thought out design decision today can come back to haunt you months later. What should be a simple change or refactor can end up resulting in complete tear down as you try to appease the borrow checker. This is deadly.

The entire premise of a startup is that you have to iterate rapidly. Very few companies know what they should be building from day one. It's an iterative process involving a feedback loop of talking to users and making changes to reflect the feedback. The faster you can make that feedback loop, the higher probability you have of success.

Correctness#

The evident hit in development velocity is redeemed to an extent by Rust's emphasis on writing correct programs. "if it compiles it works' so to speak. I've found this to be true for the most part while building with Rust and it is an absolute joy to work with for this reason.

Even if your program is not perfect, you understand the failure modes much better. The set of unknown failure modes is reduced substantially as your program breaks in exactly the way you expect it to. The lack of null pointers in conjunction with the Result paradigm (vs say, exceptions) compels you to build correct programs where edge cases are well understood and are handled explicitly by you (or unimplemented! but no one is perfect).

If you've reached product market fit - correctness may counteract the development velocity hit. When you know what you're building you need to iterate less. Your dev team is also going to be spending less time dealing with bugs as you've already dealt with that while trying to appease the compiler.

If it compiles it works - and this is an invaluable asset when you're aggressively shipping code.

Talent#

Getting great talent is unbelievably important for an early stage startup. The fact that the absolute number of competent and experienced Rust developers is so small initially seems detrimental to getting great people. This is exacerbated by Rust's steep learning curve as you need to hire someone with experience, or it's going to take months for them to become productive. However, this is not the full picture.

In our experience the competence of your average Rust developer is much higher than more conventional programming languages. Something spoke to these individuals when they picked up Rust, and it's hard to put your finger on it but it's that same quality that makes a great engineer. It's also been a pleasant surprise to find out that really good engineers will seek you out as an employer because you use Rust. They don't want to work in *script or Java or C++. They want to work with Rust because it's great.

Open Source#

At Synth, we've chosen to adopt an open-core business model. The idea behind an open-core business is you develop and open source product with a permissive license which solves a real technical problem. You work on building a user base, a community and a great product all out in the open. You then structure your business model around solving the corresponding organisational problem - and that's how you make money.

We've been really lucky to have a really active set of contributors - giving ideas, reporting bugs and contributing (at times very significant) code. It is hard to know for sure, but we have a strong hunch that a lot of the contributors are active because they have an interest in Rust projects specifically. A lot of our contributors are also interested in learning Rust - not necessarily being veterans of the language. This has worked out great as the more experienced members of our core team mentor and review code of young rustaceans, building a symbiotic positive feedback loop.

Thank you to all our contributors - you know who you are and you guys are amazing.

Libraries#

Rust has an ecosystem of incredibly high quality libraries. The Rust core team has led by example and focused on a high quality and tight standard library. The result of a highly focused standard library is (unfortunately) a lack of canonical libraries for doing things outside the standard library. So you want a webserver, pick from one of the 100s available. You want a crate ( Rust lingo for library) for working with JWT tokens? Here's 9, pick one. I mean, even something as fundamental as an asynchronous runtime is split between tokio and async-std and others. As a young rustacean this can be overwhelming.

What ends up happening over time is certain libraries become implicitly canonical as they receive overwhelming support and start becoming serious dependencies differentiating from their alternatives. Also in a project update from RustConf 2021 it was mentioned that the idea of having 'recommended crates' may be visited in the future.

The lack of canonical non-standard libraries is an issue when you're getting started - but over time this diminishes as you get a better understanding of the ecosystem. What has been constantly detrimental to our development velocity has been the lack of client libraries for Rust. We've had to write a bunch of different integrations ourselves, but they're often clunky as we don't have the time to invest in making them really high quality. For example most of Google's products have at best an unofficial code-generated crate maintained by the community, and at worst absolutely nothing. You need to write it from scratch.

Should you build your startup with Rust?#

Well it depends. Assuming you're building a product in the right domain for Rust (say a CLI as opposed to a social media site), even then the answer is not clear-cut. If you don't have close to 100% conviction that you know what you're building, I would be inclined to say no. Development velocity and being able to make rapid iterations is so important for an early stage startup that it outweighs a lot of the benefits that Rust brings to the table.

If your company is later stage, and you now understand exactly what you should be building (assuming this is every the case) then I would say yes. The 'correctness' of Rust programs and the propensity of Rust to attract great engineers can help in building a great engineering culture and a great company.

Complex Procedural Rust Macros

August 9, 2021 · 8 min read

Andre Bogus

Chief Rustacean

I recently wrote the most complex procedural Rust macro I’ve ever attempted. This post tries to outline the problems I’ve encountered and tells how I overcame them.

The Background#

With synth, we are building a declarative command line test data generator. For now, the specification that declares what test data to build is just JSON that gets deserialised to our data structures using serde_json. This was a quick and easy way to configure our application without much overhead in terms of code. For example:

{    "type": "array",    "length": {        "type": "number",        "constant": 3    },    "content": {        "type": "object",        "id": {            "type": "number",            "id": {}        },        "name": {            "type": "string",            "faker": {                "generator": "name"            }        },        "email": {            "type": "string",            "faker": {                "generator": "ascii_email"            }        }    }}

However, it’s also not very nice to write (for example JSON has no comments, no formulas, etc.), so we wanted to bind our specification to a scripting language. Our end goal is to extend the language (both in terms of builtin functions and syntax) to make the configuration really elegant. After some testing and benchmarking different runtimes, our choice fell on koto, a nice little scripting language that was built foremost for live coding.

Unfortunately, koto has a very bare interface to bind to external Rust code. Since we are talking about a rather large number of types we want to include, it was clear from the start that we would want to generate the code to bind to koto.

Early Beginnings#

So I started with a somewhat simple macro to wrangle koto types (e.g. Maps) into our Rust types. However, I soon found that the marshalling overhead would have been fine for an initial setup phase, but not for recurrent calls into koto (for example for filter functions called for each row). Thus I changed my approach to try and bind functions, then extended that to types and impl blocks.

I found – as I then thought – a genius technique of generating functions that would call each other, thus daisy-chaining a series of code blocks into one that could then be called with another bindlang_main!() proc macro:

static FN_NUMBER: AtomicUsize = AtomicUsize::new(0);
fn next_fn(mut b: Block, arg: &Expr) -> Item {    let number = FN_NUMBER.fetch_add(1, SeqCst);    let this_fn = fn_ident(number);    if let Some(n) = number.checked_sub(1) {        let last_fn = fn_ident(n);        b.stmts.push(parse_quote! { #last_fn(#arg); });    }    b.stmts.extend(last_call);    parse_quote! { fn #this_fn(#arg) { #b } }}
#[proc_macro]fn bindlang_main(arg: TokenStream) -> TokenStream {    let arg = ident(arg.to_string());    TokenStream::from(if let Some(n) = FN_NUMBER.load(SeqCst).checked_sub(1) {        let last_fn = fn_ident(n);        quote! { #last_fn(#arg) }    } else {        proc_macro2::TokenStream::default()    })}

I also wrote a derive macro to implement the marshalling traits. This worked well for a small example that was entirely contained within one module, but failed once the code was spread out through multiple modules: The functions would no longer be in the same scope and therefore stopped finding each other.

Worse, I needed a number of pre-defined maps with functions for method dispatch for our external types within koto. A type in Rust can have an arbitrary number of impl blocks but I needed exactly one table, and I couldn’t simply daisy-chain those.

It was clear I needed a different solution. After thinking long and hard I came to the conclusion that I needed to pull all the code together in one scope, by the bindlang_main!() macro. My idea was that I create a HashMap of syn::Items to be quoted together into one TokenStream. A lazy static Arc<Mutex<Vec<Context>>> was to collect the information from multiple attribute invocations:

#[derive(Default)]struct Context {    bare_fns: Vec<MethodSig>,    modules: HashMap<String, Vec<MethodSig>>,    vtables: HashMap<String, Vec<MethodSig>>,    types: HashMap<String, String>,}lazy_static::lazy_static! {    static ref CONTEXT: Arc<Mutex<Context>> = Arc::new(Mutex::new(Context::default()));}
#[proc_macro_attribute]pub fn bindlang(_attrs: TokenStream, code: TokenStream) -> TokenStream {    let code_cloned = code.clone();    let input = parse_macro_input!(code_cloned as Item);    // evaluate input here, and store information in Context    // CONTEXT.lock().unwrap()...    code}

This was when I found out that none of syn's type is Send and therefore cannot be stored within a Mutex. My first attempt to circumvent this was moving everything to Strings and using syn::parse_str to get the items out. This failed because of macro hygiene: Each identifier in Rust proc_macros has an identity. Two identifiers resulting from two macro operations will get different identities, no matter if their textual representation is the same.

I also found that proc_macro_derives have no way to get the #[derive(..)] attribute of the type, and I wanted to also bind derived trait implementations (at least for Default, because some types have no other constructors). So I removed the derive and moved the implementation to the #[bindlang] attribute macro, which now works on types, impl blocks and fns.

Beware: This makes use of the (unspecified, but as of now working) top-down order of macro expansion to work!

Dirty tricks avoided#

There is a Span::mixed_context() variant that will yield semi-hygienic macros (like with macro_rules). However, this looked risky (macro hygiene is there to protect us, so we better have a good reason to override it), so I took the data oriented approach, collecting the info I needed to create the code in the lazy_static to walk within bindlang_main!(). I still tried to generate the trait impls for marshalling directly in the attribute macro, but this again ran into macro hygiene trouble, because I could not recreate the virtual dispatch table identifiers. After moving this part to the main macro, too, the macro finally expanded successfully.

Except it didn’t compile successfully.

I had forgotten to use the items I was creating code for in the macro, and koto requires all external types to implement Display. So I added those imports as macro arguments and added the Display impls to be met with a type inference error within the macro invocation. Clearly I needed some type annotations, but the error message only showed me the macro invocation, which was pretty unhelpful.

Expanding Our Vision#

My solution to debug those was to cargo expand the code, comment out the macro invocation and copy the expanded portions in its place so that my IDE would pinpoint the error for me. I had to manually un-expand some format! invocations so the code would resolve correctly, and finally found where I needed more type annotations. With those added, the code finally compiled. Whew!

I then extended the bindings to also cover trait impls and Options, while my colleague Christos changed the code to marshall Rust values into koto values to mangle Result::Err(_) into koto’s runtime errors. Remembering that implicit constructors (structs and enum variants) are also useful, I added support to binding those if public. There was another error where intermediate code generated wouldn't parse, but some eprintln! debugging helped pinpoint the piece of code where it happened.

When trying to bind functions taking a non self referenced argument (e.g. fn from(value: &Value) -> Self), I found that the bindings would not work, because my FromValue implementation could not get references. Remember, a function in Rust cannot return a borrow into a value that lives only within it. It took me a while to remember I blogged about the solution in 2015! Closures to the rescue! The basic idea is to have a function that takes a closure with a reference argument and return the result of that closure:

pub trait RefFromValue {    fn ref_from_value<R, F: Fn(&Self) -> R>(        key_path: &KeyPath<'_>,        value: &Value,        f: F,    ) -> Result<R, RuntimeError>;}

Having this in a separate trait allows us to distinguish types where the borrow isn't &T, e.g. for &str. Also we gain a bit of simplicity by using unified function call syntax (MyType::my_fn(..) instead of v0.my_fn). This also meant I had to nest the argument parsing: I did this by creating the innermost Expr and wrap it in argument extractors in reverse argument order:

let mut expr: Expr = parse_quote! { #path(#(#inner_idents),*) };for (i, ((a, v), mode)) in idents.iter().zip(args.iter()).enumerate().rev() {    expr = if mode.is_ref() {        parse_quote! {             ::lang_bindings::RefFromValue::ref_from_value(                &::lang_bindings::KeyPath::Index(#i, None),                #a,                |#v| #expr            )?        }    } else {        parse_quote! {            match (::lang_bindings::FromValue::from_value(                &::lang_bindings::KeyPath::Index(#i, None),                #a            )?) {                 (#v) => #expr            }        }    };}

Note that match in the else part is there to introduce a binding without requiring a Block, a common macro trick. Now all that was left to do was add #[bindlang] attributes to our Namespace and its contents, and also add a lot of Display implementations because koto requires this for all ExternalValue implementors.

In conclusion, our test configuration should now look something like:

synth.Namespace({    synth.Name("users"): synth.Content.Array (    {            synth.Name("id"): synth.Content.Number(NumberContent.Id(schema.Id)),            synth.Name("name"): synth.Content.String(StringContent.Faker("firstname", ["EN"])),            synth.Name("email"): synth.Content.String(StringContent.Faker("email", ["EN"])),    },    synth.Content.Number(NumberContent.Constant(10))    })})

That's only the beginning: I want to introduce a few coercions, a custom koto prelude and perhaps some syntactic sugar to make this even easier to both read and write.

The Takeaway#

Macros that collect state to use later are possible (well, as long as the expansion order stays as it is) and useful, especially where code is strewn across various blocks (or even files or modules). However, if code relies on other code, it best be emitted in one go, otherwise module visibility and macro hygiene conspire to make life hard for the macro author. And if at one point the expansion order gets changed in a way that breaks the macro, I can change it to a standalone crate to be called from build.rs thanks to proc_macro2 being decoupled from the actual implementation.