Rasmus.krats.se

Reminiscing this and that, on the web since 1995

A compiling template system in Rust

Published tagged , , , .

When developing web applications, it is often useful to have a template system. Something that lets you write generic versions of web pages, that the application can fill with the specific content for each page it should show. There exists lots of "languages" to write such templates, such as mustache, jinja2, and play 2 scala templates (twirl).

Most fits very well with a dynamic language, where you can get properties from an object, or even call a method, by its name in a plain string. In a statically compiled language, the actual names of fields and methods are not relevant, and generally not present, after compilation. This makes a "dynamic" template language a hard match for a compiling language such as rust. So why not try to create a better match?

The rust implementation of mustache uses the rust serialization API (in a way I haven't decided yet if I think is horrible or beautiful) to be able to get properties from names in run-time. It does have the drawback of only being able to access things that are part of the serialization of an object, and only in the serialization format.

However, templates are hardly changed while the app is running, so why should they be dynamically parsed and evaluated at all? Why not have the templates compiled to code?

So I decided to try that. Rust Compiled Templates, or ructe is my attempt to create a template language where the templates are actually compiled and included in the binary program code. The design criteria of the project are listed at the project page on github. I'll walk through them below, but first a basic overview of what I have done and how to use it:

Usage and implementation

Ructe is intended to be called from a build script in your project. It parses your templates (using the nom parser library) and writes a rust source file containing a module with one function for each template. The generated file can then be include!d in your main.rs (or another module of your choice).

A sample template for a post and its comments in a blog engine (everyone's favourite example) may look something like this, in a file named blogentry.rs.html:

@use models::blog::{Entry, Comment};

@(entry: &Entry, comments: &[Comment])

<!doctype html>
<html>
  <head>
    <title>@entry.title - my blog</title>
  </head>
  <body>
    <h1>@entry.title</h1>
    @Html(entry.content)
    @for comment in comments {
      <h2>Comment by @comment.author</h2>
      @Html(comment.content)
    }
  </body>
</html>

The template contains of three distinct parts. The first line is a preamble. Any number of lines that start with an @ sign and ends with a semicolon will be copied to the rust code (except the @ sign).

The second non-empty line declares the template arguments. In this case, the templates takes two arguments, entry that is a reference to an Entry and comments that is a reference to a slice of Comments. The generated function takes a &mut Write to write the resulting html code on and the parameters you want your template to take, so the resulting rust function will be:

fn blogentry(out: &mut Write, entry: &Entry, comments: &[Comment]) -> io::Result<()> {
  ...
}

The last, and most important, part is the template body itself. Things that start with @ in the body are special, everything else is just written to the output destination when the template are called. The @entry.title is simply the field title of the parameter entry. If the Entry type does not have that field, an error will be reported when compiling the generated code. Next, @Html(entry.content) is also a simple expression; content is a field of entry just like title. However, while title contains any string data, content in my fictional blog software contains markup. To avoid double escaping, entry.content is wrapped in the Html type (which is provided in the generated code by ructe).

Up next is a for loop. It will be a very similar loop in the generated code, the only difference is that the body to loop over is template code and not rust code, so it will be converted in the same way as the body of the template itself.

The generated code for the above template is:

mod template_sample { 
    use std::io::{self, Write}; 
    #[allow(unused)] 
    use ::templates::{Html, ToHtml}; 
    use models::blog::{Entry, Comment}; 

    pub fn sample(out: &mut Write, entry: &Entry, comments: &[Comment]) 
    -> io::Result<()> { 
        try!(write!(out, "<!doctype html>\n<html>\n  <head>\n    <title>")); 
        try!(entry.title.to_html(out)); 
        try!(write!(out, " - my blog</title>\n  </head>\n  <body>\n    <h1>")); 
        try!(entry.title.to_html(out)); 
        try!(write!(out, "</h1>\n    ")); 
        try!(Html(entry.content).to_html(out)); 
        try!(write!(out, "\n    ")); 
        for comment in comments { 
            try!(write!(out, "<h2>Comment by ")); 
            try!(comment.author.to_html(out)); 
            try!(write!(out, "</h2>\n      ")); 
            try!(Html(comment.content).to_html(out)); 
            try!(write!(out, "\n    ")); 
        } 
        try!(write!(out, "\n  </body>\n</html>\n")); 
        Ok(()) 
    } 
} 
pub use ::templates::template_sample::sample;

The generated code for a template is wrapped in a mod so the use directives from one template won't be duplicated by directives from another template. (Maybe this module name should start with an underscore?) The two first use directives (from the standard library and from ructe-provided code) are always present. After them comes any use directives from the template itself, followed by the actual template method. After the local module, the template method from it is used, so use code won't have to know or care about the local module.

As we see, the verbatim parts of the template are simply written to out. Expressions are written by calling to_html(out) on them. The to_html method is declared by the ToHtml trait provided by ructe. The trait has two implementations: One for Html<T> where T: Display that simply writes the T to out, as Html(value) is used to signify that the value is preencoded html and should be written as is. The other is for any T: Display and encodes any <, >, and & characters as xml entities &lt;, &gt;, and &amp;.

Design criteria and current status

As many errors as possible should be caught in compile-time. This is true by the basic design, but the current error messages leaves a lot to wish for. Depending on the error, either ructe will fail to compile the template or it will generate a templates.rs that will fail to compile. In the first case, the error message is not much better than "failed to parse template". In the second case, the error messages of the rust compiler are actually rather good, but will refer to the generated rust code and not to your template. I'm not sure how to improve this. Maybe there exist something like the file name / line number markers that the old cpp outputs that I may use?

A compiled binary should include all the template code it needs, no need to read template files at runtime. This is true by the basic design. Templates are compiled and optimized before the program is started.

Compilation may take time, running should be fast. This is somewhat true by the basic design. I guess the generated code could be more efficient in some way, but not in any way that is obvious to me. This is the reason that the template functions takes a &mut Write to write on, rather than just creating and returning a String.

Writing templates should be almost as easy as writing html. I think the current template format is intuitive and ergonomic, but that does not mean it is intuitive to everyone … Comments are welcome!

The template language should be as expressive as possible. Any object / property / function result that implements Display (or templates::ToHtml) can be outputted. The format!() macro can be used. Loops and conditionals (including pattern-matching if let) are supported. Templates can call other templates.

It should be possible to write templates for any text-like format, not only html. In theory, this is no problem. But the default character escaping (& becomes &amp;, etc) is for html. Maybe a future version can support other escaping (e.g. for javascript, json, etc)?

Any value that implements the Display trait should be outputable. Yup, no problem.

By default, all values should be html-escaped. There should be an easy but explicit way to output preformatted html. There is a Html type that wraps any Display object and implies that the wrapped object is ready-made html code (or actually, that formatting the object will yield ready-made html).

Remaining challenges

Of course, work remains to be done before ructe is ready for prime time. I think there are four major areas that needs to be improved:

As mentioned above, error messages from template parsing needs to be improved, and error messages from compiling the generated code needs to be tied to the template source, if at all possible.

Currently ructe, inspired by twirl, uses a special character (@) to signal the start of an expression, but nothing special to signal the end of an expression. This is very nice when writing simple expressions surrounded by whitespace, tags, or anything else that doesn't look like a part of an expression, but it requires ructe to know the syntax of valid rust expressions and it is problematic if an expression should be immediately followed by something that looks like an expressions.

Should support for an optional — or required — end-of-expression marker be added? Maybe something like {=expression}? Otherwise, the expression parser will need to be improved. If I do and an end-of-expression marker, it will change the syntax of valid templates, so it should probably be decided one way or the other as soon as possible.

The generated code from ructe is currently a single file. This is good because the file can then be include!d in a single place, but it may result in a very large file, which may be impractical. Maybe a build script can tell cargo about an extra path to look for source code in? I could just write files to a subdirectory of the actual src directory, but that doesn't feel clean.

What is the best way to present documentation for the template language itself? Rustdoc is great for rust API:s, but maybe not for documenting a separate language. On the other hand, having the documentation available on docs.rs in the format rust developers are used to might be worth simply writing the documentation as one large docstring for a module. Currently, what little documentation that exists is in the main README of the git repo.

Comments

Jethro Beekman,

Nice work!

Regarding multiple output files: you can place all of them in $OUT_DIR and make one index file e.g. $OUT_DIR/templates.rs. This templates.rs will contain statements like `mod a;` and `mod b;`. These will, as normal, refer to files $OUT_DIR/a.rs and $OUT_DIR/b.rs respectively, when writing e.g. include!(concat!(env!("OUT_DIR"), "/templates.rs")); in your main source file somewhere.

Regarding "The generated code for a template is wrapped in a mod so the use directives from one template won't be duplicated." You don't have use modules for this. You can use `use` inside a fn.

Rasmus Kaj,

Thanks, Jethro! I had no idea modules worked relative to the directory containing an included file like that! It sounds a bit magical but very practical, I'll have to experiment with that.

Using `use` inside a fn is actually something I have done, just didn't think of it in this context.

So thank you kindly for your suggestions, I will look into them during this weekend.

Rasmus Kaj,

Thanks again for the suggestion about how simple it should work with multiple files, Jethro. Yes, it does work and I have committed the change.

Since a separate file is also a mod, the placement of the use statements feels much more natural as well (and putting them inside the function doesn't work well for using types for function arguments).

Write a comment

Your name (or pseudonym).

Not published, except as gravatar.

Your homepage / presentation.

No formatting, except that an empty line is interpreted as a paragraph break.