mda-rs: Custom Mail Delivery Agents in Rust

I had been stubbornly using procmail for two decades to filter my email, before deciding that it's finally time for a change. Procmail served me well through the years, but, little by little, the annoyances started adding up. The fact that procmail is unmaintained since 2001 inspired less and less confidence as years passed, and the requirement in many cases for external processing before any useful matching could be performed (e.g., dealing with MIME content transfer encoded data, or non us-ascii character sets) was a constant pain point. As the complexity of my rules grew, I even had to resort to an external program (a custom C++ program in my case) to perform some of the mailbox selection logic, using the AUTOFOLDER feature of procmail.

At that point, and given all the functionality that I had to implement myself, I seriously started questioning the value procmail was providing to me and started looking for alternatives. I evaluated fdm and maildrop, finding in both a lot that I liked; first and foremost a less arcane filtering language. In the end, I found maildrop to be a closer match to my preferences and requirements, and I especially appreciated the automatic MIME content decoding.

I briefly considered switching to maildrop, but a few experiments on my set of filtering rules indicated that maildrop's performance was significantly worse compared to procmail, even though for procmail I had to call out to a few more external programs to achieve similar functionality. I also gave fdm a go, and it was even slower than maildrop. I receive a lot of emails each day, mostly from various FOSS mailing lists, and the performance penalty adds up. While waiting for a few dozen seconds daily wouldn't have been the end of the world, the thought that my original and much older solution was faster, disturbed me.

Meanwhile, I noticed that the control flow statements in maildrop's filtering language were similar to that of a general purpose programming language, and, in fact, with the integration with regular expressions, they looked a bit like perl. So, I thought, what if instead of using a domain specific language, I could use a general purpose language, augmented by some kind of library to implement the same filtering and delivery functionality. My thoughts then drifted to other things, but the seed of that idea had already been planted in my mind it seems, as a few years later I found myself implementing the code that would become mda-rs.

With mda-rs I wanted to create an experience as close as possible to using an interpreted domain specific language, the approach follow by typical MDAs, while still having the performance and power of a full, compiled language at the fingertips. One aspect of this experience was providing an API that feels like natural fit for the intended purpose. The other aspect was providing a straightforward way to build a custom MDA. For this second aspect, the simplicity of Rust's cargo was one of the reasons I decided to use Rust for this project.

Another decision catering to a similar ease-of-use goal was that the user shouldn't be required to use external programs to transform the data just so they could perform basic matching. To this end, mda-rs, like maildrop, normalizes the email before processing, by decoding and converting all text MIME content parts (including MIME encoded-words in headers) to plain UTF-8.

Of course, I also wanted the resulting custom MDAs to be fast; performance was my main disappointment with other MDAs after all. Performance requires an efficient implementation, but also and an API that encourages performant use. An example of the effect of such a concern on the mda-rs API, is that the API provides access to the header fields by name, so that one can perform targeted per-header-field string searches, which can be much faster than regular expression searches of the whole header.

Finally, an important requirement for all MDAs is that the email delivery is durable, i.e., that no email will be lost in case of a system failure. In other words, at any point in time at least one of the local filesystem and the email server should have a complete and fully accessible copy of the email.

While investigating the best way to provide such durability guarantees, I noticed that all three MDAs mentioned previously are fsyncing the written email file, as expected. However, they are not fsyncing the containing directory, which could potentially lead to the file not appearing on the filesystem 1. The exact guarantees in this scenario seem to be dependent on the filesystem, but, to provide maximum durability, mda-rs fsyncs both the file and the containing directory by default. This does incur a performance penalty, so I have also included an option to use the file-sync-only approach if preferred.

Since mda-rs was written to cater primarily to my personal needs, it currently has some limitations, or unimplemented functionality, if you will. The most prominent one is that it delivers to maildir only, since that's the only mailbox format I use. The second is that there is no built-in way to change the email data (e.g., add a header field) except by filtering through an external tool, since this is another feature I don't use.

Here is a small taste of how a custom MDA would look like with mda-rs:

use std::path::PathBuf;

use mda::{Email, EmailRegex, Result, DeliveryDurability};

fn main() -> Result<()> {
    let root = PathBuf::from("/tmp/my-personal-mail");

    let mut email = Email::from_stdin_filtered(&["/usr/bin/bogofilter", "-ep"])?;
    // Use quicker (but possibly less durable) delivery.
    email.set_delivery_durability(DeliveryDurability::FileSyncOnly);

    let from = email.header_field("From").unwrap_or("");
    let bogosity = email.header_field("X-Bogosity").unwrap_or("");

    if bogosity.contains("Spam, tests=bogofilter") ||
       from.contains("@banneddomain.com") {
        email.deliver_to_maildir(root.join("spam"))?;
        return Ok(());
    }

    let cc = email.header_field("Cc").unwrap_or("");
    let to = email.header_field("To").unwrap_or("");

    if to.contains("myworkemail@example.com") ||
       cc.contains("myworkemail@example.com") {
        if email.body().search("URGENCY RATING: (CRITICAL|URGENT)")? {
            email.deliver_to_maildir(root.join("inbox/myemail/urgent"))?;
        } else {
            email.deliver_to_maildir(root.join("inbox/myemail/normal"))?;
        }
        return Ok(());
    }

    email.deliver_to_maildir(root.join("inbox/unsorted"))?;

    Ok(())
}

and a corresponding minimal Cargo.toml:

[package]
name = "my-mda"
version = "0.1.0"
edition = "2018"

[dependencies]
mda = "0.1"

To provide an idea of the performance gains to expect, I benchmarked a us-ascii only version of my personal mail filtering rules on a sample of 250 of my recently received emails using all the aforementioned MDAs. Of course, the performance results will vary depending on both the rules and the email themselves, but in my experience what is presented below seems to be a typical outcome. I'd be very interested to know how mda-rs worked for others on the performance front.

In the benchmark I included runs both with and without filtering for spam, since such filtering takes up a significant amount of processing and affects the relative results. For mda-rs I included two versions, one that mostly uses per-header-field searches, and a second one that uses regular expressions exclusively, and thus is a bit closer to how the other MDAs work. For each mda-rs version I ran the benchmark both in file sync only mode, which is how the others MDAs work, and full sync (file and directory) mode, which is the default for mda-rs. Note that, for this benchmark, both maildrop and mda-rs performed MIME decoding and translation to UTF-8 (internally and automatically), whereas neither procmail nor fdm did so, and no external program was used to provide such functionality. I verified that the delivery results were the same for all MDAs.

mda-benchmark

mda-rs wins hands down when operating in file sync mode, while at the same time doing more work (normalizing the email) compared to the next faster contestant, procmail. In full sync mode, the extra directory syncs prove to be expensive enough to overshadow other mda-rs performance gains, and procmail takes the lead. Still, even in this case, the performance is acceptable and much better compared to both maildrop and fdm.

If you would like to learn more about how to use mda-rs, you can find detailed documentation at: https://docs.rs/mda

The mda-rs project repository is at: https://github.com/afrantzis/mda-rs

The mda-rs crates.io entry is at: https://crates.io/crates/mda

Enjoy!

1 See https://lwn.net/Articles/457667/