LINUX MAGAZINEsubscribeadvertisecustomer serviceback issuesfeedbackcontacts

Newsletter
Email:
Name:

Sections
Editorial
Newbies
Reviews
Shutdown
How To
LAMP Post
Power Tools
Guru Guidance
On The Desktop
Best Defense
Developer's Den
Java Matters
Extreme Linux
Compile Time
Perl of Wisdom
Gearheads Only
Project of the Month
Who's Who
Downloads
Indexes

feedback
Linux Magazine / July 2003 / FEATURES
procmail/formail Examples
<< prev   page 1 2 3 4 5 next >>

FEATURES
Personal Post - Supplemental Pages
procmail/formail Examples
by Jerry Peek

The article introduces header editing with formail. This page has two more examples:

  • Catching Forwarded Messages: The user a@b.c who receives messages that were forwarded by one of your recipes wants to "catch" the forwarded messages and put them into a special mail folder. If you add a "flag" field as you forward the messages, she can add a recipe to her .procmailrc that tests for your flag.
  • Meeting-Minutes Mailer: These two procmail recipes work together to implement a system that emails meeting minutes to people who want them. It shows how to receive and process an incoming message, archive it to a file, then send the archived file to people who ask for it.

First, here's a note about procmail lockfiles:

Remember to use a lockfile (the trailing colon (:) -- possibly with a lockfile name after it) at the start of any recipe which writes to a local file (such as a mail folder). Why? That's because your MTA could start a second procmail session if a new message comes in before your first message has been processed; the lockfile keeps this recipe from being executed simultaneously.

In general, you won't need a lockfile if the recipe does something that can happen simultaneously. For instance, the recipe that forwards messages to groucho and friends simply starts sendmail to deliver the edited message; because Linux can manage multiple separate processes (like sed), and sendmail handles its own queuing, there's no need to lock this recipe.

Locking can slow down procmail. But use it as needed to avoid corrupted messages.

Catching Forwarded Messages

The article shows a recipe that forwards all of a user's email to the address a@b.c (except the mail that a@b.c originally sent to that user). Let's extend the example. We'll call that user Alan.

When the user a@b.c receives those forwarded messages from alan@bates.xyz, she wants to store them into a particular mail folder called alan-mail.

Remember that the forwarding recipe doesn't add a@b.c to the message header. (The delivery address is in the message envelope!) How can a@b.c "catch" Alan's messages and route them to a special folder on her account? (Testing for messages sent To: alan@bates.xyz isn't enough. Remember, that address may not be in the messsage header, either!) One easy answer is to add a "flag" field to the header, as you forward the message, that the recipient (here, a@b.c) can test for. To do that, Alan would add a second formail -A option. His forwarding recipe would change from this:

:0 c
* !^X-Loop: alan@bates.xyz
* !^from: .*a@b\.c
| formail -A"X-Loop: alan@bates.xyz" | \
  sendmail -oi -f bounces@bates.xyz a@b.c

to this:

:0 c
* !^X-Loop: alan@bates.xyz
* !^from: .*a@b\.c
| formail -A"X-Loop: alan@bates.xyz" \
    -A"X-forwarded-from: alan-for-a" | \
  sendmail -oi -f bounces@bates.xyz a@b.c

And the recipient (a@b.c) would add a procmail recipe that finds these messages and writes them into her mail folder alan-mail:

:0 :
* ^X-forwarded-from: alan-for-a
alan-mail

Adding specialized fields with names like X-something: is a good technique for automated mail processing. In theory, you should be able to choose any name starting with X-. In practice, though, some of of those names have become de facto standards. You might want to search through a bunch of your existing email messages (use a tool like grepmail) to see if the name is already used in other headers.

Meeting-Minutes Mailer

After each weekly meeting of his project group, jpeek writes the meeting minutes as an HTML file. He emails the file to his mail server, like this:

mail -s update minutes-update@foo.xyz < minutes030615.html

A procmail recipe processes messages sent to minutes-update on the foo.xyz server; it strips off the message header and saves the body in a file named /home/jpeek/lib/minutes.html.

When someone in Jerry's group wants a copy of the minutes, they send an email message addressed to minutes@foo.xyz. The server replies with the latest copy of the minutes.html file.

How does it work? There are a few pieces to this puzzle; let's look at them one by one.

Getting minutes-update and minutes Mail to the jpeek .procmailrc File

First, The mail server for the domain foo.xyz runs the sendmail MTA. It has two aliases that redirect mail for both minutes-update and minutes to the jpeek maildrop, like this:

minutes: jpeek
minutes-update: jpeek

Second, jpeek has two procmail recipes, one for each of those addresses. Since messages for both addresses are sent to his jpeek maildrop, the recipes are in the .procmailrc file in his home directory. The first recipe archives the mail sent to minutes-update, and the second recipe sends auto-replies for minutes. Notice that the recipes don't actually test the recipient's address (from the message envelope); they assume that the address is in the To: or the cc: header field. That's probably a safe assumption in this case, but (as the article explains) it's not always true!

The next two sections look at the two recipes.

The Procmail Recipe for minutes-update

Here's the recipe for minutes-update:

# Email to minutes-update@foo.xyz is aliased (in sendmail) to go
# to the jpeek mailbox.  The following recipe stores the
# message body into the file ~/lib/minutes.html.
# (See later recipe for minutes@foo.xyz; it sends out a copy.)
# Subject has to be "update"; size has to be < 100000 bytes
# (to avoid spam, mail loops, etc... I hope!).
# "c" flag means I always get a copy of update in my local mailbox.
:0 cw:/home/jpeek/lib/lockfile.minutes-update
* ^(to|cc): .*minutes-update@foo\.xyz
* ^subject: update$
* < 100000
| cd /home/jpeek/lib && \
  (mv -f minutes.html minutes.OLD.html ; \
  sed -e '1,/^$/d' > minutes.html )

The second (non-comment) line tests for the address minutes-update@foo.xyz in either the To: or the cc: header field. The third line adds a little security by requiring that the message subject have exactly one word: update. (The $ is a regular expression character that means "end of line.") The fourth line is a "sanity check", making sure that the recipe doesn't match a huge message (over 100,000 characters) sent to minutes-update -- in case there was some sort of bounced-mail loop. The last part is the shell command to execute. How does it work?

  • procmail starts a new instance of the Bourne shell to interpret this part of the recipe. This shell is independent of the shells that run other recipes; these commands don't conflict with or affect any other recipes.
  • The cd command changes the shell's current directory to /home/jpeek/lib. If this fails (because the directory isn't accessible, for instance), the other commands will be skipped. Why? It's the shell operator &&. This operator says "if the preceding command returns a non-zero status (which, to a shell, means the command failed) then abort the rest of this command line."
  • The first line ends with a backslash (\) to tell procmail that the command isn't finished. (You could enter the whole command on a single line, but breaking it with backslashes makes it easier for humans to understand.)
  • The rest of the command is inside subshell operators, ( ). (We could have used the curly-bracket command grouping operators, { }, instead.) This makes the preceding && operator apply to both commands inside the subshell. In other words, these two commands (mv and sed) will be executed only if the cd command succeeded. (If the current directory isn't lib, we don't want this recipe to rename a file or create a new file!)
  • The mv command renames any existing minutes.html file to minutes.OLD.html. The -f operator "forces" overwriting and asks mv not to complain if minutes.html doesn't exist.
  • The sed command writes the body of the email message -- which contains the meeting minutes -- into the file minutes.html. There are a few things to note here:
    • As usual, sed reads the text to edit from its standard input. The standard input comes from procmail; it holds the mail message that the recipe is currently proceessing.
    • How do we know that sed will read the message? That's because it's the first program in this command line that opens and reads its standard input. Neither cd or mv will read stdin, so we're (probably) safe here. If this trick makes you uncomfortable, you could start the shell command line with a command that collects the message into a temporary file, like this: cat > tempfile
    • The sed editing command is like the one we've seen before: it . The d command "deletes" (that is, doesn't print) the matched lines: that is, all lines in the header and the blank line after it. So sed outputs just the message body.
    • sed writes the edited text to its standard output. The shell's > operator redirects stdout to the file minutes.html.
The Procmail Recipe for minutes

The previous recipe (for minutes-update) is actually the simpler of the two recipes! ;-) Once you've learned to "read" these recipes and some common shell techniques used in them, though, it's not that tough. Let's look at the second recipe, the one that handles mail to the minutes address.

# Email to minutes@foo.xyz is aliased (in sendmail) to go
# to the jpeek mailbox.  From here, this auto-reply recipe sends
# minutes.html as an HTML message *if* recipient is on the list.
# "c" flag means I always get a copy of request in my local mailbox.
# Lockfile is needed because of the TEMPORARY tee command at the end.
:0 cw:/home/jpeek/lib/lockfile.minutes
* ^(to|cc): .*minutes@foo\.xyz
* !^X-Loop: minutes@foo\.xyz
# Only these addresses, or subject "Jerry says OK", are allowed:
* ^(subject: .*Jerry says OK)|((from|reply-to): .*(andy|don|zoe)@foo\.xyz|jpeek@jpeek\.com)
| (formail -r -A"From: minutes@foo.xyz" \
      -A"Reply-to: Jerry Peek <jpeek@foo.xyz>" \
      -A"X-Loop: minutes@foo.xyz" -A"MIME-Version: 1.0" \
      -A"Content-Type: text/html; charset=us-ascii" \
      -A"Content-Transfer-Encoding: 7bit" ; \
   cat /home/jpeek/lib/minutes.html \
  ) | tee -a /home/jpeek/lib/sendmail.input | \
  $SENDMAIL -oi -t

Overall, this recipe checks to be sure the sender is allowed to get a copy of the minutes. Senders who aren't "on the list" of approved recipients can also get the minutes by sending a message with a Subject: field containing the phrase "Jerry says OK". Then formail builds an automatic reply message, we tack on the body and use sendmail to send the message. Let's go through it step-by-step:

  • The recipe starts by specifying a lockfile. We need a lockfile here because there's a tee command at the end of this recipe which appends a copy of the outgoing message to the sendmail.input log file. If we didn't lock this recipe, there's a chance that two people could request an itinerary at the same time -- and the the recipe could be executed twice, simultaneously, which could make a mess of the sendmail.input log file. So, if you don't use the tee command in your recipe, you can omit the w:/home/jpeek/lib/lockfile.minutes too.
  • Notice that we're escaping the dots (.) in the addresses by putting a backslash (\) before them. That's because, in a regular expression, a dot matches any single character. (So, minutes@foo.xyz could conceivably match an address like minutes@foodxyz. It isn't likely, but it's possible!) Escaping the dots means they'll only match literal dots.
  • We're using the loop-stopping X-loop header field. (See the article for more info.)
  • The expression starting with ^(subject: is a bit complex. (If you aren't familiar with the "extended" expressions that procmail understands, see a good book about Linux or regular expressions for details. The July, 2002 article Hitting the Motherlode, online at http://www.linux-mag.com/2002-07/regex_01.html, discusses Perl regular expressions, which are similar.) This expression matches in any of the following cases:
    • The Subject: field contains the phrase Jerry says OK. (The regular expression .* matches any number of characters, so that phrase could occur after several spaces, in brackets, and so on.)
    • Either the From: or the Reply-to: header field has one of the addresses andy@foo.xyz,don@foo.xyz, zoe@foo.xyz, or jpeek@jpeek.com.
  • The first part of the shell command is inside subshell operators. We're using a subshell for something different than we did in the first recipe. Here we're using it to collect the output of both the formail and cat processes and pipe them, together, to the following commands: tee saves a copy of our reply and sendmail sends it. Ready for the details? Let's dig in:
      • The handy formail option -r creates an "auto-reply" message. It reads the incoming message from its standard input (which -- as we saw earlier -- procmail pipes to the shell that's running these commands), searches for the sender's address, puts that address to a To: field, and writes the result to its standard output. Formail also looks for a Message-ID: field and writes it to both References: and In-reply-to: fields; this creates the message thread, as explained in the article's sidebar about Message-ID: fields.
      • We're using formail -A options to add a lot of header fields, one by one. Notice the Reply-to: address: if someone who receives the minutes tries to reply to the author, this makes their reply go to the human (jpeek@foo.xyz) who originally wrote them, instead of back to this recipe (which would ignore their message and simply send them another copy of the minutes!). The last three header fields we add are standard MIME. Among other things, they tell MUAs that the message body is in MIME text/html format.
      • When formail has finished, the standard output of the subshell has the complete message header -- followed by a blank line. The Linux cat command adds the message body by writing the minutes.html file to standard output.
      • After the closing subshell operator, we've added a pipe to the Linux tee command with its option -a. This reads all of the subshell output from the pipe, appends all of the message to the file /home/jpeek/lib/sendmail.input, and also writes the message to its standard output. (tee is a "pipe-fitting" command designed for just this use: to save a copy of data as it streams through a pipeline.)
      • Finally (whew!), sendmail reads the completed message from its standard input and sends it. The -t flag tells sendmail to parse the message header and find the recipients' address on To:, cc:, and bcc: header fields.
      • Note that we aren't using the same special "bounce" option (like sendmail -f bounces@foo.xyz) discussed in the article. Why not? That's because bounces won't come back to the minutes address; the default envelope sender address is the user who's running sendmail -- which, in this case, is jpeek. So, if this message bounces, even though its From: address is minutes@foo.xyz, the bounce notification will be sent to jpeek@foo.xyz.

This second recipe is complex enough that you might want to put it in a separate shell script file. That could be a good idea if you want to make the script more "bulletproof" by adding more sanity checks.

In Summary (Not Quite...)

There's room for mischief in both of these minutes recipes. For instance, as we saw in the article, almost anyone can set almost any From: or Reply-to: address in their message header... so the address check in this recipe is easy for someone malicious to bypass. These fairly simple recipes aren't great for high-security applications; you might want a dedicated list server or some other setup designed for emailing files. But we hope they give you an idea of what you can do with a combination of procmail, formail, sed and your MTA.

Sending MIME mail, sooner or later

Here's a comment about the interaction between the two recipes.

Note that jpeek uses the simple Linux mail utility to send the HTML file as plain text in the body of a message to minutes-update. That makes extraction simple: the first recipe uses a one-command sed expression to put the message body into the minutes.html file. There are no MIME body-part headers to remove, as there would have been if he used an MUA that put the HTML into a MIME text/html body part.

On the flip side, though, because of the way that jpeek sends the message, the second recipe has to do more work before it can send the minutes as an HTML file: it has to encode the HTML file as a MIME text/html body part. This is one purpose of the example: to show how to make a MIME message with formail.

If you actually make a setup like this, though, consider whether the person who updates the minutes should send them in MIME format instead of plain text. If the minutes are pre-formatted into MIME, the first recipe can simply save the message into a file, with a little formail cleanup to remove header fields like To:,Subject:, and Received:. (If we didn't remove those header fields, people who receive the minutes could figure out the "secret" method that jpeek uses to update them.) The second recipe could simply use formail to generate an auto-reply message header and append the message body; the result is passed to sendmail (or another MTA) and re-sent. The second recipe doesn't need to enccode as MIME because the message jpeek originally sent was already in MIME format. Here are those revised recipes:
# Email to minutes-update@foo.xyz is aliased (in sendmail) to go
# to the jpeek mailbox.  This recipe stores the message header
# (without many header fields) and body in the file ~/lib/minutes-email.
# (See the later recipe for minutes@foo.xyz; it sends out a copy.)
# Subject has to be "update"; size has to be < 100000 bytes
# (to avoid spam, mail loops, etc... I hope!).
# "c" flag means I always get a copy of update in my local mailbox.
:0 cw:/home/jpeek/lib/lockfile.minutes-update
* ^(to|cc): .*minutes-update@foo\.xyz
* ^subject: update$
* < 100000
| cd /home/jpeek/lib && \
  (mv -f minutes-email minutes-email-OLD ; \
  formail -bfIReceived -IFrom -IReply-to \
  -ITo -ISubject > minutes-email )

# Email to minutes@foo.xyz is aliased (in sendmail) to go
# to the jpeek mailbox.  From here, this auto-reply recipe resends
# minutes-email *if* recipient is on the list.  The formail -r
# flag generates a blank line after the header; we use sed to
# delete that blank line, then cat to append the original message
# header and body.  Finally, sendmail sends the glued message.
# "c" flag means I always get a copy of request in my local mailbox.
:0 cw
* ^(to|cc): .*minutes@foo\.xyz
* !^X-Loop: minutes@foo\.xyz
# Only these addresses, or subject "Jerry says OK", are allowed:
* ^(subject: .*Jerry says OK)|((from|reply-to): .*(andy|don|zoe)@foo\.xyz|jpeek@jpeek\.com)
| (formail -r -A"From: minutes@foo.xyz" \
      -A"Reply-to: Jerry Peek <jpeek@foo.xyz>" \
      -A"X-Loop: minutes@foo.xyz" | \
   sed '/^$/d'; \
   cat /home/jpeek/lib/minutes-email \
  ) | $SENDMAIL -oi -t

[To return to the place where you left the article, use your browser's "back" button. You can also go to the start of the article.]


feedback
<< prev   page 1 2 3 4 5 next >>
Linux Magazine / July 2003 / FEATURES

LINUX MAGAZINEsubscribeadvertisecustomer serviceback issuesfeedbackcontacts