Linux Magazine | July 2003 | FEATURES

The article introduces a procmail recipe that searches message bodies for an encoded virus. When it finds one of these suspicious messages, it archives the message and emails you a notice.

This webpage goes into detail about a similar but more general recipe: one to catch all email messages that contain an executable .exe file for Microsoft Windows -- whether or not the MIME type actually identifies the file correctly. (A favorite technique of these crackers is to embed executable files but declare them as some other type, such as audio .wav files. This recipe won't be fooled... at least, not that way.)

First, here are some lines from the header and body of one of these virus messages:

From: Your friend <safawrw3rjj@sfsdfa.com>
To: jpeek@jpeek.com
MIME-Version: 1.0
Content-Type: multipart/mixed;
  boundary="--U395206Hts695282E4YrpGx8Y930bO2xI79"
Subject: Jerry, please check this out!

--U395206Hts695282E4YrpGx8Y930bO2xI79
Content-Type: audio/x-wav; name=Com.bat
Content-Transfer-Encoding: base64
Content-ID: <MsB0jeSy>

TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAuAAAAA4fug4AtAnNIbgBTM0hVGhpcyBwcm9ncmFtIGNhbm5vdCBiZSBydW4gaW4gRE9TIG1v
ZGUuDQ0KJAAAAAAAAAC3Egfb83NpiPNzaYjzc2mIGmxkiPJzaYhSaWNo83NpiAAAAAAAAAAAAAAA
  ....

EXEFILE=caught_exes.gz
:0B :
* ^TVqQAAMA
| gzip >> $EXEFILE ; \
  (echo "  Messages in $EXEFILE file:"; \
  zcat $EXEFILE | grep "^From ") | \
  mail -s "NOTICE: Windows EXE file caught?" $LOGNAME

The first .procmailrc line (which actually isn't part of the recipe; procmail evaluates it before any recipes) sets an environment variable with the name of the virus archive. It's a compressed file in GNU gzip format.

This recipe matches a line from the message files: the first line from the encoded body. Because the file is in the message body, we've added the procmail B flag so procmail won't search only the message header (which is the default).

The recipe needs a lockfile because it's writing to a file (the file caught_exes.gz). Here, a simple colon (:) at the end of the flags line is enough: procmail will use a default lockfile name created from caught_exes.gz (which is the first filename past the redirection character in the shell recipe). If in doubt, give a lockfile pathname after the colon.

The pattern line is a regular expression; ^ means "start of line". Each Windows .exe file starts with the same bytes; this pattern matches the base64-encoded version of those bytes. (You could do a similar thing to catch other encoded file types -- once you know the pattern, that is.) There's some chance that another arbitrary file, somewhere in the universe, could start with these same bytes; that's part of why we archive the message instead of simply discarding it. (You might also actually want to receive a Windows .exe file by email someday?)

The real work here is in the shell command line. These are actually four parts of a single command line, joined by procmail's continuation character \. How does the command line work?

When procmail runs a shell command line, the incoming message is fed to that shell's standard input. The first command that reads its standard input will receive the message. Here, because we don't give gzip a filename to read, it will read its standard input. It compresses the message on-the-fly. We've used the shell's >> operator to append gzip's output to a file where we collect the viruses. The gzip algorithm has the nice feature that you can append multiple chunks of compressed text to the same file. When you uncompress, the pieces will be uncompressed in the order written. This is handy for all sorts of space-efficient email archiving jobs!

The first line of the shell command ends with a semicolon (;), the shell command separator, as well as a backslash (\) to tell procmail that the command isn't finished. The next two lines run a subshell that collects the output of all commands within the parentheses; the standard output of all those commands is piped to a mail command, which sends a notice that a file has been caught. The subshell runs two commands (separated by the semicolon). The first command, echo, simply outputs the text from its command line. The second command is a pipeline: zcat uncompresses the virus message file on-the-fly and pipes it to grep, which looks for message separator lines (lines starting with From and a space). The mail command uses that MUA to send the resulting message (from its standard input) to the user; procmail should automatically set the LOGNAME variable with the current user's login name, but you could also use any email address here.

Here's a typical email message that mail might send. Please compare this message to the shell command that made it:

From: Jerry Peek <jpeek@jpeek.com>
Date: Thu, 26 Jun 2003 08:45:13 -0400 (EDT)
To: jpeek@jpeek.com
Subject: NOTICE: Windows EXE file caught?

  Messages in caught_exes.gz file:
From lucky1@bork.pe  Thu Jun 26 08:22:33 2003
From mom@home.ci  Thu Jun 26 08:45:09 2003

This is an example of the kind of thing you can do to filter messages automatically. To come up with others, think about characteristics of the incoming messages (a pattern in the header or body) and what you want to do with those messages (file them, forward them, run a series of commands, etc.) Then test, test, test!

[To return to the place where you left the article, use your browser's "back" button. You can also go to the start of the article.]