UPDATED 4/14/04 - The crmcheck and
crmlearn scripts were updated to use a new Perl script
crmstrip (below). The crmstrip script strips
locally generated mail headers from messages to be learned so that those
messages better resemble what they looked like when classified (which
happens prior to the insertion of local headers).
Sounds like there aren't too many people doing this just yet, so I thought I'd whip up a quick howto, and I do mean quick. For instance, it's assumed you already have MIMEDefang installed.
This part is pretty easy and well documented, so I'm not going to waste time with it here. Go here, download, unpack and read the howto.
One thing that's not well covered in the CRM114 docs is installation.
Sure, you do the standard make install thing, but that's
just the binaries. What I did was create a /etc/mail/crm114
directory. Then I went about creating the .css files, copying over
stuff from the source tree that looked important, editing config files,
twiddling ownership, permissions and whatnot.
So now that most of the dust has settled and it's been running for
awhile, /etc/mail/crm114 looks like this:
-rw-rw-rw- 1 defang defang 5925848 Mar 4 22:43 allmail.txt -rw-r--r-- 1 root root 0 Mar 1 21:35 blacklist.mfp -rw-r--r-- 1 root root 4647 Mar 1 21:35 blacklist.mfp.orig -rwxr-xr-x 1 root root 1543 Feb 28 21:24 classifymail.crm -rwxr-xr-x 1 root root 1214 Feb 28 21:24 mailexpand.crm -rw-r--r-- 1 root root 5596 Feb 28 20:12 mailfilter.cf -rwxr-xr-x 1 root root 27675 Feb 28 21:23 mailfilter.crm -rwxr-xr-x 1 root root 257 Feb 29 13:55 mdcrm -rw-rw-rw- 1 defang defang 12582924 Feb 28 15:22 nonspam.css -rw-rw-rw- 1 defang defang 137653 Mar 4 22:17 nonspamtext.txt -rwxr-xr-x 1 root root 4884 Feb 28 21:24 pad.crm -rw-r--r-- 1 root root 0 Mar 4 00:01 priolist.mfp -rw-r--r-- 1 root root 49 Mar 4 00:01 priolist.mfp.orig -rw-rw-rw- 1 defang defang 4515 Feb 29 22:51 rejected_by_blacklist.txt -rw-rw-rw- 1 defang defang 1908106 Mar 4 22:17 rejected_by_css.txt -rw-r--r-- 1 root root 163 Feb 28 21:30 rewrites.mfp -rwxr-xr-x 1 root root 1561 Feb 28 21:24 rewriteutil.crm -rw-r--r-- 1 root root 267 Feb 28 21:24 scrub_mailfile_rewrites.mfp -rwxr-xr-x 1 root root 310 Feb 28 21:24 shroud.crm -rw-rw-rw- 1 defang defang 12582924 Feb 28 15:22 spam.css -rw-rw-rw- 1 defang defang 95993 Mar 4 21:58 spamtext.txt -rw-r--r-- 1 root root 48 Mar 2 00:04 whitelist.mfp -rw-r--r-- 1 root root 67 Mar 1 21:36 whitelist.mfp.orig
Obviously leaving files at 666 is going to be very non-smart on some boxes, but I'm the only user on mine. If you don't want to leave them wide open on your system, just keep in mind that defang needs write access to some files and whatever user account does the learning also needs write access to at least the .css files.
While you could run CRM114 directly from MIMEDefang without a wrapper, I chose to use one. Call me silly, but to me this is "better."
/etc/mail/crm114/mdcrm:
#!/bin/sh LOG=/tmp/mdcrm.log echo "======================================" >> $LOG echo "Start: `date`" >> $LOG pwd >> $LOG /etc/mail/crm114/mailfilter.crm --fileprefix=/etc/mail/crm114/ \ --stats_only < ./INPUTMSG 2>> $LOG echo "End: `date`" >> $LOG
I added the following code to MD's filter_begin() function:
# Testing CRM114 here...
if ((-s "./INPUTMSG") <= (500 * 1024)) { # 500kB limit
open(CRM,"/etc/mail/crm114/mdcrm |")
or md_graphdefang_log("Failed opening mailfilter.crm");
@result = <CRM>;
chomp $result[0];
if( $result[0] < 0 ){
action_add_header("X-CRM114-Status", "SPAM ( pR: $result[0] )");
}else{
action_add_header("X-CRM114-Status", "HAM ( pR: $result[0] )");
}
close(CRM)
or md_graphdefang_log("Failed closing mailfilter.crm");
}
Note that all this code does is tag. That's fine if you are relying on procmail to direct your spam/ham to their respective mailboxen. And since you will need to babysit the filter for some time (indefinitely?), that's probably the best approach. I may at some point add code to refuse mail that gets particularly low crm scores, stay tuned...
These are designed to save a little typing. Both require an email on stdin. Modify as you see fit.
/usr/local/bin/crmcheck:
#!/bin/sh /usr/local/bin/crmstrip | /etc/mail/crm114/mailfilter.crm --fileprefix=/etc/mail/crm114/ | grep CRM exit 0
/usr/local/bin/crmlearn:
#!/bin/sh case "$1" in -s) /usr/local/bin/crmstrip | /etc/mail/crm114/mailfilter.crm --fileprefix=/etc/mail/crm114/ --learnspam | grep CRM ;; -h|-n) /usr/local/bin/crmstrip | /etc/mail/crm114/mailfilter.crm --fileprefix=/etc/mail/crm114/ --learnnonspam | grep CRM ;; *) cat<<EOF Try -s for spam and -h or -n for ham. EOF ;; esac exit 0
This script supports the two above...
/usr/local/bin/crmstrip:
#!/usr/bin/perl
# Here's the story...
# CRM114 was giving me grief, saying it didn't need to learn messages that it
# may have misclassified only moments ago. The only explanation I can come
# up with is that the messages being classified via milter have a few less
# headers than when they actually arrive in my mailbox, so basically CRM114
# is learning from a slightly different message than what was misclassified.
# My solution is to use Perl to strip out the added headers and return the
# message to the form it was in when originally classified.
# Using a handy example, we need to go from this:
# From r.ratliffuo@modsim.co.kr Wed Apr 14 20:39:21 2004
# Return-Path: <r.ratliffuo@modsim.co.kr>
# Received: from embassi.de ([218.146.9.3])
# by calvin.boinklabs.com (8.12.8/8.12.8) with SMTP id i3F0dCnI021372;
# Wed, 14 Apr 2004 20:39:16 -0400
# Message-ID: <5d2b01c42282$a3d602b0$8bb757f1@embassi.de>
# From: "Reggie Ratliff" <r.ratliffuo@modsim.co.kr>
# To: cwilkins@boinklabs.com, cwilkins-web@boinklabs.com
# Subject: INC^R.EASE YOUR D'I^C,K WEIGHT ^ gjanjbzw
# Date: Thu, 15 Apr 2004 00:43:09 +0000
# MIME-Version: 1.0
# Content-Type: text/html;
# charset="us-ascii"
# Content-Transfer-Encoding: 8bit
# X-CRM114-Status: SPAM ( pR: -41.5376 )
# X-Scanned-By: MIMEDefang 2.37
# To this:
# Message-ID: <5d2b01c42282$a3d602b0$8bb757f1@embassi.de>
# From: "Reggie Ratliff" <r.ratliffuo@modsim.co.kr>
# To: cwilkins@boinklabs.com, cwilkins-web@boinklabs.com
# Subject: INC^R.EASE YOUR D'I^C,K WEIGHT ^ gjanjbzw
# Date: Thu, 15 Apr 2004 00:43:09 +0000
# MIME-Version: 1.0
# Content-Type: text/html;
# charset="us-ascii"
# Content-Transfer-Encoding: 8bit
# Which means we need to lose:
# From
# Return-Path:
# X-CRM114-Status:
# X-Scanned-By:
# And of course we have a few special cases:
# Only clobber the first (local) Received: header
# Clobber locally generated Message-ID: headers
# Note that the X-CRM114-Status and X-Scanned-By headers were custom for my
# setup. Yours may not have them and/or may feature other custom generated
# headers that need to be stripped. Adjust the code below as needed.
# Also, if you are wondering where you are supposed to compare "before" and
# "after" headers as shown above:
#
# The "before" version comes from (in my case)
# /etc/mail/crm114/rejected_by_css.txt (If you don't have that file
# somewhere, you need to enable it in the crm114 config file.)
#
# It should be pretty obvious that the "after" version comes from
# your inbox, or incoming spam folder.
# Now, getting down to business...
# Set this to match the FQDN of locally generated message ID's. In other
# words, the FQDN of your inbound mail server.
$localsrv = 'calvin.boinklabs.com';
# loop through the message line by line
$mode = 'head';
$gotrcvd = 0;
while( <stdin> ){
# just print to stdout and loop if we're no longer in header mode
if( ($mode ne 'body' and /^$/) or $mode eq 'body' ){
$mode = 'body';
print;
next;
}
# If we are here, we've got headers, or a header continuation to deal
# with.
# eat continuations for headers we wish to supress
if( $mode eq 'eat' and /^\t/ ){
next;
}
# Here's where we look for headers to clobber
if( /^From\s/i
or /^Return-Path:\s/i
or /^Message-ID:\s+.*$localsrv/i
or /^X-CRM114-Status:\s/i
or /^X-Scanned-By:\s/i
or /^Status:\s/i # These 3 are added by Mutt
or /^Content-Length:\s/i
or /^Lines:\s/i ){
$mode = 'eat';
next;
}
# Special case - just the topmost (local) received header gets stripped
# adjust the literal 1 if you need to strip more than one.
if( /^Received:\s/ and $gotrcvd < 1 ){
$gotrcvd++;
$mode = 'eat';
next;
}
# If we got this far, we should print. So we will!
print;
# lastly, turn off 'eat' mode so we don't gobble up needed header lines
$mode = 'head';
}
exit(0);
Well that's it for now. Happy spam stomping!
Please direct inquires to: cwilkins@dtserv.com
All content ©2004 Dauntless Technical Services (except the stuff that isn't mine).