Jan 11 2005

Spam Assassin and Movable Type

Update 2005/01/15: Please consider combining the plugin below with mt-proxyplug for best results.

A few days ago I saw the post on ioerror.us which details a solution to link WordPress’s comment checking system with Spam Assassin. I run MovableType and a WordPress solution does not work for me. The code needed to change a bit before it was usable on my system.

After enabling it last night and disabling mt-blacklist, I’m happy to report that it has caught every single comment spam attempt (a total of 32 attempts were registered). Spam indications appear in my server’s error_log like this:

[Tue Jan 11 08:33:34 2005] spam from diet pills/jane_doe7082@work.com/
148.244.150.58: score 10.7 (limit 5.0)

A message like this indicates that the ‘CommentFilter’ implemented in mt-spamassassin.pl has received notification from the Spam Assassin daemon that the current comment is over the Spam Assassin threshold.

In order to use the mt-spamassassin.pl plugin you will need to have Spam Assassin’s spamd running on your own network or need access to spamd running on a remote system. Enter the name of the system that runs spamd in $sa_spamd_host (use ‘localhost’ if it’s running on the same host as MovableType) and also enter the port number where spamd can be reached in $sa_spamd_port. And because I did not find a way to retrieve a blog owners email address from within the MoveableType plugin, please also enter your email address in $mt_owner. For SpamAssassin’s user_prefs to work, you should also set your real (unix) userid in $mt_userid. Drop the modified file in your blog’s plugins folder and it should be ready to go.

Thanks to http://www.ioerror.us/ for the cool idea!

You can download the compressed version here: mt-spamassassin.pl.gz (1.5 Kb,gzip)

Update 01/14/2005: I’ve since added another plugin called mt-commentproxyblock, which has detected every single spam submission on 01/13/2005 before it was passed through mt-spamassassin. It seems that the majority of spammers do use public proxies and those are easy to detect.

Update 01/20/2005: I just posed a new version of the plugin with a few enhancements. If you have both mt-spamassassin and mt-proxyplug on your system, a comment will be shortcut if mt-proxyplug has already determined that it comes from an open proxy. Specifically, mt-spamassassin will look at the visible-flag of the comment and will not work on comments which are not visible. This will cut down on processing time for spam comments.
Second, Justin was nice enough to correct the fake Message-header I’ve been sending to spamd to make it more RFC-2822 compliant. Thanks!
Third, you can now specify a $mt_moderate threshold value. This means that if a comment submission is below the Spam threshold (defined in Spam Assassin), but above the $mt_moderate value, it will be moderated instead of being allowed all the way through to the blog.

#!/usr/bin/perl -w
package MT::Plugin::SpamAssassin;
use strict;
use lib '../lib';
use vars qw ($VERSION);
$VERSION='0.4';
# (CHANGE ME) what host is running spamd?
my $sa_spamd_host = q{localhost};
# (CHANGE ME) what port is spamd listening on?
my $sa_spamd_port = 783;
# (CHANGE ME) who is the owner of the blog?
my $mt_owner      = q{me@localhost.com};
# (CHANGE ME) what is the userid for SpamAssassin?
my $mt_userid     = q{me};
# (CHANGE ME) what is the moderate threshold?
my $mt_moderate   = 1.5;
use constant ACCEPT_RESPONSE => 1;
use constant DENY_RESPONSE   => 0;
use MT;
use MT::App::Comments;
use IO::Socket;
use Time::Local qw(timegm);
use POSIX;
eval{ require MT::Plugin };
unless ($@) {
    my $plugin = {
        name => qq{Spamassassin for Movable Type v$VERSION},
        description => qq{Spamassassin for Movable Type},
    };
    MT->add_plugin(new MT::Plugin($plugin));
    # tell MT that we want to be called to filter comments
    MT->add_callback('CommentFilter', 10, $plugin, \&sa_filter);
}
# sa_filter
#
# 'CommentFilter' that is called for each attempt to post a comment
# on your blog. We'll pass the incoming comment to spamd running on
# $sa_spamd_host:$sa_spamd_port. If spamd responds with an indication
# that the comment was spam, then we'll repond with DENY_RESPONSE.
# If spamd says it's no spam or we can't get a good connection to
# spamd, we'll respond with ACCEPT_RESPONSE
sub sa_filter {
    my($eh,$app,$comment)=@_;
    unless($comment->visible()) {
        return ACCEPT_RESPONSE;
    }
    #print STDERR "[".scalar(localtime())."] mt-spamassassin: " .
    #  join("/",$comment->author,$comment->email,$comment->url,$comment->ip) . "\n";
    my $now=rfc822_date();
    my $hostname=gethostbyaddr(inet_aton($comment->ip), AF_INET);
    my $message="From " . $comment->email . " " . $now . "\n" .
      "Received: from client ([" . $comment->ip . "] ".
      ($hostname?$hostname:$comment->ip) . ")" .
      " by " . $ENV{HTTP_HOST} . " via MovableType; " . $now . "\n" .
      "Message-id: <". sprintf("%x\$%x",time,rand(65535)) .
      "\@" . ($hostname?$hostname:sprintf("[%s]",$comment->ip)) . ">\n" .
      "From: " . $comment->author .
      " <" . $comment->email . ">\nDate: " . $now . "\n" .
      "Subject: MovableType comment\n" .
      "To: $mt_owner\n\n" .
      $comment->url . "\n".
      $comment->text;
    # make sure all lines end in "\r\n";
    $message =~ s/\r\n/\n/gs;
    $message =~ s/\r/\n/gs;
    $message =~ s/\n/\r\n/gs;
    # now send it off to Spamassassin
    my $socket=IO::Socket::INET->new(PeerAddr => $sa_spamd_host,
                                     PeerPort => $sa_spamd_port,
                                     Proto    => "tcp",
                                     Type     => SOCK_STREAM);
    # no socket - no spam check
    return ACCEPT_RESPONSE unless($socket);
    # create the CHECK message for spamd
    $message = "CHECK SPAMC/1.2\r\n" .
      "User: $mt_userid\r\n" .
      "Content-Length: ".length($message).
      "\r\n\r\n".
      $message;
    # print STDERR "[".scalar(localtime())."] sending to spamd:\n$message\n";
    # send it to spamd
    my $toSend=$message;
    while(length($toSend)) {
        my $written = $socket->send($toSend);
        unless(defined($written)) {
            # oh no, something went wrong :-(
            return ACCEPT_RESPONSE;
        }
        $toSend=substr($toSend,$written);
    }
    # close writing end of socket
    $socket->shutdown(1);
    # suck in response from SpamAssassin
    my $response;
    while(1) {
        my $buffer;
        unless(defined($socket->recv($buffer, 1024))) {
            return ACCEPT_RESPONSE;
        }
        last unless(length($buffer));
        $response .= $buffer;
    }
    # trim  whitespace off the beginning of the response
    $response =~ s/^\s*//;
    # check if it is really a SpamAssassin response
    return ACCEPT_RESPONSE unless ($response =~ /^spamd\/[\d\.]+/i);
    # now find "Spam: True|False ; score / limit" header
    return ACCEPT_RESPONSE
      unless ($response =~ /spam:\s*(\S+)\s*;\s*([\d\.]+)\s*\/\s*([\d\.]+)/is);
    my($flag,$score,$limit)=($1,$2,$3);
    #if($flag =~ /false/i) {
        #print STDERR "[".scalar(localtime())."] no spam:\n$message\n";
    #}
    print STDERR "[".scalar(localtime())."] spam $flag from " .
      join("/",$comment->author,$comment->email,$comment->ip) .
      ": score $score (limit $limit)\n";
    if($flag =~ /false/i) {
        if($score > $mt_moderate) {
            print STDERR "[".scalar(localtime())."] moderating comment\n";
            $comment->visible(0);
        }
        return ACCEPT_RESPONSE;
    }
    # log a line to the error_log
    return DENY_RESPONSE;
}
# rfc822_date
#
# generate a GMT date according to rfc822
sub rfc822_date {
    # offset in hours (from Mail::Sendmail)
    my $offset  = sprintf "%.1f", (timegm(localtime) - time) / 3600;
    my $minutes = sprintf "%02d", abs( $offset - int($offset) ) * 60;
    my $TZ  = sprintf("%+03d", int($offset)) . $minutes;
    return POSIX::strftime("%a, %d %b %Y %T $TZ",localtime(time()));
}
1;

4 Responses to “Spam Assassin and Movable Type”