Spam Assassin and Movable Type
Update 2005/01/15: Please consider combining the plugin below with mt-proxyplug for best results.
A few days ago I saw the post on ioerror.us which details a solution to link WordPress’s comment checking system with Spam Assassin. I run MovableType and a WordPress solution does not work for me. The code needed to change a bit before it was usable on my system.
After enabling it last night and disabling mt-blacklist, I’m happy to report that it has caught every single comment spam attempt (a total of 32 attempts were registered). Spam indications appear in my server’s error_log like this:
[Tue Jan 11 08:33:34 2005] spam from diet pills/jane_doe7082@work.com/ 148.244.150.58: score 10.7 (limit 5.0)
A message like this indicates that the ‘CommentFilter’ implemented in mt-spamassassin.pl has received notification from the Spam Assassin daemon that the current comment is over the Spam Assassin threshold.
In order to use the mt-spamassassin.pl plugin you will need to have Spam Assassin’s spamd running on your own network or need access to spamd running on a remote system. Enter the name of the system that runs spamd in $sa_spamd_host
(use ‘localhost’ if it’s running on the same host as MovableType) and also enter the port number where spamd can be reached in $sa_spamd_port
. And because I did not find a way to retrieve a blog owners email address from within the MoveableType plugin, please also enter your email address in $mt_owner
. For SpamAssassin’s user_prefs to work, you should also set your real (unix) userid in $mt_userid
. Drop the modified file in your blog’s plugins folder and it should be ready to go.
Thanks to http://www.ioerror.us/ for the cool idea!
You can download the compressed version here: mt-spamassassin.pl.gz (1.5 Kb,gzip)
Update 01/14/2005: I’ve since added another plugin called mt-commentproxyblock, which has detected every single spam submission on 01/13/2005 before it was passed through mt-spamassassin. It seems that the majority of spammers do use public proxies and those are easy to detect.
Update 01/20/2005: I just posed a new version of the plugin with a few enhancements. If you have both mt-spamassassin and mt-proxyplug on your system, a comment will be shortcut if mt-proxyplug has already determined that it comes from an open proxy. Specifically, mt-spamassassin will look at the visible-flag of the comment and will not work on comments which are not visible. This will cut down on processing time for spam comments.
Second, Justin was nice enough to correct the fake Message-header I’ve been sending to spamd to make it more RFC-2822 compliant. Thanks!
Third, you can now specify a $mt_moderate
threshold value. This means that if a comment submission is below the Spam threshold (defined in Spam Assassin), but above the $mt_moderate
value, it will be moderated instead of being allowed all the way through to the blog.
#!/usr/bin/perl -w package MT::Plugin::SpamAssassin; use strict; use lib '../lib'; use vars qw ($VERSION); $VERSION='0.4'; # (CHANGE ME) what host is running spamd? my $sa_spamd_host = q{localhost}; # (CHANGE ME) what port is spamd listening on? my $sa_spamd_port = 783; # (CHANGE ME) who is the owner of the blog? my $mt_owner = q{me@localhost.com}; # (CHANGE ME) what is the userid for SpamAssassin? my $mt_userid = q{me}; # (CHANGE ME) what is the moderate threshold? my $mt_moderate = 1.5; use constant ACCEPT_RESPONSE => 1; use constant DENY_RESPONSE => 0; use MT; use MT::App::Comments; use IO::Socket; use Time::Local qw(timegm); use POSIX; eval{ require MT::Plugin }; unless ($@) { my $plugin = { name => qq{Spamassassin for Movable Type v$VERSION}, description => qq{Spamassassin for Movable Type}, }; MT->add_plugin(new MT::Plugin($plugin)); # tell MT that we want to be called to filter comments MT->add_callback('CommentFilter', 10, $plugin, \&sa_filter); } # sa_filter # # 'CommentFilter' that is called for each attempt to post a comment # on your blog. We'll pass the incoming comment to spamd running on # $sa_spamd_host:$sa_spamd_port. If spamd responds with an indication # that the comment was spam, then we'll repond with DENY_RESPONSE. # If spamd says it's no spam or we can't get a good connection to # spamd, we'll respond with ACCEPT_RESPONSE sub sa_filter { my($eh,$app,$comment)=@_; unless($comment->visible()) { return ACCEPT_RESPONSE; } #print STDERR "[".scalar(localtime())."] mt-spamassassin: " . # join("/",$comment->author,$comment->email,$comment->url,$comment->ip) . "\n"; my $now=rfc822_date(); my $hostname=gethostbyaddr(inet_aton($comment->ip), AF_INET); my $message="From " . $comment->email . " " . $now . "\n" . "Received: from client ([" . $comment->ip . "] ". ($hostname?$hostname:$comment->ip) . ")" . " by " . $ENV{HTTP_HOST} . " via MovableType; " . $now . "\n" . "Message-id: <". sprintf("%x\$%x",time,rand(65535)) . "\@" . ($hostname?$hostname:sprintf("[%s]",$comment->ip)) . ">\n" . "From: " . $comment->author . " <" . $comment->email . ">\nDate: " . $now . "\n" . "Subject: MovableType comment\n" . "To: $mt_owner\n\n" . $comment->url . "\n". $comment->text; # make sure all lines end in "\r\n"; $message =~ s/\r\n/\n/gs; $message =~ s/\r/\n/gs; $message =~ s/\n/\r\n/gs; # now send it off to Spamassassin my $socket=IO::Socket::INET->new(PeerAddr => $sa_spamd_host, PeerPort => $sa_spamd_port, Proto => "tcp", Type => SOCK_STREAM); # no socket - no spam check return ACCEPT_RESPONSE unless($socket); # create the CHECK message for spamd $message = "CHECK SPAMC/1.2\r\n" . "User: $mt_userid\r\n" . "Content-Length: ".length($message). "\r\n\r\n". $message; # print STDERR "[".scalar(localtime())."] sending to spamd:\n$message\n"; # send it to spamd my $toSend=$message; while(length($toSend)) { my $written = $socket->send($toSend); unless(defined($written)) { # oh no, something went wrong :-( return ACCEPT_RESPONSE; } $toSend=substr($toSend,$written); } # close writing end of socket $socket->shutdown(1); # suck in response from SpamAssassin my $response; while(1) { my $buffer; unless(defined($socket->recv($buffer, 1024))) { return ACCEPT_RESPONSE; } last unless(length($buffer)); $response .= $buffer; } # trim whitespace off the beginning of the response $response =~ s/^\s*//; # check if it is really a SpamAssassin response return ACCEPT_RESPONSE unless ($response =~ /^spamd\/[\d\.]+/i); # now find "Spam: True|False ; score / limit" header return ACCEPT_RESPONSE unless ($response =~ /spam:\s*(\S+)\s*;\s*([\d\.]+)\s*\/\s*([\d\.]+)/is); my($flag,$score,$limit)=($1,$2,$3); #if($flag =~ /false/i) { #print STDERR "[".scalar(localtime())."] no spam:\n$message\n"; #} print STDERR "[".scalar(localtime())."] spam $flag from " . join("/",$comment->author,$comment->email,$comment->ip) . ": score $score (limit $limit)\n"; if($flag =~ /false/i) { if($score > $mt_moderate) { print STDERR "[".scalar(localtime())."] moderating comment\n"; $comment->visible(0); } return ACCEPT_RESPONSE; } # log a line to the error_log return DENY_RESPONSE; } # rfc822_date # # generate a GMT date according to rfc822 sub rfc822_date { # offset in hours (from Mail::Sendmail) my $offset = sprintf "%.1f", (timegm(localtime) - time) / 3600; my $minutes = sprintf "%02d", abs( $offset - int($offset) ) * 60; my $TZ = sprintf("%+03d", int($offset)) . $minutes; return POSIX::strftime("%a, %d %b %Y %T $TZ",localtime(time())); } 1;
January 12th, 2005 at 5:51 pm
You guys should work together:
http://www.hjackson.org/blog/archives/2004/11/moveable_type_s.html
January 14th, 2005 at 5:34 pm
What, no trackback? 🙂 Anyway, I’m glad to hear it’s working for you on MT. As I had posted earlier in the week on my site, I’ve caught 581 and missed 2. Not a bad record at all, I think.
February 4th, 2005 at 7:34 pm
Great info, thanks!
February 10th, 2005 at 12:26 am
Is it technically possible to add a button to MT interface that will feed the comment to sa-learn and then delete it? Or maybe a hook to comment deletion?