A spam-Attack Detection and Prevention System

 

 

 

Management Overview

 

 

 

 

 

 

presented by

Royce Williams

April 24th, 2001


 

INTRODUCTION   2

SCOPE OF THE ANALYSIS   4

SCOPE OF THE SOLUTION   5

DESCRIPTION OF THE SOLUTION   6

System Requirements  6

Components  6

Features  7

Benefits of Proposed System   10

INSTALLATION AND IMPLEMENTATION RECOMMENDATIONS   12

PLANNED IMPROVEMENTS   13

CHANGES TO THE ORIGINAL PROPOSAL   14

FREQUENTLY ASKED QUESTIONS   15

 


INTRODUCTION

 

spamradar is a Perl program designed to help ISPs and other large-volume mail servers detect, block, and report incoming floods of unsolicited commercial email (UCE) and/or unsolicited bulk email (UBE) - collectively known as "spam".

 

spamradar is unusual because it is not content-based.  It does not examine the actual contents of any email messages.  Instead, it examines the behavior of remote mailservers while they are connecting to the local mailserver, searching for patterns that are concurrent indicators of inbound spam attack.

 

spamradar is being released under the GNU General Public License (see http://www.gnu.org/copyleft/gpl.html in the hope that it can be of general use.  spamradar uses no proprietary code.  Its full source is available at http://www.tycho.org/spamradar/.

 

Note that the words SPAM® and Spam® (with the specific case shown) are not used elsewhere in this document because they are registered trademarks of Hormel Foods Corporation.  The use of the word "spam" (all lower-case) is deliberate because it is the common term for UCE and/or UBE.  Any use of the words "SPAM" or "Spam" elsewhere in this document is inadvertent and should be considered to be typographical error.

 


SCOPE OF THE ANALYSIS

 

The purpose of this analysis was to find the optimal way to solve a particular customer and business problem: the reduction of incoming and outbound spam.  The analysis focused on an evaluation of both customer and resource impact, searching for a solution to address these issues.

 

In particular, we felt that the solution needed to be

 

·        Flexible

·        Fast

·        Easy to use

·        Easy to modify

 

The analysis addresses each of these needs.

 


SCOPE OF THE SOLUTION

 

Since most of the large companies that provide email services (AOL, Yahoo, Hotmail, etc.) have their own internal proprietary methods of reducing spam, this package is aimed towards the medium- and small-sized ISPs, and small-to-medium-sized companies that run their own mail servers.

 

spamradar is different from some other anti-spam packages in that it does not attempt to examine the content of the message itself to determine whether or not it is spam.  Rather, it attempts to track and analyze information outside the messages themselves (using information that is not subject to most customer privacy regulations), and to intelligently use this information to determine whether or not spam is likely to be coming from a particular source.  spamradar can act on a system-wide basis without invading customer privacy.

 

Email is a very important and sensitive resource for many people.  Because of this, there are a variety of different approaches to combating spam.  spamradar’s features are flexible enough to accommodate changes in policy, corporate philosophy, or customer opinion and needs.

 

It is suggested that spamradar be used in conjunction with a customer-configurable personal filtering system (such as Brightmail or various procmail filters) for maximum spam reduction and flexibility.

 

 


DESCRIPTION OF THE SOLUTION

 

System Requirements

 

·                    Perl 5.004 or higher;

·                    Sendmail or Postfix mail server software

·                    Mail logging to a text file accessible by the machine that is running spamradar

 

Components

 

Log analyzer

This is a Perl program implemented as a daemon to constantly monitor raw mail logs, watching for spamlike activity.  It can also be run in non-daemon mode, reporting on spamlike activity to assist a systems administrator in the investigation of mail problems.  If system resources are limited, many of the benefits of running as a daemon can be achieved by arranging for spamradar to be executed periodically from the Unix cron facility.

 

Open-relay tester

Once spamradar has determined that a relay may be a spam source, it queues up the IP address of that remote mailserver to be tested for third-party relaying.  If a relay has been recently tested, the test will be skipped.

 

Test-results analyzer

This component periodically checks the email inbox that has been designated as the recipient of relay tests.  It verifies that the message received matches the one sent by checking the headers for the relay IP address and matching a stored cookie with the cookie in the received message.  Test results are then stored in the database.

 

Command-line utilities

While spamradar is designed to run unattended and perform useful protection functions, it requires tuning and is far from a perfect solution.  It may be well-suited to a particular installation at a particular time, but spammer tactics can change over time.  spamradar’s command-line options can be used to examine in closer detail any log entries that are not recognized by spamradar or are recognized incorrectly.  They can also be used to tune and monitor performance.

 

Data repository and data access module

This is a Berkeley DB database that contains information about known mail relays, their testing timestamps, their behavior, and their testing status.  To maximize portability, the database interface is abstracted into a separate module, Relaydata.pm.  To take advantage of the speed of BerkeleyDB hashes while at the same time benefit from their portability, the data is serialized in the background into a single string by Relaydata.pm and parsed into individual data items.  This happens in a fashion transparent to the user.

 

Features

 

Detect and track attempts to deliver to users that don’t exist.  Despite what many people believe, the most common way for a spammer to guess your username is exactly that – guessing.  The other common approach is to purchase a list of email addresses from another source.  A natural byproduct of both of these approaches is that the mail logs will contain attempts to deliver to nonexistent users.  If there are more than a few of these coming from a remote mailserver, they are probably relays being used by spammers and should be tested.

 

Analyze email deliveries claiming to be from domains that don't exist or don't have MX records.  By claiming to come from domains that don't exist, spammers try to obscure where they are really coming from.  The current best practice to counter this is to block any incoming messages with "From:" fields that use domains that don't exist, or that come from domains that don't have any way to return the mail (missing MX records).  spamradar takes this one step further.  A high number of emails coming from a few relays claiming to come from domains that don't exist are a concurrent indicator of spam attack.  spamradar watches for these relays, summarizes and displays their activity, and can act when these thresholds are exceeded.

 

Make it easier to manually hunt for spammers.  While actively fighting a spam attack, it is often handy to be able to manually review the last few lines of the log, looking for suspicious activity.  spamradar's -m and -t options make it easy to analyze the mail logs from the command line with a minimum of typing, and the -o option lets you manually test a host to see whether or not it is an open relay.

 

Track and analyze successful deliveries.  If a spammer has a good list of your users, the number of unsuccessful attempts will be relatively low, but the volume will be high.  spamradar tracks the number of successful deliveries as well as the number of delivery attempts.    Much like Kai's SpamShield (http://spamshield.conti.nu/), if a particular host is sending you 1500 messages per minute, it's probably spamming you - and spamradar will flag it as a possible spam source to be reported and tested.

 

Pay special attention to mailing lists.  Mailing lists often are often not well-maintained.  It is common for users to forget to unsubscribe from a mailing list before moving on to a new email address.  Ironically, this means that the older and more popular a mailing list is, the more likely it is that a number of your users will have previously subscribed to it and departed, making it appear as though more and more usernames are being "guessed" by the listserver. 

 

To help counter this, hosts that have "lyris", "list", "lists," or "majordomo" in their names are not blocked by default.  Other strings can also be added.  Note that this has nothing to do with the domain name used in the "From:" field of the email, but only deals directly with the reverse-DNS name of the relaying host itself.

 

Test to see whether or not a suspicious relay is an open relay.   After detecting possible spam activity, spamradar can be configured to test the servers in question for unauthorized third-party relaying.  This testing can also be manually performed with spamradar’s -o option.  If the server is truly an open relay, an email address of your choice will receive the test message sent by spamradar.

 

Warn administrators of open or blocked relays that they have been blocked.  When a relay is blocked, spamradar will automatically send an email to the Postmaster user at that host, both by name and by IP address.  It will also optionally send an email to them via the abuse.net email facility, a centralized database of known good mail-abuse contacts.  This service requires a free registration.  More information about abuse.net can be found at http://www.abuse.net/.

 

Optionally take advantage of existing spam resources.  There are a number of spam resources on the Internet that are designed to allow mailserver sysadmins to centralize their experience with spammers and open relays.  The more sites that contribute to these resources, the harder it becomes for spammers to deliver their messages unimpeded.

 

For example, some third-party services allow you to submit a host for testing as a possible open relay.  The Open Relay Behaviour-modification System (or ORBS) is one of the most widely used systems.  For those services that allow it, spamradar can automatically forward confirmed relays to any service that accepts email submission of open relays..  Be aware that some of these services do not accept submissions without some sort of registration.  See http://www.orbs.org/ for more information.

 

Users of spamradar are strongly encouraged to take advantage of shared public databases such as ORBS.  The spammer most likely to slip beneath spamradar's notice is the spammer that sends a low number of spam messages per hour, claiming to come from domains that really exist, using only a very few open relays at a time, using real domains that have MX records, attempting delivery to usernames in random order, and using "From:" usernames that look legitimate.  In other words, spammers escape spamradar by becoming low-volume spammers.  The only exception to this is the spammer that is smart enough to spam 10,000 sites simultaneously in an interleaved and distributed fashion.  It is for this reason that you are encouraged to use these features of spamradar to share your experience with others.

 

Detect multiple sendings from multiple relays.  If you receive five emails from five different mail servers on diverse networks all claiming to be coming from "bill54332@loja.net", there is an increased chance that those mail servers are open relays.  spamradar takes this into account and will be more likely to report such servers as possible spam sources.

 

Benefits of Proposed System

 

Concise.   spamradar is designed to show you as much information as you need.  For example, during normal operation, if you (or spamradar) have spotted a spam source and blocked it, spamradar will not display this host in its output by default (though all relays can be displayed with the -a option).  spamradar will also not display statistics for hosts that you have marked as never to be blocked (in the spamradar.dontblock file).  This can save valuable time when trying to track down spam-related problems.

 

Flexible.  spamradar can understand sendmail and Postfix logging formats.  It can generate machine-readable output for processing by other programs.  You can define hosts that you would like to always block, never block, or temporarily block for a specified period of time.  spamradar can run as a command-line tool or as a daemon.  It can use an external configuration file in a user-specified location.  It has multiple levels of debugging to allow you to examine the spam-detection process in greater detail.  spamradar can be configured to automatically block known open relays, or to simply report suspicious relays and defer testing to a human operator.  In this way, spamradar can change as policies and opinions change.

 

Portable and self-contained.  Written for Perl 5.0x, spamradar should be useable on any platform running sendmail or Postfix that logs to a text file,  and uses no platform-specific calls and very few  external programs.  The only exceptions to this are the external “tail” command, which is so much faster than Perl implementations that have been tested that it is still an external call, and the single-relay-testing utility rlytest.pl by Chip Rosenthal.  With some additional research and testing, it is believed that both features can be internalized in future revisions.  By keeping these external dependencies to a minimum, spamradar can be easily ported to other systems.

 

Open source.  You may need to modify spamradar to fit your own needs.  With the full source available, you will be able to do so.  If you modify spamradar in a way that might benefit others, you are encouraged to contribute your changes back to the main source tree.

 

Persistent.  spamradar remembers which servers you have tested, which servers you have not tested, and whether or not a server has been blocked.  When a host has been blocked as a possible spam source, and then unblocked later, spamradar can remember the previous block and will be more likely to block that host in the future.  This maximizes the use of resources dedicated to spamradar’s processing.

 

Efficient.  On a Sun Ultra II with two 300MHz processors, 512M of RAM, and its mail logs stored on a Netapp mounted via NFS, spamradar analyzes mail logs at a rate of about 1000 lines per CPU second.  On average, the mail systems generate about 1100 log records per real-time minute.  This should provide ample room for future growth.


 

INSTALLATION AND IMPLEMENTATION RECOMMENDATIONS

 

·        Make sure that the email address used as the source of test messages is directed to someone who reads it on a regular basis.  This is especially important during early stages of implementation.

·        Take advantage of spamradar’s export functions to populate a mailserver lookup table that can be used by the mailserver to refuse email.

·        For organizations with multiple mailservers running, spamradar can be put to best use by forwarding all mail logging to a centralized, shared log (via the Unix syslog facility).  This will enable spamradar to monitor patterns across all servers.  The list of rejected relays should be also be shared, perhaps on a centralized NFS-mounted share or pushed out with ssh and rsync.

·        Use a hard-to-guess username for the recipient of test messages, to minimize attempts to exploit the automated testing system.

·        Run spamradar manually and note which servers it detects as possibly spamming.  If domains you recognize show up frequently (like aol.com), add them to the spamradar.ignore file.

 


PLANNED IMPROVEMENTS

 

·        Syslogging as an alternative to writing directly to a log file would be more Unix-friendly.

·        A particularly thorny and interesting improvement: the abstraction of the mail-log parsing into a separate module, so that separate modules for handling other log formats could be easily developed.

·        The two remaining system calls (tail and rlytest) should be converted to internal routines.  Unfortunately, the current version of the File::Tail Perl module is considerably less efficient at extracting the last 8000 lines of a file than is the external tail program.  Internalizing rlytest might be less difficult.

·        If users bounced their spam emails to another mailbox (separate from the TEST_RECIPIENT mailbox), spamradar could be taught to pop these messages, extract relay IPs from the headers, and test them.

·        Relaytest.pm could be modified to emulate the DBI interface so that other DBI modules could be used in its place.

·        The current version of spamradar does not collect statistics or act on error messages generated by attempts to deliver mail that are already denied by another mechanism (such as RSS or a local “reject” list.)  There are stubs in the code where these would fit, but they have not yet been implemented.

·        The spamradar.allow and spamradar.ignore files currently do not understand networks expressed in Classless Inter-Domain Routing (CIDR) notation.  This would be useful for autonomous systems that have subnets of sizes between the standard classful networks.

·        The current method of testing for guessed-username randomness is to examine only the first character of each guessed username.  There are more sophisticated statistical methods to calculate how different one string is from another string.  If the calculation cost is not too high, this might be worth pursuing, perhaps even as a way to detect spam that is being successfully delivered.

 


CHANGES TO THE ORIGINAL PROPOSAL

 

From searching the mail forums and the Internet, it is clear that most small- to medium-sized ISPs and companies are not writing their mail logs to an SQL database.  Therefore, the feature that read from SQL databases was discarded before implementation.

 

Performing ARIN and domain WHOIS lookups for each host resulted in an undesirable processing delay and increase in load.  A centralized database of contact information at abuse.net coupled with an automatic attempt to deliver to postmaster@domain.com and postmaster@192.168.42.73 achieve similar effect with much less impact.

 

Further research into the requirements for ORBS submission has raised questions as to whether relay-testing results from an automated service such as spamradar will be accepted by ORBS.  An email inquiring about the feasibility of this has been sent to ORBS, but a response has not yet been received.

 

In its current incarnation, spamradar calls Chip Rosenthal’s rlytest.pl to perform its individual relay testing.  This is primary because Chip’s implementation is so well-designed and optimized that has proven to be difficult to improve upon it.  Because rlytest.pl is not packaged with spamradar, users will need to download it from http://www.unicom.com/sw/rlytest/.


 

FREQUENTLY ASKED QUESTIONS

 

What is spam? - Spam is usually defined as bulk or commercial email that is unsolicited.  Often called UCE (Unsolicited Commercial Email) or UBE (Unsolicited Bulk Email).

 

Why do I get so much spam?  Unfortunately, the laws of economics would suggest that the only reasons that spam is so prevalent are that it works, and that it is very inexpensive for the sender.  Quite a large number of people must be responding to spam, because the spammers keep sending it.  Because spam is often relayed through improperly configured mail servers administered by unsuspecting administrators, it is often impractical to try to trace back to where the spam came from (more on this later).

 

Why are spammers hard to stop?  Spammers use a vulnerability in some mail servers to transfer the cost of delivering the spam to someone else.  Because this vulnerability is somewhat abstract, many mail servers are vulnerable for years without the problem being noticed by their system administrators.  This vulnerability is know as open relay or third-party relay.

 

In the past, many mail server software packages shipped with no limit on who could use them to relay mail.  System administrators had to deliberately deactivate this feature.  In the early days of the Internet, this was harmless and even promoted inter-node communication.  On today's Internet, however, this practice is ill-advised and even dangerous, because huge floods of incoming spam can be as interruptive to the normal function of a mail server as a denial-of-service attack can be.  Most mature mail server products ship with no third-party relaying allowed, and the system administrator must enumerate the networks or domain names that are allowed to relay through the server.  However, since old software is often inexpensive or has licensing that is no longer vigorously enforced, it is quite common for new installations of older mail server packages to be deployed daily all over the world.

 

What is an open relay?  An open relay is a mail server that allows connections from any network to relay mail through it to any other network. 

 

Normal mail servers only allow their own internal users to relay mail to addresses outside their systems.  This makes some amount of sense - the big mail servers pass the mail between each other and are connected to the Internet at all times, and there is a cost to maintain these mail servers and providing enough capacity to handle the load.  This cost is incurred by the ISP and is passed on to the customer.  By limiting who can relay mail to only one's customers, any costs associated with maintaining resources sufficient to handle that load are tied directly to revenue.  However, if anyone is allowed to relay through your mailservers, all of the associated costs increase without a corresponding increase in revenue.

 

For example, if I am an Internet Alaska customer, and if I set my outgoing mail server to a mail server that belongs to Internet Alaska, I can freely send email to bob@alaska.net, bob@gci.net, or bob@aol.com.  However, if I am an Internet Alaska customer and I set my outgoing mail server to a GCI mail server, I should only be able to send messages to bob@gci.net.  As an Internet Alaska customer, I should not be able to send email through GCI's mail servers to AOL.  That is why this type of relaying is often called third-party relaying.

 

How do spammers get my email address?  From a spammer's perspective, the more people that the spammer can reach, the better.  This is why, if you are a spammer, it is vital to get a list of known good email addresses.  There a few ways that a spammer can build a good user list:

 

·                    Build one by randomly guessing usernames.  Spammers use this tactic more often than you might think.  spamradar's first goal was to figure out which relays were trying to guess usernames and then display them.

·                    Steal a public one.  This is becoming less common.  Fewer and fewer ISPs are exposing their user list (most often as their /etc/passwd file) to the public.  If a spammer has a recent list of your users, the amount of apparent username guessing will be relatively low, making it harder to detect that the messages are spam.

·                    Harvest them from public Internet sources.  These include public Usenet archives, web pages, and even chat rooms and IRC.

·                    Purchase from and/or exchange with other spammers. 

 

What is an MX record?  An MX or Mail eXchanger record is a type of DNS record attached to a particular domain.  Performing a DNS query about a particular domain and specifying the “MX” record type will show the list of mail servers that are listed as the final destination for any email destined for that domain.  For example:

 

royce@vegan:~>> host -t mx tycho.org

tycho.org mail is handled (pri=20) by mailhost.alaska.net

tycho.org mail is handled (pri=10) by smtpgate.alaska.net

 

This is a manual way to perform exactly the same lookup that a mailserver performs whenever it has to deliver mail.  If someone is trying to send email to billy@tycho.org, the mailserver will look up the MX records for the tycho.org domain, pick the one with the lowest priority (in this case, 10) look up the IP address of that server, and try to deliver the message to it.

 

The reason that this is a spam issue is that a spammer can configure his or her spam software to use a “From:” address that uses a domain that has no MX record, like so:

 

royce@vegan:~>> host -t mx delphi.net

Host not found, try again.

 

This is a domain that exists (its name is registered) but it has no MX records.  If a message claiming to come from sadie@dephi.net is successfully delivered, then the spammer has obscured the origin of the email and made it more difficult to respond.  It is rare for spammers to use email addresses that actually exist.

 

Many mail server software packages (including sendmail) can be configured to reject mail claiming to come from domains that do not have MX records.  spamradar takes advantage of the fact that the mailserver has to perform this lookup by reading the relevant record from the logs and nominating that server for testing.

 

Why is spam bad? - For end users, spam is bad because they must waste their time wading through it to process their legitimate mail.  For some people, it is quite easy to just hit the Delete key.  For others, especially those with young children, this can be a very frustrating issue.

 

For system administrators, however, the problem is serious for different reasons.  The sheer volume of spam traversing the Internet has rapidly increased to the point that capacity planning and anti-spam activities take up considerable time and resources.  The company incurs a cost for which it will never be reimbursed.  This is the equivalent of the mailman being forced to deliver any unstamped letter that you drop in the mailbox, whether it is one envelope or ten thousand.

 

Why is it called "spamradar”? - Two reasons.  First, spamradar is designed as an early-warning system that can both detect incoming trouble and help you to act upon it, so the “radar” simile seemed appropriate.  Additionally, the name didn't come up in any of the search engines that were tried.  J