Custom Search

Sunday, May 29, 2011

Building a botnet detector with Exim and SQLite, a step-by-step procedure wannabe

In this post, I am going to explain how I employ greylisting to collect IP addresses of botnets. Please read "Botnet Detection with Greylisting" first to get familiar with the general idea. I will mostly talk about implementation details here.

To detect botnets in my way of greylisting, you would need:
  • A UNIX-like host
    Linux, FreeBSD, NetBSD, or Solaris will all do. You don't need high-end hardware. Doorstop:) with single Pentium-III CPU, 64MB RAM, and 5GB hard disk should be plenty enough. Cheap VPS is even better, because it is easy to install and configure, saves on power and network bills, and are both environment and pocket friendly.

  • Heavily spammed domains with no active mailbox
    We are going to trace back to the origin of spam to find botnets, so you need to possess some domains which regularly get lots of spam (something like 5K spam every day will do). No spam, no bots. To prevent collateral damage to existing mailboxes and simplify the system, my detection is designed for domains no longer in use, which I call them as "trap domains" below. You need to modify the system yourself when applying it to active domains.
I myself use a Linux VPS with Debian 5 distribution. The trap domains I use had been expired for some time, so there should be no active mail accounts in them. You are assumed be an experienced network administrator, familiar with setting up MX records, mail system troubleshooting, etc. What my detection system does is really simple:
  • Identify bots with greylisting
    Any hosts trying to send mail to trap domains is all a bit problematic, but I want to focus on botnets, which seldom could pass greylisting. Mail servers (or abused open relays) which do retry sending mail will get a response like "no such user" for each recipient, and the corresponding sessions will be eliminated when compiling the resulting IP list of botnets.

  • Log full mail headers for trap domains
    Most abuse contacts want you to include at least full mail headers when reporting mail related abuses, because it is easier for them to explain what happened to their clients. So for every mail destined for trap domains, and not originating from known mail servers, I keep its full mail header in log files for later notifications.
Here come the detailed implementation procedures:
  1. Install Exim with SQLite support.
    Exim is the SMTP server I am most familiar with, and its powerful ACL is a tremendous help for my detection. I knew from the start that I would need to query the data collected by greylisting a lot, so I based the greylisting I use on the Simple Greylist for Exim, which gives me the ability to use SQL without maintaining a full blown database server.

    If you want to build Exim from source under Debian (like what I did), at least you need to make sure you have the development file for SQLite ready and change the makefile for Exim accordingly. Install the necessary library with the following command:

    apt-get install libsqlite3-dev

    Enable SQLite support from the Local/Makefile (inside the Exim source archive) by uncommenting:

    LOOKUP_SQLITE=yes

    and including the path for SQLite3's library file:

    LOOKUP_LIBS=-L/usr/local/lib -lsqlite3

    The instructions above are probably not enough for you to build the Exim binary. If you have not done so before, please consult the documents inside the Exim source archive or the Exim specification (perhaps the best documentation I have ever read for open source software).

  2. Create the greylisting database.
    The Simple Greylisting for Exim uses two database tables to keep track of greylisting entries and known resenders, respectively. But for the purpose of detecting botnets, I am mainly interested in expired greylisting entries, which represent IP addresses without reasonable retry behaviors. So I add another table for expired entries to the database.
    The SQLite script for creating the greylisting database is as follows:

    CREATE TABLE expired (
    id TEXT,
    time INTEGER,
    host TEXT,
    helo TEXT);
    CREATE TABLE greylist (
    id TEXT,
    time INTEGER,
    host TEXT,
    helo TEXT);
    CREATE TABLE resenders (
    host TEXT,
    helo TEXT,
    time INTEGER,
    PRIMARY KEY (host, helo) );
    CREATE INDEX expired_time on expired (time);
    CREATE INDEX greylist_time on greylist (time);

    Under Debian, you need to install the command line interface for SQLite 3:

    apt-get install sqlite3

    suppose that you save the script in a file named "create_greylisting_db.sql," and want to put the database file named "greylist.db" under the path /var/spool/exim/db, the following command will create the greylisting database:

    sqlite /var/spool/exim/db/greylist.db < create_greylisting_db.sql

  3. Modify the Exim configuration.
    Create the data file, /usr/exim/grey_domains (if you change the paths of data files, remember to adjust relevant settings below accordingly), for the trap domains, with one domain per line, like:

    trapdomain1.org
    trapdomain2.com
    trapdomain3.net


    Though mail bodies of any incoming mail will be discarded, a catch-all alias for each trap domain is still needed, so the aliases file /usr/exim/domain_aliases would contain lines like:

    *@trapdomain1.org: /dev/null
    *@trapdomain2.com: /dev/null
    *@trapdomain3.net: /dev/null


    Now it's time to integrating greylisting with Exim. Add trap domains to your local domains by adding reference to the file /usr/exim/grey_domains:

    domainlist local_domains = @ : /usr/exim/grey_domains

    Add the following lines to the main configuration setting:

    acl_smtp_helo = acl_check_helo
    GREYDB=/var/spool/exim/db/greylist.db


    The settings above specify the name of the ACL for SMTP HELO, and the filename of our SQLite database. For ACL configuration, add the following (ACL for SMTP HELO) immediately under the line "begin acl:"

    acl_check_helo:

    warn !hosts = +relay_from_hosts
    !condition = $acl_c_will_retry
    dnslists = list.dnswl.org
    log_message = $sender_host_address whitelisted in \
    $dnslist_domain=$dnslist_value, \
    time=$tod_epoch helo=$sender_helo_name
    set acl_c_will_retry = yes

    accept

    The warn clause above looks up the sending host of every inbound connection via list.dnswl.org, which tells us if the connection comes from a known mail server. Variable acl_c_will_retry is set to "yes" for known mail servers.

    The ACL for SMTP RCPT is inserted after the line "acl_check_rcpt:", as shown below:

    deny domains = /usr/exim/grey_domains
    condition = $acl_c_will_retry
    message = no such user

    The deny clause rejects each recipient from known mail servers, hoping to prevent them from delivering mail here again in the future.

    The ACL for SMTP DATA is much longer, as shown below, and should be appended right after the line "acl_check_data:".

    warn set acl_m_greyident = ${md5:${mask:$sender_host_address/24}$sender_address$recipients$h_message-id:}

    warn !hosts = +relay_from_hosts

    logwrite = MD5:$acl_m_greyident $message_headers@@@

    warn set acl_m_greytime = ${lookup sqlite {GREYDB SELECT time \

    FROM greylist WHERE \
    id='${quote_sqlite:$acl_m_greyident}';}{$value}}

    warn !hosts = +relay_from_hosts

    condition = ${if eq {$acl_m_greytime}{} {1}}
    set acl_m_dontcare = ${lookup sqlite {GREYDB INSERT INTO greylist \
    (id, time, host, helo) \
    VALUES ('$acl_m_greyident', \
    '$tod_epoch', \
    '$sender_host_address', \
    '${quote_sqlite:$sender_helo_name}');}}

    defer !hosts = +relay_from_hosts

    condition = ${if eq {$acl_m_greytime}{} {1}}
    condition = ${lookup sqlite {GREYDB SELECT time FROM greylist \
    WHERE id='${quote_sqlite:$acl_m_greyident}';} {1}}
    message = $sender_host_address is not yet authorized to deliver mail. \
    Please requeue the mail and try later.
    log_message = Greylisted defer: $acl_m_greyident $tod_epoch

    deny !hosts = +relay_from_hosts

    condition = ${if eq {$acl_m_greytime}{} {1}}
    message = unknown user
    log_message = Greylist insertion failed. Bypassing greylist.

    defer !hosts = +relay_from_hosts

    condition = ${if < {${eval10:$tod_epoch-$acl_m_greytime}}{900}}
    message = $sender_host_address is not yet authorized to deliver mail. \
    Please requeue the mail and try later.

    deny !hosts = +relay_from_hosts
    message = unknown user
    log_message = Greylist passed after \
    ${eval10:$tod_epoch-$acl_m_greytime} seconds.
    set acl_m_dontcare = ${lookup sqlite {GREYDB INSERT INTO resenders \
    (host, helo, time) \
    SELECT host, helo, time FROM greylist \
    WHERE id='${quote_sqlite:$acl_m_greyident}';}}

    The ACL here is similiar to the Simple Greylisting for Exim mentioned above. The 2nd warn clause saves a copy of mail header in the log file.  The number "900" in this ACL specifies the minumum time (in this case, 900 seconds = 15 minutes) between the first connection attempt and the retry considered effective, so bots which retry immediately after the failed delivery attempt will still be blocked by greylisting.

    From real world experience, I know that not every host which does retry is a healthy, well-functioning mail server. So the table "resenders" is only used to eliminate retried sessions from the final result, and is not for hosts to bypass greylisting.

    To satisfy Exim's routing requirement, a domain_aliases router referencing /usr/exim/domain_aliases is inserted right after the line "begin routers":

    domain_aliases:
    driver = redirect
    domains = /usr/exim/grey_domains
    data = ${lookup{$local_part@$domain}lsearch*@{/usr/exim/domain_aliases}}

  4. Set up crontab to collect expired entries.
    A scripte, /usr/exim/bin/purge_greylist.sh, is executed regularly from crontab, to move expired entries to the database table "expired." The script is as follows:

    NOW=`date +"%s"`
    THREE_DAYS_AGO=`expr $NOW - 259200`

    sqlite3 /var/spool/exim/db/greylist.db <<EOF
    .echo off
    .timeout 5000
    delete from expired where time < $THREE_DAYS_AGO;
    delete from greylist
    where exists (select * from resenders
    where greylist.host = resenders.host
    and greylist.helo = resenders.helo
    and greylist.time = resenders.time);
    insert into expired (id, time, host, helo)
    select id, time, host, helo from greylist
    where ($NOW - time) > 28800;
    delete from greylist where ($NOW - time) > 28800;
    vacuum;
    .quit
    EOF


    Entries expired more than 3 days ago will be removed from the table "expired". Retried entries are removed from the table "greylist." Entries without being retried for 28800 seconds (= 8 hours) are considered expired, which are very likely to be malware-infected computers, and are moved from table "greylist" to table "expired."

  5. Point the MXes of trap domains to your detection system.
    This is the last step. Change your DNS setting to make the host you are setting to be the MXes for the trap domains. Restart your Exim process. Sit back and relax, your system is now working.

The minimum time gap (900) and expiration time (28800) mentioned above are all changable. If you are going to summarize botnet's IP for the previous day, remember to give them enough time to expire. That is, you should not summarize the collected data until 28800 seconds (the expiration time) has passed since 00:00.

I set out to write down a step-by-step procedures, to make it easy for everyone to help detect botnets. But now I realize that writing documents, let alone good ones, is too hard for me. And for someone with no prior experience with Exim, the description might be impossible to understand.

This is certainly not the only way to detect botnets with greylisting. I just hope that sharing what I have done, can give you some ideas to develop your own version of botnet detectors.

No comments:

Post a Comment