Based on Debian 10 "Buster" environment.

bsfilter

For spam filtering, Bayesian filter is a must to use. It can learn and update how to distinguish spam mails.
There are famous filters such as SpamAssassin or bogofilter, but I choose bsfilter here. Since bsfilter is developed in Japan that it can handle Japanese mails well by default. SpamAssassin can handle Japanese mails too but it needs some plugins to upgrade the Bayesian tokenizer.

Install

# apt install bsfilter

Global configuration

Make /etc/bsfilter.conf to share with all virtual users.

pipe
insert-flag
insert-probability
auto-update
db gdbm
  • db gdbm is for too long tokens generated by accident. Such kind of tokens can break the sdbm database and prevent learning anymore.

Integration with Dovecot Sieve

Enable Dovecot Sieve plugins

Plugins are required to call external programs from the sieve scripts.
In this case, bsfiter works as an external filter that adds its original header to the mail according to the spam probability.
Additionally, always call bsfilter before per-user sieve script process to add the spam header at first.

In /etc/dovecot/conf.d/90-sieve.conf, add filter extensions and enable extprograms plugin. This filtering process will be run in global script only, so extensions is enabled in global only. (Each user can't use this extension.)

plugin {
  * snip *
  sieve_before = /usr/lib/dovecot/sieve.d/  # I chaned the directory from /var to /usr. Where to put depends on you.
  * snip *
  sieve_global_extensions = +vnd.dovecot.filter
  * snip *
  sieve_plugins = sieve_extprograms
  * snip *
}

In /etc/dovecot/conf.d/90-sieve-extprograms.conf, add location information about filters.

plugin {
  * snip *
  #sieve_pipe_bin_dir = /usr/lib/dovecot/sieve-pipe
  sieve_filter_bin_dir = /usr/lib/dovecot/sieve-filter
  #sieve_execute_bin_dir = /usr/lib/dovecot/sieve-execute
}

Shell script to execute bsfilter

Make a shell script to execute bsfilter with some parameters in /usr/lib/dovecot/sieve-filter/10-bsfilter.sh. The directory doesn't exist so make it when you make the first script.
Just a one-liner.

bsfilter --config-file /etc/bsfilter.conf

Sieve script to execute bsfilter

As specified for "sieve_before", Dovecot Sieve need a sieve script in the directory /usr/lib/dovecot/sieve.d. This will simply call the shell script above.

require ["fileinto", "mailbox", "vnd.dovecot.filter"];

filter "10-bsfilter.sh";

if header :contains ["X-Spam-Flag"] "Yes" {
  fileinto :create "Junk";
}

After filtering with bsfilter, check the spam header and send it to Junk directory. The ":create" is a mailbox plugin functionality to create the directory if it doesn't exist.

The global scripts have to be pre-compiled by hand.

# sievec 10-bsfilter.sieve
  • It works without pre-compiled scripts. Dovecot compiles if required, and try to save compiled binary to fail because of permission.
  • ManageSieve will handle this pre-compile process automatically.

Now restart dovecot to apply all the configurations above.

# systemctl restart dovecot

Test

Send a mail to any user account, and you can find the newly added headers about spam probability.

X-Spam-Flag: No
X-Spam-Probability: 0.000000

Since bsfilter hasn't done any spam learning, all mails will be clean.
If you have any troubles, change the "mail_debug" to yes in /etc/dovecot/conf.d/10-logging.conf to see what exactly is going on.


Spam Learning

Handling sdbox

Now bsfilter has to learn which mail is spam.
In my case, prepare 3 directories related to spam. Junk as INBOX level, not_clean and not_spam under it.

Junk: Mails with "X-Spam-Flag: Yes"
 + not_clean: Put spam mails with "X-Spam-Flag: No"
 + not_spam:  Put clean mails with "X-Spam-Flag: Yes" (Pick up false positive case from spam directory)

From not_clean and not_spam directory, bsfilter can pick up mails there and update the database for the next filtering.
The issue is, bsfilter can read the Maildir files but not Dovecot sdbox files. To get the Maildir equivalent data, doveadm command will help.

The process is,

  1. Pick up all virtual users from userdb
  2. Use doveadm to search for the mails in the "Junk/not_clean" and "Junk/not_spam" directory
  3. Use doveadm to export header + body and cut the first line (first line is doveadm's output string "text:") one mail by one
  4. Let bsfitler learn from the exported plain mail files
  5. Update bsfilter probability DB if there are mails to learn

Store this script in /var/lib/dovecot/scripts/bsfilter_learn.sh and run from cron as user vmail.

# Check userdb
for VERTUSER in `cat /etc/dovecot/users | awk -F'[:]' '{print $1}'`
do
        # Set homedir
        N=`echo $VERTUSER | awk -F'[@]' '{print $1}'`
        D=`echo $VERTUSER | awk -F'[@]' '{print $2}'`
        HOMEDIR=/home/vmail/$D/$N

        # Check if directories exist
        if [ ! -d $HOMEDIR/dbox/mailboxes/Junk/not_clean ]; then
                continue
        fi
        if [ ! -d $HOMEDIR/dbox/mailboxes/Junk/not_spam ]; then
                continue
        fi

        # Cleanup
        if [ -d $HOMEDIR/bsfilter ]; then
                rm -r $HOMEDIR/bsfilter
        fi

        # update flag
        UPDATE=0

        for TYPE in "not_clean" "not_spam"
        do
                # Set target directory
                MAILDIR=$HOMEDIR/bsfilter/$TYPE
                mkdir -p $MAILDIR

                # Find if mail to learn exists
                for DOVEUID in `doveadm search -u $VERTUSER MAILBOX "Junk/$TYPE" ALL | awk '{print $2}'`
                do
                        # Export mails to Maildir equivalent text files
                        doveadm fetch -u $VERTUSER text MAILBOX "Junk/$TYPE" UID $DOVEUID | tail -n +2 > $MAILDIR/$DOVEUID
                done

                # Delete mails to learn
                doveadm expunge -u $VERTUSER MAILBOX "Junk/$TYPE" ALL

                # Check if there are any mails to learn
                if [ -n "$(ls $MAILDIR)" ]; then
                        UPDATE=1
                        if [ $TYPE = "not_clean" ]; then
                                # Update spam database
                                bsfilter --homedir $HOMEDIR/.bsfilter --config-file /etc/bsfilter.conf --sub-clean --add-spam $MAILDIR/*
                        else
                                # Correct spam database (false positive)
                                bsfilter --homedir $HOMEDIR/.bsfilter --config-file /etc/bsfilter.conf --add-clean --sub-spam $MAILDIR/*
                        fi
                fi
        done

        if [ $UPDATE -eq 1 ]; then
                bsfilter --homedir $HOMEDIR/.bsfilter --config-file /etc/bsfilter.conf --update
        fi
done

Cronjob

Since vmail can't log in, /etc/cron.d/bsfilter_learn can be used.

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

*/30 * * * * vmail sh /var/lib/dovecot/scripts/bsfilter_learn.sh > /dev/null

bsfilter will output the content of learned mails to STDOUT. Cron will deliver the STDOUT of cronjobs by mail. In the result, you'll get all mails of spams and hams (clean mails) whenever any user does learning. That's just annoying and not good for privacy, the STDOUT is thrown away to /dev/null.
If the learning process has any errors, that should be noticed by the mail to vmail@example.jp.

Permission Error

Since vmail user can't access the Dovecot stats socket, you should see the following error.

doveadm(vmail): Error: net_connect_unix(/var/run/dovecot/stats-writer) failed: Permission denied

It seems this error doesn't stop the main process, so just ignoring is one way. Just adding vmail user to the dovecot group will eliminate this error. (Adding some configuration to change the socket mode to 666 is another way.)

# adduser vmail dovecot