Based on Debian 11 "Bullseye" environment.
A Bayesian filter is a must for spam filtering. It can learn how to classify spam emails.
There are famous filters such as SpamAssassin or bogofilter, but they can't be used for Japanese (and Chinese, probably) due to tokenizer issues.
"bsfilter" is minor and old but it's from Japan and works fine with Japanese spam emails.
The least requirement is only bsfilter itself. It uses a bigram algorithm for tokenization.
# apt install bsfilter
Make /etc/bsfilter.conf to share with all virtual users.
pipe insert-flag insert-probability auto-update db gdbm
Plugins are required to call external programs from sieve scripts.
In this case, bsfiter works as an external filter to add headers.
Additionally, always call bsfilter before per-user sieve script to add spam probability headers.
In /etc/dovecot/conf.d/90-sieve.conf, add filter extensions, and enable extprograms plugin. This filtering process will be run in global script only, so extensions are enabled global only. (Each user can't use this extension.)
plugin { * snip * # Location Sieve of scripts that need to be executed before the user's # personal script. If a 'file' location path points to a directory, all the # Sieve scripts contained therein (with the proper `.sieve' extension) are # executed. The order of execution within that directory is determined by the # file names, using a normal 8bit per-character comparison. # # Multiple script locations can be specified by appending an increasing number # to the setting name. The Sieve scripts found from these locations are added # to the script execution sequence in the specified order. Reading the # numbered sieve_before settings stops at the first missing setting, so no # numbers may be skipped. sieve_before = /var/lib/dovecot/sieve.d/ # Uncomment this line #sieve_before2 = ldap:/etc/sieve-ldap.conf;name=ldap-domain #sieve_before3 = (etc...) * snip * # Which Sieve language extensions are ONLY available in global scripts. This # can be used to restrict the use of certain Sieve extensions to administrator # control, for instance when these extensions can cause security concerns. # This setting has higher precedence than the `sieve_extensions' setting # (above), meaning that the extensions enabled with this setting are never # available to the user's personal script no matter what is specified for the # `sieve_extensions' setting. The syntax of this setting is similar to the # `sieve_extensions' setting, with the difference that extensions are # enabled or disabled for exclusive use in global scripts. Currently, no # extensions are marked as such by default. sieve_global_extensions = +vnd.dovecot.filter # The Pigeonhole Sieve interpreter can have plugins of its own. Using this # setting, the used plugins can be specified. Check the Dovecot wiki # (wiki2.dovecot.org) or the pigeonhole website # (http://pigeonhole.dovecot.org) for available plugins. # The sieve_extprograms plugin is included in this release. sieve_plugins = sieve_extprograms * snip * }
In /etc/dovecot/conf.d/90-sieve-extprograms.conf, add location information about filters.
plugin { # The directory where the program sockets are located for the # vnd.dovecot.pipe, vnd.dovecot.filter and vnd.dovecot.execute extension # respectively. The name of each unix socket contained in that directory # directly maps to a program-name referenced from the Sieve script. #sieve_pipe_socket_dir = sieve-pipe #sieve_filter_socket_dir = sieve-filter #sieve_execute_socket_dir = sieve-execute # The directory where the scripts are located for direct execution by the # vnd.dovecot.pipe, vnd.dovecot.filter and vnd.dovecot.execute extension # respectively. The name of each script contained in that directory # directly maps to a program-name referenced from the Sieve script. #sieve_pipe_bin_dir = /usr/lib/dovecot/sieve-pipe sieve_filter_bin_dir = /usr/lib/dovecot/sieve-filter # Uncomment this line #sieve_execute_bin_dir = /usr/lib/dovecot/sieve-execute }
Reload Dovecot to enable new configurations.
# systemctl reload dovecot
Make a shell script to execute bsfilter with some parameters in /usr/lib/dovecot/sieve-filter/10-bsfilter.sh.
The directory doesn't exist so make it when you make the first script.
bsfilter --config-file /etc/bsfilter.conf
Add execute permission to this script.
# chmod +x /usr/lib/dovecot/sieve-filter/10-bsfilter.sh
As specified for "sieve_before", Dovecot Sieve need a sieve script in the directory /var/lib/dovecot/sieve.d/10-bsfilter.sieve.
require ["fileinto", "mailbox", "vnd.dovecot.filter"]; filter "10-bsfilter.sh"; if header :contains ["X-Spam-Flag"] "Yes" { fileinto :create "Junk"; }
After filtering with bsfilter, check the spam header and send it to Junk directory.
":create" is a mailbox plugin functionality to create the directory if it doesn't exist.
The global scripts have to be pre-compiled by hand.
# sievec 10-bsfilter.sieve
Send an email to any user account, and you can find the newly added headers about spam probability.
X-Spam-Flag: No X-Spam-Probability: 0.000000
All emails will be classified clean because bsfilter hasn't done any spam learning.
If you have any troubles, change the "mail_debug" to yes in /etc/dovecot/conf.d/10-logging.conf to see what exactly is going on.
I use (Dovecot original) sdbox format for the mailbox, but bsfilter is not compatible with this format.
When learning spam/clean emails, some tweaks are required.
Prerequisites
Steps
In my case, I made "Junk" folder at the same level as INBOX (top-level). "not_clean" and "not_spam" folders under "Junk" folder.
Junk: Mails with "X-Spam-Flag: Yes" + not_clean: Put spam mails with "X-Spam-Flag: No" (Put spam emails that went through the filter) + not_spam: Put clean mails with "X-Spam-Flag: Yes" (Put false positive emails)
Make these folders with your MUA.
Because vmail user can't access the Dovecot stats socket, you should see the following error.
doveadm(vmail): Error: net_connect_unix(/var/run/dovecot/stats-writer) failed: Permission denied
It seems this error doesn't stop the main process, so just ignoring is one way. Or add vmail to the dovecot group to eliminate this error.
# adduser vmail dovecot
This learning script will be done by the user vmail, but vmail will fail when accessing the SSL certificate according to the /etc/dovecot/conf.d/10-ssl.conf.
Tweak SSL configuration to enable SSL only when dovecot (and dovadm) is executed by root.
(This howto was written here.)
# cd /etc/dovecot/conf.d # cp 10-ssl.conf 10-ssl.conf.ext # chmod 600 10-ssl.conf.ext
Change 10-ssl.conf as shown below.
# SSL/TLS support: yes, no, required. <doc/wiki/SSL.txt> ssl = no !include_try 10-ssl.conf.ext
It doesn't affect the normal dovecot process, but try restarting it to make sure.
# systemctl restart dovecot
Store this script in /var/lib/dovecot/scripts/bsfilter_learn.sh.
Add a cronjob /etc/cron.d/bsfilter_learn
SHELL=/bin/bash PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin # Example of job definition: # .---------------- minute (0 - 59) # | .------------- hour (0 - 23) # | | .---------- day of month (1 - 31) # | | | .------- month (1 - 12) OR jan,feb,mar,apr ... # | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat # | | | | | # * * * * * user-name command to be executed */30 * * * * vmail bash /var/lib/dovecot/scripts/bsfilter_learn.sh > /dev/null
bsfilter will output the content of learned mails to STDOUT. Cron will deliver the STDOUT of cronjobs by email. As a result, you'll get all spam and clean mails whenever any user does learning. This is just annoying, so the STDOUT is thrown away to /dev/null.
If the learning process has any errors, that should be noticed by the mail to vmail@example.jp.
2021-09-20
2021-09-26