USING SQL FOR LOOKUPS, LOG/REPORTING AND QUARANTINE =================================================== This text describes SQL specifics for a PostgreSQL database, and provides a schema. In most respects it also applies to an SQLite database. For general aspects of lookups, please see README.lookups. For general SQL notes and further examples please see README.sql. For MySQL-specific notes and schema please see README.sql-mysql. Upgrade note: field quarantine.mail_text should be of data type 'bytea' and not 'text' as suggested in earlier documentation; this is to prevent it from being unjustifiably associated with a character set, and to be able to store any byte value; to convert existing field from type 'text' to type 'bytea' the following clause may be used: ALTER TABLE quarantine ALTER mail_text TYPE bytea USING decode(replace(mail_text,'\\','\\\\'),'escape'); Starting with amavisd-new-2.6.0 the fields users.email, mailaddr.email, and maddr.email should preferably be declared as byte strings (bytea) with no associated character set (which is what these fields are, according to RFC 2821) instead of CHAR or VARCHAR. The following clauses convert pre-2.6.0 tables into the now preferred and more universal form: ALTER TABLE users ALTER email TYPE bytea USING decode(email,'escape'); ALTER TABLE mailaddr ALTER email TYPE bytea USING decode(email,'escape'); ALTER TABLE maddr ALTER email TYPE bytea USING decode(email,'escape'); If a data type 'bytea' is chosen for these three fields, the setting $sql_allow_8bit_address MUST be set to true to let the amavisd program use the appropriate data type in SQL commands: $sql_allow_8bit_address = 1; # maddr.email: VARCHAR (0), VARBINARY/BYTEA (1) otherwise PostgreSQL will complain with: 'types bytea and character varying cannot be matched' when amavisd tries to execute SQL commands. Starting with amavisd-new-2.7.0, three fields need to be added to table 'msgrcpt', and one to table 'msgs': ALTER TABLE msgrcpt ADD COLUMN rseqnum integer DEFAULT 0 NOT NULL; ALTER TABLE msgrcpt ADD COLUMN content char(1) DEFAULT ' ' NOT NULL; ALTER TABLE msgrcpt ADD COLUMN is_local char(1) DEFAULT ' ' NOT NULL; ALTER TABLE msgs ADD COLUMN originating char(1) DEFAULT ' ' NOT NULL; Table 'policy' received a couple of new optional fields with 2.7.0, and dropped one field. As all fields in this table are optional and any extra field is just ignored by amavisd, it is not necessary to update this table unless one really needs these new fields. The following should adjust a pre-2.7.0 schema: ALTER TABLE policy ADD COLUMN unchecked_lover char(1) default NULL; ALTER TABLE policy ADD COLUMN spam_tag3_level real default NULL; ALTER TABLE policy ADD COLUMN spam_subject_tag3 varchar(64) default NULL; ALTER TABLE policy ADD COLUMN disclaimer_options varchar(64) default NULL; ALTER TABLE policy ADD COLUMN forward_method varchar(64) default NULL; ALTER TABLE policy ADD COLUMN sa_userconf varchar(64) default NULL; ALTER TABLE policy ADD COLUMN sa_username varchar(64) default NULL; ALTER TABLE policy DROP COLUMN spam_modifies_subj; If you need to create a primary key on table msgrcpt for some reason (clustering perhaps?), try something like: UPDATE msgrcpt SET rseqnum=1+floor(999999999*random()) WHERE rseqnum=0; CREATE UNIQUE INDEX msgrcpt_idx_primary ON msgrcpt (partition_tag,mail_id,rseqnum); Also, fields mail_id and secret_id should be treated case-sensitively, so data types char or varchar (as suggested by versions before 2.7.0) should be avoided - a suitable type is bytea. Moreover, starting with version 2.7.0 the size of mail_id is configurable through a setting $mail_id_size_bits, so the previous varchar(12) may not suffice if the size is increased from its default of 72 bytes (12 characters). The following clauses convert the data type of affected fields from varchar or char to bytea: ALTER TABLE msgs ALTER mail_id TYPE bytea USING decode(mail_id,'escape'), ALTER secret_id DROP DEFAULT, ALTER secret_id TYPE bytea USING decode(secret_id,'escape'), ALTER secret_id SET DEFAULT ''; ALTER TABLE msgrcpt ALTER mail_id TYPE bytea USING decode(mail_id,'escape'); ALTER TABLE quarantine ALTER mail_id TYPE bytea USING decode(mail_id,'escape'); Version of Perl module DBD::Pg 1.48 or higher should be used; Short installation notes for PostgreSQL 8.2 are available at: http://www.postgresql.org/docs/8.2/interactive/install-short.html In short: run: 'initdb -D ...' as user postgres, then edit pg_hba.conf providing restricted access to database, create users and create databases. Something like the following may be placed into pg_hba.conf : # TYPE DATABASE USER CIDR-ADDRESS METHOD # # amavis lookups: local mail_prefs vscan md5 host mail_prefs vscan 127.0.0.1/32 md5 host mail_prefs vscan ::1/128 md5 # # amavis logging and pen pals: local mail_log vscan md5 host mail_log vscan 127.0.0.1/32 md5 host mail_log vscan ::1/128 md5 # # spamassassin Bayes and AWL databases: local mail_bayes vscan md5 host mail_bayes vscan 127.0.0.1/32 md5 host mail_bayes vscan ::1/128 md5 local mail_awl vscan md5 host mail_awl vscan 127.0.0.1/32 md5 host mail_awl vscan ::1/128 md5 Create an SQL username (role) for use by amavisd, e.g. vscan: $ createuser -U pgsql -S -D -R -P -e vscan Create databases for amavisd: $ createdb -U pgsql mail_prefs $ createdb -U pgsql mail_log and optionally databases for SpamAssassin: $ createdb -U pgsql mail_bayes $ createdb -U pgsql mail_awl The provided schema can be cut/pasted or fed directly into the client program to create a database. The '--' introduces comments according to SQL specs. Populate databases using the schema below: $ psql -U vscan mail_prefs <... $ psql -U vscan mail_log <... (for SpamAssassin database schema see its documentation: sql/README*) Something like the following can be placed into amavisd.conf (supplying correct passwords): @lookup_sql_dsn = ([ 'DBI:Pg:database=mail_prefs', 'vscan', 'LK40.gtklkKK' ]); @storage_sql_dsn = ([ 'DBI:Pg:database=mail_log', 'vscan', 'LK40.gtklkKK' ]); Equivalent settings for AWL and Bayes databases belong to a SpamAssassin's configuration file local.cf, according to SpamAssassin documentation. Amavisd and SpamAssassin need not use the same username or password, nor do they need to reside on the same SQL server. SQLite notes: - use INTEGER PRIMARY KEY AUTOINCREMENT instead of SERIAL; - SQLite is well suited for lookups database, but is not appropriate for @storage_sql_dsn due to coarse lock granularity; CREATE TABLE policy ( id serial PRIMARY KEY, -- 'id' is the _only_ required field policy_name varchar(32), -- not used by amavisd-new, a comment virus_lover char(1) default NULL, -- Y/N spam_lover char(1) default NULL, -- Y/N unchecked_lover char(1) default NULL, -- Y/N banned_files_lover char(1) default NULL, -- Y/N bad_header_lover char(1) default NULL, -- Y/N bypass_virus_checks char(1) default NULL, -- Y/N bypass_spam_checks char(1) default NULL, -- Y/N bypass_banned_checks char(1) default NULL, -- Y/N bypass_header_checks char(1) default NULL, -- Y/N virus_quarantine_to varchar(64) default NULL, spam_quarantine_to varchar(64) default NULL, banned_quarantine_to varchar(64) default NULL, unchecked_quarantine_to varchar(64) default NULL, bad_header_quarantine_to varchar(64) default NULL, clean_quarantine_to varchar(64) default NULL, archive_quarantine_to varchar(64) default NULL, spam_tag_level real default NULL, -- higher score inserts spam info headers spam_tag2_level real default NULL, -- inserts 'declared spam' header fields spam_tag3_level real default NULL, -- inserts 'blatant spam' header fields spam_kill_level real default NULL, -- higher score triggers evasive actions -- e.g. reject/drop, quarantine, ... -- (subject to final_spam_destiny setting) spam_dsn_cutoff_level real default NULL, spam_quarantine_cutoff_level real default NULL, addr_extension_virus varchar(64) default NULL, addr_extension_spam varchar(64) default NULL, addr_extension_banned varchar(64) default NULL, addr_extension_bad_header varchar(64) default NULL, warnvirusrecip char(1) default NULL, -- Y/N warnbannedrecip char(1) default NULL, -- Y/N warnbadhrecip char(1) default NULL, -- Y/N newvirus_admin varchar(64) default NULL, virus_admin varchar(64) default NULL, banned_admin varchar(64) default NULL, bad_header_admin varchar(64) default NULL, spam_admin varchar(64) default NULL, spam_subject_tag varchar(64) default NULL, spam_subject_tag2 varchar(64) default NULL, spam_subject_tag3 varchar(64) default NULL, message_size_limit integer default NULL, -- max size in bytes, 0 disable banned_rulenames varchar(64) default NULL, -- comma-separated list of ... -- names mapped through %banned_rules to actual banned_filename tables disclaimer_options varchar(64) default NULL, forward_method varchar(64) default NULL, sa_userconf varchar(64) default NULL, sa_username varchar(64) default NULL ); -- local users CREATE TABLE users ( id serial PRIMARY KEY, -- unique id priority integer NOT NULL DEFAULT 7, -- sort field, 0 is low prior. policy_id integer NOT NULL DEFAULT 1 CHECK (policy_id >= 0) REFERENCES policy(id), email bytea NOT NULL UNIQUE, -- email address, non-rfc2822-quoted fullname varchar(255) DEFAULT NULL -- not used by amavisd-new -- local char(1) -- Y/N (optional, see SQL section in README.lookups) ); -- any e-mail address (non- rfc2822-quoted), external or local, -- used as senders in wblist CREATE TABLE mailaddr ( id serial PRIMARY KEY, priority integer NOT NULL DEFAULT 9, -- 0 is low priority email bytea NOT NULL UNIQUE ); -- per-recipient whitelist and/or blacklist, -- puts sender and recipient in relation wb (white or blacklisted sender) CREATE TABLE wblist ( rid integer NOT NULL CHECK (rid >= 0) REFERENCES users(id), sid integer NOT NULL CHECK (sid >= 0) REFERENCES mailaddr(id), wb varchar(10) NOT NULL, -- W or Y / B or N / space=neutral / score PRIMARY KEY (rid,sid) ); -- grant usage rights: GRANT select ON policy TO amavis; GRANT select ON users TO amavis; GRANT select ON mailaddr TO amavis; GRANT select ON wblist TO amavis; -- R/W part of the dataset (optional) -- May reside in the same or in a separate database as lookups database; -- REQUIRES SUPPORT FOR TRANSACTIONS; specified in @storage_sql_dsn -- -- Please create additional indexes on keys when needed, or drop suggested -- ones as appropriate to optimize queries needed by a management application. -- See your database documentation for further optimization hints. -- provide unique id for each e-mail address, avoids storing copies CREATE TABLE maddr ( id serial PRIMARY KEY, partition_tag integer DEFAULT 0, -- see $partition_tag email bytea NOT NULL, -- full e-mail address domain varchar(255) NOT NULL, -- only domain part of the email address -- with subdomain fields in reverse CONSTRAINT part_email UNIQUE (partition_tag,email) ); -- information pertaining to each processed message as a whole; -- NOTE: records with a NULL msgs.content should be ignored by utilities, -- as such records correspond to messages just being processed, or were lost CREATE TABLE msgs ( partition_tag integer DEFAULT 0, -- see $partition_tag mail_id bytea NOT NULL, -- long-term unique mail id, dflt 12 ch secret_id bytea DEFAULT '', -- authorizes release of mail_id, 12 ch am_id varchar(20) NOT NULL, -- id used in the log time_num integer NOT NULL CHECK (time_num >= 0), -- rx_time: seconds since Unix epoch time_iso timestamp WITH TIME ZONE NOT NULL,-- rx_time: ISO8601 UTC ascii time sid integer NOT NULL CHECK (sid >= 0), -- sender: maddr.id policy varchar(255) DEFAULT '', -- policy bank path (like macro %p) client_addr varchar(255) DEFAULT '', -- SMTP client IP address (IPv4 or v6) size integer NOT NULL CHECK (size >= 0), -- message size in bytes originating char(1) DEFAULT ' ' NOT NULL, -- sender from inside or auth'd content char(1), -- content type: V/B/U/S/Y/M/H/O/T/C -- virus/banned/unchecked/spam(kill)/spammy(tag2)/ -- /bad-mime/bad-header/oversized/mta-err/clean -- is NULL on partially processed mail -- (prior to 2.7.0 the CC_SPAMMY was logged as 's', now 'Y' is used; --- to avoid a need for case-insenstivity in queries) quar_type char(1), -- quarantined as: ' '/F/Z/B/Q/M/L -- none/file/zipfile/bsmtp/sql/ -- /mailbox(smtp)/mailbox(lmtp) quar_loc varchar(255) DEFAULT '', -- quarantine location (e.g. file) dsn_sent char(1), -- was DSN sent? Y/N/q (q=quenched) spam_level real, -- SA spam level (no boosts) message_id varchar(255) DEFAULT '', -- mail Message-ID header field from_addr varchar(255) DEFAULT '', -- mail From header field, UTF8 subject varchar(255) DEFAULT '', -- mail Subject header field, UTF8 host varchar(255) NOT NULL, -- hostname where amavisd is running CONSTRAINT msgs_partition_mail UNIQUE (partition_tag,mail_id), PRIMARY KEY (partition_tag,mail_id) --FOREIGN KEY (sid) REFERENCES maddr(id) ON DELETE RESTRICT ); CREATE INDEX msgs_idx_sid ON msgs (sid); CREATE INDEX msgs_idx_mess_id ON msgs (message_id); -- useful with pen pals CREATE INDEX msgs_idx_time_iso ON msgs (time_iso); CREATE INDEX msgs_idx_time_num ON msgs (time_num); -- optional -- per-recipient information related to each processed message; -- NOTE: records in msgrcpt without corresponding msgs.mail_id record are -- orphaned and should be ignored and eventually deleted by external utilities CREATE TABLE msgrcpt ( partition_tag integer DEFAULT 0, -- see $partition_tag mail_id bytea NOT NULL, -- (must allow duplicates) rseqnum integer DEFAULT 0 NOT NULL, -- recip's enumeration within msg rid integer NOT NULL, -- recipient: maddr.id (duplicates allowed) is_local char(1) DEFAULT ' ' NOT NULL, -- recip is: Y=local, N=foreign content char(1) DEFAULT ' ' NOT NULL, -- content type V/B/U/S/Y/M/H/O/T/C ds char(1) NOT NULL, -- delivery status: P/R/B/D/T -- pass/reject/bounce/discard/tempfail rs char(1) NOT NULL, -- release status: initialized to ' ' bl char(1) DEFAULT ' ', -- sender blacklisted by this recip wl char(1) DEFAULT ' ', -- sender whitelisted by this recip bspam_level real, -- per-recipient (total) spam level smtp_resp varchar(255) DEFAULT '', -- SMTP response given to MTA CONSTRAINT msgrcpt_partition_mail_rseq UNIQUE (partition_tag,mail_id,rseqnum), PRIMARY KEY (partition_tag,mail_id,rseqnum) --FOREIGN KEY (rid) REFERENCES maddr(id) ON DELETE RESTRICT, --FOREIGN KEY (mail_id) REFERENCES msgs(mail_id) ON DELETE CASCADE ); CREATE INDEX msgrcpt_idx_mail_id ON msgrcpt (mail_id); CREATE INDEX msgrcpt_idx_rid ON msgrcpt (rid); -- mail quarantine in SQL, enabled by $*_quarantine_method='sql:' -- NOTE: records in quarantine without corresponding msgs.mail_id record are -- orphaned and should be ignored and eventually deleted by external utilities CREATE TABLE quarantine ( partition_tag integer DEFAULT 0, -- see $partition_tag mail_id bytea NOT NULL, -- long-term unique mail id chunk_ind integer NOT NULL CHECK (chunk_ind >= 0), -- chunk number, 1.. mail_text bytea NOT NULL, -- store mail as chunks of octects PRIMARY KEY (partition_tag,mail_id,chunk_ind) --FOREIGN KEY (mail_id) REFERENCES msgs(mail_id) ON DELETE CASCADE ); -- field msgrcpt.rs is primarily intended for use by quarantine management -- software; the value assigned by amavisd is a space; -- a short _preliminary_ list of possible values: -- 'V' => viewed (marked as read) -- 'R' => released (delivered) to this recipient -- 'p' => pending (a status given to messages when the admin received the -- request but not yet released; targeted to banned parts) -- 'D' => marked for deletion; a cleanup script may delete it -- grant usage rights: GRANT select,insert,update,delete ON maddr TO amavis; GRANT usage,update ON maddr_id_seq TO amavis; GRANT select,insert,update,delete ON msgs TO amavis; GRANT select,insert,update,delete ON msgrcpt TO amavis; GRANT select,insert,update,delete ON quarantine TO amavis; Some examples of a query: -- mail from last two minutes: SELECT now()-time_iso AS age, SUBSTRING(policy,1,2) as pb, msgrcpt.content AS c, dsn_sent as dsn, ds, bspam_level AS level, size, SUBSTRING(convert_from(sender.email,'UTF8'),1,18) AS s, SUBSTRING(recip.email,1,18) AS r, SUBSTRING(msgs.subject,1,10) AS subj FROM msgs LEFT JOIN msgrcpt ON msgs.mail_id=msgrcpt.mail_id LEFT JOIN maddr AS sender ON msgs.sid=sender.id LEFT JOIN maddr AS recip ON msgrcpt.rid=recip.id WHERE msgrcpt.content IS NOT NULL AND now() - time_iso < INTERVAL '2 minutes' ORDER BY msgs.time_num DESC; -- clean messages ordered by count, grouped by domain: SELECT count(*) as cnt, avg(bspam_level), sender.domain FROM msgs LEFT JOIN msgrcpt ON msgs.mail_id=msgrcpt.mail_id LEFT JOIN maddr AS sender ON msgs.sid=sender.id LEFT JOIN maddr AS recip ON msgrcpt.rid=recip.id WHERE msgrcpt.content='C' GROUP BY sender.domain ORDER BY cnt DESC LIMIT 50; -- top spamy domains with >10 messages, sorted by spam average, -- grouped by domain: SELECT count(*) as cnt, avg(bspam_level) as spam_avg, sender.domain FROM msgs LEFT JOIN msgrcpt ON msgs.mail_id=msgrcpt.mail_id LEFT JOIN maddr AS sender ON msgs.sid=sender.id LEFT JOIN maddr AS recip ON msgrcpt.rid=recip.id WHERE bspam_level IS NOT NULL GROUP BY sender.domain HAVING count(*) > 10 ORDER BY spam_avg DESC LIMIT 50; -- sender domains with >100 messages, sorted on sender.domain: SELECT count(*) as cnt, avg(bspam_level) as spam_avg, sender.domain FROM msgs LEFT JOIN msgrcpt ON msgs.mail_id=msgrcpt.mail_id LEFT JOIN maddr AS sender ON msgs.sid=sender.id LEFT JOIN maddr AS recip ON msgrcpt.rid=recip.id GROUP BY sender.domain HAVING count(*) > 100 ORDER BY sender.domain DESC LIMIT 100; EXAMPLE of a log/report/quarantine database housekeeping ======================================================== Using a changing partition_tag, perhaps by using an ISO 8601 week number (value 1 to 53) as a partition_tag: $partition_tag = sub { my($msginfo)=@_; sprintf("%02d",iso8601_week($msginfo->rx_time)) }; allows for probably the fastest method of purging old records, e.g.: DELETE FROM msgs WHERE partition_tag >= 13 AND partition_tag <= 23; DELETE FROM msgrcpt WHERE partition_tag >= 13 AND partition_tag <= 23; DELETE FROM quarantine WHERE partition_tag >= 13 AND partition_tag <= 23; DELETE FROM maddr WHERE partition_tag >= 13 AND partition_tag <= 23; Alternatively, purge records from table msgs by their creation time: DELETE FROM msgs WHERE time_iso < now() - INTERVAL '3 weeks'; DELETE FROM msgs WHERE time_iso < now() - INTERVAL '1 h' AND content IS NULL; Optionally certain content types may be given shorter lifetime: DELETE FROM msgs WHERE time_iso < now() - INTERVAL '1 week' AND (content='V' OR (content='S' AND spam_level > 20)); and then delete unreferenced records from tables msgrcpt, quarantine, and maddr: DELETE FROM msgrcpt WHERE mail_id IN (SELECT mail_id FROM msgrcpt LEFT JOIN msgs USING(mail_id) WHERE msgs.mail_id IS NULL); DELETE FROM quarantine WHERE mail_id IN (SELECT mail_id FROM quarantine LEFT JOIN msgs USING(mail_id) WHERE msgs.mail_id IS NULL); DELETE FROM maddr WHERE NOT EXISTS (SELECT 1 FROM msgs WHERE sid=id) AND NOT EXISTS (SELECT 1 FROM msgrcpt WHERE rid=id); On more recent testings, the following 'DELETE FROM maddr' seems to be faster from the one above by a factor of 1.5 to 2, and is functionally equivalent: DELETE FROM maddr WHERE id IN ( SELECT id FROM maddr LEFT JOIN ( SELECT sid AS id, 1 AS f FROM msgs UNION ALL SELECT rid AS id, 1 AS f FROM msgrcpt ) AS u USING(id) WHERE u.f IS NULL); Check also a thread 'Faster purging of SQL logging database' (2007-06) on the amavis-user mailing list, archived at: http://marc.info/?t=118190428300003 The third option for purging old records is to use: FOREIGN KEY ... ON DELETE CASCADE on tables msgrcpt and quarantine, in which case these records will be deleted automatically when a corresponding record in table msgs is deleted. This seems to be the slowest method.