Slashing stats

From: Murray S. Kucherawy <msk_at_cloudmark.com>
Date: Thu, 1 Sep 2011 15:37:18 -0700

One of the primary drivers behind the statistics project was collection of DKIM usage data which was necessary to produce the DKIM Implementation Report for the IETF. Thanks to all of you that are feeding us data, that has now been published:

http://www.ietf.org/iesg/implementation/report-rfc4871.txt

With that out of the way, I'd like to re-task this work to continue support of the idea of an experimental open domain reputation system. I've been doing some work in this area that will start to appear this fall. In doing so, it's obvious that only a subset of the data the statistics project currently collects will still be needed, i.e., only those parts that observe the behavior of a signing domain with respect to what mail it signs (versus what parts of DKIM it uses). As a result, in 2.5.0, big chunks of the statistics reporting and collection system will be removed.

The question I have for those of you that have read this far has to do with data you think might be useful to develop reputation about a DKIM signing domain. It's obvious we need to store information about signatures and messages, but specifically what information is critical (and what can be purged) is the question.

So far it's clear to me that we don't need to continue to track which header fields were signed and which ones changed between signing and verifying in order to compute a signer's reputation, so all of that will be dropped. Also, since the revised DKIM RFC is dropping support of the "g=" key tag, those columns in the signatures table are being removed.

I'm also considering dropping these columns from the messages table:

adsp_found int unsigned not null,
adsp_unknown int unsigned not null,
adsp_all int unsigned not null,
adsp_discardable int unsigned not null,
adsp_fail int unsigned not null,
mailing_list int unsigned not null,
received_count int unsigned not null,
content_type varchar(64),

...and from the signatures table:

ignored tinyint unsigned not null,
algorithm tinyint unsigned not null,
hdr_canon tinyint unsigned not null,
body_canon tinyint unsigned not null,
keysize smallint not null,
key_s tinyint not null,
sig_i tinyint not null,
sig_i_user tinyint not null,
sig_z tinyint unsigned not null,

If anyone can think of how any of these might be useful inputs to a reputation system, I'd love to hear your ideas. If not, I'll aim to remove them as well.

-MSK
Received on Thu Sep 01 2011 - 22:37:26 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:33:12 PST