Initial stats results

From: Murray S. Kucherawy <msk_at_blackops.org>
Date: Sun, 1 Aug 2010 20:16:20 -0700 (PDT)

With three sites reporting regularly (and huge thanks to those that are!),
I'm able to extract some information. The latest report is available at
http://www.opendkim.org/stats/report.html and is currently updated on the
hour and the half hour.

It's obvious from this that there's useful data coming in, but also that
our schema needs a little improvement. I've opened a bug report to track
the required changes and plan to introduce them in time for v2.2.0 to
ship. This will mean a small schema change at the server end and some
changes to how opendkim records data and how it reports data. I'll make
sure to prepare users for the migration in advance.

This work is evolving daily, so feel free to ask questions or make
suggestions and don't be surprised to see it's got different stuff in it
from one run to the next. Once it's stable I'll make its presence known
to the DKIM working group.

Some notes about what you're seeing:

Signature algorithm use
- 0 is rsa-sha1, 1 is rsa-sha256
- this is across all signatures seen; might be interesting to count
individual signing domains instead of signatures

Header/Body canonicalization use:
- 0 is simple, 1 is relaxed
- this is across all signatures seen; might be interesting to count
individual signing domains instead of signatures

Various key record statistics:
- indicates whether t= and g= are in use
- yes, that g= stat appears to be right
- we could record "g=" vs. "g=*" vs. no "g="
- we could try to detect DK back-compatibility

Overall pass/fail rates:
- gmail through lists (which breaks their signatures) appears to be hugely
skewing these numbers from one of our reporting sources
- it is currently difficult if not impossible to exclude data from a
particular reporting source; this will be improved with the next release
- MLM traffic is determined by detecting "List-Id:", "List-Post:",
"List-Unsubscribe:", "Precedence: list" or "Mailing-List:" header fields;
this is obviously not exhaustive but it is a decent first approximation
- "failed(body)" indicates the number of times the "bh=" value did not
match the received body

Count of unique signing domains:
- since we allow reporters to anonymize their data by hashing it, this
could actually be as much as double the true value; hashing allows privacy
while still allowing data about a particular domain to be aggregated

MLM/non-MLM signature comparison:
- very interesting
- no distinction between a list signature and an author signature (yet)
other than that list signatures will almost always look like a third-party
signature

ADSP policies found and failures:
- a "pass" here includes any message for which ADSP would result in a
"pass", including (a) a query that returns an explicit policy for a
message with a valid author domain signature AND (b) a query that returns
no policy record but there was a valid author domain signature, implying
"unknown"; a schema update will allow distinction between these as well as
reporting of ADSP syntax errors

Third-party signatures:
- not sure what sort of useful report I could pull from this information
yet, other than to observe that the percentage of signatures that are
"third-party" (where the From: domain doesn't match the d= domain) is
larger than I'd have expected
Received on Mon Aug 02 2010 - 03:16:39 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:32:53 PST