Revised stats gathering approach

From: Murray S. Kucherawy <msk_at_cloudmark.com>
Date: Mon, 9 Aug 2010 15:55:46 -0700

Reviewing the work of collecting DKIM data with some DB people has me at a new design that is a pretty substantial departure from what's there now. Here's a review of what I've got. Let me know if you have any additional input before I start coding.

At a higher level, what we know now as opendkim-stats will go away. Logging will be done to a flat file now instead of a btree/hash table, so a tool to read it will not be needed. What we have as opendkim-importstats will become a C program that uses libodbx to insert the preprocessed log file into SQL, which can be done locally or over the network (to our main server where aggregation happens).

There will be a log entry in the file for every message, and for each message a log entry for each signature (0...n of them) on the message, and for each signature a log entry for each signed header other than From (0...m of them). The message log entry will log message-specific properties (e.g. list vs. not list), the signature log entry will accumulate data about signatures (e.g. which optional tags were used, pass/fail stats, first vs. third-party, etc.), and the header log entry will count which header fields got signed. On the SQL side there will be a table for each of these, matched up by a unique "id" field in each. There will also be a domain->ID lookup table for data compression.

This allows a bunch of things we can't currently do; our current schema actually has limitations that get some details wrong and make a few specific correlation reports impossible.

I'll circulate actual schemas when I have them generated, but that's the general approach I'm going to be taking. I'm still hoping I can get this coded and unit-tested prior to starting Betas at the beginning of next month.

Comments welcome.
Received on Mon Aug 09 2010 - 22:55:57 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:32:53 PST