Re: Revised stats gathering approach from Alessandro Vesely on 2010-08-14 (OpenDKIM Developers mailing list)

From: Alessandro Vesely <vesely_at_tana.it>
Date: Sat, 14 Aug 2010 11:28:05 +0200

Murray S. Kucherawy wrote:
> Reviewing the work of collecting DKIM data with some DB people has me at
> a new design that is a pretty substantial departure from what’s there
> now. Here’s a review of what I’ve got. Let me know if you have any
> additional input before I start coding.
>
> At a higher level, what we know now as opendkim-stats will go away.
> Logging will be done to a flat file now instead of a btree/hash table,
> so a tool to read it will not be needed. What we have as
> opendkim-importstats will become a C program that uses libodbx to insert
> the preprocessed log file into SQL, which can be done locally or over
> the network (to our main server where aggregation happens).

I think having flat files may improve exchanging data. In
particular, users will be able to contribute stats even if they
don't enable BDB.

I don't know about opendkim-importstats. I assume it's optional.

> There will be a log entry in the file for every message, and for each
> message a log entry for each signature (0…n of them) on the message, and
> for each signature a log entry for each signed header other than From
> (0…m of them). The message log entry will log message-specific
> properties (e.g. list vs. not list), the signature log entry will
> accumulate data about signatures (e.g. which optional tags were used,
> pass/fail stats, first vs. third-party, etc.), and the header log entry
> will count which header fields got signed. On the SQL side there will
> be a table for each of these, matched up by a unique “id” field in
> each. There will also be a domain->ID lookup table for data compression.

That is quite some structure for a flat file. For users of the
library, I think any decent MTA has a Message-ID, possibly different
from the 5322.Message-ID, that can work as a key for all lines
originating from that message. Either a sequential number or
header.b may be used to identify signatures. However, do you need a
line for each signature/header? Why not dumping the complete
signature, unwrapped into a single line and with truncated b= and
bh= tags to save space? In this case, you'd end up with one logged
line for each header field, including the added A-R, each prefixed
by the message-ID, and truncated or redacted as needed. Would it be
easier?

JM2C
Received on Sat Aug 14 2010 - 09:28:16 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:32:53 PST