RE: Domain reputation from Murray S. Kucherawy on 2011-06-10 (OpenDKIM Users mailing list)

From: Murray S. Kucherawy <msk_at_cloudmark.com>
Date: Fri, 10 Jun 2011 10:30:05 -0700

> -----Original Message-----
> From: opendkim-users-bounce_at_lists.opendkim.org [mailto:opendkim-users-bounce_at_lists.opendkim.org] On Behalf Of Alessandro Vesely
> Sent: Friday, June 10, 2011 4:13 AM
> To: opendkim-users_at_lists.opendkim.org
> Subject: Re: Domain reputation
>
> I agree a single server's view is very narrow, but it is easily
> available now, and non-secret. I'd consider it a staging post on the
> way there.

It's certainly a viable proof-of-concept. It depends on what assertions the reputation service will make. Or perhaps more clearly, the consumers of the data need to understand upon what the assertions are based.

> To expand on this, let's name the three levels IP, SMTP, and content.
> Roughly, we have:
>
> *IP* may depend on local BLs (e.g. fail2ban, stockade) besides DNSBLs,
> *SMTP* checks invalid HELO or MAIL FROM, SPF, local spamtraps, and
> *content* based on DKIM signatures, and heuristic fuzzy filters.
>
> Unfortunately, these layers are stacked according to the protocols
> rather than the reliability of the corresponding filters. Despite
> being potentially able to deliver a reliable reputation value, DKIM
> comes quite late in the stack traversal. IP comes early, but IPv6 BLs
> are going to be less reliable due to increased use of ranges.
>
> IOW, in order to assuredly whitelist a DKIM signed message, it is not
> enough to skip any downstream filters (e.g. Bayesian ones, in some
> configurations.) One would also have to pierce upstream filters,
> which is a layer violation.

I don't think this limits a reputation provider. The OpenDKIM data, for example, simply needs to make clear that it's data that was accumulated after RBLs and based on Spamassassin verdicts. That sets the context for how it was collected and, thus, what it's really telling you. Perhaps, then, the best use of that particular data set is by participants that are using the same or similar filtering architectures.

What's more, one layer's verdict can be used to feed data into the earlier layers, improving overall efficiency.

With those points in mind, I think the layers become less interesting.

> Weight/score is a clean method to run filters independently of
> protocol layers. It consumes more resources, though.

There's nothing in the proposed protocols now, and there should never be, about how the consumer of the data makes use of it. You can use it as an instruction or use it as part of an overall scoring system. What we need to do (at this stage, at least) is provide the communication protocol, nothing more.
Received on Fri Jun 10 2011 - 17:30:13 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:20:18 PST