Space munging (fwd) from Murray S. Kucherawy on 2009-09-09 (OpenDKIM Developers mailing list)

From: Murray S. Kucherawy <msk_at_blackops.org>
Date: Wed, 9 Sep 2009 11:44:12 -0700 (PDT)

FYI, contacted sendmail.org asking for help with the _FFR_COMMAIZE
details. They have yet to reply after about a week, so I may proceed with
development of algorithms based on the results of experiments, with SM's
help testing.

There's some non-FFR code in the same vicinity that I remember developing
while looking carefully at the sendmail MTA's code, trying to mimic its
space-eating properties, but now I can't seem to reproduce what they do.
opendkim works fine, but in the space-munging case, now I can't see why.
I hate that.

Anyway, here's the mail I sent them:

---------- Forwarded message ----------
Date: Thu, 3 Sep 2009 02:36:43 -0700 (PDT)
From: Murray S. Kucherawy <msk_at_blackops.org>
To: sendmail-2009_at_sendmail.org
Subject: Space munging

I'm working on a milter-based application that does crypto of header fields.
The signing and verifying operations depend on header fields not changing
between signing and verifying, but sendmail does some header pretty-fying so I
have to anticipate what the MTA will do in order to make sure verification will
work.

A recent read of the relevant code in headers.c suggested to me that 8.13.x
MTAs will drop one leading space from the header field's value if there are at
least two, but will also make sure there's at least one; so:

         Foo:Bar becomes Foo: Bar
         Foo: Bar is unchanged
         Foo: Bar becomes Foo: Bar
         Foo: Bar becomes Foo: Bar

...etc. However in testing the above cases by actually sending them through
8.13.8, it looks like they all get normalized down to simply "Foo:(sp)Bar". Is
that correct? Does it make a difference if the first non-space character after
the colon is something like a quote or focus character rather than an
alphanumeric?

In 8.14.x, with the addition of the "noleadspc" milter negotiation option, I
get the header fields exactly as sent, and I know that's what I'm getting, so
none of this is necessary in that case.

Another area of rewriting inside the code that I need to mimic is the
commaize() function. Without having to replicate everything that prescan()
does, can I apply a set of rules like the following and get pretty close?

1) break the string into substrings, cutting at any non-comment, unescaped,
unquoted comma

2) within each substring, the rightmost focus ("<blah>") is the address and
everything else is the comment

3) if there is no focus but there are parentheses, remove stuff in (and
including) the parentheses before continuing

4) if there is no comma as in (1) and no focus as in (2) (i.e. you're still
where you started except maybe with comments removed), break the string at
non-comment unescaped spaces; each substring thus created contains a single
address

5) for any address not containing an at-sign, append "_at_$j"

6) crush adjacent unquoted, unescaped whitespaces down to a single one

7) reassemble the address list, adding ", " in between them, and wrapping at 78
characters

Thanks,

-MSK
Received on Wed Sep 09 2009 - 18:44:25 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:31:23 PST