enable asynchronous dkimpy

Bug #1847002 reported by dkg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dkimpy
Fix Released
Wishlist
Scott Kitterman

Bug Description

dkimpy appears to block on DNS requests. if it is used in an asyncio context, a slow DNS response could potentially block all the other code in an async process from working.

Given that the mathematics and verification steps don't appear to need to block, it would be great if either:

 * dkimpy could adopt aiodns (https://github.com/saghul/aiodns) internally and include async functions, or

 * dkimpy could provide an API that lets the user handle the DNS fetch themselves somehow (e.g. the user could query the message for the name of the DNS record needed, and then the user could pass the DNS record into the verification function)

Revision history for this message
Scott Kitterman (kitterman) wrote :

Since, as you know, I use Debian, I try to focus on using things that are actively maintained there. https://tracker.debian.org/pkg/aiodns does not appear to qualify.

All of the DNS queries are in dkim.dnsplug.py, so it shouldn't be too hard to override that if you want to.

I think this is a reasonable goal to provide as an option, but I'd like to see aiodns actually maintained.

Changed in dkimpy:
importance: Undecided → Wishlist
milestone: none → future
status: New → Triaged
Revision history for this message
Scott Kitterman (kitterman) wrote :

I've thought about this some more and I'm struggling with the use case. DKIM has exactly one DNS lookup, so it's not like with (say) SPF where paralleling multiple DNS lookups will help. That DKIM verification task needs to wait on the results of the DNS query for the key record.

If an application were doing large numbers of DKIM validations, wouldn't it make more sense for the application to throw the entire DKIM validation for a single message off into it's own thread to wait for the DNS answer? Maybe it's just that it's late and I"m tired, but I'm not sure I see the benefit of doing this inside dkimpy?

Revision history for this message
dkg (dkg0) wrote : Re: [Bug 1847002] Re: enable asynchronous dkimpy

On Tue 2019-10-08 06:29:13 +0000, Scott Kitterman wrote:
> I've thought about this some more and I'm struggling with the use case.
> DKIM has exactly one DNS lookup, so it's not like with (say) SPF where
> paralleling multiple DNS lookups will help. That DKIM verification task
> needs to wait on the results of the DNS query for the key record.
>
> If an application were doing large numbers of DKIM validations, wouldn't
> it make more sense for the application to throw the entire DKIM
> validation for a single message off into it's own thread to wait for the
> DNS answer? Maybe it's just that it's late and I"m tired, but I'm not
> sure I see the benefit of doing this inside dkimpy?

My concern is an application that doesn't deal with multiple threads at
all, but just uses entirely asyncio code. In that case, you'd want the
code to await the result of the DNS lookup without having to manage
threads, but also without blocking the other asyncio callbacks that are
ready to be invoked by the main loop.

      --dkg

Revision history for this message
Scott Kitterman (kitterman) wrote :

OK. I guess we could add that as another option for dnsplug.

Revision history for this message
Stuart Gathman (stuart-gathman) wrote :

I like the option of passing in DNS record as optional argument.

Revision history for this message
Stuart Gathman (stuart-gathman) wrote :

It would also help to have an API that returns the hostname to lookup in DNS. That is not trivial to construct.

Revision history for this message
Scott Kitterman (kitterman) wrote :

It looks like supporting aiodns won't be that hard.

Changed in dkimpy:
milestone: future → 1.0.0
assignee: nobody → Scott Kitterman (kitterman)
Revision history for this message
Scott Kitterman (kitterman) wrote :

Based on the example code in the README https://github.com/saghul/aiodns and a look at the pycares source, I think it's a new dnsplug DNSLookup variant that is something like:

def get_txt_aiodns(name, timeout=5):
    """Return a TXT record associated with a DNS name. For DKIM we
    can assume there is only one."""

    import asyncio
    import aiodns

    # Note: This will use the existing loop or create one if needed
    loop = asyncio.get_event_loop()
    resolver = aiodns.DNSResolver(loop=loop, timeout=timeout)

    async def query(name, qtype):
        return await resolver.query(name, qtype)

    q = query(name, 'TXT')
        result = loop.run_until_complete(coro)

    if result:
        return result.text
    else:
        return None

Unlike the existing dnsplug selection which is based purely on which modules are installed, I think this should be explicitly selected with some kind of flag in the API:

  #: @param aiodns: use aiodns for async DNS lookups (requires python 3.5,
  #: error if set when aiodns is not available)

I've tested bits and pieces of the above, but not the whole thing. I'd appreciate feedback if anyone has experience with using async in general or aiodns in particular.

Thanks.

Revision history for this message
dkg (dkg0) wrote :

So i think that would work in terms of *using* `aiodns`, but i don't think it would work in terms of integrating dkimpy into a broader asynchronous application.

To do that, we'd need to offer an asynchronous version of `dkim.dkim_verify`, one that could be awaited on for its results. otherwise, `dkim.dkim_verify` ends up blocking the thread that calls it.

does that make sense?

Revision history for this message
Scott Kitterman (kitterman) wrote :

Sort of. I've never worked with asyncio before, so it's not entirely clear. I've Googled a bit and I think I got it. I'll take a whack at it and see if I can make it work. Thanks.

Revision history for this message
Scott Kitterman (kitterman) wrote :

I have a working feature branch for this:

https://code.launchpad.net/~dkimpy-hackers/dkimpy/+git/dkimpy/+ref/async

It works, but it seems like a maintenance challenge to have all the duplicated code. I would appreciate a review of the concept. It this is on the right track, I can move the async bits into smaller functions to reduce the code duplication.

Revision history for this message
Scott Kitterman (kitterman) wrote :

I've finished refactoring and the async ~duplicated code is down to almost nothing. Unless someone has comments, I'll go ahead and merge this as in manual tests is seems to work fine. Need to figure out test suite coverage though (since this is all about DNS processing and the current test suite doesn't test that).

Revision history for this message
Stuart Gathman (stuart-gathman) wrote :

I suggest copying the test suite from pyspf. Async programming is not timing sensitive (unlike multi-threading or responding directly to interrupts). The zonefile data is similar for DKIM, and we should be able to come up with a test case syntax. It is complicated by having to include a complete email messages in each test case. Maybe just a filename in the definition is sufficient.

tests:
  exampletest:
    description: blue sky of what properties are needed
    comment: ...
    spec: 1.2/3
    mailfile: test123.txt
    mailfrom: <email address hidden> # for standards that compare MFROM with DKIM identity
    result: fail/pass/notpresent

zonedata:
  _adsp._domainkey.bar.com:
     - TXT: "dkim=unknown"
  default._domainkey.bar.com:
     - TXT: "k=rsa; p=fag7f942l3fnafda0g7f042342gasdf0g80afdjgllfhgofur72395942jfldsafJFHDLF..."

It might be sufficient to have a bodyhash test property, and include only email headers - but there are variations on how bodyhash is computed that should be tested.

The test driver would run each test in both sync and async mode (option to do only one or the other).

Revision history for this message
Stuart Gathman (stuart-gathman) wrote :

Oh, signing tests need more support. I'll have to get back to this later, however.

Revision history for this message
Scott Kitterman (kitterman) wrote :

There's also https://github.com/ValiMail/arc_test_suite/ which uses dnslib to
simulate on the wire DNS, to an extent, the problem with all of these is that
the override the DNSLookup (for pyspf) or get_text (for dkimpy) functions with
a test variant, so the actual DNS functionality of the package never gets
tested.

I did a little more investigation of dnslib and aiodns and I think it's quite
feasible to make a test harness that actually tests the DNS processing too. I
think starting from the pyspf zone definition for data is good. The ARC test
suite used it as well.

Revision history for this message
Scott Kitterman (kitterman) wrote :

I've finished refactoring and the async ~duplicated code is down to almost nothing. Unless someone has comments, I'll go ahead and merge this as in manual tests is seems to work fine. Need to figure out test suite coverage though (since this is all about DNS processing and the current test suite doesn't test that). I won't hold the next release for that though.

Changed in dkimpy:
status: Triaged → In Progress
Revision history for this message
Scott Kitterman (kitterman) wrote :

Merged now.

Changed in dkimpy:
status: In Progress → Fix Committed
Revision history for this message
Scott Kitterman (kitterman) wrote :

2019-12-09 Version 1.0.0
    - Add support for RFC 8460 tlsrpt DKIM signature processing (LP: #1847020)
    - Add async support with aiodns for DKIM verification (ARC not supported)
      (LP: #1847002)
    - Add new timeout parameter to enable DNS lookup timeouts to be adjusted
    - Add new DKIM.present function to allow applications to test if a DKIM
      signature is present without doing validation (LP: #1851141)
    - Support signature verification with RSAPublicKey formatted keys
      since, although rare, they are RFC 6376 specified (LP: #1851862)
    - Drop usage of pymilter Milter.dns in dnsplug since it doesn't support
      havine a timeout passed to it
    - Catch binascii related key format errors (LP: #1854477)

Changed in dkimpy:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.