Stop trigger Nagios alarm when outgoing mail gets a transient error

Bug #1510266 reported by Paul Everitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KARL4
Fix Released
High
Paul Everitt

Bug Description

We have a bunch of things going on with mailin/mailout that need some attention. Several of these issues do a log.error which then causes Nagios to tell me KARL is in a critical error state. [wink]

This is the first one. For outgoing mail, gocept's queue is setup to do address verification for each outgoing message, with a cache. If the remote mail server doesn't answer in time, gocept's mail server gives KARL (repoze.sendmail) a warning. That then triggers, I believe, a log.error in KARL, which triggers Nagios to think KARL is broken.

It isn't a severe error. repoze.sendmail tries again in 3 hours and always is able to deliver. (Unfortunately, that 3h setting is not easily configurable, but correct me if I am wrong on that.)

Ideally this would generate a log.warning or something in KARL. We can know about it, but not go crazy.

Some extra notes:

- The full traceback is below

- The .ini files that configure OSF are in a separate package (osideploy) which Fabric uses to generate

Error while sending mail from to x@y.com

Traceback (most recent call last):
  File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/queue.py", line 209, in _send_message
    self.mailer.send(fromaddr, toaddrs, message)
  File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/mailer.py", line 75, in send
    connection.sendmail(fromaddr, toaddrs, message)
  File "/usr/lib/python2.7/smtplib.py", line 746, in sendmail
    raise SMTPDataError(code, resp)
SMTPDataError: (450, '4.1.1 id=25378-35 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <x@y.com>: Recipient address rejected: unverified address: Address verification in progress')

Revision history for this message
Carlos de la Guardia (cguardia) wrote :

I looked at all the code involved, and I think the easy way out is to just catch the SMTPDataError and log it. You are right that there is no easy way to configure the 3 hour setting, short of forking repoze.sendmail.

My problem is that I can't easily replicate this. Can you?

Changed in karl4:
status: New → Fix Committed
Revision history for this message
Paul Everitt (paul-agendaless) wrote : Re: [Bug 1510266] Stop trigger Nagios alarm when outgoing mail gets a transient error

It’s really hard to replicate and will take quite a testing setup to do so, I think.

Is the idea that its an unhandled exception in repoze.sendmail that makes it all the way to Pyramid’s handler? And thus, since you are handling it, Pyramid’s standard logging (which marks it as a error instead of a warning) gets triggered?

—Paul

> On Nov 12, 2015, at 4:35 AM, Carlos de la Guardia <email address hidden> wrote:
>
> I looked at all the code involved, and I think the easy way out is to
> just catch the SMTPDataError and log it. You are right that there is no
> easy way to configure the 3 hour setting, short of forking
> repoze.sendmail.
>
> My problem is that I can't easily replicate this. Can you?
>
> ** Changed in: karl4
> Status: New => Fix Committed
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1510266
>
> Title:
> Stop trigger Nagios alarm when outgoing mail gets a transient error
>
> Status in KARL4:
> Fix Committed
>
> Bug description:
> We have a bunch of things going on with mailin/mailout that need some
> attention. Several of these issues do a log.error which then causes
> Nagios to tell me KARL is in a critical error state. [wink]
>
> This is the first one. For outgoing mail, gocept's queue is setup to
> do address verification for each outgoing message, with a cache. If
> the remote mail server doesn't answer in time, gocept's mail server
> gives KARL (repoze.sendmail) a warning. That then triggers, I believe,
> a log.error in KARL, which triggers Nagios to think KARL is broken.
>
> It isn't a severe error. repoze.sendmail tries again in 3 hours and
> always is able to deliver. (Unfortunately, that 3h setting is not
> easily configurable, but correct me if I am wrong on that.)
>
> Ideally this would generate a log.warning or something in KARL. We can
> know about it, but not go crazy.
>
> Some extra notes:
>
> - The full traceback is below
>
> - The .ini files that configure OSF are in a separate package
> (osideploy) which Fabric uses to generate
>
>
> Error while sending mail from to x@y.com
>
> Traceback (most recent call last):
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/queue.py", line 209, in _send_message
> self.mailer.send(fromaddr, toaddrs, message)
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/mailer.py", line 75, in send
> connection.sendmail(fromaddr, toaddrs, message)
> File "/usr/lib/python2.7/smtplib.py", line 746, in sendmail
> raise SMTPDataError(code, resp)
> SMTPDataError: (450, '4.1.1 id=25378-35 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <x@y.com>: Recipient address rejected: unverified address: Address verification in progress')
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl4/+bug/1510266/+subscriptions

Revision history for this message
Carlos de la Guardia (cguardia) wrote :

Yeah, repoze.sendmail catches all smtp exceptions, but only acts on error codes between 500 and 599. It re-raises the exception otherwise, and mailout was not handling it.

Changed in karl4:
status: Fix Committed → Fix Released
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Hmm, production just raised an alarm for this:

SMTPDataError: (450, '4.1.1 id=04334-07 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <email address hidden>: Recipient address rejected: unverified address: Address verification in progress')

Revision history for this message
Carlos de la Guardia (cguardia) wrote :

This is discouraging. Why is the exception not bubbling up? We might need to fork repoze.sendmail.

Revision history for this message
Paul Everitt (paul-agendaless) wrote : Re: [Bug 1510266] Re: Stop trigger Nagios alarm when outgoing mail gets a transient error

Does that mean you think there is a bare except handler somewhere?

—Paul

> On Nov 16, 2015, at 2:02 PM, Carlos de la Guardia <email address hidden> wrote:
>
> This is discouraging. Why is the exception not bubbling up? We might
> need to fork repoze.sendmail.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1510266
>
> Title:
> Stop trigger Nagios alarm when outgoing mail gets a transient error
>
> Status in KARL4:
> Fix Released
>
> Bug description:
> We have a bunch of things going on with mailin/mailout that need some
> attention. Several of these issues do a log.error which then causes
> Nagios to tell me KARL is in a critical error state. [wink]
>
> This is the first one. For outgoing mail, gocept's queue is setup to
> do address verification for each outgoing message, with a cache. If
> the remote mail server doesn't answer in time, gocept's mail server
> gives KARL (repoze.sendmail) a warning. That then triggers, I believe,
> a log.error in KARL, which triggers Nagios to think KARL is broken.
>
> It isn't a severe error. repoze.sendmail tries again in 3 hours and
> always is able to deliver. (Unfortunately, that 3h setting is not
> easily configurable, but correct me if I am wrong on that.)
>
> Ideally this would generate a log.warning or something in KARL. We can
> know about it, but not go crazy.
>
> Some extra notes:
>
> - The full traceback is below
>
> - The .ini files that configure OSF are in a separate package
> (osideploy) which Fabric uses to generate
>
>
> Error while sending mail from to x@y.com
>
> Traceback (most recent call last):
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/queue.py", line 209, in _send_message
> self.mailer.send(fromaddr, toaddrs, message)
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/mailer.py", line 75, in send
> connection.sendmail(fromaddr, toaddrs, message)
> File "/usr/lib/python2.7/smtplib.py", line 746, in sendmail
> raise SMTPDataError(code, resp)
> SMTPDataError: (450, '4.1.1 id=25378-35 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <x@y.com>: Recipient address rejected: unverified address: Address verification in progress')
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl4/+bug/1510266/+subscriptions

Revision history for this message
Paul Everitt (paul-agendaless) wrote :
Download full text (3.9 KiB)

Here is the full traceback:

Traceback (most recent call last):
  File "/srv/osfkarl/production/73/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/queue.py", line 209, in _send_message
    self.mailer.send(fromaddr, toaddrs, message)
  File "/srv/osfkarl/production/73/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/mailer.py", line 75, in send
    connection.sendmail(fromaddr, toaddrs, message)
  File "/usr/lib/python2.7/smtplib.py", line 746, in sendmail
    raise SMTPDataError(code, resp)
SMTPDataError: (450, '4.1.1 id=04334-07 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <email address hidden>: Recipient address rejected: unverified address: Address verification in progress')

Is this the relevant section? A 450 error code means it hits the “else”. Looks ok to me, your exception handler should have caught it…although it’s weird, the traceback above doesn’t go through our KARL code. Is this because it’s done at TM commit time?

            try:
                self.mailer.send(fromaddr, toaddrs, message)
            except smtplib.SMTPResponseException, e:
                if 500 <= e.smtp_code <= 599:
                    # permanent error, ditch the message
                    self.log.error(
                        "Discarding email from %s to %s due to"
                        " a permanent error: %s",
                        fromaddr, ", ".join(toaddrs), str(e))
                    _os_link(filename, rejected_filename)
                else:
                    # Log an error and retry later
                    raise

—Paul

> On Nov 16, 2015, at 2:02 PM, Carlos de la Guardia <email address hidden> wrote:
>
> This is discouraging. Why is the exception not bubbling up? We might
> need to fork repoze.sendmail.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1510266
>
> Title:
> Stop trigger Nagios alarm when outgoing mail gets a transient error
>
> Status in KARL4:
> Fix Released
>
> Bug description:
> We have a bunch of things going on with mailin/mailout that need some
> attention. Several of these issues do a log.error which then causes
> Nagios to tell me KARL is in a critical error state. [wink]
>
> This is the first one. For outgoing mail, gocept's queue is setup to
> do address verification for each outgoing message, with a cache. If
> the remote mail server doesn't answer in time, gocept's mail server
> gives KARL (repoze.sendmail) a warning. That then triggers, I believe,
> a log.error in KARL, which triggers Nagios to think KARL is broken.
>
> It isn't a severe error. repoze.sendmail tries again in 3 hours and
> always is able to deliver. (Unfortunately, that 3h setting is not
> easily configurable, but correct me if I am wrong on that.)
>
> Ideally this would generate a log.warning or something in KARL. We can
> know about it, but not go crazy.
>
> Some extra notes:
>
> - The full traceback is below
>
> - The .ini files that configure OSF are in a separate package
> (osideploy) which Fabric uses to generate
>
>
> Error while sending mail from to x@y.com
>
> Tracebac...

Read more...

Revision history for this message
Carlos de la Guardia (cguardia) wrote :

I think I know where to go with this. I'll do it on Wednesday.

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Hi Carlos. Based on your last comment, this one is still open. We're getting a Nagios alarm several times a day, so it would be nice to quiet this.

Changed in karl4:
importance: Low → High
milestone: 012 → 013
status: Fix Released → In Progress
Revision history for this message
Carlos de la Guardia (cguardia) wrote :

I made a PR to repoze.sendmail to allow us to avoid this or any other transient errors. If it's not merged we could use my branch in the meantime.

https://github.com/repoze/repoze.sendmail/pull/37

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Should I make our own -agendaless package and put in our index? Probably off of master plus this branch, so we’d need to be careful when deploying it (big jump forward in repoze.sendmail version.)

—Paul

> On Dec 10, 2015, at 3:16 AM, Carlos de la Guardia <email address hidden> wrote:
>
> I made a PR to repoze.sendmail to allow us to avoid this or any other
> transient errors. If it's not merged we could use my branch in the
> meantime.
>
> https://github.com/repoze/repoze.sendmail/pull/37
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1510266
>
> Title:
> Stop trigger Nagios alarm when outgoing mail gets a transient error
>
> Status in KARL4:
> In Progress
>
> Bug description:
> We have a bunch of things going on with mailin/mailout that need some
> attention. Several of these issues do a log.error which then causes
> Nagios to tell me KARL is in a critical error state. [wink]
>
> This is the first one. For outgoing mail, gocept's queue is setup to
> do address verification for each outgoing message, with a cache. If
> the remote mail server doesn't answer in time, gocept's mail server
> gives KARL (repoze.sendmail) a warning. That then triggers, I believe,
> a log.error in KARL, which triggers Nagios to think KARL is broken.
>
> It isn't a severe error. repoze.sendmail tries again in 3 hours and
> always is able to deliver. (Unfortunately, that 3h setting is not
> easily configurable, but correct me if I am wrong on that.)
>
> Ideally this would generate a log.warning or something in KARL. We can
> know about it, but not go crazy.
>
> Some extra notes:
>
> - The full traceback is below
>
> - The .ini files that configure OSF are in a separate package
> (osideploy) which Fabric uses to generate
>
>
> Error while sending mail from to x@y.com
>
> Traceback (most recent call last):
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/queue.py", line 209, in _send_message
> self.mailer.send(fromaddr, toaddrs, message)
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/mailer.py", line 75, in send
> connection.sendmail(fromaddr, toaddrs, message)
> File "/usr/lib/python2.7/smtplib.py", line 746, in sendmail
> raise SMTPDataError(code, resp)
> SMTPDataError: (450, '4.1.1 id=25378-35 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <x@y.com>: Recipient address rejected: unverified address: Address verification in progress')
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl4/+bug/1510266/+subscriptions

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Also, does that repoze.sendmail PR imply a change in our code, to add ignore_transient=False ?

—Paul

> On Dec 10, 2015, at 8:14 AM, Paul Everitt <email address hidden> wrote:
>
>
> Should I make our own -agendaless package and put in our index? Probably off of master plus this branch, so we’d need to be careful when deploying it (big jump forward in repoze.sendmail version.)
>
> —Paul
>
>> On Dec 10, 2015, at 3:16 AM, Carlos de la Guardia <email address hidden> wrote:
>>
>> I made a PR to repoze.sendmail to allow us to avoid this or any other
>> transient errors. If it's not merged we could use my branch in the
>> meantime.
>>
>> https://github.com/repoze/repoze.sendmail/pull/37
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1510266
>>
>> Title:
>> Stop trigger Nagios alarm when outgoing mail gets a transient error
>>
>> Status in KARL4:
>> In Progress
>>
>> Bug description:
>> We have a bunch of things going on with mailin/mailout that need some
>> attention. Several of these issues do a log.error which then causes
>> Nagios to tell me KARL is in a critical error state. [wink]
>>
>> This is the first one. For outgoing mail, gocept's queue is setup to
>> do address verification for each outgoing message, with a cache. If
>> the remote mail server doesn't answer in time, gocept's mail server
>> gives KARL (repoze.sendmail) a warning. That then triggers, I believe,
>> a log.error in KARL, which triggers Nagios to think KARL is broken.
>>
>> It isn't a severe error. repoze.sendmail tries again in 3 hours and
>> always is able to deliver. (Unfortunately, that 3h setting is not
>> easily configurable, but correct me if I am wrong on that.)
>>
>> Ideally this would generate a log.warning or something in KARL. We can
>> know about it, but not go crazy.
>>
>> Some extra notes:
>>
>> - The full traceback is below
>>
>> - The .ini files that configure OSF are in a separate package
>> (osideploy) which Fabric uses to generate
>>
>>
>> Error while sending mail from to x@y.com
>>
>> Traceback (most recent call last):
>> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/queue.py", line 209, in _send_message
>> self.mailer.send(fromaddr, toaddrs, message)
>> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/mailer.py", line 75, in send
>> connection.sendmail(fromaddr, toaddrs, message)
>> File "/usr/lib/python2.7/smtplib.py", line 746, in sendmail
>> raise SMTPDataError(code, resp)
>> SMTPDataError: (450, '4.1.1 id=25378-35 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <x@y.com>: Recipient address rejected: unverified address: Address verification in progress')
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/karl4/+bug/1510266/+subscriptions
>

Revision history for this message
Carlos de la Guardia (cguardia) wrote :

To avoid unpleasant surprises, maybe I should backport the fix to the branch that we use?

You are right, we would need to add that ignore_transient=True to our QP call.

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

I think we previously agreed that we should upgrade to the latest, to try and fix the multiple Message-Id issue. Thus, if you agree, I will fork that repo into karlproject and make a package based on your branch.

—Paul

> On Dec 10, 2015, at 1:08 PM, Carlos de la Guardia <email address hidden> wrote:
>
> To avoid unpleasant surprises, maybe I should backport the fix to the
> branch that we use?
>
> You are right, we would need to add that ignore_transient=True to our QP
> call.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1510266
>
> Title:
> Stop trigger Nagios alarm when outgoing mail gets a transient error
>
> Status in KARL4:
> In Progress
>
> Bug description:
> We have a bunch of things going on with mailin/mailout that need some
> attention. Several of these issues do a log.error which then causes
> Nagios to tell me KARL is in a critical error state. [wink]
>
> This is the first one. For outgoing mail, gocept's queue is setup to
> do address verification for each outgoing message, with a cache. If
> the remote mail server doesn't answer in time, gocept's mail server
> gives KARL (repoze.sendmail) a warning. That then triggers, I believe,
> a log.error in KARL, which triggers Nagios to think KARL is broken.
>
> It isn't a severe error. repoze.sendmail tries again in 3 hours and
> always is able to deliver. (Unfortunately, that 3h setting is not
> easily configurable, but correct me if I am wrong on that.)
>
> Ideally this would generate a log.warning or something in KARL. We can
> know about it, but not go crazy.
>
> Some extra notes:
>
> - The full traceback is below
>
> - The .ini files that configure OSF are in a separate package
> (osideploy) which Fabric uses to generate
>
>
> Error while sending mail from to x@y.com
>
> Traceback (most recent call last):
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/queue.py", line 209, in _send_message
> self.mailer.send(fromaddr, toaddrs, message)
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/mailer.py", line 75, in send
> connection.sendmail(fromaddr, toaddrs, message)
> File "/usr/lib/python2.7/smtplib.py", line 746, in sendmail
> raise SMTPDataError(code, resp)
> SMTPDataError: (450, '4.1.1 id=25378-35 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <x@y.com>: Recipient address rejected: unverified address: Address verification in progress')
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl4/+bug/1510266/+subscriptions

Revision history for this message
Carlos de la Guardia (cguardia) wrote :

By the way, which version of repoze.sendmail are we using?

Revision history for this message
Carlos de la Guardia (cguardia) wrote :

Ok, let's fork it into karlproject. I will add the mailout change when that's done.

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

repoze.sendmail 2.3

—Paul

> On Dec 10, 2015, at 1:26 PM, Carlos de la Guardia <email address hidden> wrote:
>
> By the way, which version of repoze.sendmail are we using?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1510266
>
> Title:
> Stop trigger Nagios alarm when outgoing mail gets a transient error
>
> Status in KARL4:
> In Progress
>
> Bug description:
> We have a bunch of things going on with mailin/mailout that need some
> attention. Several of these issues do a log.error which then causes
> Nagios to tell me KARL is in a critical error state. [wink]
>
> This is the first one. For outgoing mail, gocept's queue is setup to
> do address verification for each outgoing message, with a cache. If
> the remote mail server doesn't answer in time, gocept's mail server
> gives KARL (repoze.sendmail) a warning. That then triggers, I believe,
> a log.error in KARL, which triggers Nagios to think KARL is broken.
>
> It isn't a severe error. repoze.sendmail tries again in 3 hours and
> always is able to deliver. (Unfortunately, that 3h setting is not
> easily configurable, but correct me if I am wrong on that.)
>
> Ideally this would generate a log.warning or something in KARL. We can
> know about it, but not go crazy.
>
> Some extra notes:
>
> - The full traceback is below
>
> - The .ini files that configure OSF are in a separate package
> (osideploy) which Fabric uses to generate
>
>
> Error while sending mail from to x@y.com
>
> Traceback (most recent call last):
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/queue.py", line 209, in _send_message
> self.mailer.send(fromaddr, toaddrs, message)
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/mailer.py", line 75, in send
> connection.sendmail(fromaddr, toaddrs, message)
> File "/usr/lib/python2.7/smtplib.py", line 746, in sendmail
> raise SMTPDataError(code, resp)
> SMTPDataError: (450, '4.1.1 id=25378-35 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <x@y.com>: Recipient address rejected: unverified address: Address verification in progress')
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl4/+bug/1510266/+subscriptions

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Today is a KARL day for me. I’ll focus first on various things related to repoze.sendmail.

I made this fork of master:

  https://github.com/karlproject/repoze.sendmail

I will edit setup.py to use PEP 440-compliant “local version identifiers”. Thus, the version field will be “4.2+agendaless.1”.

Should I change the repository name to be repoze.sendmail-agendaless? (I doubt it.)

—Paul

> On Dec 10, 2015, at 1:33 PM, Carlos de la Guardia <email address hidden> wrote:
>
> Ok, let's fork it into karlproject. I will add the mailout change when
> that's done.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1510266
>
> Title:
> Stop trigger Nagios alarm when outgoing mail gets a transient error
>
> Status in KARL4:
> In Progress
>
> Bug description:
> We have a bunch of things going on with mailin/mailout that need some
> attention. Several of these issues do a log.error which then causes
> Nagios to tell me KARL is in a critical error state. [wink]
>
> This is the first one. For outgoing mail, gocept's queue is setup to
> do address verification for each outgoing message, with a cache. If
> the remote mail server doesn't answer in time, gocept's mail server
> gives KARL (repoze.sendmail) a warning. That then triggers, I believe,
> a log.error in KARL, which triggers Nagios to think KARL is broken.
>
> It isn't a severe error. repoze.sendmail tries again in 3 hours and
> always is able to deliver. (Unfortunately, that 3h setting is not
> easily configurable, but correct me if I am wrong on that.)
>
> Ideally this would generate a log.warning or something in KARL. We can
> know about it, but not go crazy.
>
> Some extra notes:
>
> - The full traceback is below
>
> - The .ini files that configure OSF are in a separate package
> (osideploy) which Fabric uses to generate
>
>
> Error while sending mail from to x@y.com
>
> Traceback (most recent call last):
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/queue.py", line 209, in _send_message
> self.mailer.send(fromaddr, toaddrs, message)
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/mailer.py", line 75, in send
> connection.sendmail(fromaddr, toaddrs, message)
> File "/usr/lib/python2.7/smtplib.py", line 746, in sendmail
> raise SMTPDataError(code, resp)
> SMTPDataError: (450, '4.1.1 id=25378-35 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <x@y.com>: Recipient address rejected: unverified address: Address verification in progress')
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl4/+bug/1510266/+subscriptions

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

As an FYI and reminder, it occurs to me, now that we have forked this, we can do some other things we wanted to do. For example, the crazy changes I made previously in KARL, to monkey-patch stuff in here, can now be done directly in here. Also, we can make the 3 hour retry delay configurable.

—Paul

> On Dec 10, 2015, at 1:33 PM, Carlos de la Guardia <email address hidden> wrote:
>
> Ok, let's fork it into karlproject. I will add the mailout change when
> that's done.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1510266
>
> Title:
> Stop trigger Nagios alarm when outgoing mail gets a transient error
>
> Status in KARL4:
> In Progress
>
> Bug description:
> We have a bunch of things going on with mailin/mailout that need some
> attention. Several of these issues do a log.error which then causes
> Nagios to tell me KARL is in a critical error state. [wink]
>
> This is the first one. For outgoing mail, gocept's queue is setup to
> do address verification for each outgoing message, with a cache. If
> the remote mail server doesn't answer in time, gocept's mail server
> gives KARL (repoze.sendmail) a warning. That then triggers, I believe,
> a log.error in KARL, which triggers Nagios to think KARL is broken.
>
> It isn't a severe error. repoze.sendmail tries again in 3 hours and
> always is able to deliver. (Unfortunately, that 3h setting is not
> easily configurable, but correct me if I am wrong on that.)
>
> Ideally this would generate a log.warning or something in KARL. We can
> know about it, but not go crazy.
>
> Some extra notes:
>
> - The full traceback is below
>
> - The .ini files that configure OSF are in a separate package
> (osideploy) which Fabric uses to generate
>
>
> Error while sending mail from to x@y.com
>
> Traceback (most recent call last):
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/queue.py", line 209, in _send_message
> self.mailer.send(fromaddr, toaddrs, message)
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/mailer.py", line 75, in send
> connection.sendmail(fromaddr, toaddrs, message)
> File "/usr/lib/python2.7/smtplib.py", line 746, in sendmail
> raise SMTPDataError(code, resp)
> SMTPDataError: (450, '4.1.1 id=25378-35 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <x@y.com>: Recipient address rejected: unverified address: Address verification in progress')
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl4/+bug/1510266/+subscriptions

Changed in karl4:
milestone: 013 → 014
Revision history for this message
Carlos de la Guardia (cguardia) wrote :

I just pushed the change to use the ignore_transient feature in mailout.

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Over to me for testing/merging/releasing.

Changed in karl4:
assignee: Carlos de la Guardia (cguardia) → Paul Everitt (paul-agendaless)
status: In Progress → Fix Committed
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

This is now in testing on the master branch (with some other things per an email just now to Nat/Carlos) which is setup on karlstaging.

Changed in karl4:
status: Fix Committed → Fix Released
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Hmm, after the production update, I'm still getting errors logged for:

Traceback (most recent call last):
  File "/srv/osfkarl/production/74/eggs/repoze.sendmail-4.2+agendaless.2-py2.7.egg/repoze/sendmail/queue.py", line 232, in _send_message
    self.mailer.send(fromaddr, toaddrs, message)
  File "/srv/osfkarl/production/74/eggs/repoze.sendmail-4.2+agendaless.2-py2.7.egg/repoze/sendmail/mailer.py", line 100, in send
    connection.sendmail(fromaddr, toaddrs, message)
  File "/usr/lib/python2.7/smtplib.py", line 746, in sendmail
    raise SMTPDataError(code, resp)
SMTPDataError: (450, '4.1.1 id=30340-30 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <xxx@xx>: Recipient address rejected: unverified address: Address verification in progress')

Changed in karl4:
status: Fix Released → In Progress
Changed in karl4:
milestone: 014 → 015
Revision history for this message
Carlos de la Guardia (cguardia) wrote :

Pretty strange. SMTPDataError is a subclass of SMTPResponseException, so it should be caught inside the queue code. Are we 100% sure the ignore_transient=True part is there?

Revision history for this message
Paul Everitt (paul-agendaless) wrote : Re: [Bug 1510266] Stop trigger Nagios alarm when outgoing mail gets a transient error

Crap, you’re right, it’s on master but I didn’t put that into the egg. :(

—Paul

> On Jan 4, 2016, at 1:02 AM, Carlos de la Guardia <email address hidden> wrote:
>
> Pretty strange. SMTPDataError is a subclass of SMTPResponseException, so
> it should be caught inside the queue code. Are we 100% sure the
> ignore_transient=True part is there?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1510266
>
> Title:
> Stop trigger Nagios alarm when outgoing mail gets a transient error
>
> Status in KARL4:
> In Progress
>
> Bug description:
> We have a bunch of things going on with mailin/mailout that need some
> attention. Several of these issues do a log.error which then causes
> Nagios to tell me KARL is in a critical error state. [wink]
>
> This is the first one. For outgoing mail, gocept's queue is setup to
> do address verification for each outgoing message, with a cache. If
> the remote mail server doesn't answer in time, gocept's mail server
> gives KARL (repoze.sendmail) a warning. That then triggers, I believe,
> a log.error in KARL, which triggers Nagios to think KARL is broken.
>
> It isn't a severe error. repoze.sendmail tries again in 3 hours and
> always is able to deliver. (Unfortunately, that 3h setting is not
> easily configurable, but correct me if I am wrong on that.)
>
> Ideally this would generate a log.warning or something in KARL. We can
> know about it, but not go crazy.
>
> Some extra notes:
>
> - The full traceback is below
>
> - The .ini files that configure OSF are in a separate package
> (osideploy) which Fabric uses to generate
>
>
> Error while sending mail from to x@y.com
>
> Traceback (most recent call last):
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/queue.py", line 209, in _send_message
> self.mailer.send(fromaddr, toaddrs, message)
> File "/srv/osfkarl/production/72/eggs/repoze.sendmail-2.3-py2.7.egg/repoze/sendmail/mailer.py", line 75, in send
> connection.sendmail(fromaddr, toaddrs, message)
> File "/usr/lib/python2.7/smtplib.py", line 746, in sendmail
> raise SMTPDataError(code, resp)
> SMTPDataError: (450, '4.1.1 id=25378-35 - Temporary MTA failure on relaying, from MTA(smtp:[127.0.0.1]:10025): 450 4.1.1 <x@y.com>: Recipient address rejected: unverified address: Address verification in progress')
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl4/+bug/1510266/+subscriptions

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Paul needs to get this into an egg and into a release.

Changed in karl4:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.