Duplicate oops reports logged for each checkwatches oops

Bug #661441 reported by Diogo Matsubara
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Deryck Hodge

Bug Description

A duplicate OOPS was logged (OOPS-1748C1004) when the checkwatches script logged OOPS-1748CCW994

This is likely due to the change on r11637: "Generate OOPS reports when cronscripts log WARN or higher, and log unhandled exceptions in scripts. Improve logging."

Related branches

Revision history for this message
Deryck Hodge (deryck) wrote :

Marking this HIGH per zero-oops policy, but AIUI, we're waiting on info from stub about the possibility of disabling the logging he added when a script implements it's own OOPS logging.

Changed in malone:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Robert Collins (lifeless) wrote :

The answer should be : don't log a high severity message if you also generate an OOPS.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 661441] Re: Duplicate oops reports logged for each checkwatches oops

@deryck this doesn't need to be high, its not adding new oopses - if
you have no oopses, you'll still have none without whatever change is
needed here.

I agree it would be nice to fix promptly to get good signal out of the
oops system, but I'd probably put fixing checkwatches oopses higher
than addressing this issue.

Revision history for this message
Stuart Bishop (stub) wrote :

If the new OOPSes are good enough, tear out the custom OOPS logging code so we don't duplicate work.

If the new OOPSes are not good enough, add a flag to LaunchpadCronScript that disables adding the OopsHandler to the root Logger. Open a bug on what is missing, because we are duplicating work. Or just fix the OopsHandler.

Alternatively, we can just filter the duplicates in the reports - the new OOPSes are unique enough to filter with no false positives.

Note that if the global OOPS reporting is filtered or disabled we will not get OOPSes for unhandled exceptions (such as lockfile errors or other error conditions in the infrastructure).

Revision history for this message
Diogo Matsubara (matsubara) wrote :

Robert, I disagree. I think this should be high because this is making the lpnet oops report useless with all the noise from the checkwatches OOPSes.

Stuart, how are the new OOPSes unique enough to filter them in the reports? I'm not in favor of filtering them in the reports as we'd basically be hiding the problem which is the same oops recorded twice.

Revision history for this message
Deryck Hodge (deryck) wrote :

We can do an easy fix in checkwatches to make sure we don't have the duplicate OOPS. I'll do a branch sometime next week and fix this. Maybe even on the plane to UDS, if I get all my other hacking done. :-)

Revision history for this message
Robert Collins (lifeless) wrote :

@Diogo useless is relative - we have oops, we have a zero oops policy;
I agree the report could be more useful without the 2-times distortion
here, but useless means that looking at the report has no value - and
thats simply not true.

Revision history for this message
Diogo Matsubara (matsubara) wrote :

@Robert, sorry, useless was a too strong word. The current situation makes the daily report very difficult to read with all the noise.

Revision history for this message
Stuart Bishop (stub) wrote :

The duplicate OOPS reports are certainly unique enough to filter them. The request variables are the most obvious difference.

Deryck Hodge (deryck)
Changed in malone:
status: Triaged → In Progress
assignee: nobody → Deryck Hodge (deryck)
Revision history for this message
Launchpad QA Bot (lpqabot) wrote : Bug fixed by a commit
Changed in malone:
milestone: none → 10.11
tags: added: qa-needstesting
Changed in malone:
status: In Progress → Fix Committed
Revision history for this message
Deryck Hodge (deryck) wrote :

I'm marking this qa-untestable. There's no easy way to test it and issues with staging/qastaging would prevent any run of this script currently. The script runs fine locally, tests pass, and it's low impact just changing the logging level. We can confirm after deploy if the double oops is fixed and errors are still logged.

Cheers,
deryck

tags: added: qa-intestable
removed: qa-needstesting
tags: added: qa-untestable
removed: qa-intestable
Changed in malone:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.