[SRU] MessageTimeout and DuplicateMessage errors after update

Bug #1914437 reported by Chris MacNaughton
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Invalid
Undecided
Unassigned
Queens
Fix Released
Critical
Unassigned
Rocky
Fix Released
Critical
Unassigned
Stein
Fix Released
Critical
Unassigned
Train
New
Undecided
Unassigned
oslo.messaging
Invalid
Undecided
Unassigned
python-oslo.messaging (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Fix Released
Critical
Unassigned

Bug Description

[Impact]
A recent update to oslo.messaging to resolve #1789177 causes failures.

(Below comments copied form the original bug):

After a partial upgrade (only one side, producers or consumers), there are a lot of MessageTimeout and DuplicateMessage errors in the logs. Downgrading back to 5.35.0-0ubuntu1~cloud0 fixed the problem.

Right after restarted n-ovs-agent, I can see a lot of errors in rabbitmq log[1]
which is the same as the error when rabbitmq failover issue ( the original issue of this LP )

Then after I upgraded oslo.messaging in neutron-api unit and restarted neutron-server, below errors are gone and I was able to create instance again.

After upgrading oslo.messaging in n-ovs only, exchange they communicate didn't match.
As changing exchanges they use depends on publisher-cosumer relation.

So I think there are two ways.
1. revert this patch for Q ( original failover problem will be there )
2. upgrade them with maintenance window

Thanks a lot

[1]
################################################################################
=ERROR REPORT==== 3-Feb-2021::03:25:26 ===
Channel error on connection <0.2379.1> (10.0.0.32:60430 -> 10.0.0.34:5672, vhost: 'openstack', user: 'neutron'), channel 1:
{amqp_error,not_found,
            "no exchange 'reply_7da3cecc31b34bdeb96c866dc84e3044' in vhost 'openstack'",
            'basic.publish'}

10.0.0.32 is neutron-api unit

[Test Case]
This SRU needs the following scenarios tested:

1) partial upgrade of n-ovs at 5.35.0-0ubuntu3 [1] and n-api/n-gateway at 5.35.0-0ubuntu1 - instance creation will be successful

2) partial upgrade of n-api/n-gateway at 5.35.0-0ubuntu3 [1] and n-ovs at 5.35.0-0ubuntu1 - instance creation will be successful

3) partial upgrade of n-ovs at 5.35.0-0ubuntu2 [1] and n-api/n-gateway at 5.35.0-0ubuntu3 - instance creation will fail (see regression potential)

4) partial upgrade of n-api/n-gateway at 5.35.0-0ubuntu3 [1] and n-ovs at 5.35.0-0ubuntu2 - instance creation will fail (see regression potential)

5) test all neutron nodes at 5.35.0-0ubunt3 - instance creation will be successful

[1] and neutron* services restarted

[Regression Potential]
There is regression potential for clouds that have already upgraded to 5.35.0-0ubuntu2. This needs to be tested but if a cloud has fully upgraded to 5.35.0-0ubuntu2, then the same disruption that this SRU is trying to solve may once again occur in a cloud with some services running 5.35.0-0ubuntu2 and some running 5.35.0-0ubuntu3. Once that cloud is entirely at 5.35.0-0ubuntu3, messages will no longer timeout.

summary: - [SRU]
+ [SRU] Recent update broke message handling
summary: - [SRU] Recent update broke message handling
+ [SRU] MessageTimeout and DuplicateMessage errors after udpate
Changed in python-oslo.messaging (Ubuntu Bionic):
status: New → Triaged
importance: Undecided → Critical
Changed in python-oslo.messaging (Ubuntu):
status: New → Invalid
Changed in cloud-archive:
status: New → Invalid
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote : Re: [SRU] MessageTimeout and DuplicateMessage errors after udpate

I'm marking this as not affecting Queens because the change that caused this regression didn't get out of queens-proposed.

no longer affects: cloud-archive/queens
Revision history for this message
Robie Basak (racb) wrote :

Sorry, I specified the tag name to Chris wrong.

tags: added: regression-update
removed: regression-updates
description: updated
Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello Chris, or anyone else affected,

Accepted python-oslo.messaging into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/python-oslo.messaging/5.35.0-0ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in python-oslo.messaging (Ubuntu Bionic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-bionic
description: updated
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello Chris, or anyone else affected,

Accepted python-oslo.messaging into stein-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:stein-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-stein-needed to verification-stein-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-stein-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-stein-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello Chris, or anyone else affected,

Accepted python-oslo.messaging into rocky-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:rocky-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-rocky-needed to verification-rocky-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-rocky-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-rocky-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello Chris, or anyone else affected,

Accepted python-oslo.messaging into queens-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:queens-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-queens-needed to verification-queens-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-queens-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-queens-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote : Re: [SRU] MessageTimeout and DuplicateMessage errors after udpate

verified on stein-proposed, rocky-proposed and bionic-proposed

stein-proposed
--------------

1) partial upgrade of n-ovs at 9.5.0-0ubuntu1~cloud2 [1] and n-api/n-gateway at 9.5.0-0ubuntu1~cloud1 - instance creation successful

2) partial upgrade of n-api/n-gateway at 9.5.0-0ubuntu1~cloud2 [1] and n-ovs at 9.5.0-0ubuntu1~cloud1 - instance creation successful

3) test all neutron nodes at 9.5.0-0ubuntu1~cloud2 - instance creation successful

rocky-proposed
--------------

1) partial upgrade of n-ovs at 8.1.0-0ubuntu1~cloud2.1 [1] and n-api/n-gateway at 8.1.0-0ubuntu1~cloud0 - instance creation successful

2) partial upgrade of n-api/n-gateway at 8.1.0-0ubuntu1~cloud0 [1] and n-ovs at 8.1.0-0ubuntu1~cloud2.1 - instance creation successful

3) test all neutron nodes at 8.1.0-0ubuntu1~cloud2.1 - instance creation successful

bionic-proposed
---------------

1) partial upgrade of n-ovs at 5.35.0-0ubuntu3 [1] and n-api/n-gateway at 5.35.0-0ubuntu1 - instance creation successful

2) partial upgrade of n-api/n-gateway at 5.35.0-0ubuntu3 [1] and n-ovs at 5.35.0-0ubuntu1 - instance creation successful

3) partial upgrade of n-ovs at 5.35.0-0ubuntu2 [1] and n-api/n-gateway at 5.35.0-0ubuntu3 - instance creation failed as expected (see regression potential)

4) partial upgrade of n-api/n-gateway at 5.35.0-0ubuntu3 [1] and n-ovs at 5.35.0-0ubuntu2 - instance creation failed as expected (see regression potential)

5) test all neutron nodes at 5.35.0-0ubunt3 - instance creation successful

tags: added: verification-done-bionic verification-rocky-done verification-stein-done
removed: verification-needed-bionic verification-queens-needed verification-rocky-needed verification-stein-needed
tags: added: verification-done
removed: verification-needed
tags: added: verification-queens-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote : Update Released

The verification of the Stable Release Update for python-oslo.messaging has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote : Re: [SRU] MessageTimeout and DuplicateMessage errors after udpate

This bug was fixed in the package python-oslo.messaging - 9.5.0-0ubuntu1~cloud2
---------------

 python-oslo.messaging (9.5.0-0ubuntu1~cloud2) bionic-stein; urgency=medium
 .
   * d/p/0001-Use-default-exchange-for-direct-messaging.patch,
     d/p/0002-Cancel-consumer-if-queue-down.patch:
     Removed as it breaks partial upgrades (LP: #1914437).

Revision history for this message
Corey Bryant (corey.bryant) wrote : Update Released

The verification of the Stable Release Update for python-oslo.messaging has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote : Re: [SRU] MessageTimeout and DuplicateMessage errors after udpate

This bug was fixed in the package python-oslo.messaging - 8.1.0-0ubuntu1~cloud2.1
---------------

 python-oslo.messaging (8.1.0-0ubuntu1~cloud2.1) bionic-rocky; urgency=medium
 .
   * d/p/0001-Use-default-exchange-for-direct-messaging.patch,
     d/p/0002-Use-default-exchange-for-direct-messaging.patch,
     d/p/0003-Cancel-consumer-if-queue-down.patch,
     Removed as it breaks partial upgrades (LP: #1914437).

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (python-oslo.messaging/5.35.0-0ubuntu3)

All autopkgtests for the newly accepted python-oslo.messaging (5.35.0-0ubuntu3) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

python-ceilometermiddleware/1.2.0-0ubuntu1 (ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#python-oslo.messaging

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Corey Bryant (corey.bryant) wrote : Re: [SRU] MessageTimeout and DuplicateMessage errors after udpate

queens-proposed tested successful
---------------------------------

1) partial upgrade of n-ovs at 5.35.0-0ubuntu3~cloud0 [1] and n-api/n-gateway at 5.35.0-0ubuntu1~cloud0 - instance creation successful
2) partial upgrade of n-api/n-gateway at 5.35.0-0ubuntu1~cloud0 [1] and n-ovs at 5.35.0-0ubuntu1~cloud3 - instance creation successful
3) test all neutron nodes at 5.35.0-0ubuntu3~cloud0 - instance creation successful

tags: added: verification-queens-done
removed: verification-queens-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-oslo.messaging - 5.35.0-0ubuntu3

---------------
python-oslo.messaging (5.35.0-0ubuntu3) bionic; urgency=medium

  * d/p/0001-Use-default-exchange-for-direct-messaging.patch,
    d/p/0002-Use-default-exchange-for-direct-messaging.patch,
    d/p/0003-Cancel-consumer-if-queue-down.patch: Removed as it breaks
    partial upgrades (LP: #1914437).

 -- Chris MacNaughton <email address hidden> Wed, 03 Feb 2021 14:33:27 +0000

Changed in python-oslo.messaging (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Corey Bryant (corey.bryant) wrote : Update Released

The verification of the Stable Release Update for python-oslo.messaging has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote : Re: [SRU] MessageTimeout and DuplicateMessage errors after udpate

This bug was fixed in the package python-oslo.messaging - 5.35.0-0ubuntu3~cloud0
---------------

 python-oslo.messaging (5.35.0-0ubuntu3~cloud0) xenial-queens; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 python-oslo.messaging (5.35.0-0ubuntu3) bionic; urgency=medium
 .
   * d/p/0001-Use-default-exchange-for-direct-messaging.patch,
     d/p/0002-Use-default-exchange-for-direct-messaging.patch,
     d/p/0003-Cancel-consumer-if-queue-down.patch: Removed as it breaks
     partial upgrades (LP: #1914437).

Changed in oslo.messaging:
status: New → Invalid
Revision history for this message
Corey Bryant (corey.bryant) wrote :

More details on dates that regression landed:

stein-updates - 9.5.0-0ubuntu1~cloud1 landed 2021-01-14
rocky-updates - not affected (update never made it past rocky-proposed)
queens-updates - not affected (update never made it past queens-proposed)
bionic-updates - 5.35.0-0ubuntu2 landed 2021-01-28

summary: - [SRU] MessageTimeout and DuplicateMessage errors after udpate
+ [SRU] MessageTimeout and DuplicateMessage errors after update
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.