Rabbit password is reset on every upgrade which forces lockstep cluster restarts

Bug #1300507 reported by Julian Edwards
82
This bug affects 13 people
Affects Status Importance Assigned to Milestone
maas (Ubuntu)
Fix Released
Critical
Greg Lutostanski

Bug Description

Every time the maas region controller package is updated it updates the rabbit password.

This means remote clusters are no longer able to connect to rabbit until they are restarted and re-receive credentials.

The packaging *must not* force lockstep upgrades like this, data centres will want to do rolling upgrades.

Related branches

Revision history for this message
Julian Edwards (julian-edwards) wrote :

I consider this a critical bug, but I cannot set the priority on here.

Revision history for this message
Robie Basak (racb) wrote :

Setting Critical for Julian.

Changed in maas (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Julian,

As we previously discussed, the region needs to be able to tell the cluster about the updated password so the cluster keeps making the requests to rabbitmq without having to manually restart the cluster. This needs to be fixed in MAAS core regardless of maas packaging changes the password on every upgrade.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in maas (Ubuntu):
status: New → Confirmed
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Andres, I think we're going to have to agree to disagree on this one then.

Changing the password during installation is about the worst time it can be done as there is going to be some instability anyway. I do agree that changes need to be conveyed to the clusters, but if the change is made outside of MAAS's control I don't see how it can know and react unless MAAS has a region controller hook for the packaging to call. It will be quite a lot of work to implement this compared to the simple packaging fix that can be made in the short term.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Why is the rabbit password changed anyway?

Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1300507] Re: Rabbit password is reset on every upgrade which forces lockstep cluster restarts

On Tuesday 22 Jul 2014 10:47:09 you wrote:
> Why is the rabbit password changed anyway?

Absolutely no idea, it seems pointless but Andres wrote that packaging code so
he may be able to explain.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

The MAAS config file gets changed by the packaging because at the time MAAS did not support conf.d/ (and it does not currently support it either). The packaging updates the config file (which it actually shouldn't be doing, but it was the only way of solving the problem).

The problem was that if we were providing a new config quite constantly, which meant that a new config file needed to be installed replacing the older config, causing upgrades to fail because there was no way to obtain the old password. This is not a simple packaging fix, at least, at the time it wasn't and it required lots of hacky things (since we were doing things we werent supposed by policy anyway)

Now, as I have expressed before, the Region needs to be able to notify the Clusters about its password changes. It doesn't matter who changes the password here, whether it is the user directly or the packaging, the issue still remains and this should be fixed in MAAS and not just go for a quick fix in packaging. This is a bug in MAAS.

Changed in maas:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Gavin Panella (allenap) wrote :

RabbitMQ will be going away this cycle, so we should avoid investing a lot of time in an engineering fix for this.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Precisely why I was advocating a quick packaging fix. :)

Revision history for this message
David Britton (dpb) wrote :

Saw this as well, Latest in trusty -> 1.6b6.

tags: added: landscape
tags: added: cloud-installer
Revision history for this message
David Britton (dpb) wrote :

I just hit it again -- on upgrading from 1.6rc1

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Same here on another machine. Rabbit was full of these:
=ERROR REPORT==== 12-Aug-2014::19:18:59 ===
closing AMQP connection <0.4488.0> (10.96.0.10:36895 -> 10.96.0.10:5672):
{handshake_error,starting,0,
                 {amqp_error,access_refused,
                             "AMQPLAIN login refused: user 'maas_workers' - invalid credentials",
                             'connection.start_ok'}}

And celery.log was full of these:
[2014-08-12 19:06:55,739: ERROR/MainProcess] consumer: Cannot connect to amqp://maas_workers@10.96.0.10:5672//maas_workers: [Errno 104] Connection reset by peer.

Revision history for this message
Greg Lutostanski (lutostag) wrote :

Found where it happens:
debian/maas-region-controller.postinst:95

Should only happen when creating a new user -- will have to find out why the region-controller does not think it has a user already.

Will dive in further.

Changed in maas (Ubuntu):
assignee: nobody → Greg Lutostanski (lutostag)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package maas - 1.6.1+bzr2550-0ubuntu1

---------------
maas (1.6.1+bzr2550-0ubuntu1) utopic; urgency=medium

  * New upstream bugfix release:
    - Auto-link node MACs to Networks (LP: #1341619)

  [ Julian Edwards ]
  * debian/maas-region-controller.postinst: Don't restart RabbitMQ on
    upgrades, just ensure it's running. Should prevent a race with the
    cluster celery restarting.
  * debian/rules: Pull upstream branch from the right place.

  [ Andres Rodriguez ]
  * debian/maas-region-controller.postinst: Ensure cluster celery is
    started if it also runs on the region.
 -- Julian Edwards <email address hidden> Thu, 21 Aug 2014 18:38:27 +1000

Changed in maas (Ubuntu):
status: Confirmed → Fix Released
no longer affects: maas
Revision history for this message
neel (neel-basu-z) wrote :

I am getting an Error with Ubuntu 14.04 Server installed from downloaded iso.

[2014-09-26 12:37:35,356: ERROR/Beat] beat: Connection error: timed out. Trying again in 32.0 seconds...
[2014-09-26 12:37:35,357: ERROR/MainProcess] consumer: Cannot connect to amqp://maas_workers@192.168.250.140:5672//maas_workers: timed out.
Trying again in 32.00 seconds...

[2014-09-26 12:38:11,403: ERROR/MainProcess] consumer: Cannot connect to amqp://maas_workers@192.168.250.140:5672//maas_workers: timed out.
Trying again in 32.00 seconds...

[2014-09-26 12:38:11,403: ERROR/Beat] beat: Connection error: timed out. Trying again in 32.0 seconds...

It all started once I started downloading pxe boot images with sudo -E maas-import-pxe-files

Is it related to this bug ? what is the workaround ?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Hi neel,

unlikely. A timeout usually means a network connectivity problem. If it were a password problem, you would get an immediate error and it would say the credentials are incorrect.

Revision history for this message
Tuomas Heino (iheino+ub) wrote :

FYI neel, current version in 14.04.1 LTS (1.5.4+bzr2294-0ubuntu1.1) does not seem include a fix for this. A backport (or SRU?) would be nice to have for this. Or a ReleaseNotes entry at least.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

We'll SRU 1.7 once it's super-solid!

Mark

tags: added: verification-done
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.