Certificate generated by certupdater worker cannot be used by MongoDB

Bug #1434680 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Ian Booth
1.22
Fix Released
Critical
Menno Finlay-Smits
1.23
Fix Released
Critical
Ian Booth

Bug Description

Juju CI upgraded all is machines to 1.22.0, All Upgrade jobs to 1.23-beta1 and 1.24-alpha1 fail. This looked like fallout from bug 1434070, but we have confirmed that 1.22.0 cannot upgrade to a previously blessed 1.23-beta1 revision. Since 1.21.3 can upgrade, it appears there is something about upgrading 1.22.0 to 1.23+ that is not accounted for.

Revision history for this message
Curtis Hovey (sinzui) wrote :

I hid the comments with the logs because they might contain confidential information. Engineers can review them and they are also available at http://reports.vapour.ws/releases/2466

Revision history for this message
Curtis Hovey (sinzui) wrote :

Attached is a redacted machine-0.log

Revision history for this message
Curtis Hovey (sinzui) wrote :

Attached is a redacted all-machines.log

Revision history for this message
Curtis Hovey (sinzui) wrote :

This is my log from a bootstrap from packaged 1.22.0 to packaged (but to yet public) 1.23-beta1

Changed in juju-core:
assignee: nobody → Menno Smits (menno.smits)
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

It looks like certificates are getting mixed up somehow. The upgrade is triggered and machine-0 reboots into the new tools version and then it looks like the certificate for API server access is being used for connecting to MongoDB! (or something)

Because the state server can't connect to MongoDB the environment can't come up.

I'll keep digging into the cause.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

The problem is easy to reproduce with the local provider:

$ /usr/bin/juju bootstrap # where /usr/bin/juju is 1.22.0 from the stable PPA
$ juju upgrade-juju --upload-tools # where juju is 1.23-beta1 or current master

The result is the same with connections to mongodb failing with this error:

juju.mongo open.go:122 TLS handshake failed: x509: certificate is valid for localhost, juju-apiserver, not juju-mongodb

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

Using git bisect, I've found that 3734d91 is the culprit. The change seems like it should be fine but repeated manual upgrades, with and without it demonstrate that it's the problem.

I'm still trying to figure out WHY it's the problem.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

The root cause is actually fairly convoluted.

Rev 3734d91 exposed the problem but it isn't actually the source. That change makes only a small non-functional cleanup to the juju-db upstart script. However, because the upstart script has changed, jujud writes out a new server.pem and restarts juju-db as it starts up into 1.23 or 1.24.

The issue is that the new server.pem is generated from the same cert and key as is the API server and since version 1.22 the certupdater worker keeps API server cert in sync with state server address changes. It also identifies the certifcate as originating from the "localhost" and "juju-apiserver" hostnames. Juju's mongodb client connection code expects a certificate for "juju-mongodb" causes connections to mongo to fail once mongo is using the new certificate file.

Although it is possible to trigger this problem through upgrades, the bug isn't really upgrade related. It is also possible to trigger it with 1.22 alone by making any edit to juju-db upstart script and restarting jujud.

Updating the ticket title to reflect this.

summary: - 1.22.0 cannot upgrade to 1.23-beta1 or 1.24-alpha1
+ Certificate generated by certupdater worker cannot be used by MongoDB
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

This fix for 1.22 was commited in 317ffb1b23f929e28942a32d032c7b99268fb533.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

The fix for 1.23 and 1.24 (master) is a little more complicated because if the upgrade is coming from 1.22.0 then the certificate in the agent config is already going to be wrong when jujud starts, preventing connections to mongodb and preventing the upgrade from completing.

wallyworld and I have discussed adding some code that runs when the agent's config is first loaded which will fix the cert at that time so that connections to mongodb can work.

There's a proof of concept of how this could work here: http://paste.ubuntu.com/10657968/

Ian Booth (wallyworld)
Changed in juju-core:
assignee: Menno Smits (menno.smits) → Ian Booth (wallyworld)
status: Triaged → In Progress
Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
Revision history for this message
Ian Booth (wallyworld) wrote :

Several upgrade tests have now passed in CI (previously failed) so marking as fix released

Changed in juju-core:
status: Fix Committed → Fix Released
Revision history for this message
Aaron Bentley (abentley) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.