rsyslog connections fail with certificate verification errors after upgrade to 1.24.2

Bug #1474614 reported by Menno Finlay-Smits
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Andrew Wilkins
1.24
Fix Released
Critical
Andrew Wilkins

Bug Description

After upgrading a 1.24.0 envrion to 1.24.2 all non-state unit and machine agents are unable to connect to rsyslogd and continually report the following:

2015-07-15 00:41:36 DEBUG juju.worker runner.go:196 "rsyslog" started
2015-07-15 00:41:36 DEBUG juju.worker.rsyslog worker.go:222 making syslog connection for "juju-machine-8" to 10.254.26.3:6514
2015-07-15 00:41:36 DEBUG juju.worker runner.go:203 "rsyslog" done: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "juju-generated CA for environment \"rsyslog\"")
2015-07-15 00:41:36 ERROR juju.worker runner.go:223 exited "rsyslog": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "juju-generated CA for environment \"rsyslog\"")
2015-07-15 00:41:36 INFO juju.worker runner.go:261 restarting "rsyslog" in 3s

On the server side, the rsyslogd logs look like this following rsyslogd's restart after the upgrade:

Jul 15 00:12:26 ip-10-254-26-3 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="8608" x-info="http://www.rsyslog.com"] exiting on signal 15.
Jul 15 00:12:26 ip-10-254-26-3 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="8720" x-info="http://www.rsyslog.com"] start
Jul 15 00:12:26 ip-10-254-26-3 rsyslogd-2307: warning: ~ action is deprecated, consider using the 'stop' statement instead [try http://www.rsyslog.com/e/2307 ]
Jul 15 00:12:26 ip-10-254-26-3 rsyslogd-2221: module 'imuxsock' already in this config, cannot be added
 [try http://www.rsyslog.com/e/2221 ]
Jul 15 00:12:26 ip-10-254-26-3 rsyslogd: rsyslogd's groupid changed to 104
Jul 15 00:12:26 ip-10-254-26-3 rsyslogd: rsyslogd's userid changed to 101
Jul 15 00:12:26 ip-10-254-26-3 rsyslogd-2039: Could no open output pipe '/dev/xconsole': No such file or directory [try http://www.rsyslog.com/e/2039 ]
Jul 15 00:12:27 ip-10-254-26-3 rsyslogd-2089: netstream session 0x7fbd88006490 will be closed due to error
 [try http://www.rsyslog.com/e/2089 ]
Jul 15 00:12:27 ip-10-254-26-3 rsyslogd-2089: netstream session 0x7fbd88005a00 will be closed due to error
 [try http://www.rsyslog.com/e/2089 ]
Jul 15 00:12:27 ip-10-254-26-3 rsyslogd-2089: netstream session 0x7fbd88007f20 will be closed due to error
 [try http://www.rsyslog.com/e/2089 ]
Jul 15 00:12:27 ip-10-254-26-3 rsyslogd-2089: netstream session 0x7fbd880059c0 will be closed due to error
 [try http://www.rsyslog.com/e/2089 ]
...

Rsyslog 2089 error refers to certificate verification issues.

What I did to reproduce (on EC2):

# With 1.24.0
juju bootstrap --upload-tools
juju ensure-availability # I did this, not sure if relevant
juju deploy ubuntu -n 3

# wait for things to stabilise
juju set-env tools-url=https://streams.canonical.com/juju/tools
juju upgrade-juju --version="1.24.2"

Note: I was unable to reproduce the problem doing the same steps (but without HA) using the local provider.

Tags: regression
Ian Booth (wallyworld)
Changed in juju-core:
status: New → Triaged
Revision history for this message
Andrew Wilkins (axwalk) wrote :

FWIW, I repro'd with ensure-availability but could not without ensure-availability. Not sure why that's relevant yet.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

I just repro'd again, but this time it fixed itself up after a short time. I think the difference is that the first time I did it, one of the state servers didn't upgrade, and stayed on 1.24.0 (not sure why, didn't look into it).

So I *think* what is happening is that each of the state servers individually generates a new CA cert/key, because there's not one in /etc/juju/rsyslog. That's because we're not migrating the files from /var/lib/juju to /etc/juju/rsyslog; we only migrate from /var/log/juju to /etc/juju/rsyslog if you upgrade from <=1.23.x to 1.24.2. So each state server generates a new CA cert/key, then publishes to state, then each other one reacts to that and generates new cert/key for rsyslog; they should converge, *if* they all upgrade.

I think we should probably update the upgrade steps so that going from 1.24.0->1.24.2 migrates the existing certs, rather than creating new ones.

Revision history for this message
Andrew Wilkins (axwalk) wrote :
Curtis Hovey (sinzui)
Changed in juju-core:
importance: Critical → High
milestone: none → 1.25.0
Andrew Wilkins (axwalk)
Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
status: In Progress → Triaged
Revision history for this message
Andrew Wilkins (axwalk) wrote :

Forward-porting fix to 1.25 is currently blocked on other changes to the rsyslog worker being forward ported (i.e. the change that the upgrade step is concerned with).

Revision history for this message
Andrew Wilkins (axwalk) wrote :

I've got a PR up that forward-ports all of the required changes as well as the upgrade fix here: https://github.com/juju/juju/pull/2884

Changed in juju-core:
status: Triaged → In Progress
Andrew Wilkins (axwalk)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
Revision history for this message
Felipe Reyes (freyes) wrote :

Hello,

I'm getting this problem using local provider (kvm), I'm using 1.24.5[0], this environment wasn't upgrade, it was deployed directly using 1.24.5, my configuration is:

environments:
    kvm:
        type: local
        default-series: trusty
        container: kvm
        network-bridge: br0
        apt-http-proxy: http://192.168.0.103:3142

I'm attaching all-machines.log and machine-1.log.

==> /var/log/juju/machine-1.log <==
2015-09-08 20:55:00 INFO juju.worker runner.go:269 start "rsyslog"
2015-09-08 20:55:00 DEBUG juju.worker.rsyslog worker.go:106 starting rsyslog worker mode 1 for "machine-1" "freyes-kvm"
2015-09-08 20:55:00 DEBUG juju.worker runner.go:196 "rsyslog" started
2015-09-08 20:55:00 DEBUG juju.worker.rsyslog worker.go:223 making syslog connection for "juju-freyes-kvm-machine-1" to 192.168.0.103:6514
2015-09-08 20:55:00 INFO juju.worker runner.go:275 stopped "rsyslog", err: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "juju-generated CA for environment \"rsyslog\"")
2015-09-08 20:55:00 DEBUG juju.worker runner.go:203 "rsyslog" done: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "juju-generated CA for environment \"rsyslog\"")
2015-09-08 20:55:00 ERROR juju.worker runner.go:223 exited "rsyslog": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "juju-generated CA for environment \"rsyslog\"")
2015-09-08 20:55:00 INFO juju.worker runner.go:261 restarting "rsyslog" in 3s

[0] juju-core:
  Installed: 1.24.5-0ubuntu1~15.04.1~juju1
  Candidate: 1.24.5-0ubuntu1~15.04.1~juju1
  Version table:
 *** 1.24.5-0ubuntu1~15.04.1~juju1 0
        500 http://ppa.launchpad.net/juju/stable/ubuntu/ vivid/main amd64 Packages
        100 /var/lib/dpkg/status

Revision history for this message
Felipe Reyes (freyes) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.