Looping config-changed hooks in fresh juju-core 1.24.3 Openstack deployment

Bug #1478024 reported by Peter Sabaini
40
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
William Reade
1.24
Fix Released
Critical
William Reade

Bug Description

Testing our standard Openstack-HA deployment with juju-core 1.24.3 results in ongoing config-changed hook runs, even after letting it settle for 2h+

Top output:

ubuntu@apollo:~$ top -b -n1 | head -n20
top - 16:15:26 up 8:46, 2 users, load average: 5.77, 6.60, 6.59
Tasks: 843 total, 3 running, 840 sleeping, 0 stopped, 0 zombie
%Cpu(s): 33.0 us, 7.8 sy, 0.0 ni, 58.4 id, 0.3 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem: 32908308 total, 32003636 used, 904672 free, 584024 buffers
KiB Swap: 8388604 total, 30100 used, 8358504 free. 21018196 cached Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1383041 root 20 0 140392 56372 5776 R 92.9 0.2 0:48.82 config-changed
   3495 syslog 20 0 1364728 27108 2424 S 24.8 0.1 55:00.82 rsyslogd
  12128 root 20 0 5914920 1.597g 1.494g S 24.8 5.1 84:07.66 mongod
  85221 root 20 0 544600 27516 14584 S 12.4 0.1 1:33.09 jujud
1420856 root 20 0 115404 27616 5784 S 12.4 0.1 0:01.47 config-changed
   5501 root 20 0 544216 25000 14592 S 6.2 0.1 0:45.19 jujud
  12186 root 20 0 1807572 688820 19452 S 6.2 2.1 16:51.11 jujud
  12449 root 20 0 543000 23860 14480 S 6.2 0.1 0:17.80 jujud
  16968 root 20 0 544600 25956 14640 S 6.2 0.1 2:18.24 jujud
  18693 root 20 0 544600 24096 14652 S 6.2 0.1 1:48.54 jujud
  30913 root 20 0 610264 26240 14620 S 6.2 0.1 5:15.93 jujud
 169544 root 20 0 477464 24732 14584 S 6.2 0.1 0:51.15 jujud
 169545 root 20 0 477464 24588 14520 S 6.2 0.1 0:57.52 jujud

/var/log/juju from machine0 is at chinstrap:/home/sabaini/juju-1.24.3-test

Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.24.4
Revision history for this message
Curtis Hovey (sinzui) wrote :

@peter
chinstrap:/home/sabaini/juju-1.24.3-test is empty. Can you upload the all-machines.log?

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Oy vey, operator_error. I've moved the logfile into the right place now (on the phone right now, uploading won't work, sorry)

Changed in juju-core:
milestone: 1.24.4 → 1.25.0
importance: High → Critical
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

I've only had a preliminary look through the logs but what looks like is happening in each unit is that the uniter frequently dies because "leadership failure: unable to make a leadership claim: worker stopped". It then restarts which triggers a new "config-changed" hook.

I think the uniter failures are caused by these regular errors on the state servers:

...
2015-07-23 16:01:35 DEBUG juju.worker runner.go:196 "lease manager" started
2015-07-23 16:01:56 DEBUG juju.cmd.jujud machine.go:1592 worker "lease manager" exited with writing lease token: could not add token "neutron-openvswitch/0" to data-store: simultaneous lease updates occurred
2015-07-23 16:01:56 INFO juju.worker runner.go:275 stopped "lease manager", err: writing lease token: could not add token "neutron-openvswitch/0" to data-store: simultaneous lease updates occurred
2015-07-23 16:01:56 DEBUG juju.worker runner.go:203 "lease manager" done: writing lease token: could not add token "neutron-openvswitch/0" to data-store: simultaneous lease updates occurred
2015-07-23 16:01:56 ERROR juju.worker runner.go:223 exited "lease manager": writing lease token: could not add token "neutron-openvswitch/0" to data-store: simultaneous lease updates occurred
2015-07-23 16:01:56 INFO juju.worker runner.go:261 restarting "lease manager" in 3s
2015-07-23 16:01:59 INFO juju.worker runner.go:269 start "lease manager"
2015-07-23 16:01:59 DEBUG juju.worker runner.go:196 "lease manager" started
...

These happen at a similar frequency to the uniter failures in the units.

It is somewhat likely that this issue is also the root cause of bug 1478232.

Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

This might also explain bug 1474588.

Curtis Hovey (sinzui)
tags: added: leadership upgrade-juju
Curtis Hovey (sinzui)
tags: added: blocker
Curtis Hovey (sinzui)
tags: removed: blocker
Curtis Hovey (sinzui)
Changed in juju-core:
importance: Critical → High
status: Triaged → Fix Committed
assignee: nobody → William Reade (fwereade)
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.