afterHook logic skipped if hook is missing

Bug #1450917 reported by Curtis Hovey
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Ian Booth
1.24
Fix Released
Critical
Ian Booth

Bug Description

As seen in
     http://reports.vapour.ws/releases/2592
the maas-1_7-deployer job cannot complete. For 1.23 and 1.22, it completed in less than 12 minutes, but for 1.24, it failed to complete in 30 minutes. The agents say they are in pending, which isn't really a deployer issue. We know that other deploy and uprade jobs with the same version of juju passed. as do other jujus pass this test.

The root cause is that if a charm does not implement a hook, it is correctly skipped by the uniter, but the after hook logic is not run. The means that in the case of the landscape charms, the logic to update the workload status from maintenance is never run.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: none → 1.25.0
Revision history for this message
Ian Booth (wallyworld) wrote :

The units that are still shown as pending as recorded as still being in the process of running the install hook. It is not yet clear why the install hook does not complete for those landscape units. Can we get the unit log files of the landscape units to see what might be happening? Could we run status-history on the failing units (in addition to just status) when the deployment fails?

    units:
      landscape/0:
        workload-status:
          current: maintenance
          message: installing charm software
          since: 01 May 2015 18:59:33 UTC
        agent-status:
          current: idle
          since: 01 May 2015 19:01:55 UTC
          version: 1.24-alpha1
        agent-state: pending
        agent-version: 1.24-alpha1
        machine: "3"
        public-address: maas-node-112.maas

Revision history for this message
Ian Booth (wallyworld) wrote :

Looking at the install hook for the landscape charm, all it does is apt-get some packages. So it's possible that is blocking because something else has the apt lock?

Revision history for this message
Ian Booth (wallyworld) wrote :

The deployer logs show that the landscape charm is being installed to machine 3.

2015-05-01 18:59:33 [DEBUG] deployer.env: Delta machine: 3 change:pending
2015-05-01 18:59:33 [DEBUG] deployer.env: Delta machine: 3 change:started
2015-05-01 18:59:33 [DEBUG] deployer.env: Delta unit: landscape/0 change:pending
2015-05-01 18:59:38 [DEBUG] deployer.env: Delta unit: landscape/0 change:pending
2015-05-01 18:59:38 [DEBUG] deployer.env: Delta service: landscape change:{u'Err': None, u'Since': u'2015-05-01T18:59:33Z', u'Current': u'maintenance', u'Version': u'', u'Message': u'installing charm software', u'Data': {}}
2015-05-01 18:59:38 [DEBUG] deployer.env: Delta unit: landscape/0 change:pending
2015-05-01 18:59:38 [DEBUG] deployer.env: Delta unit: landscape/0 change:pending
2015-05-01 19:01:48 [DEBUG] deployer.env: Delta unit: landscape/0 change:pending
2015-05-01 19:01:53 [DEBUG] deployer.env: Delta unit: landscape/0 change:pending
2015-05-01 19:01:58 [DEBUG] deployer.env: Delta unit: landscape/0 change:pending
Traceback (most recent call last):
  File "/usr/bin/juju-deployer", line 9, in <module>

Machine 3 has started at 18:59:33 and then the landscape charm starts installing. From that point it appears no more status updates are received by the deployer for that charm. This needs to be investigated.

Revision history for this message
Ian Booth (wallyworld) wrote :

I deployed the bundle on AWS and as per status-history, the landscape charm does run the install and start hooks. But unlike the other charms in the bundle, the transition from installing to started does not occur. It's only the landscape charms that are affected.

$ juju status-history landscape/0
TIME TYPE STATUS MESSAGE
02 May 2015 10:57:06 AEST workload unknown Waiting for agent initialization to finish
02 May 2015 10:57:06 AEST agent allocating
02 May 2015 10:59:08 AEST agent executing running install hook
02 May 2015 10:59:08 AEST workload maintenance installing charm software
02 May 2015 11:03:37 AEST agent executing running leader-elected hook
02 May 2015 11:03:37 AEST agent executing running config-changed hook
02 May 2015 11:03:44 AEST agent executing running start hook
02 May 2015 11:03:47 AEST agent idle

$ juju status
...
[Units]
ID WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE
apache2/0 unknown idle 1.24-alpha1.1 1 54.166.254.96
haproxy/0 unknown idle 1.24-alpha1.1 2 54.197.123.224
landscape-msg/0 maintenance idle 1.24-alpha1.1 4 54.160.126.130 installing charm software
...

Revision history for this message
Ian Booth (wallyworld) wrote :

These lines from the landscape unit log provide the vital clue. If the start hook is not implemented by the charm, the uniter never runs the logic to update the workload status. We expect to see

INFO juju.worker.uniter.operation runhook.go:115 ran "start" hook

But because the above is not there, the vital logic is skipped.

2015-05-02 01:03:44 INFO juju.worker.uniter modes.go:124 found queued "start" hook
2015-05-02 01:03:44 INFO juju.worker.uniter.operation executor.go:66 running operation run start hook
2015-05-02 01:03:44 INFO juju.worker.uniter.operation executor.go:87 preparing operation "run start hook"
2015-05-02 01:03:44 INFO juju.worker.uniter.operation executor.go:87 executing operation "run start hook"
2015-05-02 01:03:44 DEBUG juju.worker.uniter modes.go:31 [AGENT-STATUS] executing running start hook
2015-05-02 01:03:45 INFO juju.worker.uniter.context runner.go:155 skipped "start" hook (not implemented)
2015-05-02 01:03:45 INFO juju.worker.uniter.context context.go:444 handling reboot
2015-05-02 01:03:45 INFO juju.worker.uniter.operation runhook.go:121 skipped "start" hook (missing)
2015-05-02 01:03:45 INFO juju.worker.uniter.operation executor.go:87 committing operation "run start hook"

Ian Booth (wallyworld)
summary: - deployer on maas 1.7 cannot complete
+ afterHook logic skipped if hook is missing
description: updated
Ian Booth (wallyworld)
Changed in juju-core:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.