deployer sometimes fails with a unit status not found error

Bug #1451283 reported by Ian Booth
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Ian Booth
1.24
Fix Released
High
Ian Booth

Bug Description

Seems to be intermittent.
As reported in http://reports.vapour.ws/releases/2597/job/aws-deployer-bundle/attempt/283

2015-05-03 14:25:57 [DEBUG] deployer.import: Waiting for deploy delay
2015-05-03 14:26:12 [DEBUG] deployer.import: Adding units...
Traceback (most recent call last):
  File "/usr/bin/juju-deployer", line 9, in <module>
    load_entry_point('juju-deployer==0.4.3', 'console_scripts', 'juju-deployer')()
  File "/usr/lib/python2.7/dist-packages/deployer/cli.py", line 130, in main
    run()
  File "/usr/lib/python2.7/dist-packages/deployer/cli.py", line 228, in run
    importer.Importer(env, deployment, options).run()
  File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 202, in run
    self.add_units()
  File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 22, in add_units
    env_status = self.env.status()
  File "/usr/lib/python2.7/dist-packages/deployer/env/go.py", line 203, in status
    return self.client.get_stat()
  File "/usr/lib/python2.7/dist-packages/jujuclient.py", line 599, in get_stat
    return StatusTranslator().run(watch)
  File "/usr/lib/python2.7/dist-packages/jujuclient.py", line 991, in run
    change_set = watch.next()
  File "/usr/lib/python2.7/dist-packages/jujuclient.py", line 211, in next
    'Id': self.watcher_id})
  File "/usr/lib/python2.7/dist-packages/jujuclient.py", line 148, in _rpc
    raise EnvError(result)
jujuclient.EnvError: <Env Error - Details:
 { u'Error': u'status for key "u#python-django/0" not found',
    u'ErrorCode': u'not found',
    u'RequestId': 1,
    u'Response': { }}

Status snippet:

  python-django:
    charm: cs:trusty/python-django-12
    exposed: false
    service-status:
      current: unknown
      message: Waiting for agent initialization to finish
      since: 03 May 2015 14:25:50 UTC
    units:
      python-django/0:
        workload-status:
          current: unknown
          message: Waiting for agent initialization to finish
          since: 03 May 2015 14:25:50 UTC
        agent-status:
          current: allocating
          since: 03 May 2015 14:25:50 UTC
        agent-state: pending
        machine: 2/lxc/0

Ian Booth (wallyworld)
Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
Ian Booth (wallyworld) wrote :

Marking down to High because the last 2 CI runs passed. There's still potentially an intermittent issue though which needs to be investigated.

Changed in juju-core:
importance: Critical → High
Curtis Hovey (sinzui)
summary: - depployer sometimes fails with a unit status not found error
+ deployer sometimes fails with a unit status not found error
tags: added: intermittent-failure
Revision history for this message
Liam Young (gnuoy) wrote :

I can reliable reproduce this when deploying Openstack, unfortunately the bundle is pretty big and when I've tried to reproduce with a cut down bundle I've failed.

Revision history for this message
Liam Young (gnuoy) wrote :

Traceback (most recent call last):
  File "/usr/bin/juju-deployer", line 9, in <module>
    load_entry_point('juju-deployer==0.4.3', 'console_scripts', 'juju-deployer')()
  File "/usr/lib/python2.7/dist-packages/deployer/cli.py", line 130, in main
  File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 22, in add_units
    env_status = self.env.status()
  File "/usr/lib/python2.7/dist-packages/deployer/env/go.py", line 203, in status
    return self.client.get_stat()
  File "/usr/lib/python2.7/dist-packages/jujuclient.py", line 955, in get_stat
    return StatusTranslator().run(watch)
  File "/usr/lib/python2.7/dist-packages/jujuclient.py", line 2017, in run
    change_set = next(watch)
  File "/usr/lib/python2.7/dist-packages/jujuclient.py", line 353, in next
    'Id': self.watcher_id})
  File "/usr/lib/python2.7/dist-packages/jujuclient.py", line 288, in _rpc
    raise EnvError(result)
jujuclient.EnvError: <Env Error - Details:
 { u'Error': u'status for key "u#glance/2" not found',
    u'ErrorCode': u'not found',
    u'RequestId': 1,
    u'Response': { }}
 >
Status: 1
Traceback (most recent call last):
  File "/usr/bin/mojo", line 9, in <module>
    load_entry_point('mojo==0.1.8', 'console_scripts', 'mojo')()
  File "/usr/lib/python2.7/dist-packages/mojo/cli.py", line 570, in main
    run_with_args(args, ap)
  File "/usr/lib/python2.7/dist-packages/mojo/cli.py", line 539, in run_with_args
    args.func(args)
  File "/usr/lib/python2.7/dist-packages/mojo/cli.py", line 300, in run_from_manifest
    manifest.run(project, workspace, series, stage)
  File "/usr/lib/python2.7/dist-packages/mojo/__init__.py", line 266, in run
    auto_secrets=auto_secrets)
  File "/usr/lib/python2.7/dist-packages/mojo/phase.py", line 434, in run
    raise JujuDeployerException
mojo.phase.JujuDeployerException

Revision history for this message
Liam Young (gnuoy) wrote :
Revision history for this message
Stuart Bishop (stub) wrote :

Bug #1437266 also has the crazy unit names containing #, and is reproducible with juju-deployer.

Revision history for this message
Ian Booth (wallyworld) wrote :

Thanks for the logs. Unfortunately the logging level is WARNING so not all of the api calls are logged. It would be great if the --debug option could be used with bootstrap and logs provided. Or even add this:

logging-config: <root>=TRACE

to the environment.yaml. The trace option will provide a lot of detail but if done as a one off when this issue occurs, would be very helpful.

Revision history for this message
Liam Young (gnuoy) wrote :

I've redeployed and this time deployer exploded on ceilometer/1

2015-05-12 05:19:46 DEBUG juju.utils.ssh ssh.go:249 using OpenSSH ssh client
  File "/usr/lib/python2.7/dist-packages/deployer/cli.py", line 228, in run
    importer.Importer(env, deployment, options).run()
  File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 202, in run
    self.add_units()
  File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 22, in add_units
    env_status = self.env.status()
  File "/usr/lib/python2.7/dist-packages/deployer/env/go.py", line 203, in status
    return self.client.get_stat()
  File "/usr/lib/python2.7/dist-packages/jujuclient.py", line 955, in get_stat
    return StatusTranslator().run(watch)
  File "/usr/lib/python2.7/dist-packages/jujuclient.py", line 2017, in run
    change_set = next(watch)
  File "/usr/lib/python2.7/dist-packages/jujuclient.py", line 353, in next
    'Id': self.watcher_id})
  File "/usr/lib/python2.7/dist-packages/jujuclient.py", line 288, in _rpc
    raise EnvError(result)
jujuclient.EnvError: <Env Error - Details:
 { u'Error': u'status for key "u#ceilometer/1" not found',
    u'ErrorCode': u'not found',
    u'RequestId': 1,
    u'Response': { }}
 >
Status: 1

I'll attach logs

Revision history for this message
Liam Young (gnuoy) wrote :
Revision history for this message
Liam Young (gnuoy) wrote :
Revision history for this message
Liam Young (gnuoy) wrote :

Once this error has occurred rerunning the juju-deployer deployment command will end in the same stack trace. If I grep user-admin@local from the all-machines.log while issuing the deployer command I see:

machine-0: 2015-05-12 06:47:50 DEBUG juju.apiserver apiserver.go:287 -> [64A] user-admin@local 454.218708ms {"RequestId":0,"Response":"'body redacted'"} Admin[""].Login
machine-0: 2015-05-12 06:47:50 DEBUG juju.apiserver apiserver.go:287 -> [64B] user-admin@local 469.229133ms {"RequestId":0,"Response":"'body redacted'"} Admin[""].Login
machine-0: 2015-05-12 06:47:50 DEBUG juju.apiserver apiserver.go:273 <- [64B] user-admin@local {"RequestId":0,"Type":"Client","Request":"WatchAll","Params":"'params redacted'"}
machine-0: 2015-05-12 06:47:50 DEBUG juju.apiserver apiserver.go:287 -> [64B] user-admin@local 1.642976ms {"RequestId":0,"Response":"'body redacted'"} Client[""].WatchAll
machine-0: 2015-05-12 06:47:50 DEBUG juju.apiserver apiserver.go:273 <- [64B] user-admin@local {"RequestId":1,"Type":"AllWatcher","Id":"1","Request":"Next","Params":"'params redacted'"}
machine-0: 2015-05-12 06:47:50 DEBUG juju.apiserver apiserver.go:287 -> [64B] user-admin@local 142.503us {"RequestId":1,"Error":"status for key \"u#ceilometer/1\" not found","ErrorCode":"not found","Response":"'body redacted'"} AllWatcher["1"].Next
machine-0: 2015-05-12 06:47:51 INFO juju.apiserver apiserver.go:296 [64B] user-admin@local API connection terminated after 611.092033ms
machine-0: 2015-05-12 06:47:51 INFO juju.apiserver apiserver.go:296 [64A] user-admin@local API connection terminated after 1.124130145s

Revision history for this message
Liam Young (gnuoy) wrote :

On piece of key info I missed out was that I do not see this error with juju 1.23.2. It's only since I've been usint 1.24

David Britton (dpb)
tags: added: landscape
Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
David Britton (dpb)
tags: added: kanban-cross-team
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.