subordinate service not appearing in watch output

Bug #1421315 reported by Greg Lutostanski
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Medium
Unassigned
juju-deployer
In Progress
Wishlist
Adam Israel

Bug Description

When running a deployment we sometimes see the following:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/oil_ci/deploy/oil_deployer.py", line 140, in deploy
    debug=True, verbose=True)
  File "/usr/lib/python2.7/dist-packages/oil_ci/juju/juju_deployer.py", line 85, in run_deployer
    importer.Importer(env, deploy, options=opts).run()
  File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 202, in run
    self.add_units()
  File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 26, in add_units
    cur_units = len(env_status['services'][svc.name].get('units', ()))
KeyError: 'ceilometer-agent'

Will paste deployment yaml and all output in a comment.

Revision history for this message
Greg Lutostanski (lutostag) wrote :
Download full text (13.0 KiB)

following yaml + output... I will try to repro it outside so we can actively poke it -- but we don't hit this error very often so I am not hopeful that I will be successful every time.

oil_deployment:
  overrides:
    openstack-origin: cloud:trusty-juno
    source: cloud:trusty-updates/juno
  relations:
  - - keystone
    - mysql
  - - nova-cloud-controller
    - glance
  - - nova-cloud-controller
    - keystone
  - - nova-cloud-controller
    - mysql
  - - nova-cloud-controller
    - rabbitmq-server
  - - glance
    - keystone
  - - glance
    - mysql
  - - cinder
    - glance
  - - cinder
    - keystone
  - - cinder
    - nova-cloud-controller
  - - cinder
    - mysql
  - - cinder
    - rabbitmq-server
  - - openstack-dashboard
    - keystone
  - - heat
    - mysql
  - - heat
    - rabbitmq-server
  - - heat
    - keystone
  - - ceilometer:identity-service
    - keystone:identity-service
  - - ceilometer
    - rabbitmq-server
  - - ceilometer
    - mongodb
  - - ceilometer-agent
    - ceilometer
  - - neutron-gateway
    - mysql
  - - neutron-gateway
    - nova-cloud-controller
  - - nova-compute:amqp
    - rabbitmq-server:amqp
  - - nova-compute
    - nova-cloud-controller
  - - nova-compute
    - glance
  - - neutron-api
    - mysql
  - - neutron-api
    - rabbitmq-server
  - - neutron-api
    - nova-cloud-controller
  - - neutron-api
    - neutron-openvswitch
  - - neutron-api
    - keystone
  - - neutron-openvswitch
    - nova-compute
  - - neutron-openvswitch
    - rabbitmq-server
  - - swift-proxy
    - keystone
  - - swift-proxy
    - swift-storage
  - - ceilometer-agent
    - nova-compute
  - - swift-proxy
    - glance
  - - ceph
    - cinder
  - - ceph
    - nova-compute
  series: trusty
  services:
    ceilometer:
      branch: lp:charms/ceilometer
      to:
      - lxc:nova-compute=0
    ceilometer-agent:
      branch: lp:charms/ceilometer-agent
    ceph:
      branch: lp:~lutostag/charms/trusty/ceph/fix-six-bug
      num_units: 3
      options:
        fsid: 6547bd3e-1397-11e2-82e5-53567c8d32dc
        monitor-count: 3
        monitor-secret: AQCXrnZQwI7KGBAAiPofmKEXKxu5bUzoYLVkbQ==
        osd-devices: /dev/sdc /dev/sdd /srv/ceph
        osd-reformat: 'yes'
    cinder:
      branch: lp:~openstack-charmers/charms/trusty/cinder/next
      options:
        block-device: None
        glance-api-version: 2
        remove-missing: true
    glance:
      branch: lp:~openstack-charmers/charms/trusty/glance/next
      to:
      - lxc:nova-compute=1
    heat:
      branch: lp:~openstack-charmers/charms/trusty/heat/next
      to:
      - lxc:nova-compute=2
    keystone:
      branch: lp:~gnuoy/charms/trusty/keystone/next-1385105
      options:
        admin-password: openstack
        admin-token: ubuntutesting
      to:
      - lxc:cinder=0
    mongodb:
      branch: lp:charms/mongodb
      to:
      - lxc:neutron-gateway=0
    mysql:
      branch: lp:~lutostag/charms/trusty/mysql/fix-six-bug
      to:
      - lxc:nova-cloud-controller=0
    neutron-api:
      branch: lp:charms/neutron-api
      options:
        neutron-security-groups: true
      to:
      - lxc:ceph=0
    neutron-gateway:
      branch: lp:charms/quantum-gateway
      ...

tags: added: oil
Curtis Hovey (sinzui)
affects: juju-core → juju-deployer
Changed in juju-deployer:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Kapil Thangavelu (hazmat)
importance: High → Undecided
Revision history for this message
Kapil Thangavelu (hazmat) wrote :

investigating

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

the root issue is that subordinate service is not appearing in the watch output, which is a bug in core. deployer tries to do some workarounds on watches to ensure its consistent by waiting for 5.1 seconds (implementation details on core which may have changed). status is implemented via api by core for a while now, so the other work around is to try and use that api instead of the watch.

summary: - juju-deployer KeyError in add_units during a deploy
+ subordinate service not appearing in watch output
Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Richard Harding (rharding) wrote :

Tools like the Juju Gui rely on the allwatcher to know when services are added/etc a subordinate with no units should still go across the watcher so we can show it and allow a GUI user to relate it to an existing service in the environment. If the subordinate is not in the AllWatcher output then the GUI will never know that the service already exists and is not able to represent that to users in a sane way.

Revision history for this message
Jay R. Wren (evarlast) wrote :

I checked allwatcher and juju-gui with juju tools versions 1.21.3 and 1.22-beta3 and I could not reproduce this bug.

Revision history for this message
Jay R. Wren (evarlast) wrote :

Update: Confirmed that 1.22-beta4 tools to now exhibit this behavior.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: none → 1.23
importance: Medium → High
tags: added: api regression
tags: added: juju-gui
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I'm not quite sure what was expected - can somebody paste a juju status output with 1.22-beta3 (where it works) and the same scenario with 1.22-beta4 (where it doesn't) ? Looking just at the bundle yaml does not help much.

Revision history for this message
Jay R. Wren (evarlast) wrote :

Comment #6 is a typo and should read:

    Confirmed that 1.22-beta4 tools do not exhibit this behavior.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

As 1.22-beta4 does not seem to be affected, I'm marking it as Incomplete - we have a few fixes to land.

Revision history for this message
Richard Harding (rharding) wrote :

Greg, can you provide additional details to the bug report please? What version of juju, the deployer, and python-jujuclient were you using?

You mention "When running a deployment we sometimes see the following" which I'm wondering if this is some sort of race condition. Can you specify how often you see it and firm up 'sometimes'?

If this is a timing related issue, can you repeat the deployer call and it succeeds?

Thanks for the additional information.

Revision history for this message
Greg Lutostanski (lutostag) wrote :

juju-core 1.21.1-0ubuntu1~12.04.1~juju1
juju-deployer 0.4.3-0ubuntu1~ubuntu12.04.1~ppa1
python-jujuclient 0.18.4-5

Over the data I have from February it happened 43/3092 times.

Since the latest update to:
juju-core 1.22-beta4-0ubuntu1~12.04.1~juju1
juju-deployer 0.4.3-0ubuntu1~ubuntu12.04.1~ppa1
python-jujuclient 0.18.4-5

I am running our data collector against the runs since the update and will respond when its done (but given the rarity of hitting it and that we have only had 568 runs so far after updating to beta4, even if we havent hit it yet I cannot say if it is entirely fixed or not).

Currently we can't repro it effectively; so one-off tests like trying the deployer call again isn't possible ATM, will discuss if we can get this in the future for these hard to repro bugs.

Revision history for this message
Greg Lutostanski (lutostag) wrote :

Just looked at the data from the runs, it looks like we haven't hit it since the update to 1.22-beta4-0ubuntu1~12.04.1~juju1, but again there is not a high degree of confidence in this data.

Changed in juju-deployer:
assignee: Kapil Thangavelu (hazmat) → nobody
Revision history for this message
Adam Israel (aisrael) wrote :

I'm running into this error while running unit tests for Postgresql (see https://code.launchpad.net/~stub/charms/precise/postgresql/enable-integration-tests/+merge/238283).

I'm running 1.22.0-trusty-amd64, using the local provider in a Vagrant VM.

If I run individual tests, I haven't seen it error. If I run the full test suite via bundletester, I see it regularly.

Revision history for this message
Antonio Rosales (arosales) wrote :

@Adam,
If you have the all-machine log hand you could you pastebin that?

juju-core devs,
Any other debug information that would be helpful?

-thanks,
Antonio

Revision history for this message
Antonio Rosales (arosales) wrote :

Per comment 13 updating the status to "new."

-thanks,
Antonio

Ian Booth (wallyworld)
Changed in juju-core:
milestone: 1.23 → 1.24-alpha1
status: Triaged → Incomplete
no longer affects: juju-core/1.22
Adam Israel (aisrael)
Changed in juju-deployer:
status: Triaged → New
Changed in juju-core:
status: Incomplete → New
Revision history for this message
Ian Booth (wallyworld) wrote :

Setting to incomplete because we don't have the requested all-machine log file attached.

Changed in juju-core:
status: New → Incomplete
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.24-alpha1 → none
Curtis Hovey (sinzui)
Changed in juju-core:
importance: High → Medium
Adam Israel (aisrael)
Changed in juju-deployer:
status: New → In Progress
assignee: nobody → Adam Israel (aisrael)
Curtis Hovey (sinzui)
Changed in juju-core:
status: Incomplete → Fix Released
Revision history for this message
Ravi Kumar (ravikumarseo) wrote :

Nice

Tom Haddon (mthaddon)
Changed in juju-deployer:
importance: Undecided → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.