xenial: nova-api-metadata not running post deployment

Bug #1547122 reported by James Page
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron-gateway (Juju Charms Collection)
Fix Released
High
James Page
nova-cloud-controller (Juju Charms Collection)
Fix Released
High
James Page
openstack-pkg-tools (Ubuntu)
Won't Fix
Medium
Unassigned
Nominated for Xenial by Alberto Salvia Novella

Bug Description

For a xenial deploy with next, the nova-api-metadata service is not running post deployment.

Not sure why right now - it manually starts OK and is functional post deployment.

James Page (james-page)
Changed in neutron-gateway (Juju Charms Collection):
importance: Undecided → High
milestone: none → 16.04
tags: added: openstack xenial
James Page (james-page)
Changed in neutron-gateway (Juju Charms Collection):
assignee: nobody → James Page (james-page)
status: New → In Progress
Revision history for this message
James Page (james-page) wrote :

systemd is less aggressive than upstart in restarting of services that shutdown very quickly; implementing a remote-restart nonce on the relation between nova-cc and neutron-gateway to ensure that nova-api-metadata gets restarted once the nova-cc charm is all up and running.

Changed in nova-cloud-controller (Juju Charms Collection):
status: New → In Progress
importance: Undecided → High
assignee: nobody → James Page (james-page)
milestone: none → 16.04
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-cloud-controller (master)

Fix proposed to branch: master
Review: https://review.openstack.org/305658

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-gateway (master)

Fix proposed to branch: master
Review: https://review.openstack.org/305670

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-gateway (master)

Reviewed: https://review.openstack.org/305670
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-gateway/commit/?id=c2872788818456478092dbd073034862aca18f2e
Submitter: Jenkins
Branch: master

commit c2872788818456478092dbd073034862aca18f2e
Author: James Page <email address hidden>
Date: Thu Apr 14 09:53:55 2016 +0100

    Restart nova-api-metadata on restart_nonce changes

    The nova-cloud-controller presents a restart_nonce key on the
    quantum-network-service relation once db migration has been
    completed and the nova-conductor service is able to respond to
    RPC calls.

    Restart the nova-api-metadata when this data changes to ensure
    a running service post deployment.

    Change-Id: Iafc27fbb2a70e3195fc189e4056a1ca58ff6b663
    Closes-Bug: 1547122

Changed in neutron-gateway (Juju Charms Collection):
status: In Progress → Fix Committed
Changed in nova-cloud-controller (Juju Charms Collection):
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-cloud-controller (master)

Reviewed: https://review.openstack.org/305658
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=4c68802ade2828fa3c463cf573e1831f447e293f
Submitter: Jenkins
Branch: master

commit 4c68802ade2828fa3c463cf573e1831f447e293f
Author: James Page <email address hidden>
Date: Thu Apr 14 09:18:30 2016 +0100

    Trigger restarts for nova-api-metadata

    The neutron-gateway charm operates a nova-api-metadata service
    for instance access to Nova metadata; this needs to be restarted
    after a database initialization as if the conductors are not
    responding, this service will shut itself down.

    Set a remote_restart nonce on the quantum-network-service
    relation post database migration to trigger remote restarts.

    Change-Id: I469d4119dd95cc51378d7ca2e3ef736d94c12226
    Closes-Bug: 1547122

James Page (james-page)
Changed in neutron-gateway (Juju Charms Collection):
status: Fix Committed → Fix Released
Changed in nova-cloud-controller (Juju Charms Collection):
status: Fix Committed → Fix Released
Revision history for this message
James Page (james-page) wrote :

Re-opening as race still exists (but with the AMQP relation instead).

Changed in nova-cloud-controller (Juju Charms Collection):
status: Fix Released → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-cloud-controller (master)

Fix proposed to branch: master
Review: https://review.openstack.org/335104

James Page (james-page)
Changed in nova-cloud-controller (Juju Charms Collection):
milestone: 16.04 → 16.07
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-cloud-controller (master)

Reviewed: https://review.openstack.org/335104
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=928644532b74d9bcb442d1fbb85eef7b03662934
Submitter: Jenkins
Branch: master

commit 928644532b74d9bcb442d1fbb85eef7b03662934
Author: James Page <email address hidden>
Date: Tue Jun 28 16:48:01 2016 +0100

    Restart neutron-gateway services on amqp complete

    The nova-api-metadata service which runs in the neutron-gateway
    charm units will shutdown if it does not receive a response from
    a nova-conductor within a short timeout period; this can happen
    if the nova-conductor is not yet wired to the messaging bus.

    Trigger a remote restart using the existing relation api when
    the amqp relation is complete, and nova-conductor services should
    be (shortly) up and running.

    Change-Id: I47d242a6e4a11d30400b9b787ace752a472b2d1e
    Closes-Bug: 1547122

Changed in nova-cloud-controller (Juju Charms Collection):
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-cloud-controller (stable/16.04)

Fix proposed to branch: stable/16.04
Review: https://review.openstack.org/335847

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-cloud-controller (stable/16.04)

Reviewed: https://review.openstack.org/335847
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=6d0b01f1fc310959876693b407d58c6c3a257146
Submitter: Jenkins
Branch: stable/16.04

commit 6d0b01f1fc310959876693b407d58c6c3a257146
Author: James Page <email address hidden>
Date: Tue Jun 28 16:48:01 2016 +0100

    Restart neutron-gateway services on amqp complete

    The nova-api-metadata service which runs in the neutron-gateway
    charm units will shutdown if it does not receive a response from
    a nova-conductor within a short timeout period; this can happen
    if the nova-conductor is not yet wired to the messaging bus.

    Trigger a remote restart using the existing relation api when
    the amqp relation is complete, and nova-conductor services should
    be (shortly) up and running.

    Change-Id: I47d242a6e4a11d30400b9b787ace752a472b2d1e
    Closes-Bug: 1547122
    (cherry picked from commit 928644532b74d9bcb442d1fbb85eef7b03662934)

James Page (james-page)
Changed in neutron-gateway (Juju Charms Collection):
status: Fix Released → New
Changed in nova-cloud-controller (Juju Charms Collection):
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-gateway (master)

Fix proposed to branch: master
Review: https://review.openstack.org/339712

Changed in neutron-gateway (Juju Charms Collection):
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-gateway (master)

Reviewed: https://review.openstack.org/339712
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-gateway/commit/?id=1fd4c20f5d7531bffb8ab6ce8653697a2f1e841a
Submitter: Jenkins
Branch: master

commit 1fd4c20f5d7531bffb8ab6ce8653697a2f1e841a
Author: James Page <email address hidden>
Date: Fri Jul 8 17:46:58 2016 +0100

    Use correct relation key for restarts

    The nova-cloud-controller charm set the relation key 'restart_trigger';
    this charm was using 'restart_nonce' which obviously never gets set,
    so the nova-api-metadata service would never actually get restarted
    when required.

    Use the correct relation key, fixing remote restart triggers for
    nova-api-metadata, resolving races in deployment.

    Change-Id: Ic3dbdd41f87c0362f7f725d0f58458f5239ea093
    Closes-Bug: 1547122

Changed in neutron-gateway (Juju Charms Collection):
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-gateway (master)

Fix proposed to branch: master
Review: https://review.openstack.org/340832

Revision history for this message
James Page (james-page) wrote :

Raising bug task for openstack-pkg-tools; the behaviour of a systemd unit generated by this package is:

  Restart=on-failure

this is different to upstart, which will always restart a process that exist irrespective of its exit code, mimicing:

  Restart=always

Lets consider switching the default to behave the same as it has done for the last 4 years.

Changed in openstack-pkg-tools (Ubuntu):
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-gateway (master)

Reviewed: https://review.openstack.org/340832
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-gateway/commit/?id=68ea83e60d4930b6c80a910c3ef5465c5f34416b
Submitter: Jenkins
Branch: master

commit 68ea83e60d4930b6c80a910c3ef5465c5f34416b
Author: James Page <email address hidden>
Date: Tue Jul 12 10:57:28 2016 +0100

    Avoid restart races for nova-api-metadata

    It's possible that the nova-api-metadata will startup during the
    time that the nova-conductor processes on the nova-cloud-controller
    units are still starting up, resulting in a messaging timeout which
    causes the daemon to exit 0.

    Upstart will restart a service in this scenario, however systemd is
    configured in packaging to only restart 'on-failure' so will not
    attempt to restart.

    This points to two other bugs - one that a messaging timeout results
    in a exit code of 0, and that the OpenStack services under systemd
    behave differently to under upstart.

    Install an override file for systemd based installs to mimic the
    behaviour of upstart, and deal with a code logic problem in the
    restart_trigger handling to ensure that the charm does at least
    try to restart the nova-api-metadata service at the right points
    in time.

    Change-Id: Ia08b7840efa33fd301d0e2c55bb30ae1a102cbfa
    Closes-Bug: 1547122

Changed in openstack-pkg-tools (Ubuntu):
importance: Undecided → Medium
James Page (james-page)
Changed in neutron-gateway (Juju Charms Collection):
status: Fix Committed → Fix Released
Chuck Short (zulcss)
Changed in openstack-pkg-tools (Ubuntu):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.