Services not running that should be: nova-consoleauth with single-nova-consoleauth=true

Bug #1660244 reported by Nobuto Murata
38
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
Fix Released
Medium
Jorge Niedbalski

Bug Description

May be related: https://bugs.launchpad.net/charms/+source/nova-cloud-controller/+bug/1520545

When setting like:
 $ juju config nova-cloud-controller single-nova-consoleauth=true console-access-protocol=novnc

Then, juju status says "blocked" on all 3 units and "Services not running that should be: nova-consoleauth". With single-nova-consoleauth=true, only one unit will run nova-consoleauth so the check for running services seems false-positive.

Tags: sts
Changed in nova-cloud-controller (Juju Charms Collection):
assignee: nobody → Alex Kavanagh (ajkavanagh)
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

This bug is not apparent with ~openstack-charmers-next/nova-cloud-controller. Will see if it's just a stable problem.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I've tried every combination of single-nova-consoleauth=true & flase, and console-access-protocol=none and novnc and can't reproduce the problem

Please could you supply a bundle that exhibits the problem?
Many thanks!

Changed in nova-cloud-controller (Juju Charms Collection):
status: New → Incomplete
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Okay, having added hacluster (my error), I've reproduced an error with:

juju config nova-cloud-controller single-nova-consoleauth=false

And the corresponding output:

nova-cloud-controller/0* blocked idle 4 10.5.10.90 8774/tcp Services not running that should be: nova-consoleauth
  hacluster/0 active idle 10.5.10.90 Unit is ready and clustered
nova-cloud-controller/1 blocked idle 5 10.5.10.91 8774/tcp Services not running that should be: nova-consoleauth
  hacluster/1 active idle 10.5.10.91 Unit is ready and clustered
nova-cloud-controller/2 blocked idle 6 10.5.10.92 8774/tcp Services not running that should be: nova-consoleauth
  hacluster/2* active idle 10.5.10.92 Unit is ready and clustered

Changed in nova-cloud-controller (Juju Charms Collection):
status: Incomplete → Confirmed
Revision history for this message
James Page (james-page) wrote :

Setting as medium as this is not the preferred approach to HA of the consoleauth service - users can always use the memcache charms to provide a common token cache across all services.

Changed in nova-cloud-controller (Juju Charms Collection):
importance: Undecided → Medium
status: Confirmed → Triaged
Changed in nova-cloud-controller (Juju Charms Collection):
assignee: Alex Kavanagh (ajkavanagh) → nobody
Revision history for this message
Nobuto Murata (nobuto) wrote :

I understand memcached is a preferred approach over single-nova-consoleauth. However, some restricted environments (number of available IP addresses) may require HA VNC console without additional memcached containers. In any case, README and config.yaml in nova-cloud-controller charm could be updated to reflect the state of HA support as single-nova-consoleauth is the default right now.

James Page (james-page)
Changed in charm-nova-cloud-controller:
importance: Undecided → Medium
status: New → Triaged
Changed in nova-cloud-controller (Juju Charms Collection):
status: Triaged → Invalid
Revision history for this message
David Ames (thedac) wrote :

Just ran into this as confirmation.

Filed this bug which may or may not be a duplicate of this one. The root cause looks like the DB not being ready before corosync attempts to start nova-consoleauth.

 https://bugs.launchpad.net/charm-hacluster/+bug/1674683

If we decide single-nova-consoleauth=true is not valid with memcache we should enforce it early with a clear message in juju status.

Revision history for this message
David Ames (thedac) wrote :

More information.

When single-nova-consoleauth=false and memcache is related we still see authentication problems where refreshing the page eventually hits the correct instance that has the authentication information and the console works.

Somehow nova is not checking with memcache or memcache is not distributing its information.

Revision history for this message
David Ames (thedac) wrote :

It is possible that comment #7 is untrue. We were performing an upgrade from mitaka to newton and may have been a timing problem.

We need to test upgrading from mitaka to newton and that console access remains the same to validate.

Revision history for this message
Fairbanks. (fairbanks) wrote :

Hello,

I have reported the same bug ( https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1677566 ).

And i can confirm that adding memcached does not solve this. According to the charm's readme it should. But it doesn't. I never see any connection from the nova-cloud-controller the the memchached units when starting/accessing the console via the HTML5 web-interface.

I Also tried to set the settings for memcached manually, but that didn't seem to work ether.
This is something i have had with mitaka, newton and ocata. Setting the single-consoleauth to true fixes this, and it switches over to an other ncc when the one with the VIP dies.

Using memcached is preferred, because that really load-balance these connections.

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

Hello,

I was able to consistently reproduce this problem by deploying 3 units of ncc + hacluster, and setting single-nova-consoleauth=true and console-access-protocol=vnc.

As can be seen:

nova-cloud-controller/0* blocked idle 12 10.5.2.30 8774/tcp Services not running that should be: nova-consoleauth
   ncc-hacluster/0* active idle 10.5.2.30 Unit is ready and clustered
nova-cloud-controller/1 active idle 13 10.5.2.39 8774/tcp Unit is ready
   ncc-hacluster/2 active idle 10.5.2.39 Unit is ready and clustered
nova-cloud-controller/2 blocked idle 14 10.5.2.40 8774/tcp Services not running that should be: nova-consoleauth
  ncc-hacluster/1 active idle 10.5.2.40 Unit is ready and clustered

The root cause of this is that when using single-nova-consoleauth=true, all the service management
is performed by the pacemaker/ocf agent itself, and shouldn't be delegated to systemd, so the service_status code will always return:

root@juju-495a7a-xenial-newton-12:/home/ubuntu# systemctl is-active nova-consoleauth.service
failed

A possible fix for this is to make the resource_map ignores the nova-consoleauth services when using the nova-consoleauth=True option regardless of the console-access-protocol used, that way the service_status code will not be performed and the service handling will be delegated completely to pacemaker.

I will propose a patch using this approach.

Changed in charm-nova-cloud-controller:
status: Triaged → In Progress
assignee: nobody → Jorge Niedbalski (niedbalski)
no longer affects: nova-cloud-controller (Juju Charms Collection)
tags: added: sts
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-nova-cloud-controller (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/468955

Changed in charm-nova-cloud-controller:
milestone: none → 17.08
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-cloud-controller (master)

Reviewed: https://review.openstack.org/468955
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=0d28006e1cd7db2f37a6faaeea266fed04c57ec7
Submitter: Jenkins
Branch: master

commit 0d28006e1cd7db2f37a6faaeea266fed04c57ec7
Author: Jorge Niedbalski <email address hidden>
Date: Mon May 29 13:52:31 2017 -0400

    Disable the nova-consoleauth service on HA.

    When using the nova-cloud-controller charm in HA with the config
    option single-nova-consoleauth set to true, its expected for the
    nova-consoleauth service to be run in just a single unit at the time.

    The service management (start/stop) is performed by pacemaker in
    accordance with the cluster health using the OCF resource agent[0].

    Its required for the service to be disabled by default on upstart (trusty)
    or systemd (>=xenial).

    This change disables the service by using the service_pause
    charmhelpers call which considers both cases (upstart/systemd) when
    the ha relation is present and the single-nova-consoleauth option is
    used.

    Also, this change fixes LP: #1660244 (Services not running that should be:
    nova-consoleauth) by removing it from the resource_map when
    ha + single-nova-consoleauth is used.

    [0] https://github.com/openstack/openstack-resource-agents/blob/master/ocf/nova-consoleauth

    Closes-Bug: #1693629
    Closes-Bug: #1660244

    Change-Id: Iaffe0456cceb42ee124cb8881d3379d78cac0f3a
    Signed-off-by: Jorge Niedbalski <email address hidden>

Changed in charm-nova-cloud-controller:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-cloud-controller (stable/17.02)

Fix proposed to branch: stable/17.02
Review: https://review.openstack.org/469555

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-cloud-controller (stable/17.02)

Reviewed: https://review.openstack.org/469555
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=0723811cb448903583e208c4925af825e3667176
Submitter: Jenkins
Branch: stable/17.02

commit 0723811cb448903583e208c4925af825e3667176
Author: Jorge Niedbalski <email address hidden>
Date: Mon May 29 13:52:31 2017 -0400

    Disable the nova-consoleauth service on HA.

    When using the nova-cloud-controller charm in HA with the config
    option single-nova-consoleauth set to true, its expected for the
    nova-consoleauth service to be run in just a single unit at the time.

    The service management (start/stop) is performed by pacemaker in
    accordance with the cluster health using the OCF resource agent[0].

    Its required for the service to be disabled by default on upstart (trusty)
    or systemd (>=xenial).

    This change disables the service by using the service_pause
    charmhelpers call which considers both cases (upstart/systemd) when
    the ha relation is present and the single-nova-consoleauth option is
    used.

    Also, this change fixes LP: #1660244 (Services not running that should be:
    nova-consoleauth) by removing it from the resource_map when
    ha + single-nova-consoleauth is used.

    [0] https://github.com/openstack/openstack-resource-agents/blob/master/ocf/nova-consoleauth

    Closes-Bug: #1693629
    Closes-Bug: #1660244
    Depends-On: I9a4245e764e268327466bc0fbe8b5383303ad07f

    Change-Id: Ib012d81e0a1573f6ccf74a7a59f75329c751d7a0
    Signed-off-by: Jorge Niedbalski <email address hidden>

James Page (james-page)
Changed in charm-nova-cloud-controller:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.