manila in HA does not deploy

Bug #1867358 reported by Marian Gasparovic
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Manila Charm
Fix Released
High
Unassigned
OpenStack Manila-Ganesha Charm
Fix Released
High
Chris MacNaughton

Bug Description

When I deploy manila as a single unit, all is fine. If I use 3 units with vip and hacluster as shown in deployment guide, manila stays blocked

manila/0 blocked idle 4/lxd/4 10.0.2.90 8786/tcp Services not running that should be: manila-scheduler, manila-data; Ports which should be open, but are not: 8776
  hacluster-manila/2 active idle 10.0.2.90 Unit is ready and clustered
manila/1 blocked idle 5/lxd/3 10.0.2.100 8786/tcp Services not running that should be: manila-scheduler, manila-data; Ports which should be open, but are not: 8776
  hacluster-manila/1 active idle 10.0.2.100 Unit is ready and clustered
manila/2* blocked idle 6/lxd/3 10.0.2.95 8786/tcp Ports which should be open, but are not: 8776
  hacluster-manila/0* active idle 10.0.2.95 Unit is ready and clustered

manila-ganesha is fine

Unit Workload Agent Machine Public address Ports Message
manila-ganesha/0* active idle 1/lxd/3 10.0.1.67 Unit is ready
  hacluster-manila-ganesha/0* active idle 10.0.1.67 Unit is ready and clustered
manila-ganesha/1 active idle 2/lxd/4 10.0.1.69 Unit is ready
  hacluster-manila-ganesha/2 active idle 10.0.1.69 Unit is ready and clustered
manila-ganesha/2 active idle 3/lxd/3 10.0.1.68 Unit is ready
  hacluster-manila-ganesha/1 active idle 10.0.1.68 Unit is ready and clustered

If I ssh to units I can start manila-scheduler and manila-data manualy, but I cannot get rid of port error.

manila is listening on port 8786
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=152,fd=13))
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=225,fd=3))
LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=225,fd=4))
LISTEN 0 128 [::1]:11211 [::]:* users:(("memcached",pid=46995,fd=26))
LISTEN 0 128 *:8786 *:* users:(("manila-api",pid=47062,fd=8),("manila-api",pid=47062,fd=7),("manila-api",pid=47061,fd=8),("manila-api",pid=47061,fd=7),("manila-api",pid=47060,fd=8),("manila-api",pid=47060,fd=7),("manila-api",pid=47059,fd=8),("manila-api",pid=47059,fd=7),("manila-api",pid=46977,fd=7))

for a comparison, when running single manila unit, it looks the same

State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=152,fd=13))
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=235,fd=3))
LISTEN 0 128 [::1]:11211 [::]:* users:(("memcached",pid=27493,fd=26))
LISTEN 0 128 *:8786 *:* users:(("manila-api",pid=27917,fd=8),("manila-api",pid=27917,fd=7),("manila-api",pid=27916,fd=8),("manila-api",pid=27916,fd=7),("manila-api",pid=27915,fd=8),("manila-api",pid=27915,fd=7),("manila-api",pid=27914,fd=8),("manila-api",pid=27914,fd=7),("manila-api",pid=27742,fd=7))
LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=235,fd=4))

  manila:
    charm: cs:manila
    num_units: 3
    options:
      openstack-origin: *openstack-origin
      vip: *manila-vip
      # default-share-backend is a somewhat opaque configuration option
      # that gets mapped into Manila's configuration file. The names are
      # proscribed by the share type drivers in Manila. I suspect
      # that we could remove this configuration, but the Manila charm
      # currently goes into a blocked state without it set, and is broken
      # if set incorrectly. In essence, to use CephFS with Ganesha, it must
      # be set to cephfsnfs1
      default-share-backend: cephfsnfs1
      share-protocols: NFS
    bindings:
      "": *oam-space
      public: *public-space
      admin: *admin-space
      internal: *internal-space
      shared-db: *internal-space
    to:
    - lxd:1003
    - lxd:1004
    - lxd:1005

Tags: cdo-qa
tags: added: cdo-qa
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi Marian

Please could you add to the bug report:

- The bundle used for deploying the model.
- Any relevant logs from the affected units.

Many thanks

Changed in charm-manila:
status: New → Incomplete
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :
Download full text (3.6 KiB)

I'm able to reproduce this with a fairly minimal bundle:

series: bionic
variables:
  origin: &origin cloud:bionic-stein
applications:
  mysql:
    charm: cs:percona-cluster
    num_units: 1
    options:
      source: *origin
  rabbitmq-server:
    charm: cs:rabbitmq-server
    num_units: 1
    options:
      source: *origin
  keystone:
    charm: cs:keystone
    num_units: 1
    options:
      openstack-origin: *origin
  manila:
    charm: cs:manila
    num_units: 3
    options:
      openstack-origin: *origin
      vip: 10.5.110.241
      default-share-backend: generic
  manila-generic:
    charm: cs:~openstack-charmers/manila-generic
    options:
      openstack-origin: *origin
      driver-handles-share-servers: False
  manila-hacluster:
    charm: cs:hacluster
relations:
  - [ keystone, manila ]
  - [ keystone, mysql:shared-db ]
  - [ manila:manila-plugin, manila-generic]
  - [ manila, manila-hacluster ]
  - [ manila, mysql:shared-db ]
  - [ manila, rabbitmq-server ]

Model Controller Cloud/Region Version SLA Timestamp
icey icey-serverstack serverstack/serverstack 2.7.3 unsupported 09:58:33Z

App Version Status Scale Charm Store Rev OS Notes
keystone 15.0.0 active 1 keystone jujucharms 310 ubuntu
manila 8.0.0 blocked 3 manila jujucharms 17 ubuntu
manila-generic 8.0.0 active 3 manila-generic jujucharms 25 ubuntu
manila-hacluster active 3 hacluster jujucharms 64 ubuntu
mysql 5.7.20 active 1 percona-cluster jujucharms 284 ubuntu
rabbitmq-server 3.6.10 active 1 rabbitmq-server jujucharms 99 ubuntu

Unit Workload Agent Machine Public address Ports Message
keystone/0* active idle 0 10.5.0.6 5000/tcp Unit is ready
manila/0* blocked idle 1 10.5.0.4 8786/tcp Ports which should be open, but are not: 8776
  manila-generic/0* active idle 10.5.0.4 Unit is ready
  manila-hacluster/0* active idle 10.5.0.4 Unit is ready and clustered
manila/1 blocked idle 2 10.5.0.7 8786/tcp Ports which should be open, but are not: 8776
  manila-generic/1 active idle 10.5.0.7 Unit is ready
  manila-hacluster/1 active idle 10.5.0.7 Unit is ready and clustered
manila/2 blocked idle 3 10.5.0.12 8786/tcp Ports which should be open, but are not: 8776
  manila-generic/2 active idle 10.5.0.12 Unit is ready
  manila-hacluster/2 active idle 10.5.0.12 Unit is ready and clustered
mysql/0* active idle 4 10.5.0.19 3306/tcp Unit is ready
rabbitmq-server/0* active idle 5 10.5.0.16 5672/tcp Unit is ready

Machine State DNS Inst id Series AZ Message
0 started 10.5.0.6 8cec7f3e-6141-43fc-8c65-d6290...

Read more...

Changed in charm-manila:
status: Incomplete → Confirmed
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :
Revision history for this message
Marian Gasparovic (marosg) wrote :

@Chris I tried to make my bundle smaller, similar to yours, then I get the port error as you do and no complaints about scheduler and data. When I deploy my whole bundle, I get all three errors, when I restart manila units I have just port error.
It makes no sense as I did not modify any manila relations. Also manila units are in separate containers, they should not be influenced by other not related units.

Changed in charm-manila:
importance: Undecided → High
Revision history for this message
Ian Marsh (drulgaard) wrote :

(Before I start... confession ... I currently know very little about pacemaker or the inner workings of charms... so apologies if I'm barking up the wrong tree)

It looks like it fails to set up the resources (e.g. the vip):
ubuntu@juju-a55f84-2-lxd-8:~$ sudo crm config show
node 1000: juju-a55f84-2-lxd-8
node 1001: juju-a55f84-0-lxd-8
node 1002: juju-a55f84-1-lxd-8
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.18-2b07d5c5a9 \
        cluster-infrastructure=corosync \
        cluster-name=debian \
        no-quorum-policy=stop \
        cluster-recheck-interval=60 \
        stonith-enabled=false
rsc_defaults rsc-options: \
        resource-stickiness=100 \
        failure-timeout=0

Relevant bit of /var/log/juju/unit-manila-ha-0.log (manila-ha/0 is the leader):
2020-03-26 15:47:43 INFO juju-log hanode:108: Setting cluster symmetry
2020-03-26 15:47:43 WARNING juju-log hanode:108: Inconsistent or absent enable-resources setting []
2020-03-26 15:47:43 WARNING juju-log hanode:108: Unable to calculated desired symmetric-cluster setting
2020-03-26 15:47:43 DEBUG juju-log hanode:108: Deleting Resources
2020-03-26 15:47:43 DEBUG juju-log hanode:108: Configuring Resources: {}
2020-03-26 15:47:44 DEBUG juju-log hanode:108: Configuring Groups: {}
2020-03-26 15:47:44 DEBUG juju-log hanode:108: Configuring Master/Slave (ms): {}
2020-03-26 15:47:44 DEBUG juju-log hanode:108: Configuring Orders: {}
2020-03-26 15:47:44 DEBUG juju-log hanode:108: Configuring Clones: {}
2020-03-26 15:47:44 DEBUG juju-log hanode:108: Configuring Colocations: {}
2020-03-26 15:47:44 DEBUG juju-log hanode:108: Configuring Locations: {}
2020-03-26 15:47:44 INFO juju-log hanode:108: Configuring any remote nodes
2020-03-26 15:47:45 DEBUG juju-log hanode:108: Checking for pacemaker-remote nodes
2020-03-26 15:47:45 WARNING juju-log hanode:108: Inconsistent or absent enable-resources setting []
2020-03-26 15:47:45 WARNING juju-log hanode:108: Unable to calculate whether resources should run on remotes
2020-03-26 15:47:56 DEBUG juju-log hanode:108: Pacemaker is ready
2020-03-26 15:49:34 DEBUG juju-log Pacemaker is ready

Following the code...
unit-manila-ha-0/charm/hooks/hooks.py:ha_relation_changed()
unit-manila-ha-0/charm/hooks/utils.py:set_cluster_symmetry()
unit-manila-ha-0/charm/hooks/utils.py:need_resources_on_remotes()
... it seems like this:
    for relid in relation_ids('pacemaker-remote'):
        for unit in related_units(relid):
            data = parse_data(relid, unit, 'enable-resources')
            # parse_data returns {} if key is absent.
            if type(data) is bool:
                responses.append(data)
... doesn't result in a clean response.

I'll keep digging but in the meantime I'm hoping this triggers an "ah ha" moment in someone more knowledgeable.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-manila (master)

Reviewed: https://review.opendev.org/714040
Committed: https://git.openstack.org/cgit/openstack/charm-manila/commit/?id=4f5c6a9712276ab3e3b99ae99de14507c8bc626d
Submitter: Zuul
Branch: master

commit 4f5c6a9712276ab3e3b99ae99de14507c8bc626d
Author: Chris MacNaughton <email address hidden>
Date: Fri Mar 20 08:12:22 2020 +0100

    Enable HA with Manila

    An indirect requirement of enabling HA with Manila
    is migrating from directly using WSGI to using
    mod-wsgi with Apache.

    Change-Id: I1f501283db3db1338d47a89a7688cf5035d08a7a
    Closes-Bug: #1867358

Changed in charm-manila:
status: In Progress → Fix Committed
Changed in charm-manila-ganesha:
assignee: nobody → Chris MacNaughton (chris.macnaughton)
importance: Undecided → High
Changed in charm-manila:
milestone: none → 20.05
David Ames (thedac)
Changed in charm-manila:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-manila-ganesha (stable/20.05)

Fix proposed to branch: stable/20.05
Review: https://review.opendev.org/730835

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-manila-ganesha (master)

Reviewed: https://review.opendev.org/728929
Committed: https://git.openstack.org/cgit/openstack/charm-manila-ganesha/commit/?id=d4f0215a1b5923645ad0bd0af6912afacbc1ea8f
Submitter: Zuul
Branch: master

commit d4f0215a1b5923645ad0bd0af6912afacbc1ea8f
Author: Chris MacNaughton <email address hidden>
Date: Mon May 18 17:33:45 2020 +0200

    Change Ganesha to HA deploy

    This change also modifies the ganesha + manila-share
    services to run via pacemaker to enable them to be
    colocated with the VIP, a requirement to run manila
    and ganesha in a highly available configuration

    Change-Id: Idd0b594c24ef029f2415ee9ca13a8aca6d01d2a7
    Closes-Bug: #1867358

Changed in charm-manila-ganesha:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-manila-ganesha (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/743212

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-manila-ganesha (stable/20.05)

Change abandoned by Chris MacNaughton (icey) (<email address hidden>) on branch: stable/20.05
Review: https://review.opendev.org/730835

Changed in charm-manila-ganesha:
milestone: none → 21.04
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.