Cinder host manager doesn't remove disabled/stopped service from host_state_map

Bug #1192416 reported by kaitian521 on 2013-06-19
46
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Cinder
Undecided
Haomai Wang

Bug Description

when I have a bad cinder-volume service,

cinder-manage service list

cinder-volume localhost nova enabled :-) 2013-06-19 07:20:17
cinder-scheduler localhost nova enabled :-) 2013-06-19 07:20:17
cinder-volume other nova enabled XXX 2013-06-19 07:20:17
cinder-scheduler other nova enabled XXX 2013-06-19 07:20:17

#######################################################################################
SO it is very easy to reproduce it:

step 1: change our hostname to others "hostname other"
step 2 : restart all cinder service : "cinder-api, cinder-scheduler, cinder-volume"
step 3 : cinder-manage service list
   Then you can get some bad cinder-volume service as above.
########################################################################################

When running "cinder create 1"

cinder scheduler will log an warning:

Traceback (most recent call last):
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py", line 433, in _process_data
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp **args)
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/dispatcher.py", line 148, in dispatch
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/scheduler/manager.py", line 115, in create_volume
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp context, ex, request_spec)
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp self.gen.next()
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/scheduler/manager.py", line 104, in create_volume
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp filter_properties)
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/scheduler/filter_scheduler.py", line 64, in schedule_create_volume
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp filter_properties)
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/scheduler/filter_scheduler.py", line 197, in _schedule
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp hosts = self.host_manager.get_all_host_states(elevated)
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/cinder/scheduler/host_manager.py", line 268, in get_all_host_states
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp "(host: %s)") % host)
2013-06-18 04:01:38.771 54708 TRACE cinder.openstack.common.rpc.amqp TypeError: not all arguments converted during string formatting

Possible solution is to remove disabled/stopped service from host_state_map() or simple clear the whole map.

########################################
for service in volume_services:
            if not utils.service_is_up(service) or service['disabled']:
                LOG.warn(_("service is down or disabled. Host name :%s")
                    % service['host'])
                _host = self.host_state_map.get(service['host'])
                if _host:
                    self.host_state_map.pop(service['host'])
                continue
            host = service['host']
########################################
The reason is that get_all_host_states method is use to get all service state and cinder scheduler choose the cinder-volume service from them.
Then when cinder-volume service is down or disabled, the state of bad one need remove from host_state_map list.

Huang Zhiteng (zhiteng-huang) wrote :

Please provide more information on how to reproduce this bug. What you have pointed out is not relevant (to the issue).

Changed in cinder:
status: New → Invalid
status: Invalid → Incomplete
kaitian521 (kaitian521) on 2013-06-19
description: updated
tags: added: cinder-scheduler
description: updated
Changed in cinder:
status: Incomplete → New
Huang Zhiteng (zhiteng-huang) wrote :

Did you try the correction? I think the original code is correct. Again please provide the steps to reproduce this bug.

Changed in cinder:
status: New → Incomplete
kaitian521 (kaitian521) wrote :

TO "Did you try the correction? I think the original code is correct"

You said you think the original code is right, AND I think maybe you are wrong.

Of course I tried it

kaitian521 (kaitian521) on 2013-06-19
description: updated
Changed in cinder:
status: Incomplete → New
description: updated
Huang Zhiteng (zhiteng-huang) wrote :

OK, what version of Cinder are you using?

kaitian521 (kaitian521) wrote :

VersionInfo(cinder:2013.2)

Thank you?

kaitian521 (kaitian521) wrote :

[root@localhost cinder]# cinder-scheduler --version
2013.2
[root@localhost cinder]# cinder --version
1.0.3.11

should be this one

kaitian521 (kaitian521) wrote :

AND i find that in github, https://github.com/openstack/cinder/blob/master/cinder/scheduler/host_manager.py

the code is not changed yet

renminmin (rmm0811) wrote :

I found the same problem.
I think the code of host_manange.py/get_all_host_states need to change like this.

########################################
for service in volume_services:
            if not utils.service_is_up(service) or service['disabled']:
                LOG.warn(_("service is down or disabled. Host name :%s")
                    % service['host'])
                _host = self.host_state_map.get(service['host'])
                if _host:
                    self.host_state_map.pop(service['host'])
                continue
            host = service['host']
########################################
The reason is that get_all_host_states method is use to get all service state and cinder scheduler choose the cinder-volume service from them.
Then when cinder-volume service is down or disabled, the state of bad one need remove from host_state_map list.

renminmin (rmm0811) on 2013-06-24
Changed in cinder:
status: New → Incomplete
status: Incomplete → New
Vincent Hou (houshengbo) wrote :

Zhiteng, my team also find the same issue. I think it can be confirmed. We arer using Havana-1.

Steps:
1. Start all the cinder services, change the host name of thte cinder volume and restart cinder volume.
2. Run "cinder create 1".
Ths log is the same as kaitian521 described.

Changed in cinder:
status: New → Incomplete
status: Incomplete → Confirmed
Haomai Wang (haomai) on 2013-07-04
Changed in cinder:
assignee: nobody → Haomai Wang (haomai)

Fix proposed to branch: master
Review: https://review.openstack.org/35676

Changed in cinder:
status: Confirmed → In Progress
summary: - cinder/scheduler/host_manager.py , line 268, in get_all_host_states
+ Cinder host manager doesn't remove disabled/stopped service from
+ host_state_map
description: updated
Haomai Wang (haomai) on 2013-07-09
Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2013-07-17
Changed in cinder:
milestone: none → havana-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2013-10-17
Changed in cinder:
milestone: havana-2 → 2013.2

Hi,

I have installed an openstack havana on ubuntu 12.04 using the cloudarchive repository.
 dpkg -l cinder-scheduler
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Description
+++-=====================================================-=====================================================-==========================================================================================================================
ii cinder-scheduler 1:2013.2-0ubuntu1~cloud0 Cinder storage service - Scheduler server

And if edit /usr/share/pyshared/cinder/scheduler/host_manager.py I see this:

    def get_all_host_states(self, context):
        """Returns a dict of all the hosts the HostManager
          knows about. Also, each of the consumable resources in HostState
          are pre-populated and adjusted based on data in the db.

          For example:
          {'192.168.1.100': HostState(), ...}
        """

        # Get resource usage across the available volume nodes:
        topic = CONF.volume_topic
        volume_services = db.service_get_all_by_topic(context, topic)
        self.host_state_map.clear()
        for service in volume_services:
            host = service['host']
            if not utils.service_is_up(service) or service['disabled']:
                LOG.warn(_("volume service is down or disabled. "
                           "(host: %s)") % host)
                continue
            capabilities = self.service_states.get(host, None)
            host_state = self.host_state_map.get(host)
            if host_state:
                # copy capabilities to host_state.capabilities
                host_state.update_capabilities(capabilities,
                                               dict(service.iteritems()))
            else:
                host_state = self.host_state_cls(host,
                                                 capabilities=capabilities,
                                                 service=
                                                 dict(service.iteritems()))
                self.host_state_map[host] = host_state
            # update host_state
            host_state.update_from_volume_capability(capabilities)

        return self.host_state_map.itervalues()

It seams like the code that solves the problem has not been added.

Regards,
Gabriel

Vish Ishaya (vishvananda) wrote :

The code is right here:

        self.host_state_map.clear()

Where in the code do you put self.host_state_map.clear()? do you remove anything elese?

PhilippeA (philippe-amelant) wrote :

Hello,
it's look like this bug is still present in juno pre release.
regards

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers