Restart of nova service on controller causes sporadic instance creation failures on HA

Bug #1219005 reported by Shweta P
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cisco Openstack
Incomplete
Medium
Shweta P

Bug Description

Instances were being created succesfully before stopping the nova services on controller1

Once the nova services on control 1 was restarted random failures of instance creation is observed with the following error

root@p4-control01:/etc/init.d# nova show net75admin12
+-------------------------------------+--------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+-------------------------------------+--------------------------------------------------------------------------------------------------------------------+
| status | ERROR |
| updated | 2013-08-30T16:16:02Z |
| OS-EXT-STS:task_state | deleting |
| OS-EXT-SRV-ATTR:host | p4-compute01 |
| key_name | None |
| image | cirros-x86_64 (870e0556-90f4-4149-8ba6-033c59e8ed84) |
| hostId | d16a0203935c60288bd326effd6192ee5e8d68d465232c8f8741561f |
| OS-EXT-STS:vm_state | error |
| OS-EXT-SRV-ATTR:instance_name | instance-00000015 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | p4-compute01.ctocllab.cisco.com |
| flavor | m1.tiny (1) |
| id | cebcb012-eb0d-415b-a6d5-21ff804aec43 |
| user_id | 25edc1baf93645c89b9f7ce758fafd70 |
| name | net75admin12 |
| created | 2013-08-30T16:09:18Z |
| tenant_id | 030762b1d2cc40c79436b2036d7d241b |
| OS-DCF:diskConfig | MANUAL |
| metadata | {} |
| accessIPv4 | |
| accessIPv6 | |
| fault | {u'message': u'TypeError', u'code': 500, u'details': u'unsupported operand type(s) for +: \'NoneType\' and \'str\' |
| | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 224, in decorated_function |
| | return function(self, context, *args, **kwargs) |
| | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1416, in terminate_instance |
| | do_terminate_instance(instance, bdms) |
| | File "/usr/lib/python2.7/dist-packages/nova/openstack/common/lockutils.py", line 242, in inner |
| | retval = f(*args, **kwargs) |
| | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1408, in do_terminate_instance |
| | reservations=reservations) |
| | File "/usr/lib/python2.7/dist-packages/nova/hooks.py", line 85, in inner |
| | rv = f(*args, **kwargs) |
| | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1371, in _delete_instance |
| | project_id=project_id) |
| | File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ |
| | self.gen.next() |
| | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1344, in _delete_instance |
| | self._shutdown_instance(context, instance, bdms) |
| | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1262, in _shutdown_instance |
| | network_info = self._get_instance_nw_info(context, instance) |
| | File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 697, in _get_instance_nw_info |
| | instance, conductor_api=self.conductor_api) |
| | File "/usr/lib/python2.7/dist-packages/nova/network/quantumv2/api.py", line 366, in get_instance_nw_info |
| | result = self._get_instance_nw_info(context, instance, networks) |
| | File "/usr/lib/python2.7/dist-packages/nova/network/quantumv2/api.py", line 374, in _get_instance_nw_info |
| | nw_info = self._build_network_info_model(context, instance, networks) |
| | File "/usr/lib/python2.7/dist-packages/nova/network/quantumv2/api.py", line 793, in _build_network_info_model |
| | data = client.list_ports(**search_opts) |
| | File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 108, in with_params |
| | ret = self.function(instance, *args, **kwargs) |
| | File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 260, in list_ports |
| | **_params) |
| | File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 856, in list |
| | for r in self._pagination(collection, path, **params): |
| | File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 869, in _pagination |
| | res = self.get(path, params=params) |
| | File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 842, in get |
| | headers=headers, params=params) |
| | File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 827, in retry_request |
| | headers=headers, params=params) |
| | File "/usr/lib/python2.7/dist-packages/quantumclient/v2_0/client.py", line 762, in do_request |
| | resp, replybody = self.httpclient.do_request(action, method, body=body) |
| | File "/usr/lib/python2.7/dist-packages/quantumclient/client.py", line 161, in do_request |
| | self.authenticate() |
| | File "/usr/lib/python2.7/dist-packages/quantumclient/client.py", line 190, in authenticate |
| | token_url = self.auth_url + "/tokens" |
| | ', u'created': u'2013-08-30T16:16:03Z'} |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-AZ:availability_zone | nova |
| config_drive | |
+-------------------------------------+----------------------------------------------------------------------------------------------------

Tags: ha
tags: added: ha
Revision history for this message
Chris Ricker (chris-ricker) wrote :
Revision history for this message
Mark T. Voelker (mvoelker) wrote :

In a non-HA environment, I brought down all nova service on the compute node, then started them back up. I've since launched about a dozen instances and haven't seen any of them get into an error state.

Revision history for this message
Mark T. Voelker (mvoelker) wrote :

> I brought down all nova service on the compute node

Err, typo: I brought down all nova services on the *control* node

Changed in openstack-cisco:
status: New → Triaged
importance: High → Medium
Changed in openstack-cisco:
milestone: g.2 → g.3
Revision history for this message
Mark T. Voelker (mvoelker) wrote :

Reassigning to Shweta to see if she can reproduce, as I'm not able to in my non-HA testbed. Shweta, could you give this a spin with the current g.3 code and see if it still happens? We had discussed a suspicion that it might be something to do with the SLB layer, which means it won't manifest in non-HA setups.

Changed in openstack-cisco:
assignee: nobody → Shweta P (shweta-ap05)
summary: - Restart of nova service on controller causes instance creation failures
+ Restart of nova service on controller causes sporadic instance creation
+ failures on HA
Revision history for this message
Mark T. Voelker (mvoelker) wrote :

So far we've not been able to reproduce this outside of one HA testbed (other HA testbeds haven't shown similar issues). I'm setting this to incomplete for the moment pending confirmation from Don or others as to whether or not they're able to reproduce it.

Changed in openstack-cisco:
status: Triaged → Incomplete
Changed in openstack-cisco:
milestone: g.3 → h.0
Changed in openstack-cisco:
milestone: h.0 → h.1
Changed in openstack-cisco:
milestone: h.1 → none
Revision history for this message
Mark T. Voelker (mvoelker) wrote :

I looked at a similar issue with a deployer yesterday, and I'm wondering if we aren't hitting https://bugs.launchpad.net/nova/+bug/1241275 here. If so, there's a fix in Icehouse and a backport to stable/havana that will be present in h.3. It's unclear whether this was present in Grizzly, but as that release has been deprecated by the community there will be no grizzly backport if it was.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.