nova service-delete fails for services on non-child (top) cell

Bug #1361186 reported by Rajesh Tailor
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Sean Dague
Juno
Fix Released
Undecided
Unassigned

Bug Description

Nova service-delete fails for services on non-child (top) cell.

How to reproduce:

$ nova --os-username admin service-list

+----------------+------------------+---------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----------------+------------------+---------------------+----------+---------+-------+----------------------------+-----------------+
| region!child@1 | nova-conductor | region!child@ubuntu | internal | enabled | up | 2014-08-18T06:06:56.000000 | - |
| region!child@2 | nova-compute | region!child@ubuntu | nova | enabled | up | 2014-08-18T06:06:55.000000 | - |
| region!child@3 | nova-cells | region!child@ubuntu | internal | enabled | up | 2014-08-18T06:06:59.000000 | - |
| region!child@4 | nova-scheduler | region!child@ubuntu | internal | enabled | up | 2014-08-18T06:06:50.000000 | - |
| region@1 | nova-cells | region@ubuntu | internal | enabled | up | 2014-08-18T06:06:59.000000 | - |
| region@2 | nova-cert | region@ubuntu | internal | enabled | up | 2014-08-18T06:06:58.000000 | - |
| region@3 | nova-consoleauth | region@ubuntu | internal | enabled | up | 2014-08-18T06:06:57.000000 | - |
+----------------+------------------+---------------------+----------+---------+-------+----------------------------+-----------------+

Stop one of the services on top cell (e.g. nova-cert).

$ nova --os-username admin service-list

+----------------+------------------+---------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----------------+------------------+---------------------+----------+---------+-------+----------------------------+-----------------+
| region!child@1 | nova-conductor | region!child@ubuntu | internal | enabled | up | 2014-08-18T06:09:26.000000 | - |
| region!child@2 | nova-compute | region!child@ubuntu | nova | enabled | up | 2014-08-18T06:09:25.000000 | - |
| region!child@3 | nova-cells | region!child@ubuntu | internal | enabled | up | 2014-08-18T06:09:19.000000 | - |
| region!child@4 | nova-scheduler | region!child@ubuntu | internal | enabled | up | 2014-08-18T06:09:20.000000 | - |
| region@1 | nova-cells | region@ubuntu | internal | enabled | up | 2014-08-18T06:09:19.000000 | - |
| region@2 | nova-cert | region@ubuntu | internal | enabled | down | 2014-08-18T06:08:28.000000 | - |
| region@3 | nova-consoleauth | region@ubuntu | internal | enabled | up | 2014-08-18T06:09:27.000000 | - |
+----------------+------------------+---------------------+----------+---------+-------+----------------------------+-----------------+

Nova service-delete:
$ nova --os-username admin service-delete 'region@2'

Check the request id from nova-api.log:

2014-08-18 15:10:23.491 INFO nova.osapi_compute.wsgi.server [req-e134d915-ad66-41ba-a6f8-33ec51b7daee admin demo] 192.168.101.31 "DELETE /v2/d66804d2e78549cd8f5efcedd0abecb2/os-services/region@2 HTTP/1.1" status: 204 len: 179 time: 0.1334069

Error log in n-cell-region service:

2014-08-18 15:10:23.464 ERROR nova.cells.messaging [req-e134d915-ad66-41ba-a6f8-33ec51b7daee admin demo] Error locating next hop for message: 'NoneType' object has no attribute 'count'
2014-08-18 15:10:23.464 TRACE nova.cells.messaging Traceback (most recent call last):
2014-08-18 15:10:23.464 TRACE nova.cells.messaging File "/opt/stack/nova/nova/cells/messaging.py", line 406, in process
2014-08-18 15:10:23.464 TRACE nova.cells.messaging next_hop = self._get_next_hop()
2014-08-18 15:10:23.464 TRACE nova.cells.messaging File "/opt/stack/nova/nova/cells/messaging.py", line 361, in _get_next_hop
2014-08-18 15:10:23.464 TRACE nova.cells.messaging dest_hops = target_cell.count(_PATH_CELL_SEP)
2014-08-18 15:10:23.464 TRACE nova.cells.messaging AttributeError: 'NoneType' object has no attribute 'count'

Appendix:
In case of services on child cell, no issues.

$ nova --os-username admin service-list

+----------------+------------------+---------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----------------+------------------+---------------------+----------+---------+-------+----------------------------+-----------------+
| region!child@1 | nova-conductor | region!child@ubuntu | internal | enabled | up | 2014-08-18T06:14:46.000000 | - |
| region!child@2 | nova-compute | region!child@ubuntu | nova | enabled | down | 2014-08-18T06:13:15.000000 | - |
| region!child@3 | nova-cells | region!child@ubuntu | internal | enabled | up | 2014-08-18T06:14:49.000000 | - |
| region!child@4 | nova-scheduler | region!child@ubuntu | internal | enabled | up | 2014-08-18T06:14:50.000000 | - |
| region@1 | nova-cells | region@ubuntu | internal | enabled | up | 2014-08-18T06:14:49.000000 | - |
| region@2 | nova-cert | region@ubuntu | internal | enabled | down | 2014-08-18T06:08:28.000000 | - |
| region@3 | nova-consoleauth | region@ubuntu | internal | enabled | up | 2014-08-18T06:14:47.000000 | - |
+----------------+------------------+---------------------+----------+---------+-------+----------------------------+-----------------+

Delete child cell service:
$ nova --os-username admin service-delete 'region!child@2'

$ nova --os-username admin service-list

+----------------+------------------+---------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----------------+------------------+---------------------+----------+---------+-------+----------------------------+-----------------+
| region!child@1 | nova-conductor | region!child@ubuntu | internal | enabled | up | 2014-08-18T06:15:46.000000 | - |
| region!child@3 | nova-cells | region!child@ubuntu | internal | enabled | up | 2014-08-18T06:15:39.000000 | - |
| region!child@4 | nova-scheduler | region!child@ubuntu | internal | enabled | up | 2014-08-18T06:15:40.000000 | - |
| region@1 | nova-cells | region@ubuntu | internal | enabled | up | 2014-08-18T06:15:39.000000 | - |
| region@2 | nova-cert | region@ubuntu | internal | enabled | down | 2014-08-18T06:08:28.000000 | - |
| region@3 | nova-consoleauth | region@ubuntu | internal | enabled | up | 2014-08-18T06:15:47.000000 | - |
+----------------+------------------+---------------------+----------+---------+-------+----------------------------+-----------------+

Tags: cells ntt
Changed in nova:
assignee: nobody → Rajesh Tailor (rajesh-tailor)
Revision history for this message
RedBaron (dheeraj-gupta4) wrote :

To my mind, this bug occurs because both cases
1. When a service is to be deleted at the top level
2. Deletion has to be initiated at the top level (A nova service-delete command for instance)
are handled by the same function in the nova.compute.cells_api.HostAPI method

def service_delete(self, context, service_id):
        """Deletes the specified service."""
        self.cells_rpcapi.service_delete(context, service_id)

The method does the correct thing when the command is executed/initiated - Strips the cellname and service name and forwards the message to the concerned cell. The target cell on receiving the message calls appropriate method in nova.cells.messaging._TargetedMessageMethods

def service_delete(self, message, service_id):
        """Deletes the specified service."""
        self.host_api.service_delete(message.ctxt, service_id)

For child cells this is fine since self.host_api points to nova.compute.HostAPI but for an API cell the self.host_api is nova.compute.cells_api.HostAPI which initiated the message in the first place. The top level cell thus has no way of knowing that the user wants to delete a service on _this_ cell and forwards the request (like it did in the beginning).

As an example,If the service ID to delete is 'toplevel@4' i.e. in API cell, the API cell splits out the cell and service using cells.utils function and gets cell_name:toplevel, service_id:4. It then forwards the message to itself. On receiving this message it only gets service_id:4 and (again), tries to split it for forwarding. Now spliting returns a cell_name: None and hence the eroor occurs while trying to route the message.

The same problem exists in service_update method too.

tags: added: cells
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/118674

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/126851

Changed in nova:
assignee: Rajesh Tailor (rajesh-tailor) → RedBaron (dheeraj-gupta4)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Rajesh Tailor (<email address hidden>) on branch: master
Review: https://review.openstack.org/118674
Reason: Similar patch has submitted to address this issue.
please refer: https://review.openstack.org/#/c/126851/

Changed in nova:
assignee: RedBaron (dheeraj-gupta4) → Sean Dague (sdague)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/126851
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b0feaba45ff53d20a93bc4b29936d13550d15bf6
Submitter: Jenkins
Branch: master

commit b0feaba45ff53d20a93bc4b29936d13550d15bf6
Author: Dheeraj Gupta <email address hidden>
Date: Wed Oct 8 09:24:18 2014 +0000

    Make service-delete work in API cells

    nova service-delete is handled by HostAPI.service_delete method.
    Normally, it searches for relevant Service object in DB and calls
    destroy() on it. However, in API cell it is over-ridden so that
    service_delete message can be routed to appropriate cell. Once the
    destination cell is reached, the message processor invokes
    HostAPI.service_delete to perform the actual delete. This works in
    case of child cells.
    However in case of API cells, since HostAPI.service_delete has been
    over-ridden, when the message processor invokes HostAPI.service_delete
    to perform actual delete of the service, the API cell tries to again
    route the message and fails in doing so with an AttributeError (as
    the message is already at the destination cell and can not be further
    routed).
    This patch moves the object destroy() call to a separate function and
    modifies the message processor to call the new function. The original
    service_delete is modified to also call the new function.

    Change-Id: I9148d8ceb5cdeb858dc9741b24cf0e03487c9a62
    Closes-Bug: 1361186

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/147844

Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-2 → 2015.1.0
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/juno)

Reviewed: https://review.openstack.org/147844
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3212242fc36ea501fb1d118c380f3e9c9105ec3a
Submitter: Jenkins
Branch: stable/juno

commit 3212242fc36ea501fb1d118c380f3e9c9105ec3a
Author: Dheeraj Gupta <email address hidden>
Date: Wed Oct 8 09:24:18 2014 +0000

    Make service-delete work in API cells

    nova service-delete is handled by HostAPI.service_delete method.
    Normally, it searches for relevant Service object in DB and calls
    destroy() on it. However, in API cell it is over-ridden so that
    service_delete message can be routed to appropriate cell. Once the
    destination cell is reached, the message processor invokes
    HostAPI.service_delete to perform the actual delete. This works in
    case of child cells.
    However in case of API cells, since HostAPI.service_delete has been
    over-ridden, when the message processor invokes HostAPI.service_delete
    to perform actual delete of the service, the API cell tries to again
    route the message and fails in doing so with an AttributeError (as
    the message is already at the destination cell and can not be further
    routed).
    This patch moves the object destroy() call to a separate function and
    modifies the message processor to call the new function. The original
    service_delete is modified to also call the new function.

    Change-Id: I9148d8ceb5cdeb858dc9741b24cf0e03487c9a62
    Closes-Bug: 1361186
    (cherry picked from commit b0feaba45ff53d20a93bc4b29936d13550d15bf6)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.