class decorator isn't enabled in nova for some service

Bug #1731864 reported by Matt Simonin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Medium
Ilya Shakhat
Pike
Confirmed
Medium
Unassigned
osprofiler
Invalid
Undecided
Unassigned

Bug Description

* Observation:

When using devstack, nova-api rpc calls to the conductor (client side) aren't shown in the traces.

* Personnal Interpretation:

Nova osprofiler integration makes use of a wrapper around the class decorator[1]

This logic is executed at import time, the issue is that `CONF.profiler` options are set at run time when `config_parse` is called by the service (e.g in [2][3]).

In the nova-api case, setting a breakpoint in the wrapper we see that the decorator is executed when the `from nova import service` is loaded in the api command.

  /usr/local/bin/nova-api(6)<module>()
-> from nova.cmd.api import main
  /opt/stack/nova/nova/cmd/api.py(33)<module>()
-> from nova import service
  /opt/stack/nova/nova/service.py(31)<module>()
-> from nova import conductor
  /opt/stack/nova/nova/conductor/__init__.py(16)<module>()
-> from nova.conductor import api as conductor_api
  /opt/stack/nova/nova/conductor/api.py(21)<module>()
-> from nova.conductor import rpcapi
  /opt/stack/nova/nova/conductor/rpcapi.py(34)<module>()
-> class ConductorAPI(object):
> /opt/stack/nova/nova/profiler.py(69)decorator()
-> if profiler and 'profiler' in CONF and CONF.profiler.enabled:

(Pdb) 'profiler' in CONF
True
(Pdb) CONF.profiler.enabled
False

[1]: https://github.com/openstack/nova/blob/a4fc1bcd08c12324070fde5e500366ff821d468f/nova/profiler.py#L59-L73
[2]: https://github.com/openstack/nova/blob/a4fc1bcd08c12324070fde5e500366ff821d468f/nova/cmd/api.py#L41
[3]: https://github.com/openstack/nova/blob/a4fc1bcd08c12324070fde5e500366ff821d468f/nova/config.py#L45-L52

Revision history for this message
Ilya Shakhat (shakhat) wrote :

Confirmed, the bug is introduced after switching to uWSGI-based deployment. Besides Nova it affects also Cinder. Keystone, Neutron and Heat are not affected.

Changed in osprofiler:
status: New → Confirmed
Revision history for this message
Matt Simonin (matthieu-simonin) wrote :

I took a look to old traces[1] (ocata - no uwsgi). If you take the first RPC in the trace (schedule_and_build_instances), you only get the receiver trace (the conductor) not the sender (the api). It's likely the issue was already here. They are also other cases in which you only get one of the two parts of the RPC (e.g see object_action compute->conductor when instance states are updated).

So my feeling is that it's not related to uwsgi but really rely on the fact that the test[2] is interpreted too early (at import time).

[1]: http://enos.irisa.fr/html/wan_g5k/cpt01-osp/cpt01-osp-lat0/root/rally_home/trace-boot-and-delete.yaml.html
[2]: https://github.com/openstack/nova/blob/a4fc1bcd08c12324070fde5e500366ff821d468f/nova/profiler.py#L68

Revision history for this message
Ilya Shakhat (shakhat) wrote :

As I understand the RPC function `schedule_and_build_instances` is called from https://github.com/openstack/nova/blob/master/nova/conductor/api.py#L130 . Most probably it would be enough to add profiler.trace_cls decorator to the class ComputeTaskAPI.

The main problem here is that all instrumentation points are set manually and there's no guaranty of 100% coverage. More correct approach would be to instrument RPC library itself and catch all function calls.

Revision history for this message
Matt Simonin (matthieu-simonin) wrote :

I maybe missed something in the nova code base but I see it on the nova.conductor.rpcapi.ComputeTaskAPI:

master: https://github.com/openstack/nova/blob/78dfc7f1f5e1037892c20e867686540fccd632c2/nova/conductor/rpcapi.py#L253

pike: https://github.com/openstack/nova/blob/stable/pike/nova/conductor/rpcapi.py#L253

ocata: https://github.com/openstack/nova/blob/stable/ocata/nova/conductor/rpcapi.py#L252

Regarding the proposition to add it directly in oslo.messaging, I totally agree as this will catch all rpcs. But I don't think this will fix this particular issue (I'm still convince this is due to the wrapper evaluated at import time [1])

[1]: https://github.com/openstack/nova/blob/master/nova/profiler.py#L68

Revision history for this message
Matt Simonin (matthieu-simonin) wrote :

Just attached two traces obtained from the latest devstack deployment.
This focuses the test : https://github.com/openstack/nova/blob/57bf7f49cf08dbc0c0d16eb7512ecaa15978fbff/nova/profiler.py#L68

- test disabled: nova-boot-without-test.html
The trace shows 24 rpcs trace points

- test enabled: nova-boot-with-test.html
The trace shows 6 rpcs trace points instead of the 24 expected.

Disabling the test isn't probably a good solution since it will lead to patch the class regarless the osprofiler configuration.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in osprofiler:
status: Confirmed → In Progress
status: In Progress → Invalid
Changed in nova:
status: New → In Progress
assignee: nobody → Ilya Shakhat (shakhat)
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/519664
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0b4608bf0b40d50834bd5ab2e50096ec6c284822
Submitter: Zuul
Branch: master

commit 0b4608bf0b40d50834bd5ab2e50096ec6c284822
Author: Ilya Shakhat <email address hidden>
Date: Tue Nov 14 13:41:50 2017 +0100

    Initialize osprofiler in WSGI application

    This patch adds missing initialization of OSProfiler when
    Nova API is running as WSGI application.

    Change-Id: I779b144cfdfbe4a9a572990bad9720113402ac7f
    Related-Bug: #1731864

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.