Errors in ir-api reported at DEBUG level

Bug #1507589 reported by Sean Dague
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Invalid
Undecided
Unassigned
WSME
Confirmed
Undecided
Unassigned

Bug Description

In attempting to determine the cause of a recent ironic failure, it was made a lot more difficult by the fact that all the critical errors were getting reported as DEBUG level

An instance of this:

http://logs.openstack.org/01/236901/1/check/gate-tempest-dsvm-ironic-agent_ssh/349672c/logs/screen-ir-api.txt.gz#_2015-10-19_11_37_36_678

2015-10-19 11:37:36.678 18979 DEBUG wsme.api [req-c5ac9709-2b1c-42d7-9016-713289082633 ] Client-side error: Node 1 is locked by host devstack-trusty-rax-iad-5481252, please retry after the current operation is completed.
Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/opt/stack/new/ironic/ironic/conductor/manager.py", line 1723, in update_port
    purpose='port update') as task:

  File "/opt/stack/new/ironic/ironic/conductor/task_manager.py", line 152, in acquire
    driver_name=driver_name, purpose=purpose)

  File "/opt/stack/new/ironic/ironic/conductor/task_manager.py", line 221, in __init__
    self.release_resources()

  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/opt/stack/new/ironic/ironic/conductor/task_manager.py", line 203, in __init__
    self._lock()

  File "/opt/stack/new/ironic/ironic/conductor/task_manager.py", line 242, in _lock
    reserve_node()

  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 212, in call
    raise attempt.get()

  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/opt/stack/new/ironic/ironic/conductor/task_manager.py", line 235, in reserve_node
    self.node_id)

  File "/opt/stack/new/ironic/ironic/objects/node.py", line 228, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/opt/stack/new/ironic/ironic/db/sqlalchemy/api.py", line 226, in reserve_node
    host=node['reservation'])

NodeLocked: Node 1 is locked by host devstack-trusty-rax-iad-5481252, please retry after the current operation is completed.
 format_exception /usr/local/lib/python2.7/dist-packages/wsme/api.py:221

Revision history for this message
Dmitry Tantsur (divius) wrote :
Changed in wsme:
status: New → Confirmed
Changed in ironic:
status: New → Invalid
Revision history for this message
Stéphane Bisinger (kjir) wrote :

I fail to see how this is WSME's responsibility. WSME actually returns the fault string with the response, then it's the client's responsibility to decide what to do with it: https://github.com/openstack/wsme/blob/master/wsme/api.py#L220

That is why WSME logs Client Side Errors as a debug information.

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Honestly I think Ironic should be doing a log.error() before raising exceptions like this.

Revision history for this message
Dmitry Tantsur (divius) wrote :

Stéphane, WSME, however does log in on the next line. If we log it again, we'll get 2 records.

Jim, I guess we do - in conductor. This is about API service.

Revision history for this message
Stéphane Bisinger (kjir) wrote :

The OT complains about the fact that a critical error for Ironic is logged at debug level. The point is that WSME logs the error at debug level and Ironic does not log it at all. WSME can't have any knowledge about what constitutes a critical error for the application. WSME doesn't know what a NodeLocked exception is and it shouldn't: it is up to the application to decide what to do with the errors returned from WSME.

As far as WSME knows, a ClientSideError could be a user typo, which does not fit the "Critical error" definition and shouldn't be logged as such. So either the client should check the returned error and act appropriately, or the API implementation should not raise a ClientSideError, but another type of exception which indicates a Server Side error, which are logged at error level: https://github.com/openstack/wsme/blob/master/wsme/api.py#L228

Revision history for this message
Dmitry Tantsur (divius) wrote :

NodeLockedError is an interesting case, but it is reported as a client error right now. Also it's not really true that "Ironic does not log it at all". Ironic conductor service does log it separately, Ironic API service relies on WSME to log it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.