The ironic driver needs to scale back how many errors it traces out

Bug #1525002 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Low
Jim Rollenhagen
python-ironicclient
Fix Released
Low
Dmitry Tantsur

Bug Description

The amount of tracing in this n-cpu log is a bit much:

http://logs.openstack.org/93/255793/3/gate/gate-tempest-dsvm-ironic-agent_ssh/25175ed/logs/screen-n-cpu.txt.gz?level=TRACE

Like these warnings:

http://logs.openstack.org/93/255793/3/gate/gate-tempest-dsvm-ironic-agent_ssh/25175ed/logs/screen-n-cpu.txt.gz?level=TRACE#_2015-12-10_16_11_48_799

2015-12-10 16:11:48.798 WARNING ironicclient.common.http [req-94077ab9-5adf-4720-9fb8-2e027dde9b72 tempest-BaremetalBasicOps-1285004585 tempest-BaremetalBasicOps-1966762451] Request returned failure status.
2015-12-10 16:11:48.799 WARNING ironicclient.common.http [req-94077ab9-5adf-4720-9fb8-2e027dde9b72 tempest-BaremetalBasicOps-1285004585 tempest-BaremetalBasicOps-1966762451] Error contacting Ironic server: Node 1 is locked by host localhost, please retry after the current operation is completed.
Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 150, in inner
    return func(*args, **kwargs)

  File "/opt/stack/new/ironic/ironic/conductor/manager.py", line 1519, in update_port
    purpose='port update') as task:

  File "/opt/stack/new/ironic/ironic/conductor/task_manager.py", line 152, in acquire
    driver_name=driver_name, purpose=purpose)

  File "/opt/stack/new/ironic/ironic/conductor/task_manager.py", line 222, in __init__
    self.release_resources()

  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 204, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/opt/stack/new/ironic/ironic/conductor/task_manager.py", line 203, in __init__
    self._lock()

  File "/opt/stack/new/ironic/ironic/conductor/task_manager.py", line 243, in _lock
    reserve_node()

  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 212, in call
    raise attempt.get()

  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/opt/stack/new/ironic/ironic/conductor/task_manager.py", line 236, in reserve_node
    self.node_id)

  File "/opt/stack/new/ironic/ironic/objects/node.py", line 260, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/opt/stack/new/ironic/ironic/db/sqlalchemy/api.py", line 226, in reserve_node
    host=node['reservation'])

NodeLocked: Node 1 is locked by host localhost, please retry after the current operation is completed.
 (HTTP 409). Attempt 1 of 61

UPD from dtantsur:
The traceback has nothing to do with ironic client or driver. It gets added somewhere between ironic conductor and API levels.

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :

I see ironicclient logging errors/warnings here:

https://github.com/openstack/python-ironicclient/blob/master/ironicclient/common/http.py#L183

But I'm not sure where the stacktrace is coming from, unless ironic is sending that off in the error details somehow?

Revision history for this message
Matt Riedemann (mriedem) wrote :
Dmitry Tantsur (divius)
Changed in python-ironicclient:
status: New → Triaged
importance: Undecided → Low
assignee: nobody → Dmitry Tantsur (divius)
Revision history for this message
Dmitry Tantsur (divius) wrote :

Actually it's even more interesting: traceback comes from ironic as part of message instead of details. Investigating on...

Changed in ironic:
status: New → Confirmed
importance: Undecided → Low
assignee: nobody → Dmitry Tantsur (divius)
Revision history for this message
Dmitry Tantsur (divius) wrote :

More intriguing details: exception is normal on the conductor side, but gets traceback appended after passing through RPC. Will chat with Oslo folks to figure out.

Dmitry Tantsur (divius)
Changed in ironic:
assignee: Dmitry Tantsur (divius) → nobody
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-ironicclient (master)

Fix proposed to branch: master
Review: https://review.openstack.org/256375

Changed in python-ironicclient:
status: Triaged → In Progress
Revision history for this message
Dmitry Tantsur (divius) wrote :

oslo.messaging actually does it: https://github.com/openstack/oslo.messaging/blob/master/oslo_messaging/_drivers/common.py#L208

we should probably strip it on API level

Dmitry Tantsur (divius)
Changed in ironic:
status: Confirmed → Triaged
tags: added: low-hanging-fruit
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/256615

Changed in ironic:
assignee: nobody → Jim Rollenhagen (jim-rollenhagen)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-ironicclient (master)

Reviewed: https://review.openstack.org/256375
Committed: https://git.openstack.org/cgit/openstack/python-ironicclient/commit/?id=8224859c02f525fa3b7dbe917d56b889b0dbb073
Submitter: Jenkins
Branch: master

commit 8224859c02f525fa3b7dbe917d56b889b0dbb073
Author: Dmitry Tantsur <email address hidden>
Date: Fri Dec 11 12:38:39 2015 +0100

    Scale back on how many warnings we issue

    * Remove "Request returned failure status" warning, it does not carry
      any helpful information for an operator (or anyone).
    * Demote retry warnings to debug level. Retrying is essentially a normal
      thing in the real world, and currently we're spamming logs with warnings.
      http://logs.openstack.org/93/255793/3/gate/gate-tempest-dsvm-ironic-agent_ssh/25175ed/logs/screen-n-cpu.txt.gz?level=TRACE

    Change-Id: I6c40581b9dc5ab1fe3af4b882ede17a13edacc66
    Closes-Bug: #1525002

Changed in python-ironicclient:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-ironicclient (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/257955

Revision history for this message
Michael Davies (mrda) wrote :

Do we think there is any work to be done in Nova for this? Or is it all covered by the changes to ironic/ironic-client?

Revision history for this message
Dmitry Tantsur (divius) wrote :

I suspect ironic changes should be enough. We should review our logging in the ironic driver, but that's another story

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/256615
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=bd60603e443ec79b0783bebe2cda48a6b9c89ced
Submitter: Jenkins
Branch: master

commit bd60603e443ec79b0783bebe2cda48a6b9c89ced
Author: Jim Rollenhagen <email address hidden>
Date: Fri Dec 11 10:03:09 2015 -0800

    Don't return tracebacks in API response in debug mode

    The API should not return tracebacks in a production environment. As
    deployers often run services in debug mode, because OpenStack is hard to
    debug, we should not return tracebacks in debug mode. To allow
    developers etc to continue to use this feature, add a new config option
    'debug_tracebacks_in_api' that maintains this behavior.

    Also add to troubleshooting docs.

    Change-Id: Idbbf7efc45140e9e3d8b9491edd58905cbba0363
    Closes-Bug: #1525002

Changed in ironic:
status: In Progress → Fix Released
Revision history for this message
Michael Davies (mrda) wrote :

Removed "Nova" as we think this is sufficiently addressed by the ironic and ironicclient changes.

no longer affects: nova
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-ironicclient (stable/liberty)

Reviewed: https://review.openstack.org/257955
Committed: https://git.openstack.org/cgit/openstack/python-ironicclient/commit/?id=ae2c0995adbc71b6eed03fd11087b93acf01446a
Submitter: Jenkins
Branch: stable/liberty

commit ae2c0995adbc71b6eed03fd11087b93acf01446a
Author: Dmitry Tantsur <email address hidden>
Date: Fri Dec 11 12:38:39 2015 +0100

    Scale back on how many warnings we issue

    * Remove "Request returned failure status" warning, it does not carry
      any helpful information for an operator (or anyone).
    * Demote retry warnings to debug level. Retrying is essentially a normal
      thing in the real world, and currently we're spamming logs with warnings.
      http://logs.openstack.org/93/255793/3/gate/gate-tempest-dsvm-ironic-agent_ssh/25175ed/logs/screen-n-cpu.txt.gz?level=TRACE

    Change-Id: I6c40581b9dc5ab1fe3af4b882ede17a13edacc66
    Closes-Bug: #1525002
    (cherry picked from commit 8224859c02f525fa3b7dbe917d56b889b0dbb073)

tags: added: in-stable-liberty
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/python-ironicclient 1.1.0

This issue was fixed in the openstack/python-ironicclient 1.1.0 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/ironic 5.0.0

This issue was fixed in the openstack/ironic 5.0.0 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/python-ironicclient 0.8.2

This issue was fixed in the openstack/python-ironicclient 0.8.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

This issue was fixed in the openstack/python-ironicclient 0.8.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.