api servers hang with 100% CPU if syslog restarted

Bug #1459726 reported by George Shuklin
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Glance
Invalid
Undecided
Unassigned
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
neutron
Invalid
Undecided
Unassigned
oslo.log
Invalid
Undecided
Unassigned
python-eventlet (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Affected:

glance-api
glance-registry
neutron-server
nova-api

If service was configured to use rsyslog and rsyslog was restarted after API server started, it hangs on next log line with 100% CPU. If server have few workers, each worker will eat own 100% CPU share.

Steps to reproduce:
1. Configure syslog:
use_syslog=true
syslog_log_facility=LOG_LOCAL4
2. restart api service
3. restart rsyslog

Execute some command to force logging. F.e.: neutron net-create foo, nova boot, etc.

Expected result: normal operation

Actual result:
with some chance (about 30-50%) api server will hung with 100% CPU usage and will not reply to request.

Strace on hung service:

gettimeofday({1432827199, 745141}, NULL) = 0
poll([{fd=3, events=POLLOUT|POLLERR|POLLHUP}, {fd=5, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 2, 60000) = 1 ([{fd=3, revents=POLLOUT}])
sendto(3, "<151>keystonemiddleware.auth_token[12502]: DEBUG Authenticating user token __call__ /usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py:650\0", 154, 0, NULL, 0) = -1 ENOTCONN (Transport endpoint is not connected)
gettimeofday({1432827199, 745226}, NULL) = 0
poll([{fd=3, events=POLLOUT|POLLERR|POLLHUP}, {fd=5, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 2, 60000) = 1 ([{fd=3, revents=POLLOUT}])
sendto(3, "<151>keystonemiddleware.auth_token[12502]: DEBUG Authenticating user token __call__ /usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py:650\0", 154, 0, NULL, 0) = -1 ENOTCONN (Transport endpoint is not connected)
gettimeofday({1432827199, 745325}, NULL) = 0

Tested on:
nova, glance, neutron: 1:2014.2.3, Ubuntu version.

Tags: ops
description: updated
Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
George Shuklin (george-shuklin) wrote :

May be. But it affects not glance only, but all other components of OS too.

Tom Fifield (fifieldt)
tags: added: ops
Revision history for this message
Nobuto Murata (nobuto) wrote :

To be clear, python-eventlet is a library depended by nova, neutron, glance, cinder, etc. Proposed fix is available for testing/verification in Ubuntu 14.04 or later in bug #1452312.

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

I think this is a duplicate of https://bugs.launchpad.net/bugs/1459726

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :
Revision history for this message
Vil Surkin (vill-srk) wrote :

Stuart, no, its new bug.

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

@vill-srk

What is the difference between the two bugs? (One of the signatures of bug 1076466 was the cpu spinning at 100%).

Thanks.

Revision history for this message
Vil Surkin (vill-srk) wrote :

@stuart-mclaren, bug #1076466 was fixed (in oslo logging module). Now it's new problem with logging, but looks like it in python-eventlet library. We have installation on Juno and Trusty, code already have patches from #1076466, but it hang CPU anyway.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

Bug #1076466 doesn't appear to have been fixed. The patch against the old incubated version of the code was abandoned, and I don't see a related fix in the oslo.log library version of the code.

Revision history for this message
George Shuklin (george-shuklin) wrote :

Yes, I think this is not a duplicate - #1076466 has been fixed already.

Revision history for this message
George Shuklin (george-shuklin) wrote :

I've just tried to use https://launchpad.net/ubuntu/+archive/primary/+files/python-eventlet_0.13.0-1ubuntu3.3_all.deb (proposed) and it helped.

We have two API servers, I've done experiment: one is trusty/juno-cloud-archive. One got update, second not.

Working:
python-eventlet 0.13.0-1ubuntu3.3

Broken:
python-eventlet 0.13.0-1ubuntu3.2~cloud0

After syslog restart few iteration of neutron net-create hang neutron-server with 100% CPU on broken node and works fine on the working. Same for glance-api/registry.

Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

This looks similar to https://bugs.launchpad.net/oslo.log/+bug/1101404 also, isn't it?

Revision history for this message
George Shuklin (george-shuklin) wrote :

May be. I'm not sure. Anyway, this is not nova/glance/neutron bug, but python-eventlet, and it is mostly concerns for distributions, not for developers.

Ian Cordasco (icordasc)
Changed in glance:
status: New → Invalid
Changed in nova:
status: New → Invalid
Changed in neutron:
status: New → Invalid
Changed in oslo.log:
status: New → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in python-eventlet (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.