nova syslog logging to /dev/log race condition

Bug #1101404 reported by Joe Breu
76
This bug affects 13 people
Affects Status Importance Assigned to Milestone
Glance
Fix Released
Undecided
Unassigned
OpenStack Compute (nova)
Fix Released
Medium
Unassigned
OpenStack Identity (keystone)
Fix Released
Medium
Unassigned
neutron
Fix Released
Medium
ratalevolamena
oslo.log
Fix Released
Medium
Unassigned

Bug Description

running nova-api-ec2
running rsyslog

service rsyslog restart ; service nova-api-ec2 restart

nova-api-ec2 consumes up to 100% of the available CPU (or at least a full core) and s not responsive. /var/log/nova/nova-api-ec2.lgo states the socket is already in use.

strace the process

sendto(3, "<142>2013-01-18 20:00:22 24882 INFO nova.service [-] Caught SIGTERM, exiting\0", 77, 0, NULL, 0) = -1 ENOTCONN (Transport endpoint is not connected)

service nova-api-ec2 restart fails as upstart already thinks the process has been terminated.

The only way to recover is to pkill -9 nova-api-ec2 and then restart it with 'service nova-api-ec2 restart'.

The same behavior has been seen in all nova-api services.

Revision history for this message
Joe Breu (breu) wrote :

forgot to mention - this is against folsom but it looks to exist in g as well

Dolph Mathews (dolph)
Changed in keystone:
status: New → Confirmed
Revision history for this message
Joe Breu (breu) wrote :

Chatted with Dolph and setting 'verbose=False' fixes the issue but it is not a permanent solution.

Revision history for this message
Michael Still (mikal) wrote :

The logging code comes from oslo.

Revision history for this message
Michael Still (mikal) wrote :

What version of python are you using in production? My poking around seems to indicate that http://bugs.python.org/issue15179 might be the solution to your problem, but that hasn't landed in the python Ubuntu 12.04 packages. It has landed in 12.10. I'd be very interested in if you can re-create this problem with that newer version of python.

Revision history for this message
Joe Breu (breu) wrote :

We are using latest python from 12.04 (I don't have the version number handy at the moment). This looks like it is also being worked in https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/1081022 for 12.04

Chuck Short (zulcss)
Changed in nova:
status: New → Confirmed
Revision history for this message
Joe Gordon (jogo) wrote :
Revision history for this message
Sean Dague (sdague) wrote :

Setting to medium as this is going to require some eventlet hacking it seems

Changed in nova:
importance: Undecided → Medium
Revision history for this message
Mark McLoughlin (markmc) wrote :

Are we sure this isn't just an eventlet bug?

Changed in oslo:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Jay Pipes (jaypipes) wrote :

We are seeing this as well in our Folsom deployments. In fact, it's pretty much made an entire deployment zone come to its knees. This needs to be addressed ASAP, in my opinion.

For what it's worth, in our case, it's always the metadata API service that seems to get hosed.

Dolph Mathews (dolph)
Changed in keystone:
importance: Undecided → Medium
Revision history for this message
Sam Morrison (sorrison) wrote :

Yeah we are seeing this in our Grizzly cloud, killed all 36 of our nova-api servers.....

Can we get this push up in importance?

Revision history for this message
Marcus Furlong (furlongm) wrote :

It looks like the actual python bug was marked as "Fix Released" in Precise but for some reason the patch did not make it into the actual release:

https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/1081022

Revision history for this message
Kieran Spear (kspear) wrote :

I think the python bug is a red herring. We were able to fix this by applying the eventlet patch that jogo links to above in comment #6:

https://gist.github.com/7a2e224f3aafea1a1fc3

Revision history for this message
Kieran Spear (kspear) wrote :

To clarify - we also applied the patch from the python bug and it did not fix the problem.

Revision history for this message
Kieran Spear (kspear) wrote :

I reported a new eventlet issue here, quoting the mailing list post:

https://github.com/eventlet/eventlet/issues/63

Revision history for this message
Dolph Mathews (dolph) wrote :

Is there any work to track against openstack projects then, other than a requirements bump for eventlet when one is available?

Revision history for this message
Stig Telfer (stigtelfer) wrote :

I have hit the same issue on my system here. The test environment is CentOS 6.5, Python 2.6.6, Icehouse RPMs from RDO, eventlet 0.14. Our Nova services are configured with verbose and debug logging enabled. We also configured to log through syslog. Whenever syslog is restarted, if a log event occurs while syslog is down then nova-api spins exactly as described, OpenStack processing stops and the services cannot be gracefully shut down.

I tried the latest eventlet development snapshot, unfortunately it does not fix it. The proposed patch (linked to above) does work on our configuration.

I haven't tried upgrading to Python 2.7.6 - too disruptive - which has also been reported to fix the issue.

Revision history for this message
Ben Nemec (bnemec) wrote :

Ew, apparently this was fixed upstream in Python 2.7, but 2.6 didn't get the fix. I'm not really sure there's anything we can do about that here - the only solution I think is likely to be accepted is to get the fix backported to py26, but I'm not sure how feasible that is either.

Dolph Mathews (dolph)
summary: - nova syslog logging to /dev/log race condition
+ nova syslog logging to /dev/log race condition in python 2.6
Changed in mos:
status: New → Confirmed
importance: Undecided → High
milestone: none → 5.1
assignee: nobody → MOS Nova (mos-nova)
no longer affects: mos
Revision history for this message
Francois Deppierraz (francois-ctrlaltdel) wrote : Re: nova syslog logging to /dev/log race condition in python 2.6

I've experienced the same problem on an ubuntu 14.04 server running openstack icehouse.

Steps to reproduce:

# service rsyslogd restart

# top
 3736 nova 20 0 298632 105264 3848 R 100.0 0.3 12:45.94 nova-api
10812 glance 20 0 142844 45968 2388 R 100.0 0.1 3:34.85 glance-api
 3737 nova 20 0 301860 108532 3880 R 99.8 0.3 3:32.82 nova-api
[...]

Please note that the default python version is 2.7:

# python -V
Python 2.7.6

affects: oslo-incubator → oslo.log
Revision history for this message
Francois Deppierraz (francois-ctrlaltdel) wrote :

It is possible to reproduce this issue with the following script:

https://gist.github.com/dbishop/7a2e224f3aafea1a1fc3

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Ian Cordasco (icordasc) wrote :

For what it is worth, there isn't much that can be done for 2.6 unless the oslo.log developers can think of a way around this without having users patch their copy of 2.6. This also appears to be affecting 2.7 so I'm updating the title to be less misleading.

summary: - nova syslog logging to /dev/log race condition in python 2.6
+ nova syslog logging to /dev/log race condition in python 2.6 and 2.7
Revision history for this message
Mehdi Abaakouk (sileht) wrote : Re: nova syslog logging to /dev/log race condition in python 2.6 and 2.7

A new eventlet bug have been opened about that: https://github.com/eventlet/eventlet/issues/166

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Mehdi, i see the patch for logging/handlers.py in my python 2.7 installation - http://bugs.python.org/issue15179 - does it happen even with that patch?

Changed in neutron:
assignee: nobody → ratalevolamena (chris-techno1307)
status: New → Confirmed
status: Confirmed → New
assignee: ratalevolamena (chris-techno1307) → nobody
Changed in neutron:
assignee: nobody → ratalevolamena (chris-techno1307)
Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

I think https://review.openstack.org/#/c/136842/ should address this in oslo.log, since we're no longer assuming /dev/log and using the syslog module to talk to the service directly.

The resolution to https://bugs.launchpad.net/nova/+bug/1459726, which appears to be the same problem, also indicates that updating eventlet helps.

Changed in oslo.log:
status: Confirmed → Fix Committed
Changed in oslo.log:
status: Fix Committed → Fix Released
Revision history for this message
Dolph Mathews (dolph) wrote :

This should now be fixed in other projects consuming oslo_log 1.3.0.

Revision history for this message
Gary Kotton (garyk) wrote :
Changed in neutron:
status: New → Fix Committed
Revision history for this message
Steve Martinelli (stevemar) wrote :

as with neutron, keystone is also using oslo.log > 1.3.0: https://github.com/openstack/keystone/blob/master/requirements.txt#L28

Changed in keystone:
status: Confirmed → Fix Committed
Changed in keystone:
milestone: none → mitaka-1
Changed in keystone:
status: Fix Committed → Fix Released
Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

As with neutron and keystone, nova is also using oslo.log > 1.3.0 so I think this is no longer an issue https://github.com/openstack/nova/blob/a35bd3/requirements.txt#L41

Changed in nova:
status: Confirmed → Fix Released
summary: - nova syslog logging to /dev/log race condition in python 2.6 and 2.7
+ nova syslog logging to /dev/log race condition
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Changed the title as Python 2.6 is no longer used anywhere and this is therefore irrelevant

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Glance is also using oslo.log > 1.3.0 https://github.com/openstack/glance/blob/e3445cc9/requirements.txt#L41 (which makes sense - Global requirements)

Changed in glance:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.