neutron

neutron-ns-meta invokes oom-killer during gate runs

Bug #1362347 reported by Matthew Treinish on 2014-08-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	neutron	Invalid	High	Salvatore Orlando

Bug Description

Occasionally a neutron gate job fails because the node runs out of memory. oom-killer is invoked and it starts killing processes to save the node. (which just causes cascading issues) The kernel logs show that oom-killer is being invoked by neutron-ns-meta.

An example of one such failure is:

http://logs.openstack.org/75/116775/2/check/check-tempest-dsvm-neutron-full/ab17a70/

With the kernel log:

http://logs.openstack.org/75/116775/2/check/check-tempest-dsvm-neutron-full/ab17a70/logs/syslog.txt.gz#_Aug_26_04_59_03

Using logstash this failure can be isolated to only neutron gate jobs. So there is probably something triggering neutron to occasionally make the job consume in excess of 8GB of ram.

I also noted in the neutron svc log that first out of memory error came from using keystone-middleware:

http://logs.openstack.org/75/116775/2/check/check-tempest-dsvm-neutron-full/ab17a70/logs/screen-q-svc.txt.gz#_2014-08-26_04_56_39_602

but that may just be a red herring.

Tags:

Revision history for this message

Matthew Treinish (treinish) wrote on 2014-08-27:

Tracking in logstash with:

http://logstash.openstack.org/#eyJzZWFyY2giOiJ0YWdzOnN5c2xvZyBBTkQgbWVzc2FnZTpcIm9vbS1raWxsZXJcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTQwOTE2NDcwNzcxM30=

Matthew Treinish (treinish) on 2014-08-27

tags:	added: gate-failure
summary:	- neutron-ns-meta invokes oom-killer during gate runs + gneutron-ns-meta invokes oom-killer during gate runs
summary:	- gneutron-ns-meta invokes oom-killer during gate runs + neutron-ns-meta invokes oom-killer during gate runs

Salvatore Orlando (salvatore-orlando) on 2014-08-27

Changed in neutron:
assignee:	nobody → Salvatore Orlando (salvatore-orlando)
milestone:	none → juno-3

Revision history for this message

Salvatore Orlando (salvatore-orlando) wrote on 2014-08-27:

Failure analysis here: http://blog.kortar.org/?p=52

Revision history for this message

Salvatore Orlando (salvatore-orlando) wrote on 2014-08-27:

While I was looking at these failures I noticed that they only occurred with mysql as a db backend.
There are not enough data points to confirm this, but probably the root cause might have something to do either with the DBMS itself, with python-mysqldb, or with the sqlalchemy backend for mysql.

Or this might be a red herring and it's just neutron exhausting memory.

However, oom usually occur 30 to 40 minutes into the test runs. Tempest runs usually last more. Consider this is not a frequent failure, if it was a progressive memory leak due to system load it oom messages should have occurred closer to the end of the test.

Changed in neutron:
importance:	Undecided → Medium
importance:	Medium → High

Thierry Carrez (ttx) on 2014-09-03

Changed in neutron:
milestone:	juno-3 → juno-rc1

Salvatore Orlando (salvatore-orlando) on 2014-09-18

Changed in neutron:
milestone:	juno-rc1 → kilo-1

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-10-15:

We haven't seen this in 10 days.

Changed in neutron:
status:	New → Incomplete

Revision history for this message

Joe Gordon (jogo) wrote on 2014-10-16:

Since we haven't seen this in a while, closing the bug. If this is seen again please re-open.

Changed in neutron:
status:	Incomplete → Invalid

Thierry Carrez (ttx) on 2014-11-25

Changed in neutron:
milestone:	kilo-1 → none

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.