nova services are down after deployment

Bug #1355749 reported by Sergey Murashov
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Dmitry Mescheryakov
5.0.x
Fix Committed
High
Dmitry Mescheryakov

Bug Description

ISO:{"build_id": "2014-08-11_12-45-06", "mirantis": "yes", "build_number": "169", "ostf_sha": "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f", "nailgun_sha": "04ada3cd7ef14f6741a05fd5d6690260f9198095", "production": "docker", "api": "1.0", "fuelmain_sha": "43374c706b4fdce28aeb4ef11e69a53f41646740", "astute_sha": "6db5f5031b74e67b92fcac1f7998eaa296d68025", "release": "5.0.1", "fuellib_sha": "a31dbac8fff9cf6bc4cd0d23459670e34b27a9ab"}

Steps to reproduce:
1. Install OS(CentOS, neutron GRE, Murano, Savanna, Ceilometer, 3 contollers, 1 compute)
2. Wait for ~12 hours (a night for example)
3. Go to controller and execute 'nova-manage service list'

Actual result:
We can see:
root@node-1:~# nova-manage service list
Binary Host Zone Status State Updated_At
nova-conductor node-1 internal enabled :-) 2014-08-12 08:00:53
nova-consoleauth node-1 internal enabled :-) 2014-08-12 08:00:53
nova-cert node-1 internal enabled :-) 2014-08-12 08:00:53
nova-scheduler node-1 internal enabled :-) 2014-08-12 08:00:53
nova-consoleauth node-2 internal enabled XXX 2014-08-11 19:19:18
nova-conductor node-2 internal enabled XXX 2014-08-11 19:19:11
nova-scheduler node-2 internal enabled :-) 2014-08-12 08:00:48
nova-cert node-2 internal enabled XXX 2014-08-11 19:19:18
nova-conductor node-3 internal enabled XXX 2014-08-11 19:19:12
nova-scheduler node-3 internal enabled XXX 2014-08-11 19:19:16
nova-cert node-3 internal enabled XXX 2014-08-11 19:19:20
nova-consoleauth node-3 internal enabled XXX 2014-08-11 19:19:14
nova-compute node-4 nova enabled :-) 2014-08-12 08:00:48

Tags: nova db
Revision history for this message
Sergey Murashov (smurashov) wrote :
Changed in fuel:
status: New → Triaged
importance: Undecided → High
tags: added: db nova
no longer affects: fuel/5.0.x
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

After looking at the env a bit, I see the following:

1. Nova services are actually up, but they fail to update their status in the database, so nova-manage services list shows them as disabled

2. Nova services fail to update their state in the database because SQLAlchemy refuses to open a new db connection (http://paste.openstack.org/show/93736/), as it thinks 10+30 connections are already opened (10 in the pool, 30 - overflow)

3. lsof shows that nova-conductor and other services don't have *any* open connections to MySQL

4. This looks very similar to the known SQLAlchemy issue - https://bitbucket.org/zzzeek/sqlalchemy/issue/2772 , which was fixed in 0.8.3 (http://docs.sqlalchemy.org/en/latest/changelog/changelog_08.html#change-a95b0f6765fdf5ebdf844806fb2aa122) and we are using 0.8.2 right now

5. Restart of the services helps, but that's not an option, of course

Upgrade to the latest SQLAlchemy 0.8.x release should fix this.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

This is not Nova specific and can affect other services too (right off the top of my head, I think, Neutron is affected as it tracks the status of its agents in the same manner).

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

The env actually had a non-related issue with rabbitmq using CPU heavily on one of the controller nodes (2 other nodes worked just fine). This rabbitmq instance was unusable. Restart of rabbitmq-server helped.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: none → 5.1
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Sergey: please reproduce the issue and provide the environment to devs

Changed in fuel:
assignee: nobody → Sergey Murashov (smurashov)
description: updated
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This should be related to 2013 ERROR handling issue in Oslo.DB bug https://bugs.launchpad.net/mos/+bug/1352931

Revision history for this message
Sergey Murashov (smurashov) wrote :

reproduced

Revision history for this message
Sergey Murashov (smurashov) wrote :
Changed in fuel:
importance: High → Critical
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

According to logs environment is not 5.1. 5.0.1 has old Galera OCF script which has problems with assembling Galera cluster. Fuel 5.1 has a completely different OCF script that doesn't have such problems

Changed in fuel:
status: Triaged → Invalid
Changed in fuel:
status: Invalid → Confirmed
importance: Critical → High
Revision history for this message
Mike Scherbakov (mihgen) wrote :

How often does this issue occur?
Why it was moved to Confirmed from Invalid, were you able to reproduce it?

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Mike, it reproduced pretty well on 5.0.1 - 3 repro of 3 attempts. I will see if QA folks will be able to reproduce the issue in 5.1. BTW that is the issue which require patching/upgrade of sqlalchemy we were talking about. Hence assigning to myself.

Changed in fuel:
assignee: Sergey Murashov (smurashov) → Dmitry Mescheryakov (dmitrymex)
affects: fuel → mos
Changed in mos:
milestone: 5.1 → none
no longer affects: fuel/5.0.x
Changed in mos:
milestone: none → 5.1
Revision history for this message
OSCI Robot (oscirobot) wrote :

Package sqlalchemy has been built from changeset: http://gerrit.mirantis.com/20986
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-20986/ubuntu
You can build an ISO with this package:
make iso EXTRA_DEB_REPOS="http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-20986/ubuntu /"

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package python-sqlalchemy has been built from changeset: http://gerrit.mirantis.com/20992
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-20992/centos
You can build an ISO with this package:
make iso EXTRA_RPM_REPOS="osci-testing,http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-20992/centos"

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package sqlalchemy has been built from changeset: http://gerrit.mirantis.com/20986
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-20986/ubuntu
You can build an ISO with this package:
make iso EXTRA_DEB_REPOS="http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-20986/ubuntu /"

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package python-sqlalchemy has been built from changeset: http://gerrit.mirantis.com/20992
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-20992/centos
You can build an ISO with this package:
make iso EXTRA_RPM_REPOS="osci-testing,http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-20992/centos"

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package sqlalchemy has been built from changeset: http://gerrit.mirantis.com/20986
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-20986/ubuntu
You can build an ISO with this package:
make iso EXTRA_DEB_REPOS="http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-20986/ubuntu /"

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package python-sqlalchemy has been built from changeset: http://gerrit.mirantis.com/20992
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable/centos
You can build an ISO with this package:
make iso EXTRA_RPM_REPOS="osci-testing,http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable/centos"

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package sqlalchemy has been built from changeset: http://gerrit.mirantis.com/20986
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable/ubuntu
You can build an ISO with this package:
make iso EXTRA_DEB_REPOS="http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable/ubuntu /"

Changed in mos:
status: Confirmed → Fix Committed
Revision history for this message
OSCI Robot (oscirobot) wrote :

Package python-sqlalchemy has been built from changeset: http://gerrit.mirantis.com/21375
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.0.2-stable-21375/centos
You can build an ISO with this package:
make iso EXTRA_RPM_REPOS="osci-testing,http://osci-obs.vm.mirantis.net:82/centos-fuel-5.0.2-stable-21375/centos"

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package sqlalchemy has been built from changeset: http://gerrit.mirantis.com/21376
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.0.2-stable-21376/ubuntu
You can build an ISO with this package:
make iso EXTRA_DEB_REPOS="http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.0.2-stable-21376/ubuntu /"

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package python-sqlalchemy has been built from changeset: http://gerrit.mirantis.com/21375
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.0.2-stable/centos
You can build an ISO with this package:
make iso EXTRA_RPM_REPOS="osci-testing,http://osci-obs.vm.mirantis.net:82/centos-fuel-5.0.2-stable/centos"

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package sqlalchemy has been built from changeset: http://gerrit.mirantis.com/21376
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.0.2-stable/ubuntu
You can build an ISO with this package:
make iso EXTRA_DEB_REPOS="http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.0.2-stable/ubuntu /"

Revision history for this message
Alexander Gubanov (ogubanov) wrote :

Verified it on mos 5.1.1 (buil 45) - fixed!
Proof: http://pastebin.com/SjbsHALR

Changed in mos:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.