RabbitMQ connections lack heartbeat or TCP keepalives

Bug #1341656 reported by Dmitry Mescheryakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Committed
Critical
Ilya Pekelny
5.0.x
Fix Committed
Critical
Ilya Pekelny
5.1.x
Fix Committed
Critical
Ilya Pekelny

Bug Description

Tags: oslo
Changed in mos:
assignee: nobody → MOS Oslo (mos-oslo)
Igor Marnat (imarnat)
Changed in mos:
assignee: MOS Oslo (mos-oslo) → Alexei Kornienko (alexei-kornienko)
Igor Marnat (imarnat)
Changed in mos:
milestone: 5.1 → 5.0.1
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Igor Marnat (imarnat) wrote :

Patches and requests for building packages: https://bugs.launchpad.net/mos/+bug/1340711

Revision history for this message
Serg Melikyan (smelikyan) wrote :
Changed in mos:
status: Confirmed → Fix Committed
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :
Revision history for this message
OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/20691
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-20691/centos

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/20691
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-20691/ubuntu

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :
Revision history for this message
Serg Melikyan (smelikyan) wrote :

https://gerrit.mirantis.com/20691 - Proper heartbeat implementation is submitted and ready to be tested

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/21233
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-21233/centos

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/21233
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-21233/ubuntu

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/20691
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-20691/ubuntu

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/20691
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-20691/centos

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/20691
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-20691/ubuntu

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/20691
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable/ubuntu

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package oslo.messaging has been built from changeset: http://gerrit.mirantis.com/20691
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable/centos

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

What is the current status of this bug in 5.1 and 5.0.2?

Revision history for this message
Serg Melikyan (smelikyan) wrote :

We verified and merged fix to 5.1 branch, backports to 5.0.2 are pending:
https://review.openstack.org/117500
http://gerrit.mirantis.com/21389

Revision history for this message
Serg Melikyan (smelikyan) wrote :

Backport to 5.0.2 merged

no longer affects: mos/6.0.x
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

{

    "build_id": "2014-09-15_00-01-46",
    "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346",
    "build_number": "8",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "b8d8189cc37d6d1b26f4479be6be7313beefb1c8",
    "production": "docker",
    "fuelmain_sha": "d7ed7973034bde73d3f42c000984423b59b2312b",
    "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13",
    "feature_groups": [
        "experimental"
    ],
    "release": "5.1",
    "release_versions": {
        "2014.1.1-5.1": {
            "VERSION": {
                "build_id": "2014-09-15_00-01-46",
                "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346",
                "build_number": "8",
                "api": "1.0",
                "nailgun_sha": "b8d8189cc37d6d1b26f4479be6be7313beefb1c8",
                "production": "docker",
                "fuelmain_sha": "d7ed7973034bde73d3f42c000984423b59b2312b",
                "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13",
                "feature_groups": [
                    "experimental"
                ],
                "release": "5.1",
                "fuellib_sha": "395fd9d20a003603cc9ad26e16cb13c1c45e24e6"
            }
        }
    },
    "fuellib_sha": "395fd9d20a003603cc9ad26e16cb13c1c45e24e6"

}

We have problems with oslo.messaging in our tests after restart nodes
Steps:
1. Create next cluster - Ubuntu, Simple, Flat nova-network, Cinder for volumes, Ceph for images, 1 controller, 1 compute, 2 cinder+ceph nodes
2. Deploy cluster
3. Reboot all nodes one by one
4. Open Health Check tab
5. Run tests

Expected - tests passed
Actual - tests on volume and instance creation failed. Lots of errors on compute node in nova logs:
2014-09-15 13:08:11 ERROR

oslo.messaging._drivers.impl_rabbit [req-c4c51ee4-f9b7-4e9e-a949-e694f3b21757 ] AMQP server on 10.108.52.4:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 30 seconds.

And on cinder node in cinder logs
2014-09-15 07:40:19 ERROR

oslo.messaging._drivers.impl_rabbit [req-2fd99d10-2de0-478e-ab3e-cabc52935455 - - - - -] AMQP server on 10.108.52.4:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.

Logs are attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Also same errors in next configuration - Ubuntu, Ha, Flat Nova network, Cinder for volumes, 3 controllers, 2 compute
After destroy of one of controllers some OSTF tests failed with timeout
Errors on node-4 (compute)
nova.openstack.common.periodic_task [-] Error during ComputeManager.update_available_resource: Timed out waiting for a reply to message ID 2dd0eabc69984cdca76067420d9d6f12

Logs are attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Serg Melikyan (smelikyan) wrote :

This exception has nothing to do with heartbeats, from log there is clearly issue with connection to rabbitmq. Looks like server are down.

Revision history for this message
Serg Melikyan (smelikyan) wrote :

I suggest opening new bug and returning state of this bug to commited

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

I agree with it

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.