Eventmachine unexpected problem

Bug #1498847 reported by Vladimir Sharshov
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Vladimir Sharshov

Bug Description

Astute got task and after that freeze. Looks like problem was in Eventmachine

We have many problem with Eventmachine which we use only in ruby-amqp which we use to work with RabbitMQ.

We author of ruby-amqp https://github.com/ruby-amqp/amqp#a-word-of-warning-use-this-only-if-you-already-use-eventmachine

Unless you already use EventMachine, there is no real reason to use this client. Consider Bunny or March Hare instead.

amqp gem brings in a fair share of EventMachine complexity which cannot be fully eliminated. Event loop blocking, writes that happen at the end of loop tick, uncaught exceptions in event loop silently killing it: it's not worth the pain unless you've already deeply invested in EventMachine and understand how it works.

We also have problem with ruby-amqp and Eventmachine in this bugs:

  - https://bugs.launchpad.net/fuel/+bug/1487397;
  - https://bugs.launchpad.net/fuel/+bug/1485895;
  - https://bugs.launchpad.net/fuel/+bug/1483182;
  - https://bugs.launchpad.net/fuel/+bug/1500901.

And many other.

I suggest to change ruby-amqp library to Bunny: http://rubybunny.info/

Backtrace from GDB:

from /usr/lib64/ruby/gems/2.1.0/gems/astute-7.0.0/bin/astuted:84:in `<top (required)>'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:22:in `start'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:53:in `start'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:120:in `master_loop!'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:300:in `maintain_worker_count'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:261:in `spawn_workers'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:261:in `each'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:266:in `block in spawn_workers'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:266:in `fork'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:266:in `block (2 levels) in spawn_workers'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:343:in `worker_loop!'

from /usr/lib64/ruby/gems/2.1.0/gems/astute-7.0.0/lib/astute/server/worker.rb:42:in `run'

from /usr/lib64/ruby/gems/2.1.0/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run'

from /usr/lib64/ruby/gems/2.1.0/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run_machine'

/usr/lib64/ruby/gems/2.1.0/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run_machine': Raised a special exception with GDB (Exception)

from /usr/lib64/ruby/gems/2.1.0/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run'

from /usr/lib64/ruby/gems/2.1.0/gems/astute-7.0.0/lib/astute/server/worker.rb:42:in `run'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:343:in `worker_loop!'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:266:in `block (2 levels) in spawn_workers'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:266:in `fork'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:266:in `block in spawn_workers'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:261:in `each'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:261:in `spawn_workers'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:300:in `maintain_worker_count'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:120:in `master_loop!'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:53:in `start'

from /usr/lib64/ruby/gems/2.1.0/gems/raemon-0.3.0/lib/raemon/master.rb:22:in `start'

from /usr/lib64/ruby/gems/2.1.0/gems/astute-7.0.0/bin/astuted:84:in `<top (required)>'

from -e:1:in `load'

from -e:1:in `<main>'

Changed in fuel:
status: Confirmed → Triaged
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/234665

Changed in fuel:
status: Triaged → In Progress
Dmitry Pyzhov (dpyzhov)
tags: added: area-python
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-main (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/250454

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-main (master)

Reviewed: https://review.openstack.org/250454
Committed: https://git.openstack.org/cgit/openstack/fuel-main/commit/?id=2eca6adc33f02e02cd812e1d4be7c70e05fd07db
Submitter: Jenkins
Branch: master

commit 2eca6adc33f02e02cd812e1d4be7c70e05fd07db
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Thu Nov 26 18:38:10 2015 +0300

    Replace ruby-amqp gem to bunny gem

    Change-Id: Id67b855ef87ea72ac529f9decf744bfe37283b2e
    Related-Bug: #1498847

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/234665
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=b60624ee2c5f1d6d805619b6c27965a973508da1
Submitter: Jenkins
Branch: master

commit b60624ee2c5f1d6d805619b6c27965a973508da1
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Mon Oct 12 19:25:00 2015 +0300

    Move from amqp-gem to bunny

    Differents:

    - separate independent chanel for outgoing report;
    - solid way to redeclare already existed queues;
    - auto recovery mode in case of network problem by default;
    - more solid, modern and simple library for AMQP.

    Also:

    - implement asynchronous logger for event callbacks.

    Short words from both gems authors:

    amqp gem brings in a fair share of EventMachine complexity which
    cannot be fully eliminated. Event loop blocking, writes that
    happen at the end of loop tick, uncaught exceptions in event
    loop silently killing it: it's not worth the pain unless
    you've already deeply invested in EventMachine and
    understand how it works.

    Closes-Bug: #1498847
    Closes-Bug: #1487397
    Closes-Bug: #1461562
    Related-Bug: #1485895
    Related-Bug: #1483182

    Change-Id: I52d005498ccb978ada158bfa64b1c7de1a24e9b0

Changed in fuel:
status: In Progress → Fix Committed
tags: added: on-verification
Revision history for this message
Mikhail Samoylov (msamoylov) wrote :

Steps for verification:
1. Deploy cluster with 3 controllers and 1 compute
2. Delete 1 controller
3. Add 1 controller
4. Redeploy
5. Run ostf

Verification passed for fuel version:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  openstack_version: "2015.1.0-8.0"
  api: "1.0"
  build_number: "247"
  build_id: "247"
  fuel-nailgun_sha: "86cebc1d92c7cc9ca25b00f5590954a7c4f880a0"
  python-fuelclient_sha: "91474bd8c526f4f536ab13368feb4a5c1b84d185"
  fuel-agent_sha: "660c6514caa8f5fcd482f1cc4008a6028243e009"
  fuel-nailgun-agent_sha: "a33a58d378c117c0f509b0e7badc6f0910364154"
  astute_sha: "b60624ee2c5f1d6d805619b6c27965a973508da1"
  fuel-library_sha: "032c707ec800f11044b32733dd4d395e06c209d0"
  fuel-ostf_sha: "65de07b5dce50349e7bc414f364505483c34e2b1"
  fuel-mirror_sha: "bfe7af26b7e6fdd46a16480481cc757f67958177"
  fuelmenu_sha: "fcb15df4fd1a790b17dd78cf675c11c279040941"
  shotgun_sha: "a0bd06508067935f2ae9be2523ed0d1717b995ce"
  network-checker_sha: "a3534f8885246afb15609c54f91d3b23d599a5b1"
  fuel-upgrade_sha: "1e894e26d4e1423a9b0d66abd6a79505f4175ff6"
  fuelmain_sha: "fda7c87dea9fb54c08bd3844d277b2e4778924e4"

We need a stress test for this bug. Steps for stress test:
1. Deploy cluster
2. Run stress tests for the CPU and memory a server cluster during deployment
3. After deploy run OSTF

Revision history for this message
Mikhail Samoylov (msamoylov) wrote :

Performance tests scenario:
It must be no freezes after starting performance test
I.
            1. Create cluster
            2. Add 3 nodes with controller roles
            3. Add 2 nodes with compute roles
            4. Deploy the cluster
            5. Load cpu and memory usage up to 90%
            6. Stop deploy
            7. Start deploy
            7. Make snapshot

II.
1. Create new environment
2. Choose Neutron, VLAN
3. Choose Ceph for images
4. Choose Sahara
5. Choose Ceilometer
6. Add 1 controller+ceph
7. Add 1 compute+ceph
8. Add 1 cinder+ceph
9. Add 2 mongo
10. Change disk configuration for both Mongo nodes. Change 'MongoDB' volume for vdc
11. Deploy the environment
12. Load CPU and RAM
13. Verify networks
14. Load CPU and RAM

Revision history for this message
Sergey Arkhipov (sarkhipov) wrote :

Fix has been verified on MOS 8.0 ISO #328. Problem was not reproduced on mentioned scenarios.

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.