controller goes offline if cluster is actively used

Bug #1413342 reported by Leontii Istomin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
Undecided
Dina Belova

Bug Description

[root@fuel ~]# fuel --fuel-version
api: '1.0'
astute_sha: f7cda2171b0b677dfaeb59693d980a2d3ee4c3e0
auth_required: true
build_id: 2015-01-20_08-49-23
build_number: '38'
feature_groups:
- mirantis
fuellib_sha: 9aa913096fb93ea4847ee14bfaf33597326886f3
fuelmain_sha: 1ee1766a51bdb5bed75d5c2efdcaaa318118e439
nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
ostf_sha: 3d2f44dcfa32d6ce0372cc64695e9edcc1913ea7
production: docker
release: 6.0.1
release_versions:
  2014.2-6.0:
    VERSION:
      api: '1.0'
      astute_sha: f7cda2171b0b677dfaeb59693d980a2d3ee4c3e0
      build_id: 2015-01-20_08-49-23
      build_number: '38'
      feature_groups:
      - mirantis
      fuellib_sha: 9aa913096fb93ea4847ee14bfaf33597326886f3
      fuelmain_sha: 1ee1766a51bdb5bed75d5c2efdcaaa318118e439
      nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
      ostf_sha: 3d2f44dcfa32d6ce0372cc64695e9edcc1913ea7
      production: docker
      release: 6.0.1

Ubuntu, HA, neutron-gre, Ceilometer, debug, Ceph for volumes,images,ephemerals,objects
controllers: 3
computes: 97

Controller hangs when we perform some load (rally tests).
Console of one of controller nodes unreachable via ssh and even via ipmi. We seen node-47 is hanged last time.
If we reboot the hanged controller, we can see the following behaviour:

https://bugs.launchpad.net/fuel/+bug/1413341

diagnostic snapshot is here https://drive.google.com/a/mirantis.com/file/d/0Bx4ptZV1Jt7hRTZlUWIzUk5PYmM/view?usp=sharing

Tags: scale
Revision history for this message
Leontii Istomin (listomin) wrote :

it's baremetal installation

Changed in mos:
assignee: nobody → MOS Keystone (mos-keystone)
no longer affects: fuel
Revision history for this message
arogusskiy (arogusskiy) wrote :

I don't have an answer from anyboy about customizing kernel parametres. Alekasndr N told about someone change something, but not for hiload, low latency, real-time & for all of it. That's wy I recomend next rest repeat with two settings: with & with no special settings.

Changed in mos:
assignee: MOS Keystone (mos-keystone) → arogusskiy (arogusskiy)
Revision history for this message
arogusskiy (arogusskiy) wrote :

Currently bug is not reproduced.

Changed in mos:
status: New → Incomplete
Changed in mos:
milestone: none → 6.1
Changed in mos:
assignee: arogusskiy (arogusskiy) → Dina Belova (dbelova)
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

So far Scale team didn't reproduce the issue, hence moving to invalid

Changed in mos:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.