Moderate load on OpenStack REST API kills the cloud
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Confirmed
|
High
|
MOS Scale |
Bug Description
Version: 9.0
Steps to reproduce:
Install 200 nodes cloud with Ceilometer. On each compute node run Ceilometer's pollster with incorrect settings:
ceilometer-polling --polling-namespace compute central --config-file /etc/ceilometer
The setting cause each pollster to check each OpenStack REST API endpoint with a basic command once a minute. As a result, each API endpoint gets 3 requests per second. Or in total we got 3 * <number of REST API endpoints> requests.
Expected result:
OpenStack services operate normally, all goes fine
Actual result:
A number of things went wrong:
1. ssh login from master node on any controller took 30 seconds
2. pacemaker cluster constantly got broken. From crmd.log of controllers it could be seen that crmd (that is part of pacemaker) master constantly migrated, resources check failed and so on.
As a result of #2 RabbitMQ cluster got broken at some point.
We checked the following resources:
1. cpu
2. memory
3. disk
4. network
All 4 seem to be far from being exhausted. As an afterthought, we should have checked entropy pool as well.
We need to reproduce this issue without Ceilometer and see, which resource we exhausted, because such load seems to be fairly moderate. (Again, don't forget to check entropy).
Changed in mos: | |
milestone: | none → 10.0 |
Changed in mos: | |
assignee: | nobody → MOS Scale (mos-scale) |
Changed in mos: | |
importance: | Undecided → High |
status: | New → Confirmed |