Corosync at controller stucks at reboot or takes a lot of time for it
Bug #1407678 reported by
Bogdan Dobrelya
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
High
|
Bogdan Dobrelya |
Bug Description
Steps to reproduce:
1. Find management_vip node:
# ssh node-1 'crm status'
2. Reboot node with any other at the same time:
# for i in node-{1,2}; do ssh $i 'nohup reboot &'; done
3. In ~10% cases one node will hang on shutdown process.
3. In ~30% cases one nodes' shutdown process takes a lot of time
Looks like RabbitMQ RA causes cluster to hangs during reboot.
Updating the shutdown-escalation property would not solve the issue.
description: | updated |
Changed in fuel: | |
assignee: | Bogdan Dobrelya (bogdando) → Fuel Library Team (fuel-library) |
status: | In Progress → Confirmed |
Changed in fuel: | |
assignee: | Fuel Library Team (fuel-library) → Bogdan Dobrelya (bogdando) |
To post a comment you must log in.
I suggest to set shutdown-escalation to at least 5 min as 120 sec looks too low value compared to default 20 min.