HA deployment failed after adding two new controllers with Failed to call refresh: swift-ring-builder /etc/swift/account.builder rebalance returned 1 instead of one of [0]
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
Critical
|
Aleksandr Didenko | ||
6.0.x |
Fix Released
|
Critical
|
Aleksandr Didenko | ||
6.1.x |
Fix Released
|
Critical
|
Aleksandr Didenko |
Bug Description
{
"build_id": "2014-12-
"ostf_sha": "a9afb68710d809
"build_number": "49",
"auth_
"api": "1.0",
"nailgun_sha": "22bd43b89a1784
"production": "docker",
"fuelmain_sha": "3aab16667f47dd
"astute_sha": "16b252d93be6aa
"feature_
"mirantis"
],
"release": "6.0",
"release_
],
}
}
},
"fuellib_sha": "2c99931072d951
}
Steps:
1. Create cluster - HA, CentOS, Flat nova-network, 1 controller
2. Deploy cluster
3. Add 2 new controllers
4. Deploy changes
5. Add 2 new controllers
6. Start cluster re-deployment
Actual - deployment failed on node-2 with
2014-12-15 06:51:43 ERR
(/Stage[
Logs are attached
tags: | added: on-verification |
tags: | removed: on-verification |
tags: | added: on-verification |
Changed in puppet-swift: | |
assignee: | nobody → Aleksandr Didenko (adidenko) |
status: | New → In Progress |
Changed in puppet-swift: | |
assignee: | Aleksandr Didenko (adidenko) → nobody |
no longer affects: | puppet-swift |
----------- 15T06:30: 57.405142+ 00:00 debug: Executing 'swift-ring-builder /etc/swift/ account. builder rebalance' 15T06:30: 57.894101+ 00:00 notice: (/Stage[ main]/Swift: :Ringbuilder/ Swift:: Ringbuilder: :Rebalance[ account] /Exec[rebalance _account] /returns) Reassigned 512 (100.00%) partitions. Balance is now 0.00.
2014-12-15T06:20:44 info: [410] Processing RPC call 'deploy'
... deploying 1st primary controller ...
2014-12-
2014-12-
------------ 15T06:51: 43.441763+ 00:00 debug: Executing 'swift-ring-builder /etc/swift/ account. builder rebalance' 15T06:51: 43.930819+ 00:00 notice: (/Stage[ main]/Swift: :Ringbuilder/ Swift:: Ringbuilder: :Rebalance[ account] /Exec[rebalance _account] /returns) No partitions could be reassigned. 15T06:51: 43.932191+ 00:00 notice: (/Stage[ main]/Swift: :Ringbuilder/ Swift:: Ringbuilder: :Rebalance[ account] /Exec[rebalance _account] /returns) Either none need to be or none can be due to min_part_hours [1].
2014-12-15T06:46:14 info: [418] Processing RPC call 'deploy'
... deploying primary controller +2 new controllers ...
2014-12-
2014-12-
2014-12-
------------ 15T07:35: 28.645214+ 00:00 debug: Executing 'swift-ring-builder /etc/swift/ account. builder rebalance' 15T07:35: 29.196341+ 00:00 notice: (/Stage[ main]/Swift: :Ringbuilder/ Swift:: Ringbuilder: :Rebalance[ account] /Exec[rebalance _account] /returns) Reassigned 512 (100.00%) partitions. Balance is now 234.64.
2014-12-15T07:29:33 info: [405] Processing RPC call 'deploy'
... deploying primary conrtoller, +2 controllers, +2 new controllers ...
2014-12-
2014-12-
We set min_part_hours=1, so it is now allowed to rebalance swift ring more often than once per hour. So in the end we can see this in puppet logs for node-2:
2014-12- 15T07:35: 30.475418+ 00:00 notice: (/Stage[ main]/Swift: :Ringbuilder/ Swift:: Ringbuilder: :Rebalance[ object] /Exec[rebalance _object] /returns) NOTE: Balance of 252.86 indicates you should push this 15T07:35: 30.476298+ 00:00 notice: (/Stage[ main]/Swift: :Ringbuilder/ Swift:: Ringbuilder: :Rebalance[ object] /Exec[rebalance _object] /returns) ring, wait at least 1 hours, and rebalance/repush.
2014-12-
We should adapt our scalability system tests to not add new controllers more often than once per hour on environments with swift. We also might need to update our documentation about it.