Failed to delete controller+ironic node because swift rings were not rebalanced

Bug #1549293 reported by Kyrylo Romanenko
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Confirmed
Medium
Kyrylo Romanenko
8.0.x
Won't Fix
Medium
Kyrylo Romanenko
9.x
Won't Fix
Medium
Kyrylo Romanenko

Bug Description

Steps to reproduce:

1) Deploy cluster with nodes:
2 Controller + Ironic
1 Compute
1 Ironic.

Settings:
Compute QEMU
Network Neutron with VLAN segmentation
Storage Backends Cinder LVM over iSCSI for volumes

2) Delete 1 Controller+Ironic node.
3) Redeploy changes.

4) Wait until redeploy will end.

Actual result: redeployment reached 100% and then gave failure.

Error:
Deployment has failed. Method granular_deploy. Failed to execute hook 'ironic_post_swift_key' Failed to run command cd / && ruby /etc/puppet/modules/osnailyfacter/modular/astute/ironic_post_swift_key.rb

Details from Astute log: http://paste.openstack.org/show/488016/

Environment:
RC2 iso
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "570"
  build_id: "570"
  fuel-nailgun_sha: "558ca91a854cf29e395940c232911ffb851899c1"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "658be72c4b42d3e1436b86ac4567ab914bfb451b"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "c2a335b5b725f1b994f78d4c78723d29fa44685a"
  fuel-ostf_sha: "3bc76a63a9e7d195ff34eadc29552f4235fa6c52"
  fuel-mirror_sha: "fb45b80d7bee5899d931f926e5c9512e2b442749"
  fuelmenu_sha: "78ffc73065a9674b707c081d128cb7eea611474f"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "a43cf96cd9532f10794dce736350bf5bed350e9d"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "d605bcbabf315382d56d0ce8143458be67c53434"

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Kyrylo, the usual question: do you have enough resources available? Is it a virtualized or a baremetal environment? Timeout errors suggest that you might be short on CPU and/or RAM.

Another point is that it's weird that it's possible to deploy a cluster with 2 controller nodes - my understanding was we can only have an odd number of controller nodes (for HA to work properly).

Changed in mos:
assignee: MOS Ironic (mos-ironic) → Kyrylo Romanenko (kromanenko)
status: New → Incomplete
milestone: 8.0 → 9.0
tags: added: area-ironic
removed: ironic
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Roman, in this case it look OK with resources. This is on virtual environment.
We have no limitation for odd or even number of controllers and we use doubled nodes in testing environments quite often.
Also Vasyl Saienko has found some issue with swift on this environment. Hi is investigating it now.

Revision history for this message
Vasyl Saienko (vsaienko) wrote :

The error comes because swift ring is not balanced.
It happened because we delete 1 controllers (which contain swift data) from cluster, but data were not replicated to another yet.
We should force sync swift ring before deleting controller nodes.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Does swift ring become balanced at all, when we use an even number of controllers?

I'm still struggling to understand how you expect services to work properly if you remove one of two controller nodes, shouldn't pacemaker forcefully stop all managed resources? How is it different from a split brain?

Changed in mos:
milestone: 9.0 → 8.0-updates
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Data replication occurs every 15th minute of every 2nd hour. And we should force sync swift ring before deleting controller nodes.

Since it is not absolutely usual case, and we have command-line workaround, i`ll set this issue as medium.

And while we can not test it with 9.0, i set it as invalid for 9.0. Next time i`ll recheck it.

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Workaround to be run before cluster rebuilding:
run on controllers: /usr/local/bin/swift-rings-rebalance.sh

summary: - Failed to delete controller+ironic node
+ Failed to delete controller+ironic node because swift rings were not
+ rebalanced
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 8.0-updates because of Medium importance

tags: added: wontfix-low
Changed in mos:
status: Confirmed → Won't Fix
Revision history for this message
Dina Belova (dbelova) wrote :

Added move-to-10.0 tag due to the fact bug was transferred from 9.0 to 10.0

tags: added: move-to-10.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.