HA not working for 2 controllers, service not avaliable on failover

Bug #1352326 reported by Moshe Levi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
High
Fuel Library (Deprecated)

Bug Description

we install Fuel with Ubuntu HA

when we power off one of the controller we can't access the dashboard
and none of the openstack cli commands no working for the second controller

{"build_id": "2014-08-02_05-27-27", "ostf_sha": "a3fa823ea0e4e03beb637ae07a91adea82c33182", "build_number": "50", "auth_required": true, "api": "1.0", "nailgun_sha": "bd0127be0061029f9f910547db5e633c82244942", "production": "docker", "fuelmain_sha": "e99879292cf6e96b8991300d947df76b69134bb1", "astute_sha": "5a93fa8f9abbc087ee1c9cca894d781a03167094", "feature_groups": ["experimental"], "release": "5.1", "fuellib_sha": "4e3fdd75f8dabde8e5d07067545d8043a70a176b"}

snapshot attached

Revision history for this message
Moshe Levi (moshele) wrote :
Changed in fuel:
importance: Undecided → High
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 5.1
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

According to logs, you have non-HA environment:

deployment_mode: multinode
puppet_manifests_source: rsync://10.20.0.2:/puppet/manifests/
network_scheme:
  transformations:
  - action: add-br
    name: br-eth0
  - action: add-port
    bridge: br-eth0
    name: eth0
  - action: add-br
    name: br-eth1
  - action: add-port
    bridge: br-eth1
    name: eth1
  - action: add-br
    name: br-eth2
  - action: add-port
    bridge: br-eth2
    name: eth2
  - action: add-br
    name: br-eth3
  - action: add-port
    bridge: br-eth3
    name: eth3
  - action: add-br
    name: br-ex
  - action: add-br
    name: br-mgmt
  - action: add-br
    name: br-storage
  - action: add-br
    name: br-fw-admin
  - action: add-patch
    bridges:
    - br-eth1
    - br-storage
    tags:
    - 3
    - 0
  - action: add-patch
    bridges:
    - br-eth3
    - br-ex
    trunks:
    - 0
  - action: add-patch
    bridges:
    - br-eth1
    - br-mgmt
    tags:
    - 2
    - 0
  - action: add-patch
    bridges:
    - br-eth2
    - br-fw-admin
    trunks:
    - 0
  - action: add-br
    name: br-prv
  - action: add-patch
    bridges:
    - br-eth1
    - br-prv
  roles:
    management: br-mgmt
    storage: vlan3
    ex: br-ex
    private: br-prv
    fw-admin: br-fw-admin
  interfaces:
    vlan3:
      L2:
        vlan_splinters: 'off'
    eth3:
      L2:
        vlan_splinters: 'off'
    eth2:
      L2:
        vlan_splinters: 'off'
    eth1:
      L2:
        vlan_splinters: 'off'
    eth0:
      L2:
        vlan_splinters: 'off'
  version: '1.0'
  provider: ovs
  endpoints:
    br-ex:
      IP:
      - 10.209.37.52/22
      gateway: 10.209.36.1
    br-fw-admin:
      IP:
      - 10.20.0.5/24
    vlan3:
      IP:
      - 192.168.1.4/24
      vlandev: eth_iser0
    br-mgmt:
      IP:
      - 192.168.0.4/24
    br-prv:
      IP: none
heat:
  db_password: XGfpST2Y
  user_password: 5CWyRQf2
  enabled: true
  rabbit_password: HlUp3MOY
storage_network_range: 192.168.1.0/24
start_guests_on_host_boot: true
rabbit:
  password: mbK5tFLc
use_cinder: true
management_network_range: 192.168.0.0/24
nodes:
- storage_netmask: 255.255.255.0
  uid: '1'
  public_address: 10.209.37.50
  internal_netmask: 255.255.255.0
  fqdn: node-1.domain.tld
  role: controller
  public_netmask: 255.255.252.0
  internal_address: 192.168.0.2
  storage_address: 192.168.1.2
  name: node-1
- storage_netmask: 255.255.255.0
  uid: '2'
  public_address: 10.209.37.51
  internal_netmask: 255.255.255.0
  fqdn: node-2.domain.tld
  role: compute
  public_netmask: 255.255.252.0
  internal_address: 192.168.0.3
  storage_address: 192.168.1.3
  name: node-2
- storage_netmask: 255.255.255.0
  uid: '3'
  public_address: 10.209.37.52
  internal_netmask: 255.255.255.0
  fqdn: node-3.domain.tld
  role: compute
  public_netmask: 255.255.252.0
  internal_address: 192.168.0.4
  storage_address: 192.168.1.4
  name: node-3
- storage_netmask: 255.255.255.0
  uid: '4'
  public_address: 10.209.37.53
  internal_netmask: 255.255.255.0
  fqdn: node-4.domain.tld
  role: cinder
  public_netmask: 255.255.252.0
  internal_address: 192.168.0.5
  storage_address: 192.168.1.5
  name: node-4

Changed in fuel:
status: New → Invalid
Revision history for this message
Moshe Levi (moshele) wrote :

I upload the wrong snapshot.

as for know we see issues with mysql service on failover
root@node-1:~# crm resource show
 vip__management_old (ocf::mirantis:ns_IPaddr2): Started
 vip__public_old (ocf::mirantis:ns_IPaddr2): Started
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Masters: [ node-1 ]
     Slaves: [ node-2 ]
 Clone Set: clone_p_mysql [p_mysql]
     p_mysql (ocf::mirantis:mysql-wss): Started (unmanaged) FAILED
     Started: [ node-2 ]
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-1 node-2 ]
 p_heat-engine (ocf::mirantis:heat-engine): Started
 Clone Set: clone_p_neutron-plugin-openvswitch-agent [p_neutron-plugin-openvswitch-agent]
     Started: [ node-1 node-2 ]
 Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-1 node-2 ]
 p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started
 p_neutron-l3-agent (ocf::mirantis:neutron-agent-l3): Started

I will upload the currect snapshot

Changed in fuel:
status: Invalid → Incomplete
Revision history for this message
Moshe Levi (moshele) wrote :
Changed in fuel:
status: Incomplete → New
Revision history for this message
Moshe Levi (moshele) wrote :

what happen if I installed 2 controller in HA , and had split brain is there a way to recover?

Changed in fuel:
status: New → Incomplete
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

There are only 2 controllers, according to updated logs snapshot:
role: primary-controller
last_controller: node-2
  roles:
fqdn: node-1.domain.tld
nodes:
- role: primary-controller
  fqdn: node-1.domain.tld
  name: node-1
- role: controller
  fqdn: node-2.domain.tld
  name: node-2
- role: compute
  fqdn: node-3.domain.tld
  name: node-3
- role: cinder
  fqdn: node-4.domain.tld
  name: node-4

Perhapsm, we should rename and update this ticket in order to reflect the fact what HA failover won't work for 2 controllers. And I believe that is an expected behavior, because we need at least 3 controllers for HA, if we want a quorum maintained after failover.

summary: - HA not working, service not avaliable on failover
+ HA not working for 2 controllers, service not avaliable on failover
Changed in fuel:
status: Incomplete → Won't Fix
Revision history for this message
Moshe Levi (moshele) wrote :

if HA not working with 2 controllers, why I was able to deploy HA environment with 2 controllers?

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

Fuel allows to add controller by controller. That's a natural process of deployment. Sometimes, it's possible to install single controller and upgrade entire cluster to pure HA system when you have more resources.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

While deploying clusters of 2 or even 1 (via fuel cli) controllers in HA configuration is possible, the failover for such cases is impossible, afaik. It *could* be, if additional arbitration (garbd) configured, but that is not the case Fuel provides out of box

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.