HA deployment fails with Neutron GRE

Bug #1359833 reported by Eugene Nikanorov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Library (Deprecated)

Bug Description

{"build_id": "2014-08-19_18-05-30",
 "ostf_sha": "c6ecd0137b5d7c1576fa65baef0fc70f9a150daa",
 "build_number": "457",
 "auth_required": true,
 "api": "1.0",
 "nailgun_sha": "36d27ff737b361f92093986d061bbfc1670bee45",
 "production": "docker",
 "fuelmain_sha": "4291004347e406ef187624ea47be192702db353e",
 "astute_sha": "efe3cb3668b9079e68fb1534fd4649ac45a344e1",
 "feature_groups": ["mirantis"],
 "release": "5.1",
 "fuellib_sha": "e292af206c8d8da242537007b438f8750b5d1efe"}

Env: 3 controllers, 1 compute. Nentron GRE, ubuntu or centos (issue appears on both)
The deployment fails on all controllers except the first one, and on compute too.

Snapshot is attached.

A bunch of logs from Fuelweb:

2014-08-21 13:30:24 ERR
 (/Stage[main]/Rabbitmq::Service/Service[p_rabbitmq-server]) /etc/puppet/modules/corosync/lib/puppet/provider/service/pacemaker.rb:241:in `enabled?'
2014-08-21 13:30:24 ERR
 (/Stage[main]/Rabbitmq::Service/Service[p_rabbitmq-server]) /etc/puppet/modules/corosync/lib/puppet/provider/service/pacemaker.rb:55:in `get_service_hash'
2014-08-21 13:30:24 ERR
 (/Stage[main]/Rabbitmq::Service/Service[p_rabbitmq-server]) Could not evaluate: resource p_rabbitmq-server not found
2014-08-21 13:30:09 ERR
 (/Stage[main]/Openstack::Swift::Storage_node/Swift::Ringsync[container]/Rsync::Get[/etc/swift/container.ring.gz]/Exec[rsync /etc/swift/container.ring.gz]/returns) change from notrun to 0 failed: rsync -q -a rsync://192.168.0.3/swift_server/container.ring.gz /etc/swift/container.ring.gz returned 10 instead of one of [0]

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :
Changed in fuel:
importance: Undecided → High
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 5.1
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

Controller nodes can't communicate by management network, but can by admin network.

PRIMARY-CONTROLLER:
[root@node-5 ~]# ip -f inet a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    inet 127.0.0.1/8 scope host lo
11: br-ex: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    inet 172.16.161.51/24 brd 172.16.161.255 scope global br-ex
12: br-mgmt: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    inet 192.168.0.3/24 brd 192.168.0.255 scope global br-mgmt
13: br-storage: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    inet 192.168.1.2/24 brd 192.168.1.255 scope global br-storage
14: br-fw-admin: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    inet 10.20.0.3/24 brd 10.20.0.255 scope global br-fw-admin
27: hapr-host: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 240.0.0.1/30 scope global hapr-host
You have new mail in /var/spool/mail/root
[root@node-5 ~]#

2-ND CONTROLLER:
[root@node-6 ~]# ip -f inet a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    inet 127.0.0.1/8 scope host lo
11: br-ex: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    inet 172.16.161.52/24 brd 172.16.161.255 scope global br-ex
12: br-mgmt: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    inet 192.168.0.4/24 brd 192.168.0.255 scope global br-mgmt
13: br-storage: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    inet 192.168.1.3/24 brd 192.168.1.255 scope global br-storage
14: br-fw-admin: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    inet 10.20.0.4/24 brd 10.20.0.255 scope global br-fw-admin
17: br-mgmt-hapr: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 192.168.0.4/24 scope global br-mgmt-hapr
19: br-ex-hapr: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 172.16.161.52/24 scope global br-ex-hapr
[root@node-6 ~]#

[root@node-6 ~]# ip n
192.168.0.3 dev br-mgmt INCOMPLETE
192.168.0.2 dev br-mgmt-hapr lladdr ce:a3:0d:ea:d8:53 REACHABLE
10.20.0.2 dev br-fw-admin lladdr 52:54:00:b3:76:00 REACHABLE
10.20.0.3 dev br-fw-admin lladdr a6:a1:2b:d6:9f:43 STALE
192.168.0.5 dev br-mgmt INCOMPLETE
192.168.1.2 dev br-storage FAILED
172.16.161.50 dev br-ex-hapr lladdr 66:d4:bc:40:b8:b9 REACHABLE
[root@node-6 ~]#

Revision history for this message
Sergey Vasilenko (xenolog) wrote :

controller nodes also can communicate by public network.

Admin and public networks -- untagged, management and storage -- tagged.

looks like low-level network issue

Revision history for this message
Sergey Vasilenko (xenolog) wrote :
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

apparently the issue was in wrong interface configuration on the nodes.

The problem is that network verification passes as usual giving no hint that something is wrong with configuration.

Also, resulting errors in the logs hardly give meaningful hint about the issue.

I suggest to lower the importance and add some verification steps into the deployment process.

Changed in fuel:
status: New → Confirmed
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.