Keepalived's VRRP child process is constantly dying and respawning on the controller

Bug #1558490 reported by Attila Darazs on 2016-03-17
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Undecided
Unassigned

Bug Description

While testing the new IPv6 gate jobs, they time out during the overcloud deployment.

Looking at the /var/log/messages of the controller, the following error is repeated multiple times every second:

Mar 17 08:52:53 localhost Keepalived[6947]: VRRP child process(8100) died: Respawning
Mar 17 08:52:53 localhost Keepalived_vrrp[8101]: Netlink reflector reports IP fe80::278:f2ff:fe20:52be added
Mar 17 08:52:53 localhost Keepalived_vrrp[8101]: Netlink reflector reports IP fe80::278:f2ff:fe20:52c0 added
Mar 17 08:52:53 localhost Keepalived_vrrp[8101]: Netlink reflector reports IP fe80::278:f2ff:fe20:52ba added
Mar 17 08:52:53 localhost Keepalived_vrrp[8101]: Netlink reflector reports IP 2001:db8:fd00:1000::11 added
Mar 17 08:52:53 localhost Keepalived_vrrp[8101]: Netlink reflector reports IP fe80::278:f2ff:fe20:52bc added
Mar 17 08:52:53 localhost Keepalived_vrrp[8101]: Registering Kernel netlink reflector
Mar 17 08:52:53 localhost Keepalived_vrrp[8101]: Registering Kernel netlink command channel
Mar 17 08:52:53 localhost Keepalived_vrrp[8101]: Registering gratuitous ARP shared channel
Mar 17 08:52:53 localhost Keepalived_vrrp[8101]: Opening file '/etc/keepalived/keepalived.conf'.
Mar 17 08:52:53 localhost Keepalived_vrrp[8101]: Cant find interface for vrrp_instance 53 !!!
Mar 17 08:52:53 localhost Keepalived_vrrp[8101]: Configuration error: VRRP definition must belong to an interface
Mar 17 08:52:53 localhost Keepalived[6947]: VRRP child process(8101) died: Respawning

[same repeats with different pids over and over]

These were the deployment arguments:

OVERCLOUD_DEPLOY_ARGS='--libvirt-type=qemu -t 80 -e /tmp/tripleo-ci/test-environments/swap-partition.yaml --ntp-server 0.centos.pool.ntp.org -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-multiple-nics-v6.yaml -e /tmp/tripleo-ci/test-environments/net-iso.yaml'

Full logs available here: http://logs.openstack.org/45/289445/15/check-tripleo/gate-tripleo-ci-f22-nonha/390d4a3/

It happen during a gate job for: https://review.openstack.org/#/c/289445/15

Javier Peña (jpena-c) wrote :

I can reproduce the issue locally. Looking at keepalived.conf, I see some vrrp_instances seem to have an invalid configuration, e.g.:

vrrp_instance 53 {
  virtual_router_id 53
  # Advert interval
  advert_int 1
  # for electing MASTER, highest priority wins.
  priority 101
  state MASTER
  interface
  virtual_ipaddress {
      fd00:fd00:fd00:2000::11 dev
  }
  track_script {
  haproxy
  }
}

Note there is no "interface" defined.

Javier Peña (jpena-c) wrote :

After some investigation, the issue seems to lie in Facter: https://tickets.puppetlabs.com/browse/FACT-1372

This issue was fixed in the openstack/puppet-tripleo 0.4.0 release.

This issue was fixed in the openstack/puppet-tripleo 0.4.0 release.

Emilien Macchi (emilienm) wrote :

This bug was last updated over 180 days ago, as tripleo is a fast moving project and we'd like to get the tracker down to currently actionable bugs, this is getting marked as Invalid. If the issue still exists, please feel free to reopen it.

Changed in tripleo:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers