juju-core

Bug #1442257
Comment #11

Comment 11 for bug 1442257

Revision history for this message

James Page (james-page) wrote on 2015-04-16:

#11

AFAICT this issue appears to be isolated to LXC.

I sprung a 3 node PXC cluster with corosync/pacemaker managing the VIP under KVM containers on our OpenStack QA cloud in an effort to reproduce this problem on one of our automated testing platforms, but I'm not able to reproduce this problem after three days of testing. I've been actively rebooting units, moving resource around, unplugging network interfaces etc... but the corosync/pacemaker cluster always restores back to a clean state with no split brain or wedging on the corosync/pacemaker stack.

All interfaces (and veth's) are configured at the standard 1500 MTU - instances are on different compute nodes, but we run jumbo frames on the hypervisor physical nics to avoid any packet fragmentation issues in the GRE overlay networks that we use on this cloud - heres how things are connected in OpenStack nova/neutron:

KVM <-> [tap] <-> [bridge] <-> [veth-pair] <-> [OVS br-int] <-> [OVS br-tun] <-> {GRE tunnel}

This may or may not be related, but we also see problems restarting pacemaker/corosync under LXC (see bug 1439649) - another issue which I can't yet reproduce under KVM or on real hardware.