KVM containers race with LXD containers and hooks

Bug #1664437 reported by John A Meinel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
John A Meinel

Bug Description

It seems when fixing for LXD containers (bug #1662586), I knew to fix for KVM containers, but I failed to actually call the right function.

John A Meinel (jameinel)
Changed in juju:
assignee: nobody → John A Meinel (jameinel)
Revision history for this message
John A Meinel (jameinel) wrote :
Ante Karamatić (ivoks)
tags: added: 4010
Revision history for this message
Ante Karamatić (ivoks) wrote :
Download full text (3.4 KiB)

Here's context from one occurrence:

2017-02-13 22:13:08 DEBUG juju.provisioner.lxd container_initialisation.go:214 Bridging [{DeviceName:bondA BridgeName:br-bondA} {DeviceName:bondM BridgeName:br-bondM}] devices on host "machine-32" with delay=17, acquiring lock "machine-lock"
2017-02-13 22:13:16 INFO juju.utils.packaging.manager utils.go:57 Running: apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet install uvtool
2017-02-13 22:13:17 DEBUG juju.provisioner container_initialisation.go:188 release lock "machine-lock" for container initialisation
2017-02-13 22:13:17 DEBUG juju.worker runner.go:178 stop "32-container-watcher"
2017-02-13 22:13:17 DEBUG juju.worker runner.go:235 killing "32-container-watcher"
2017-02-13 22:13:17 DEBUG juju.cmd.jujud machine.go:1527 stopping so killing worker "32-container-watcher"
2017-02-13 22:13:17 INFO juju.worker runner.go:256 start "kvm-provisioner"
2017-02-13 22:13:17 DEBUG juju.worker runner.go:183 "kvm-provisioner" started
2017-02-13 22:13:17 INFO juju.worker runner.go:262 stopped "32-container-watcher", err: <nil>
2017-02-13 22:13:17 DEBUG juju.worker runner.go:190 "32-container-watcher" done: <nil>
2017-02-13 22:13:17 DEBUG juju.worker runner.go:214 no restart, removing "32-container-watcher" from known workers
2017-02-13 22:13:17 INFO juju.provisioner provisioner_task.go:406 found machine pending provisioning id:32/kvm/0, details:32/kvm/0
2017-02-13 22:13:17 INFO juju.provisioner provisioner_task.go:249 provisioner-harvest-mode is set to destroyed; unknown instances not stopped []
2017-02-13 22:13:17 INFO juju.provisioner.kvm kvm-broker.go:66 starting kvm container for containerMachineID: 32/kvm/0
2017-02-13 22:13:17 INFO juju.network bridge.go:70 bridgescript command=/usr/bin/python2 - --interfaces-to-bridge="bondA bondB bondM" --activate --bridge-prefix=br- --reconfigure-delay=17 /etc/network/interfaces <<'EOF'
<script-redacted>
EOF

2017-02-13 22:13:21 INFO juju.network bridge.go:70 bridgescript command=/usr/bin/python2 - --interfaces-to-bridge="bondA bondM" --activate --bridge-prefix=br- --reconfigure-delay=17 /etc/network/interfaces <<'EOF'
<script-redacted>
EOF

2017-02-13 22:13:21 INFO juju.network bridge.go:75 bridgescript result=0, timeout=false
2017-02-13 22:13:57 DEBUG juju.provisioner container_initialisation.go:241 observed network config updated
2017-02-13 22:13:57 DEBUG juju.provisioner.lxd container_initialisation.go:244 releasing lock "machine-lock" for bridging
2017-02-13 22:13:57 DEBUG juju.provisioner.lxd broker.go:108 using multi-bridge networking for container "32/lxd/0"
2017-02-13 22:13:57 WARNING juju.provisioner lxd-broker.go:96 failed to prepare container "32/lxd/0" network config: unable to find host bridge for space(s) "clo", "oam" for container "32/lxd/0"
2017-02-13 22:13:57 WARNING juju.provisioner broker.go:149 incomplete DNS config found, discovering host's DNS config
2017-02-13 22:13:57 DEBUG juju.provisioner broker.go:162 setting DNS servers [public:100.107.64.42] and domains [maas] on container interface "eth0"
2017-02-13 22:13:57 DEBUG juju.tools.lxdclient client.go:203 connecting to LXD remote "cloud-im...

Read more...

Revision history for this message
Ante Karamatić (ivoks) wrote :

I can confirm that suggested patch does indeed solve this issue.

Revision history for this message
Ante Karamatić (ivoks) wrote :

Well, it looks like there's a corner case. If I'm reading this log correctly, KVM was created:

2017-02-14 05:14:11 INFO juju.provisioner.kvm kvm-broker.go:66 starting kvm container for containerMachineID: 21/kvm/0
2017-02-14 05:14:12 DEBUG juju.provisioner.kvm container_initialisation.go:214 Bridging [{DeviceName:bondA BridgeName:br-bondA} {DeviceName:bondB BridgeName:br-bondB} {DeviceName:bondM BridgeName:br-bondM}] devices on host "machine-21" for container "machine-21-kvm-0" with delay=17, acquiring lock "machine-lock"

Then while it was booting (with correct network interfaces), this happened:

2017-02-14 05:16:33 INFO juju.network bridge.go:70 bridgescript command=/usr/bin/python2 - --interfaces-to-bridge="bondA bondM" --activate --bridge-prefix=br- --reconfigure-delay=17 /etc/network/interfaces <<'EOF'

One can observe that the host lost connection to juju controller, and that is true. This is because bondM hosts the IP and the route to juju controller. From KVM console one can see that the VM was terminated:

cloud-init-nonet[13.55]: waiting 120 seconds for network device^M
qemu: terminating on signal 15 from pid 33833
2017-02-14 05:17:33.091+0000: shutting down

PID 33833 is libvirtd. Timestamp correlates with juju's entry:

2017-02-14 05:17:33 WARNING juju.provisioner.kvm broker.go:190 unexpected error trying to release container "21/kvm/0" addreses: cannot release static addresses for "21/kvm/0": connection is shut down
2017-02-14 05:17:33 INFO juju.worker runner.go:262 stopped "kvm-provisioner", err: connection is shut down

John A Meinel (jameinel)
Changed in juju:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.