vSphere cloud provider stuck at Extending the disk

Bug #1790183 reported by Vuk Vasic
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Christian Muirhead

Bug Description

When we deploy canonical kubernetes on VMware vSphere 6.0 with custom constraints of root-disk in bundle the deploy gets stuck in for example: Extending the disk to 150G without any feedback.

Then we cannot delete the machine. --force in remove-machine does not help either.

We have VMware vSphere with two clusters. If we remove constraint of root-disk everything passes smoothly.

Tim Penhey (thumper)
Changed in juju:
status: New → Triaged
importance: Undecided → Medium
tags: added: vsphere-provider
Revision history for this message
F.B. (flaar) wrote :

We are deploying canonical kubernetes on VMware vSphere 6.7. and are probably hitting the same hard point...

No custom constraint, standard deployment with 16GB workers, and it remains pending on "extending disk to 16GiB"

On the logs, a normal machine extending to 8GB is OK :
2019-01-16 07:03:59 DEBUG juju.apiserver.provisioner provisioner.go:1139 SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-2", Status:"allocating", Info:"VM cloned", Data:map[string]interface {}(nil)}
2019-01-16 07:03:59 DEBUG juju.apiserver.provisioner provisioner.go:1139 SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-2", Status:"allocating", Info:"extending disk to 8.0GiB", Data:map[string]interface {}(nil)}
2019-01-16 07:04:00 DEBUG juju.apiserver.provisioner provisioner.go:1139 SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-2", Status:"allocating", Info:"powering on", Data:map[string]interface {}(nil)}

But this machine with 16GB is stuck, remaining pending :
2019-01-16 07:07:25 DEBUG juju.apiserver.provisioner provisioner.go:1139 SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-7", Status:"allocating", Info:"VM cloned", Data:map[string]interface {}(nil)}
2019-01-16 07:07:26 DEBUG juju.apiserver.provisioner provisioner.go:1139 SetInstanceStatus called with: params.EntityStatusArgs{Tag:"machine-7", Status:"allocating", Info:"extending disk to 16GiB", Data:map[string]interface {}(nil)}

On the VM side, the vmdk has been extended to 16GB, but the vm is never started.

I have managed to remove the machine with --force after having removed the VM.

A new deploy can then be tried again, but fails at the same point
-> canonical kubernetes cannot be deployed on VSphere

Unit Workload Agent Machine Public address Ports Message
easyrsa/1* active idle 10 10.35.20.222 Certificate Authority connected.
etcd/3* active idle 11 10.35.20.221 2379/tcp Healthy with 2 known peers
etcd/4 active idle 12 10.35.20.220 2379/tcp Healthy with 2 known peers
etcd/5 waiting allocating 13 waiting for machine
...
kubernetes-worker/5 waiting allocating 19 waiting for machine
Entity Meter status Message
model amber user verification pending
Machine State DNS Inst id Series AZ Message
10 started 10.35.20.222 juju-c22e19-10 bionic poweredOn
11 started 10.35.20.221 juju-c22e19-11 bionic poweredOn
12 started 10.35.20.220 juju-c22e19-12 bionic poweredOn
...
19 pending pending bionic extending disk to 16GiB

Ian Booth (wallyworld)
Changed in juju:
assignee: nobody → Christian Muirhead (2-xtian)
importance: Medium → High
milestone: none → 2.5.1
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.5.1 → 2.5.2
Revision history for this message
Vuk Vasic (vukvasic) wrote :

I tested this and it seems that the issue is with user privileges. There should be a Document defining which roles and privileges are required to deploy Cluster on VMWare

Changed in juju:
milestone: 2.5.2 → 2.5.3
Revision history for this message
Christian Muirhead (2-xtian) wrote :

There's definitely a permission error coming back from the vSphere API when we try to check the status of the extend disk task, but I can't work out why it's happening. The task itself succeeds (as you see in the web client), but asking for the task state fails with this error coming back from the PropertyCollector.WaitForUpdatesEx method:

&types.UpdateSet{
    DynamicData: types.DynamicData{},
    Version: "1",
    FilterSet: {
        {
            DynamicData: types.DynamicData{},
            Filter: types.ManagedObjectReference{Type:"PropertyFilter", Value:"session[524ebcf8-83c1-088e-4bff-5cae1d4432f6]52cc13ca-5757-a877-cf7e-11609c8fbc26"},
            ObjectSet: {
                {
                    DynamicData: types.DynamicData{},
                    Kind: "enter",
                    Obj: types.ManagedObjectReference{Type:"Task", Value:"task-153176"},
                    ChangeSet: nil,
                    MissingSet: {
                        {
                            DynamicData: types.DynamicData{},
                            Path: "info",
                            Fault: types.LocalizedMethodFault{
                                DynamicData: types.DynamicData{},
                                Fault: &types.NoPermission{
                                    SecurityError: types.SecurityError{},
                                    Object: types.ManagedObjectReference{Type:"Folder", Value:"group-d1"},
                                    PrivilegeId: "System.Read",
                                },
                                LocalizedMessage: "",
                            },
                        },
                    },
                },
            },
            MissingSet: nil,
        },
    },
    Truncated: (*bool)(nil),
}

The user is an administrator in that data center, and we can query the state of the other tasks we create, so I can't see why this task would need specific permissions.

Revision history for this message
Christian Muirhead (2-xtian) wrote :

In any case I've got a PR with a workaround that polls the VM disk size here: https://github.com/juju/juju/pull/9951

Changed in juju:
milestone: 2.5.3 → 2.5.4
Changed in juju:
status: Triaged → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.