panic: malformed characteristic "-" while bootstrapping juju controller for vsphere cloud

Bug #1895756 reported by Kevin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Ben Hoyt

Bug Description

I am trying to set up a juju controller for vsphere following the guide at https://juju.is/docs/vsphere-cloud

However, the bootstrap fails when trying to connect to the API stating connection refused on port 17070.
I added flags to my next bootstrap to not destroy on failure and I SSH'ed into the failed controller and got the log attached. The most prominent error was the following:

panic: malformed characteristic "-"

goroutine 517 [running]:
github.com/juju/juju/core/instance.MustParseHardware(...)
        /workspace/_build/src/github.com/juju/juju/core/instance/hardwarecharacteristics.go:86
github.com/juju/juju/core/instance.(*HardwareCharacteristics).Clone(0xc000762c40, 0xc000183680)
        /workspace/_build/src/github.com/juju/juju/core/instance/hardwarecharacteristics.go:77 +0x25e
github.com/juju/juju/core/multiwatcher.(*MachineInfo).Clone(0xc0001b2a00, 0x515a760, 0xc000926d80)
        /workspace/_build/src/github.com/juju/juju/core/multiwatcher/types.go:117 +0x8e
github.com/juju/juju/core/multiwatcher.(*store).ChangesSince(0xc0006fbef0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /workspace/_build/src/github.com/juju/juju/core/multiwatcher/store.go:268 +0x1d5
github.com/juju/juju/worker/multiwatcher.(*Worker).respond(0xc00072bc20)
        /workspace/_build/src/github.com/juju/juju/worker/multiwatcher/worker.go:456 +0x1a9
github.com/juju/juju/worker/multiwatcher.(*Worker).process(0xc00072bc20, 0x51bc060, 0xc00012ee40, 0xc0000fcfc0, 0x4856f1a, 0xa)
        /workspace/_build/src/github.com/juju/juju/worker/multiwatcher/worker.go:388 +0x94
github.com/juju/juju/worker/multiwatcher.(*Worker).inner.func1(0xc00072bc20, 0x51bc060, 0xc00012ee40, 0xc0000fd380, 0xc0006cfaa0)
        /workspace/_build/src/github.com/juju/juju/worker/multiwatcher/worker.go:284 +0x7f
created by github.com/juju/juju/worker/multiwatcher.(*Worker).inner
        /workspace/_build/src/github.com/juju/juju/worker/multiwatcher/worker.go:283 +0x238
2020-09-15 13:26:52 INFO juju.cmd supercommand.go:54 running jujud [2.8.2 0 a44e6eb38430da695737f5e9f37819478b9587c3 gc go1.14.9]
2020-09-15 13:26:52 DEBUG juju.cmd supercommand.go:55 args: []string{"/var/lib/juju/tools/machine-0/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "0", "--debug"}

I am running vsphere 6.7 on an ESXI 6.5 server
The network both the initial node doing the bootstrapping and the to-be controller are on the same ESXI distributed port group which was passed in as the primary-network adapter. The network they are connected to is a VLAN

I would be happy to provide any other information you may need.

Revision history for this message
Kevin (kmanh999) wrote :
description: updated
Revision history for this message
Ben Hoyt (benhoyt) wrote :

Just posting a quick update here. This is caused by our HardwareCharacteristics.Clone() method cloning by calling .String() and re-parsing. But this is not round-trippable if the string fields (Arch, RootDiskSource, Tags, AvailabilityZone) have spaces in them. In this example I suspect the root disk source or availability zone (cluster/host/resource pool) has a space in it followed by "-" and is causing this issue.

I'm going to fix .Clone() to not clone via "string and re-parse", but by creating a new object and copying the fields over. This should make it into 2.8.4 which is coming out in a week or so.

If you can attach the output of "govc find" (https://github.com/vmware/govmomi/tree/master/govc) we could confirm this in your case.

Changed in juju:
assignee: nobody → Ben Hoyt (benhoyt)
status: New → In Progress
importance: Undecided → High
Revision history for this message
Kevin (kmanh999) wrote :

Hi Ben, just wanted to immediately follow up to confirm that the Data center name has a space, the primary network has a space and the datastore is titled "NAS - Data". I would guess you are right on the money that that datastore is causing the issue. I will run that to confirm for you. Would you also like me to rename the datastore to something with out spaces to test?

Revision history for this message
Ben Hoyt (benhoyt) wrote :

Yeah, that'd be great if it's not too much trouble. If you are able to rename it to NAS_Data or something, that should work around this issue. I also have a draft fix for not handling spaces, but it won't be submitted till next week.

Revision history for this message
Kevin (kmanh999) wrote :

Sounds good, easy to rename it as a workaround. Thank you for the fast triage!

Ben Hoyt (benhoyt)
Changed in juju:
status: In Progress → Fix Committed
John A Meinel (jameinel)
Changed in juju:
milestone: none → 2.8.4
Revision history for this message
Ben Hoyt (benhoyt) wrote :
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.