cannot start bootstrap instance: agent binary info mismatch

Bug #1745951 reported by Kevin Wennemuth on 2018-01-29
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
Medium
Anastasia
2.3
Medium
Anastasia

Bug Description

juju bootstrap --auto-upgrade --config=./lab.yaml vsphere juju
Creating Juju controller "juju" on vsphere/esx.power.lab
Looking for packaged Juju agent version <nil> for amd64
Launching controller instance(s) on vsphere/esx.power.lab...
ERROR failed to bootstrap model: cannot start bootstrap instance: agent binary info mismatch ({2.3.0-xenial-amd64 0dbc7bafc0bb9b6d62846d617591755e50547f13e16881d6571d39b31b20335a 28011270}, {2.3.1-xenial-amd64 72bed4104b050a63f1d026a889d8fc9bdfb13f9b04b47ef3b762967774e1a187 28017324})

lab.yml
-----------
---
primary-network: "VM Network"
external-network: "VM Network"
datastore: ds00

any hints on this?

Kevin Wennemuth (feffi) wrote :
Download full text (5.5 KiB)

juju bootstrap --auto-upgrade --config=./lab.yaml vsphere juju --debug
10:38:06 INFO juju.cmd supercommand.go:56 running juju [2.3.1 gc go1.9.2]
10:38:06 DEBUG juju.cmd supercommand.go:57 args: []string{"juju", "bootstrap", "--to", "zone=esx.power.lab", "--auto-upgrade", "--config=./lab.yaml", "vsphere", "juju", "--debug"}
10:38:06 DEBUG juju.cmd.juju.commands bootstrap.go:826 authenticating with region "" and credential "feffi" ()
10:38:06 DEBUG juju.cmd.juju.commands bootstrap.go:954 provider attrs: map[]
10:38:06 INFO cmd authkeys.go:114 Adding contents of "/root/.local/share/juju/ssh/juju_id_rsa.pub" to authorized-keys
10:38:06 DEBUG juju.cmd.juju.commands bootstrap.go:1010 preparing controller with config: map[primary-network:VM Network enable-os-upgrade:true http-proxy: enable-os-refresh-update:true transmit-vendor-metrics:true cloudinit-userdata: agent-stream:released apt-mirror: disable-network-management:false test-mode:false proxy-ssh:false https-proxy: apt-https-proxy: net-bond-reconfigure-delay:17 resource-tags: update-status-hook-interval:5m apt-http-proxy: uuid:92eecc07-571e-4510-89c4-9907a0d0a5d5 default-series:xenial fan-config: image-stream:released logforward-enabled:false no-proxy:127.0.0.1,localhost,::1 max-action-results-age:336h container-networking-method: datastore:ds00 ftp-proxy: egress-subnets: name:controller apt-ftp-proxy: logging-config: ssl-hostname-verification:true agent-metadata-url: max-status-history-age:336h authorized-keys:********************************************** juju-client-key
 type:vsphere apt-no-proxy: max-status-history-size:5G max-action-results-size:5G external-network:VM Network image-metadata-url: automatically-retry-hooks:true ignore-machine-addresses:false development:false provisioner-harvest-mode:destroyed firewall-mode:instance]
10:38:06 INFO cmd bootstrap.go:499 Creating Juju controller "juju" on vsphere/esx.power.lab
10:38:06 INFO juju.cmd.juju.commands bootstrap.go:563 combined bootstrap constraints:
10:38:06 DEBUG juju.environs.bootstrap bootstrap.go:199 model "controller" supports service/machine networks: false
10:38:06 DEBUG juju.environs.bootstrap bootstrap.go:201 network management by juju enabled: true
10:38:06 INFO cmd bootstrap.go:233 Loading image metadata
10:38:06 INFO cmd bootstrap.go:296 Looking for packaged Juju agent version <nil> for amd64
10:38:06 INFO juju.environs.bootstrap tools.go:72 looking for bootstrap agent binaries: version=<nil>
10:38:06 DEBUG juju.environs.tools tools.go:102 finding agent binaries in stream: "released"
10:38:06 DEBUG juju.environs.tools tools.go:104 reading agent binaries with major.minor version 2.3
10:38:06 DEBUG juju.environs.tools tools.go:118 filtering agent binaries by architecture: amd64
10:38:06 DEBUG juju.environs.tools urls.go:109 trying datasource "keystone catalog"
10:38:07 DEBUG juju.environs.simplestreams simplestreams.go:683 using default candidate for content id "com.ubuntu.juju:released:tools" are {20161007 mirrors:1.0 content-download streams/v1/cpc-mirrors.sjson []}
10:38:07 INFO juju.environs.bootstrap tools.go:74 found 48 packaged agent binaries
10:38:07 INFO cmd bootstrap.go:371 Starting new instance for in...

Read more...

Kevin Wennemuth (feffi) wrote :

If I supply a agent version, it works though:

juju bootstrap --auto-upgrade --config=./lab.yaml --agent-version 2.3.1-xenial-amd64 vsphere juju

Kevin Wennemuth (feffi) wrote :

This also happens on juju 2.3.1

Kevin Wennemuth (feffi) wrote :

ok, another hint: If I leave out the "--auto-upgrade" flag, magic happens, and the above error went away on both juju 2.3.1 and 2.3.2.

Kevin Wennemuth (feffi) wrote :

the error message for the above errors:

2018-01-29 10:33:12 DEBUG cmd supercommand.go:459 error stack:
github.com/juju/juju/state/model.go:428: model already exists
github.com/juju/juju/state/model.go:432:
github.com/juju/juju/agent/agentbootstrap/bootstrap.go:216: creating hosted model
github.com/juju/juju/cmd/jujud/agent/agent.go:104:
2018-01-29 10:33:12 DEBUG juju.cmd.jujud main.go:187 jujud complete, code 0, err <nil>
11:33:12 ERROR juju.cmd.juju.commands bootstrap.go:519 failed to bootstrap model: subprocess encountered error code 1
11:33:12 DEBUG juju.cmd.juju.commands bootstrap.go:520 (error details: [{github.com/juju/juju/cmd/juju/commands/bootstrap.go:611: failed to bootstrap model} {subprocess encountered error code 1}])
11:33:12 DEBUG juju.cmd.juju.commands bootstrap.go:1117 cleaning up after failed bootstrap
11:33:12 INFO juju.provider.common destroy.go:20 destroying model "controller"

again, the model exists error...

I'm not sure why exactly this is happening. Some things that seem odd from
your logs:
 Looking for packaged Juju agent version <nil> for amd64

Why would we be looking for a "<nil>" version agent? I would have expected
that we would be looking at the clients Version information which is
clearly:
 10:38:06 INFO juju.cmd supercommand.go:56 running juju [2.3.1 gc go1.9.2]

Going further, the error you are seeing is from this code:
func (cfg *InstanceConfig) SetTools(toolsList coretools.List) error {
    if len(toolsList) == 0 {
        return errors.New("need at least 1 agent binary")
    }
    var tools *coretools.Tools
    for _, listed := range toolsList {
        if listed == nil {
            return errors.New("nil entry in agent binaries list")
        }
        info := *listed
        info.URL = ""
        if tools == nil {
            tools = &info
            continue
        }
        if !reflect.DeepEqual(info, *tools) {
            return errors.Errorf("agent binary info mismatch (%v, %v)",
*tools, info)
        }
    }
    cfg.tools = copyToolsList(toolsList)
    return nil
}
Which if I read it correctly, is essentially saying the version of every
entry in the list should match.
I have the feeling the issue goes back to that original "nil" might be part
of the problem.

On Mon, Jan 29, 2018 at 2:26 PM, Kevin Wennemuth <<email address hidden>
> wrote:

> This also happens on juju 2.3.1
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1745951
>
> Title:
> cannot start bootstrap instance: agent binary info mismatch
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1745951/+subscriptions
>

Kevin Wennemuth (feffi) wrote :

Hi John,

I think you're right. As soon as I add the agent-version-flag, it disappears, but crashes later on.

Kevin

> Am 29.01.2018 um 12:15 schrieb John A Meinel <email address hidden>:
>
> I'm not sure why exactly this is happening. Some things that seem odd from
> your logs:
> Looking for packaged Juju agent version <nil> for amd64
>
> Why would we be looking for a "<nil>" version agent? I would have expected
> that we would be looking at the clients Version information which is
> clearly:
> 10:38:06 INFO juju.cmd supercommand.go:56 running juju [2.3.1 gc go1.9.2]
>
> Going further, the error you are seeing is from this code:
> func (cfg *InstanceConfig) SetTools(toolsList coretools.List) error {
> if len(toolsList) == 0 {
> return errors.New("need at least 1 agent binary")
> }
> var tools *coretools.Tools
> for _, listed := range toolsList {
> if listed == nil {
> return errors.New("nil entry in agent binaries list")
> }
> info := *listed
> info.URL = ""
> if tools == nil {
> tools = &info
> continue
> }
> if !reflect.DeepEqual(info, *tools) {
> return errors.Errorf("agent binary info mismatch (%v, %v)",
> *tools, info)
> }
> }
> cfg.tools = copyToolsList(toolsList)
> return nil
> }
> Which if I read it correctly, is essentially saying the version of every
> entry in the list should match.
> I have the feeling the issue goes back to that original "nil" might be part
> of the problem.
>
>
> On Mon, Jan 29, 2018 at 2:26 PM, Kevin Wennemuth <<email address hidden>
>> wrote:
>
>> This also happens on juju 2.3.1
>>
>> --
>> You received this bug notification because you are subscribed to juju.
>> Matching subscriptions: juju bugs
>> https://bugs.launchpad.net/bugs/1745951
>>
>> Title:
>> cannot start bootstrap instance: agent binary info mismatch
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/juju/+bug/1745951/+subscriptions
>>
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1745951
>
> Title:
> cannot start bootstrap instance: agent binary info mismatch
>
> Status in juju:
> New
>
> Bug description:
> juju bootstrap --auto-upgrade --config=./lab.yaml vsphere juju
> Creating Juju controller "juju" on vsphere/esx.power.lab
> Looking for packaged Juju agent version <nil> for amd64
> Launching controller instance(s) on vsphere/esx.power.lab...
> ERROR failed to bootstrap model: cannot start bootstrap instance: agent binary info mismatch ({2.3.0-xenial-amd64 0dbc7bafc0bb9b6d62846d617591755e50547f13e16881d6571d39b31b20335a 28011270}, {2.3.1-xenial-amd64 72bed4104b050a63f1d026a889d8fc9bdfb13f9b04b47ef3b762967774e1a187 28017324})
>
>
> lab.yml
> -----------
> ---
> primary-network: "VM Network"
> external-network: "VM Network"
> datastore: ds00
>
> any hints on this?
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1745951/+subscriptions

Anastasia (anastasia-macmood) wrote :

@Kevin Wennemuth (feffi),

I think we want to know how you've bootstrapped this environment to have agent version <nil> to start with. This is a first for us, I think :)

Changed in juju:
status: New → Incomplete
John A Meinel (jameinel) wrote :

I also think we had a different bug around "model already exists" which
we've fixed. I don't remember the details of what the fix was but I do
remember we encountered that recently.

On Wed, Jan 31, 2018 at 7:07 AM, Anastasia <email address hidden>
wrote:

> @Kevin Wennemuth (feffi),
>
> I think we want to know how you've bootstrapped this environment to have
> agent version <nil> to start with. This is a first for us, I think :)
>
> ** Changed in: juju
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1745951
>
> Title:
> cannot start bootstrap instance: agent binary info mismatch
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1745951/+subscriptions
>

Kevin Wennemuth (feffi) wrote :

Hi, what do you need? Cloud.yaml? The environment is a vsphere 6.5.

@john: yes, the previous „model error“ has been fixed in 2.3.2 and was based on an escaping error of zones containing a dot (.) in mongodb.

Kevin

Anastasia (anastasia-macmood) wrote :

@Kevin Wennemuth (feffi),

I've marked this report as Incomplete because we have not seen 'agent nil' before. I was going to try to reproduce / find vsphere tomorrow.

Meanwhile, if you can think of anything that would have given us nil, it'd be useful... Maybe you've bootstrapped different version from an odd client or specifying odd agent version to bootstrap?

Kevin Wennemuth (feffi) wrote :

Hi Anastasia,

Nope, it‘s a clean install of juju and removal of .local.

The only hint I can give: As soon as I use the --auto-upgrade flag, it breaks. Without the flag, it deploys smoothly.

Kevin

> Am 31.01.2018 um 08:40 schrieb Anastasia <email address hidden>:
>
> @Kevin Wennemuth (feffi),
>
> I've marked this report as Incomplete because we have not seen 'agent
> nil' before. I was going to try to reproduce / find vsphere tomorrow.
>
> Meanwhile, if you can think of anything that would have given us nil,
> it'd be useful... Maybe you've bootstrapped different version from an
> odd client or specifying odd agent version to bootstrap?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1745951
>
> Title:
> cannot start bootstrap instance: agent binary info mismatch
>
> Status in juju:
> Incomplete
>
> Bug description:
> juju bootstrap --auto-upgrade --config=./lab.yaml vsphere juju
> Creating Juju controller "juju" on vsphere/esx.power.lab
> Looking for packaged Juju agent version <nil> for amd64
> Launching controller instance(s) on vsphere/esx.power.lab...
> ERROR failed to bootstrap model: cannot start bootstrap instance: agent binary info mismatch ({2.3.0-xenial-amd64 0dbc7bafc0bb9b6d62846d617591755e50547f13e16881d6571d39b31b20335a 28011270}, {2.3.1-xenial-amd64 72bed4104b050a63f1d026a889d8fc9bdfb13f9b04b47ef3b762967774e1a187 28017324})
>
>
> lab.yml
> -----------
> ---
> primary-network: "VM Network"
> external-network: "VM Network"
> datastore: ds00
>
> any hints on this?
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1745951/+subscriptions

Anastasia (anastasia-macmood) wrote :

I can reproduce this independently from provider.

It looks like the problem is caused by using --auto-upgrade option. Working on the patch...

Changed in juju:
status: Incomplete → In Progress
assignee: nobody → Anastasia (anastasia-macmood)
importance: Undecided → Medium
Anastasia (anastasia-macmood) wrote :
Changed in juju:
milestone: none → 2.4-beta1
Anastasia (anastasia-macmood) wrote :

PR against develop (2.4.x): https://github.com/juju/juju/pull/8341

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers