Bootstrap on Openstack fails if there is an IPv6 subnet

Bug #1761706 reported by thomas on 2018-04-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
Eric Claude Jones
2.3
High
Eric Claude Jones

Bug Description

Hi, I having this problem in agent installation script, when trying to deploy juju controller on openstack cloud-provider (ovh), on a ubuntu xenial instance.

Install command is :
juju bootstrap ovh-public-cloud ovh-openstack-sbg1 --config image-metadata-url=https://storage.sbg1.cloud.ovh.net/v1/AUTH_f0c04bb34430403982c05c26a9e934b3/simplestreams/images/ --bootstrap-series xenial --show-log --debug

thanks
thomas

thomas (toms130) wrote :
description: updated
Anastasia (anastasia-macmood) wrote :

This looks like a pnic coming from github.com/juju/juju/network.CalculateOverlaySegment according to the log provided.

I am triaging this as Critical for 2.3.6 (as according to the log, this was a 2.3.5 bootstrap).

At this stage, I am not sure that the panic exists in develop (heading into 2.4-b1).

@thomas (toms130),

Considering that this seems to be coming from networking code, is there something special with your networking setup? I am not sure how to re-produce this at the moment since I am pretty sure we do test a lot of different bootstrap scenarios..

no longer affects: juju-core
Changed in juju:
status: New → Incomplete
thomas (toms130) wrote :

I don't think so, I have default ext-net that I use for other instances.

openstack network list

+--------------------------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | Name | Subnets |
+--------------------------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| 3f4e3b19-4a46-4672-aade-5654d1fc0704 | Ext-Net | 1b0dae3a-4146-4b81-b38a-17d4e5b30f2c, bdc559c9-8f89-4e56-a895-133f60b0262f, d6f02615-65e9-4921-8943-8aa72a744a16, e9e8eec1-5c91-40ec-b471-11a96a151b76 |
+--------------------------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------+

Nicholas Skaggs (nskaggs) wrote :

Is it an Ipv6 only network?

thomas (toms130) wrote :

No, this is a ipv4 ipv6 network.
See below openstack server list of generated instance before rollback

os server list
+--------------------------------------+--------------------------+--------+-----------------------------------------------------+-----------------+-----------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+--------------------------+--------+-----------------------------------------------------+-----------------+-----------+
| 72aaee9c-de21-4a8c-a78b-7d20721506bc | juju-4758ec-controller-0 | ACTIVE | Ext-Net=2001:41d0:401:2000::e:80df, 167.114.243.80 | Ubuntu 16.04 | vps-ssd-2 |

Tim Penhey (thumper) wrote :

Step 1: stop the panic, and log the expectations.

runtime error: index out of range
goroutine 1 [running]:
main.Main.func1()
 /workspace/src/github.com/juju/juju/cmd/jujud/main.go:203 +0xbc
panic(0x290d860, 0x493a120)
 /snap/go/1473/src/runtime/panic.go:505 +0x229
github.com/juju/juju/network.CalculateOverlaySegment(0xc42077d360, 0x17, 0xc420317e60, 0xc420317e90, 0x49e61e0, 0xc420237f00, 0x42)
 /workspace/src/github.com/juju/juju/network/fan.go:74 +0x3a5
github.com/juju/juju/state.(*State).SaveSubnetsFromProvider(0xc420166480, 0xc42058c600, 0x7, 0x8, 0x0, 0x0, 0xc42058c600, 0x7)
 /workspace/src/github.com/juju/juju/state/spacesdiscovery.go:124 +0x70d
github.com/juju/juju/state.(*State).ReloadSpaces(0xc420166480, 0x3224180, 0xc420d1e780, 0x0, 0x0)
 /workspace/src/github.com/juju/juju/state/spacesdiscovery.go:55 +0x20e
github.com/juju/juju/agent/agentbootstrap.InitializeState(0x2ebbe81, 0x5, 0x0, 0x0, 0x7f365e531bc0, 0xc4200fa580, 0xc420320f10, 0x0, 0xc420762c00, 0x10, ...)
 /workspace/src/github.com/juju/juju/agent/agentbootstrap/bootstrap.go:226 +0x14f9
main.(*BootstrapCommand).Run.func2(0x7f365e531bc0, 0xc4200fa580, 0xc4200fa580, 0x7f365e531bc0)
 /workspace/src/github.com/juju/juju/cmd/jujud/bootstrap.go:266 +0x42f
github.com/juju/juju/cmd/jujud/agent.(*agentConf).ChangeConfig(0xc4205f1200, 0xc4207d4b00, 0x0, 0x0)
 /workspace/src/github.com/juju/juju/cmd/jujud/agent/agent.go:103 +0xb0
main.(*BootstrapCommand).Run(0xc4205f1230, 0xc4204a2a00, 0x0, 0x0)
 /workspace/src/github.com/juju/juju/cmd/jujud/bootstrap.go:250 +0xbc0
github.com/juju/cmd.(*SuperCommand).Run(0xc4204fc480, 0xc4204a2a00, 0xc4204a2a00, 0x0)
 /workspace/src/github.com/juju/cmd/supercommand.go:456 +0x2c0
github.com/juju/cmd.Main(0x31f6ac0, 0xc4204fc480, 0xc4204a2a00, 0xc42004c090, 0x7, 0x7, 0x0)
 /workspace/src/github.com/juju/cmd/cmd.go:317 +0x266
main.jujuDMain(0xc42004c080, 0x8, 0x8, 0xc4204a2a00, 0x0, 0x0, 0x0)
 /workspace/src/github.com/juju/juju/cmd/jujud/main.go:186 +0x894
main.Main(0xc42004c080, 0x8, 0x8, 0x0)
 /workspace/src/github.com/juju/juju/cmd/jujud/main.go:219 +0x1d9
main.MainWrapper(0xc42004c080, 0x8, 0x8)
 /workspace/src/github.com/juju/juju/cmd/jujud/main.go:194 +0x3f
main.main()
 /workspace/src/github.com/juju/juju/cmd/jujud/main_nix.go:22 +0x45

Tim Penhey (thumper) wrote :

Step 2: work around the expectations, either make better decisions or leave fan unconfigured.

Changed in juju:
status: Incomplete → Triaged
importance: Undecided → High

newFanIP := underlayNet.IP.To4()

^- That line, newFanIP == nil because the underlying IP is an IPv6 and thus
doesn't have a v4 representation.
We should do:
if newFanIP == nil {
  continue
}

I do think the user can work around this by specifying
container-networking=provider or some other setting that disables fan. (eg
--model-default container-networking=local)

On Tue, Apr 10, 2018 at 1:48 AM, Nicholas Skaggs <
<email address hidden>> wrote:

> ** Changed in: juju/2.3
> Assignee: (unassigned) => Eric Claude Jones (ecjones)
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1761706
>
> Title:
> agent installation fails
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1761706/+subscriptions
>

thomas, it would be useful if you can confirm the workaround John mentioned in comment #8 does indeed unblock you. Try

--model-default container-networking=local

as he mentions.

thomas (toms130) wrote :
Download full text (5.0 KiB)

Hi, I tried with model-default config, but it still fails with the same error...

juju bootstrap ovh-public-cloud ovh-openstack-sbg1 --config image-metadata-url=https://storage.sbg1.cloud.ovh.net/v1/AUTH_f0c04bb34430403982c05c26a9e934b3/simplestreams/images/ --bootstrap-series xenial --model-default container-networking-method=local --show-log --debug

...

2018-04-11 14:41:26 DEBUG juju.state spacesdiscovery.go:50 environ does not support space discovery, falling back to subnet discovery
2018-04-11 14:41:28 DEBUG juju.worker runner.go:223 killing runner 0xc42067cb60
2018-04-11 14:41:28 INFO juju.worker runner.go:313 runner is dying
2018-04-11 14:41:28 DEBUG juju.worker runner.go:456 killing "presence"
2018-04-11 14:41:28 DEBUG juju.worker runner.go:456 killing "pingbatcher"
2018-04-11 14:41:28 DEBUG juju.worker runner.go:456 killing "leadership"
2018-04-11 14:41:28 DEBUG juju.worker runner.go:456 killing "singular"
2018-04-11 14:41:28 DEBUG juju.worker runner.go:456 killing "txnlog"
2018-04-11 14:41:28 INFO juju.worker runner.go:483 stopped "txnlog", err: <nil>
2018-04-11 14:41:28 DEBUG juju.worker runner.go:332 "txnlog" done: <nil>
2018-04-11 14:41:28 DEBUG juju.worker runner.go:395 no restart, removing "txnlog" from known workers
2018-04-11 14:41:28 INFO juju.worker runner.go:483 stopped "presence", err: <nil>
2018-04-11 14:41:28 DEBUG juju.worker runner.go:332 "presence" done: <nil>
2018-04-11 14:41:28 DEBUG juju.worker runner.go:395 no restart, removing "presence" from known workers
2018-04-11 14:41:28 INFO juju.worker runner.go:483 stopped "leadership", err: <nil>
2018-04-11 14:41:28 DEBUG juju.worker runner.go:332 "leadership" done: <nil>
2018-04-11 14:41:28 DEBUG juju.worker runner.go:395 no restart, removing "leadership" from known workers
2018-04-11 14:41:28 INFO juju.worker runner.go:483 stopped "pingbatcher", err: <nil>
2018-04-11 14:41:28 DEBUG juju.worker runner.go:332 "pingbatcher" done: <nil>
2018-04-11 14:41:28 DEBUG juju.worker runner.go:395 no restart, removing "pingbatcher" from known workers
2018-04-11 14:41:28 INFO juju.worker runner.go:483 stopped "singular", err: <nil>
2018-04-11 14:41:28 DEBUG juju.worker runner.go:332 "singular" done: <nil>
2018-04-11 14:41:28 DEBUG juju.worker runner.go:395 no restart, removing "singular" from known workers
2018-04-11 14:41:28 DEBUG juju.state open.go:306 closed state without error
2018-04-11 14:41:28 DEBUG juju.cmd.jujud asm_amd64.s:574 jujud complete, code 0, err <nil>
2018-04-11 14:41:28 CRITICAL juju.cmd.jujud main.go:204 Unhandled panic:
runtime error: index out of range
goroutine 1 [running]:
main.Main.func1()
 /workspace/src/github.com/juju/juju/cmd/jujud/main.go:203 +0xbc
panic(0x290d860, 0x493a120)
 /snap/go/1473/src/runtime/panic.go:505 +0x229
github.com/juju/juju/network.CalculateOverlaySegment(0xc42061f480, 0x17, 0xc420582a80, 0xc420582ab0, 0x49e61e0, 0xc4204f0300, 0x42)
 /workspace/src/github.com/juju/juju/network/fan.go:74 +0x3a5
github.com/juju/juju/state.(*State).SaveSubnetsFromProvider(0xc420489d40, 0xc420465200, 0x4, 0x4, 0x0, 0x0, 0xc420465200, 0x4)
 /workspace/src/github.com/juju/juju/state/spacesdiscovery.go:124 +0x70d
github.com/juju/juju/state.(*State).Reload...

Read more...

Download full text (6.6 KiB)

So the code has:
    fans, err := cfg.FanConfig()
    if err != nil {
        return errors.Trace(err)
    }
    if len(fans) == 0 {
        return nil
    }

So it should only be trying to CalculateOverlaySegment if FanConfig is not
empty.

I know we do some amount of autodetection of whether we *could* run the
fan. Can you make sure that fan-config is set to "" ?

What I also don't understand is that the CalculateOverlaySegment is also
doing:

if underlaySize <= subnetSize && fan.Underlay.Contains(underlayNet.IP) {
...
  newFanIP := underlayNet.IP.To4()

I don't quite see how fan.Underlay would end up saying that its CIDR
"Contains" an IPv6 address.

I know we do some amount of "autodetect what a possible fan config could
be", but I haven't found that code yet. It's possible we implemented
something on Openstack that somehow automatically generates fan config that
includes IPv6 addresses, when we know that it never should.

On Wed, Apr 11, 2018 at 6:41 PM, thomas <email address hidden> wrote:

> Hi, I tried with model-default config, but it still fails with the same
> error...
>
> juju bootstrap ovh-public-cloud ovh-openstack-sbg1 --config image-
> metadata-
> url=https://storage.sbg1.cloud.ovh.net/v1/AUTH_
> f0c04bb34430403982c05c26a9e934b3/simplestreams/images/
> --bootstrap-series xenial --model-default container-networking-
> method=local --show-log --debug
>
> ...
>
> 2018-04-11 14:41:26 DEBUG juju.state spacesdiscovery.go:50 environ does
> not support space discovery, falling back to subnet discovery
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:223 killing runner
> 0xc42067cb60
> 2018-04-11 14:41:28 INFO juju.worker runner.go:313 runner is dying
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:456 killing "presence"
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:456 killing "pingbatcher"
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:456 killing "leadership"
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:456 killing "singular"
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:456 killing "txnlog"
> 2018-04-11 14:41:28 INFO juju.worker runner.go:483 stopped "txnlog", err:
> <nil>
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:332 "txnlog" done: <nil>
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:395 no restart, removing
> "txnlog" from known workers
> 2018-04-11 14:41:28 INFO juju.worker runner.go:483 stopped "presence",
> err: <nil>
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:332 "presence" done: <nil>
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:395 no restart, removing
> "presence" from known workers
> 2018-04-11 14:41:28 INFO juju.worker runner.go:483 stopped "leadership",
> err: <nil>
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:332 "leadership" done:
> <nil>
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:395 no restart, removing
> "leadership" from known workers
> 2018-04-11 14:41:28 INFO juju.worker runner.go:483 stopped "pingbatcher",
> err: <nil>
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:332 "pingbatcher" done:
> <nil>
> 2018-04-11 14:41:28 DEBUG juju.worker runner.go:395 no restart, removing
> "pingbatcher" from known workers
> 2018-04-11 14:41:28 INFO juju.worker runner....

Read more...

Hi john, I tried with adding --model-default fan-config=" to be sure, but I have similar result.

John A Meinel (jameinel) on 2018-04-16
Changed in juju:
assignee: nobody → Eric Claude Jones (ecjones)
Changed in juju:
milestone: none → 2.4-beta1
status: Triaged → Fix Committed
John A Meinel (jameinel) on 2018-04-23
summary: - agent installation fails
+ Bootstrap on Openstack fails if there is an IPv6 subnet
John A Meinel (jameinel) wrote :
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments