Can't bootstrap openstack in some cases where compute AZ exist but not networking AZs

Bug #1689683 reported by Heather Lanigan
50
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Heather Lanigan

Bug Description

The default availability zone for nova is "nova" same as for neutron. However there is not a direct correlation between availability zones unless so made by the openstack administrator. It's possible to have multiple availability zone for compute and only 1, the default for neutron. In that case, juju bootstrap should still succeed.

$ juju bootstrap --debug --metadata-source /home/heather/simplestreams/images --config network=ubuntu-net --config use-floating-ip=true --to zone=second local-openstack local-openstack-azmismatch
20:26:45 INFO juju.cmd supercommand.go:63 running juju [2.2-beta4 gc go1.8.1]
...
20:26:53 DEBUG juju.provider.openstack provider.go:1017 using network id "02a167f4-af37-4068-9283-5c72cc830276"
20:26:54 INFO juju.provider.openstack provider.go:1141 trying to build instance in availability zone "second"
20:27:15 INFO juju.provider.openstack provider.go:1184 started instance "32588d6f-a186-4bc6-bafb-8922241d4a23"2
20:27:15 DEBUG juju.provider.openstack provider.go:1188 allocating public IP address for openstack node
20:27:15 ERROR juju.cmd.juju.commands bootstrap.go:491 failed to bootstrap model: cannot start bootstrap instance: cannot allocate a public IP as needed: could not find an external network in availablity zone
20:27:15 DEBUG juju.cmd.juju.commands bootstrap.go:492 (error details: [{github.com/juju/juju/cmd/juju/commands/bootstrap.go:583: failed to bootstrap model} {github.com/juju/juju/provider/common/bootstrap.go:50: } {github.com/juju/juju/provider/common/bootstrap.go:185: cannot start bootstrap instance} {github.com/juju/juju/provider/openstack/provider.go:1190: cannot allocate a public IP as needed} {github.com/juju/juju/provider/openstack/networking.go:168: could not find an external network in availablity zone}])
...

$ neutron availability-zone-list
+------+----------+-----------+
| name | resource | state |
+------+----------+-----------+
| nova | router | available |
| nova | network | available |
+------+----------+-----------+
$ nova availability-zone-list
+-----------------------+----------------------------------------+
| Name | Status |
+-----------------------+----------------------------------------+
| internal | available |
| |- juju-350b2c-10 | |
| | |- nova-conductor | enabled :-) 2017-05-10T00:30:25.000000 |
| | |- nova-consoleauth | enabled :-) 2017-05-10T00:30:21.000000 |
| | |- nova-scheduler | enabled :-) 2017-05-10T00:30:24.000000 |
| | |- nova-cert | enabled :-) 2017-05-10T00:30:21.000000 |
| second | available |
| |- juju-350b2c-20 | |
| | |- nova-compute | enabled :-) 2017-05-10T00:30:19.000000 |
| nova | not available |
+-----------------------+----------------------------------------+

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Additional information posted by anrah in:
https://bugs.launchpad.net/juju/+bug/1654144

We have also OpenStack deployment where external network has no availability zone, and we get the same error with 2.2.beta4

06:13:51 DEBUG juju.cloudconfig.instancecfg instancecfg.go:825 Setting numa ctl preference to false
06:13:51 DEBUG juju.service discovery.go:63 discovered init system "systemd" from series "xenial"
06:13:51 DEBUG juju.provider.openstack provider.go:1005 openstack user data; 2484 bytes
06:13:51 DEBUG juju.provider.openstack provider.go:1017 using network id "5c7cd500-c581-4491-86fa-af95a71e8c18"
06:13:51 DEBUG goose <autogenerated>:22 performing API version discovery for "https://lab.openstack.example.com:9696/"
06:13:51 DEBUG goose <autogenerated>:22 discovered API versions: [{Version:{major:2 minor:0} Links:[{Href:http://lab.openstack.example.com:9696/v2.0 Rel:self}] Status:CURRENT}]
06:13:58 INFO juju.provider.openstack provider.go:1141 trying to build instance in availability zone "nova"
06:14:22 INFO juju.provider.openstack provider.go:1184 started instance "92b9c8bc-fa5c-4eee-ad1e-5b9dffb5e229"2
06:14:22 DEBUG juju.provider.openstack provider.go:1188 allocating public IP address for openstack node
06:14:24 INFO cmd bootstrap.go:490 bootstrap failed but --keep-broken was specified so resources are not being destroyed.
When you have finished diagnosing the problem, remember to clean up the failed controller.
See `juju kill-controller`.
ERROR failed to bootstrap model: cannot start bootstrap instance: cannot allocate a public IP as needed: could not find an external network in availablity zone
06:14:24 DEBUG cmd supercommand.go:459 error stack:
github.com/juju/juju/provider/openstack/networking.go:168: could not find an external network in availablity zone
github.com/juju/juju/provider/openstack/provider.go:1190: cannot allocate a public IP as needed
github.com/juju/juju/provider/common/bootstrap.go:185: cannot start bootstrap instance
github.com/juju/juju/provider/common/bootstrap.go:50:
github.com/juju/juju/cmd/juju/commands/bootstrap.go:584: failed to bootstrap model

I'm happy to provide more information when needed.

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.2-rc1
importance: Undecided → High
status: New → Triaged
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Also from 1654144 - Per Bruno Carvalho (brunowcs):

juju version
2.2-beta4-xenial-amd64

21:23:48 DEBUG juju.provider.openstack provider.go:1133 allocating public IP address for openstack node
21:23:49 ERROR juju.cmd.juju.commands bootstrap.go:491 failed to bootstrap model: cannot start bootstrap instance: cannot allocate a public IP as needed: could not find an external network in availablity zone

----

I am using nuage with neutron plugin for SDN I believe the compatibility is in the form of my implementation, since my external network is configured without AZ

# openstack network show 42f7deed-1410-408e-ad4b-c1394ca47b40(MY EXTERNAL ID)
....
Availability_zones | None
....

I will check the possibility of updating external networks already created for AZ, using the nuage neutron plugin.

Changed in juju:
status: Triaged → In Progress
Ryan Beisner (1chb1n)
tags: added: usability
Changed in juju:
status: In Progress → Triaged
milestone: 2.2-rc1 → none
Revision history for this message
Antti Rahikainen (anrah) wrote :

Goose bug (https://github.com/go-goose/goose/pull/41) is resolved. Any guess when this might released in Juju?

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

@anrah, goose PR41 was a different bug (https://bugs.launchpad.net/juju/+bug/1654144) and not expected to resolve this one. Currently investigating a sane way to fix with the OpenStack folks. And improve OpenStack network selection along the way if possible.

Ryan Beisner (1chb1n)
tags: added: uosci
Tim Penhey (thumper)
tags: added: new-york
Ante Karamatić (ivoks)
tags: added: cpe-onsite
Revision history for this message
Tomek Osiński (osinstom) wrote :

Hi, what is a status of this bug? Is there some temporary workaround?

Revision history for this message
James Page (james-page) wrote :

Reading this bug I think I'm missing something on what Juju expects from Neutron AZ's in terms of choosing networks etc.

AFAIK the AZ implementation in Neutron is designed to allow network services for a particular resource to be spread across underlying physical/network/power failure domains - so for example you could create a network:

 $ neutron net-create --availability-zone-hint AZ1 \
      --availability-zone-hint AZ2 new_network

which would schedule the various network namespaces and daemons (dnsmasq, metadata proxies etc) across agents residing in zones AZ1 and AZ2. Likewise went a router is created, to provide north/south network traffic flows from the tenant network to the outside world, its created as a HA router with zone hinting to multiple availability zones:

 $ neutron router-create --availability-zone-hint AZ1 \
      --availability-zone-hint AZ2 --ha new_router

I think that reading:

"21:23:49 ERROR juju.cmd.juju.commands bootstrap.go:491 failed to bootstrap model: cannot start bootstrap instance: cannot allocate a public IP as needed: could not find an external network in availablity zone"

Juju is expecting to find an external network in the same AZ as the instance being created - but that does not sound sensible to me - the instance availability zone locates and instance to a specific failure domain in the DC, whereas the network availability zone allows a network or routers resources to be spread across underlying failure domains.

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Based on input (#6) from @james-page, it appears that juju should remove any correlation between compute and networking AZ in the OpenStack provider. Juju should also choose an external network based on the AZ of the instance's network if an external network is not specified by the user, when looking for appropriate FIPs to assign to an instance.

Changed in juju:
milestone: none → 2.3-alpha1
status: Triaged → In Progress
Revision history for this message
Heather Lanigan (hmlanigan) wrote :
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

I think we also need to figure out the same problem for Cinder AZs and multi storage backend scenarios.

https://github.com/juju/juju/blob/d7ab142/provider/openstack/provider.go#L1275-L1279

https://github.com/openstack/cinder/blob/master/releasenotes/notes/per-backend-az-28727aca360a1cc8.yaml

" Availability zones may now be configured per backend in a multi-backend
    configuration. Individual backend sections can now set the configuration
    option ``backend_availability_zone``. If set, this value will override
    the [DEFAULT] ``storage_availability_zone`` setting.
"

In my view, in a certain way we have a domain-specific language (DSL) per cloud to describe compute, storage and networking for machine allocation.

Juju implements availability zones (without specifying which ones exactly) and constraints, however, there are currently no good constraints/abstractions for multi-AZ, multi provider network, multi storage backend scenarios.

https://jujucharms.com/docs/2.2/charms-storage#openstack/cinder-(cinder)
"The OpenStack/Cinder provider does not currently have any specific configuration options."

For the networking side there is only one external network to allocate FIPs from, however, one might have multiple provider networks in general which is not modeled.

I think this deserves a separate wishlist type of issue.

Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

@Tomek,

A fix for this bug has been committed, however I don't have an OpenStack configuration where this can be reproduced right now. It'd be helpful if you could install the juju snap using the edge channel to verify? The fix should be in the snap by tomorrow.

Two things to try:
1. Bootstrap with --config use-floating-ip=true and --config network=<net-name>.

I've also updated to code such that --config external-network will do no validation other than the network exists, so if the above fails, using external-network should work.

Thank you for you help.
Heather

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

@Dmitrii, sounds like we need a new wish list bug. Out of curiosity, what granularity are you thinking of for specifying different networks, storage etc?

Revision history for this message
Nobuto Murata (nobuto) wrote :

@Heather,

Although my client or I are not the original reporter, my client told me that the edge version of Juju worked for them with 3 AZs in Nova and the one default AZ in Neutron. I will try to confirm that once I get an access to the environment.

On the other hand, is it something you could backport to 2.2 stable series?

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

@Heather,

Will do. There is already one filed by Sandor not so long ago https://bugs.launchpad.net/juju/+bug/1719323

On networks you could have multiple VLAN provider networks for whatever reason. You could also attach instances to provider networks directly.

In this case you would have, say, several network interfaces and several (external) networks configurable for a model.

I will give it some more thought as we also have to consider non-standard SDN use-cases.

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

@nobuto, I've back ported the fix to the 2.2 branch.

https://github.com/juju/juju/pull/7940

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.