g-s-s uses the wrong IP for the keystone endpoint, leads to juju bootstrap failure

Bug #1661392 reported by Francis Ginther
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Autopilot Log Analyser
Fix Committed
High
Francis Ginther
Glance - Simplestreams Sync Charm
Invalid
Undecided
Unassigned
Landscape Server
Fix Released
High
Francis Ginther
OpenStack Keystone Charm
Fix Released
High
Unassigned
keystone (Juju Charms Collection)
Fix Released
High
Unassigned

Bug Description

This problem has been observed several times during landscape autopilot CI testing. The observed failure is a failed juju bootstrap with the debug messages:

keystone charm: cs:~openstack-charmers-next/xenial/keystone-281
glance-simplestreams-sync charm: cs:xenial/glance-simplestreams-sync-11

[from console logs (copied to https://pastebin.canonical.com/178802/)]
05:40:32 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:04 DEBUG juju.environs.bootstrap bootstrap.go:571 constraints for image metadata lookup &{{{region1 http://10.96.64.28:5000/v2.0} [xenial win2012hvr2 win2012r2 precise quantal raring utopic vivid win2008r2 win2012hv genericlinux zesty win2016 centos7 saucy trusty wily yakkety win8 win81 win10 win2012 win2016hv win2016nano win7] [amd64 i386 armhf arm64 ppc64el s390x] released}}
05:40:32 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:05 DEBUG juju.environs.simplestreams simplestreams.go:454 skipping index "http://10.96.64.31:8080/v1/AUTH_5720f9bbc9d34ef59f1aa378be68463a/simplestreams/data/streams/v1/index.json" because of missing information: index file has no data for cloud {region1 http://10.96.64.28:5000/v2.0} not found
05:40:32 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:05 DEBUG juju.environs.bootstrap bootstrap.go:579 ignoring image metadata in keystone catalog: index file has no data for cloud {region1 http://10.96.64.28:5000/v2.0} not found
05:40:33 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:06 DEBUG juju.environs.bootstrap bootstrap.go:583 found 0 image metadata in default cloud images
05:40:33 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:06 DEBUG juju.environs.simplestreams simplestreams.go:454 skipping index "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson" because of missing information: index file has no data for cloud {region1 http://10.96.64.28:5000/v2.0} not found
05:40:33 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:06 DEBUG juju.environs.bootstrap bootstrap.go:579 ignoring image metadata in default ubuntu cloud images: index file has no data for cloud {region1 http://10.96.64.28:5000/v2.0} not found
05:40:33 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:06 DEBUG juju.environs.bootstrap bootstrap.go:587 found 0 image metadata from all image data sources
05:40:33 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:06 INFO juju.provider.common destroy.go:20 destroying model "controller"
05:40:33 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:06 INFO juju.provider.common destroy.go:31 destroying instances
05:40:37 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:10 DEBUG juju.provider.openstack provider.go:1127 terminating instances []
05:40:37 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:10 DEBUG goose <autogenerated>:22 performing API version discovery for "http://10.96.63.239:9696/"
05:40:37 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:10 WARN goose <autogenerated>:38 discovered API versions: [{Version:{major:2 minor:0} Links:[{Href:http://10.96.63.239:9696/v2.0 Rel:self}] Status:CURRENT}]
05:40:37 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:10 INFO juju.provider.common destroy.go:51 destroying storage
05:40:37 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:10 DEBUG juju.provider.openstack cinder.go:81 volume URL: http://10.96.64.32:8776/v2/8a0556c51d51437ba1c1a3ae36e305fb
05:40:40 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 05:40:13 ERROR cmd supercommand.go:458 failed to bootstrap model: no image metadata found

Of note is the line: "index file has no data for cloud {region1 http://10.96.64.28:5000/v2.0} not found".

The simplesteams data it is parsing is:
[from http://10.96.64.31:8080/v1/AUTH_5720f9bbc9d34ef59f1aa378be68463a/simplestreams/data/streams/v1/index.json]
{
 "format": "index:1.0",
 "index": {
  "auto.sync": {
   "cloudname": "glance-simplestreams-sync-openstack",
   "clouds": [
    {
     "endpoint": "http://10.96.64.57:5000/v2.0",
     "region": "region1"
    }
   ],
   "datatype": "image-ids",
   "format": "products:1.0",
   "path": "streams/v1/auto.sync.json",
   "products": [
    "com.ubuntu.cloud:server:14.04:amd64",
    "com.ubuntu.cloud:server:16.04:amd64"
   ],
   "updated": "Thu, 09 Feb 2017 05:09:38 +0000"
  }
 },
 "updated": "Thu, 09 Feb 2017 05:08:26 +0000"
}

There is a mismatch between the endpoint that juju is looking for, "http://10.96.64.28:5000/v2.0" and the one advertised in the index.json, "http://10.96.64.57:5000/v2.0". "10.96.64.28" is the VIP for keystone, while "10.96.64.57" is the private IP of keystone-0. Somehow, glance-simplestreams-sync was given the wrong IP to use for the endpoint and it populated that into the simplestreams data.

Revision history for this message
Francis Ginther (fginther) wrote :
information type: Proprietary → Public
Changed in autopilot-log-analyser:
assignee: nobody → Francis Ginther (fginther)
status: New → In Progress
Revision history for this message
Francis Ginther (fginther) wrote :

Caught a case of this occurring again, but I think it was too late. The glance-simplestreams-sync has already updated things since the cloud was first deployed. I'm now getting a successful juju bootstrap.

The simplestreams data was first created at 03 Feb 2017 02:29:12 and updated at 03 Feb 2017 06:25:35. The failure occcurred at 03:19:49 with the message:

03:19:49 INFO ui.openstack.test_region_juju_bootstrap.RegionBootstrapTest.test_juju_bootstrap_kvm . 03:19:24 DEBUG juju.environs.simplestreams simplestreams.go:454 skipping index "http://10.96.55.183:80/swift/v1/simplestreams/data/streams/v1/index.json" because of missing information: index file has no data for cloud {region1 http://10.96.55.190:5000/v2.0} not found

http://10.96.55.183:80/swift/v1/simplestreams/data/streams/v1/index.json:
https://pastebin.canonical.com/178158/

http://10.96.55.183/swift/v1/simplestreams/data/streams/v1/auto.sync.json:
https://pastebin.canonical.com/178160/

Will try again.

Changed in autopilot-log-analyser:
status: In Progress → Fix Released
status: Fix Released → Fix Committed
importance: Undecided → High
Revision history for this message
Francis Ginther (fginther) wrote :
description: updated
Revision history for this message
Francis Ginther (fginther) wrote :
Revision history for this message
Francis Ginther (fginther) wrote :
Revision history for this message
Francis Ginther (fginther) wrote :

The glance-simplestreams-config from from https://ci.lscape.net/job/landscape-system-tests/5238

summary: - "no image metadata found" during test_juju_bootstrap_kvm
+ g-s-s uses the wrong IP for the keystone endpoint, leads to juju
+ bootstrap failure
Revision history for this message
David Ames (thedac) wrote :

This appears to be a race where keystone initially hands off its private-address before recognizing it has a VIP.

Glance simple streams is then not re-running hooks to regenerate the endpoint data with the updated VIP once it receives it.

Ideally, keystone would wait to respond on the identity-services relation and/or hand of the VIP immediately. Glance simple streams should should re-create configuration and/or data when identity-service-changed is executed.

Changed in keystone (Juju Charms Collection):
status: New → Triaged
importance: Undecided → High
Chad Smith (chad.smith)
Changed in landscape:
milestone: 17.01 → 17.02
Revision history for this message
David Ames (thedac) wrote :

Spent a little more time on this today. It is a difficult race to solve on the g-s-s side due to the asynchronous nature of the glance-simplestreams-sync.sh run from cron. It can easily race and run with the wrong keystone IP information. This is a long running process that will not learn about changes to the configuration until its subsequent run.

In the future, we may consider changes to keystone (and the other APIs that run HA) to wait until hacluster is fully configured before running client relations. This will take some time.

In the meantime, there are two manual work arounds:
One, in the bundle set run to False until the deployment is complete. Then post-deploy set run to True. This guarantees g-s-s has the correct IP info for keystone before it produces the simplestreams data.
Two, set frequency to 1 hour and wait for at least one hour post deploy to test. The second run will have the correct keystone IP information.

I understand these are less than ideal, but they will get testing unblocked.

Revision history for this message
David Ames (thedac) wrote :

I'll add one more manual workaround which is to run the following post-deploy:

juju run --unit glance-simplestreams-sync/0 /usr/share/glance-simplestreams-sync/glance-simplestreams-sync.sh

Revision history for this message
David Britton (dpb) wrote :

We are going to attempt to use cs:~thedac/keystone-0 and ping back if it helps.

David Britton (dpb)
Changed in landscape:
status: New → Triaged
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-keystone (master)

Reviewed: https://review.openstack.org/433391
Committed: https://git.openstack.org/cgit/openstack/charm-keystone/commit/?id=b14c107dc347c31dd8a643a24ab6a7f98b85a7b2
Submitter: Jenkins
Branch: master

commit b14c107dc347c31dd8a643a24ab6a7f98b85a7b2
Author: David Ames <email address hidden>
Date: Mon Feb 13 17:57:36 2017 -0800

    Do not run client relation until clustered if HA

    Check if VIP or dns-ha is set to determine if the unit expects to be
    in HA. This is less racey that just checking for the ha relation.
    Wait until clustered to run the client relation hooks.

    This fixes bugs where client charms receive the private-address
    rather than the VIP on initial client relations.

    Charmhelper sync.

    Change-Id: I48b15113360ef892e38235ec4518173ec78ad143
    Partial-bug: #1661392

James Page (james-page)
Changed in charm-keystone:
importance: Undecided → High
status: New → Triaged
Changed in keystone (Juju Charms Collection):
status: Triaged → Invalid
David Britton (dpb)
Changed in keystone (Juju Charms Collection):
status: Invalid → Fix Released
Chad Smith (chad.smith)
Changed in landscape:
status: Triaged → Fix Committed
Changed in landscape:
status: Fix Committed → Fix Released
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

I'm marking this invalid against charm-glance-simplestreams-sync as it seems to be resolved in charm-keystone

Changed in glance-simplestreams-sync-charm:
status: New → Incomplete
Changed in charm-keystone:
status: Triaged → Fix Released
Changed in glance-simplestreams-sync-charm:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.