Incomplete simplestreams metadata and failed juju bootstrap

Bug #1584938 reported by Francis Ginther on 2016-05-23
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance - Simplestreams Sync Charm
Undecided
Данило Шеган
Landscape Server
Undecided
Данило Шеган
16.06
Undecided
Данило Шеган
simplestreams
Undecided
Данило Шеган
simplestreams (Ubuntu)
Medium
Unassigned
Xenial
Medium
Unassigned
Bionic
Undecided
Unassigned
Disco
Undecided
Unassigned

Bug Description

[Impact]

This was seen on a Landscape Autopilot deployment (using ceph/ceph) after the fix for lp:1583276 was committed:

root@juju-machine-0-lxc-1:/var/lib/landscape/juju-homes/1# juju ssh glance-simplestreams-sync/0 apt-cache policy python-simplestreams
Warning: Permanently added 'node-11.vmwarestack,10.245.202.81' (ECDSA) to the list of known hosts.
Warning: Permanently added '10.245.201.73' (ECDSA) to the list of known hosts.
python-simplestreams:
  Installed: 0.1.0~bzr434~trunk-0ubuntu1~ubuntu14.04.1
  Candidate: 0.1.0~bzr434~trunk-0ubuntu1~ubuntu14.04.1

The glance-simplestreams-sync.log [1] shows that the xenial image was not completely updated and got stuck in the queued state:

WARNING * 05-23 16:29:08 [PID:18247] * sstreams * Ignoring inactive image 37d8dfe7-f829-48e0-a66f-aff0cbe0a201 with status 'queued'

The log then shows that another xenial image was downloaded and ultimately the sync process switches to the daily cron.

[Test Case]

The problem occurs with the user tries to juju bootstrap with trusty. It appears that the trusty simplestreams metadata is incomplete [2,3] and leads to a failed bootstrap [4]. Creating a trusty instance via horizon (no juju involved) works fine and a bootstrap with xenial works also.

[Regression Potention]

This part was added specifically for the Xenial backport, including:

- 436-glance-fix-race-conditions.patch (LP: #1584938)

And chances of regression are small based on the MR feedback from SEG and this particular bug already stating the issue was fixed.

[Other Info]

Attached are a collection of logs including all the pastebins linked here.

[1] - https://pastebin.canonical.com/157049/
[2] - https://pastebin.canonical.com/157058/
[3] - https://pastebin.canonical.com/157059/
[4] - https://pastebin.canonical.com/157060/

Related branches

Francis Ginther (fginther) wrote :
tags: added: landscape
Francis Ginther (fginther) wrote :

Here's another pastebin showing the image status after the OSA deployment was finished:
https://pastebin.canonical.com/157048/

fginther@sandman:~/work/16.05/vmstack$ glance image-list
+--------------------------------------+-----------------------------------------------------------------+
| ID | Name |
+--------------------------------------+-----------------------------------------------------------------+
| d12bef49-b0db-4c4a-8506-5033ce99566d | auto-sync/ubuntu-trusty-14.04-amd64-server-20160516-disk1.img |
| 37d8dfe7-f829-48e0-a66f-aff0cbe0a201 | auto-sync/ubuntu-xenial-16.04-amd64-server-20160516.1-disk1.img |
| d24eb4f5-05b1-4e3a-93b6-52f72b255eb0 | auto-sync/ubuntu-xenial-16.04-amd64-server-20160516.1-disk1.img |
+--------------------------------------+-----------------------------------------------------------------+
fginther@sandman:~/work/16.05/vmstack$ nova image-list
+--------------------------------------+-----------------------------------------------------------------+--------+--------+
| ID | Name | Status | Server |
+--------------------------------------+-----------------------------------------------------------------+--------+--------+
| d12bef49-b0db-4c4a-8506-5033ce99566d | auto-sync/ubuntu-trusty-14.04-amd64-server-20160516-disk1.img | ACTIVE | |
| 37d8dfe7-f829-48e0-a66f-aff0cbe0a201 | auto-sync/ubuntu-xenial-16.04-amd64-server-20160516.1-disk1.img | SAVING | |
| d24eb4f5-05b1-4e3a-93b6-52f72b255eb0 | auto-sync/ubuntu-xenial-16.04-amd64-server-20160516.1-disk1.img | ACTIVE | |
+--------------------------------------+-----------------------------------------------------------------+--------+--------+

Francis Ginther (fginther) wrote :

Deleting all of the glance images and re-running the sync command fixed the simplestream data and juju could bootstrap trusty again.

Francis Ginther (fginther) wrote :

Glance logs:

/var/log/glance/api.log: https://pastebin.canonical.com/157099/
/var/log/juju/unit-glance-0.log: https://pastebin.canonical.com/157101/

Glance shows an authentication error at the same time as the failed image upload.

Changed in glance-simplestreams-sync-charm:
assignee: nobody → Данило Шеган (danilo)
Francis Ginther (fginther) wrote :

Logs from glance unit.

The "queued" image state does not seem to be a problem: it behaves exactly the same as "saving" state, and is thus ignored by both simplestreams metadata generation and juju.

However, for some reason trusty metadata is not complete. glance-api is last restarted at 16:26:33, at which point glance-simplestreams-sync is not doing anything:

INFO * 05-23 16:26:02 [PID:17231] * root * Configuration value missing in...
INFO * 05-23 16:27:02 [PID:17703] * root * glance-simplestreams-sync started.

This is the first run that attempts to do anything meaningful since by that point everything seems to be properly set up.

There are these annoying error messages from swiftclient which double the port in the log message:

ERROR * 05-23 16:27:39 [PID:17703] * swiftclient * Object GET failed: http://10.245.201.75:80:80/swift/v1/simplestreams/data/streams/v1/auto.sync.json 404 Not Found

But we've always had them (they come from calls to swiftclient from simplestreams or maybe even glanceclient), and they never caused any trouble, but they might hide an actually useful error message (there are also Object HEAD failed) since this is the file that is not complete upon completion of the sync.

I am not entirely sure what might be the cause of this particular problem, and it's going to be hard to chase it down without a simple way to reproduce it.

FWIW, keystone logs image-stream endpoint creation and authorization at 16:27:46-49: https://pastebin.canonical.com/157115/

This might explain the unauthenticated error: "image-stream" user's password is updated at this point.

Changed in landscape:
assignee: nobody → Данило Шеган (danilo)
Changed in simplestreams:
status: New → In Progress
assignee: nobody → Данило Шеган (danilo)
Changed in glance-simplestreams-sync-charm:
status: New → Invalid
Changed in landscape:
status: New → In Progress
tags: added: kanban
tags: removed: kanban

After discussing with Scott Moser, the problem seems to be that most of the metadata is not stored in glance, so if uploading a second image fails (for whatever reason), the metadata for the first one will be lost. Suggested solution is to store all the metadata in glance anyway (and we'll want to avoid raciness so will have to do that either before or at the same time as the image upload), and then always use that to regenerate the index file.

Francis Ginther (fginther) wrote :

Hit this again with a ceph/ceph deployment on scapestack. This also showed the authorization problem:

[From /var/log/glance-simplestreams-sync.log]
DEBUG * 05-26 21:37:37 [PID:18215] * glanceclient.common.http *
HTTP/1.1 401 Unauthorized
date: Thu, 26 May 2016 21:37:37 GMT
content-length: 46
content-type: text/plain; charset=UTF-8
x-openstack-request-id: req-db3baa69-8643-4460-8ded-d5c8fe3d8327
www-authenticate: Keystone uri='http://10.96.8.159:5000'

401 Unauthorized

Authentication required

This authorization error appears to be connected to a re-generation of the image-stream creds:
[From /var/log/juju/unit-keystone-0.log]
2016-05-26 21:37:17 DEBUG juju-log identity-service:64: Service entry for 'image-stream' already exists.
2016-05-26 21:37:17 INFO juju-log identity-service:64: Endpoint template already exists for 'image-stream' in 'region1'
2016-05-26 21:37:18 DEBUG juju-log identity-service:64: Creating service credentials for 'image-stream'
2016-05-26 21:37:18 DEBUG juju-log identity-service:64: User 'image-stream' already exists - updating password
2016-05-26 21:37:18 INFO juju-log identity-service:64: Updating password for user 'image-stream'
2016-05-26 21:37:19 INFO juju-log identity-service:64: Successfully updated password for user 'image-stream'

The merge proposal for this bug fix is now up: it both re-generates the simplestreams index after every image is uploaded, but also stores the metadata in Glance for re-use on the next run even if the simplestreams index is incomplete.

The Landscape tasks here consist of uploading a fixed version of simplestreams into our PPA for glance-simplestreams-sync.

Scott Moser (smoser) on 2016-06-14
Changed in simplestreams (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Changed in simplestreams:
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package simplestreams - 0.1.0~bzr436-0ubuntu1

---------------
simplestreams (0.1.0~bzr436-0ubuntu1) yakkety; urgency=medium

  * New upstream snapshot.
    - GlanceMirror: fix a couple of race-related problems (LP: #1584938)
    - GlanceMirror: refactor insert_item for easier testing
    - GlanceMirror: ignore inactive images (LP: #1583276)

 -- Scott Moser <email address hidden> Tue, 14 Jun 2016 16:23:26 -0400

Changed in simplestreams (Ubuntu):
status: Confirmed → Fix Released
Changed in landscape:
status: In Progress → Fix Committed

This bug is believed to be fixed in simplestreams in version 0.1.0. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in simplestreams:
status: Fix Committed → Fix Released
Changed in simplestreams (Ubuntu Disco):
status: New → Fix Released
Changed in simplestreams (Ubuntu Bionic):
status: New → Fix Released
Changed in simplestreams (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Medium
Changed in simplestreams (Ubuntu):
importance: High → Medium
Changed in simplestreams (Ubuntu Xenial):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
description: updated

An upload of simplestreams to xenial-proposed has been rejected from the upload queue for the following reason: "See https://bugs.launchpad.net/charms/+source/glance-simplestreams-sync/+bug/1611987/comments/21".

Changed in simplestreams (Ubuntu Xenial):
status: In Progress → Won't Fix
Changed in simplestreams (Ubuntu Bionic):
status: Fix Released → In Progress
Changed in simplestreams (Ubuntu Xenial):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in simplestreams (Ubuntu Bionic):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)

TL;DR version:

We are going to fix Bionic simplestreams package and provide this Bionic simplestreams package in Ubuntu Cloud Archive for Xenial. With that, we are fixing Xenial simplestreams behavior (to work with keystone v3) only if end-user enables Ubuntu Cloud Archive.

Conversation about this topic:

https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/simplestreams/+git/simplestreams/+merge/373030/comments/980111

Changed in simplestreams (Ubuntu Bionic):
status: In Progress → Fix Released
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers