Glance upload image returns before image became available for usage

Bug #1556068 reported by Tatyanka
58
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Alex Schultz

Bug Description

Deploy HA environment with NeutronVXLAN and 2 nodegroups

Steps To Reproduce:
1. Revert snapshot with 2 networks sets for slaves
2. Create cluster (HA) with Neutron VXLAN
3. Add 3 controller nodes from default nodegroup
4. Add 2 compute nodes from custom nodegroup
5. Deploy cluster
6. Run health checks (OSTF)

Actual Result:
Cluster is ready bot ostf fails that there is no Test VM
- Create volume and boot instance from it (failure) TestVM image not found. Please, download http://download.cirros-cloud.net/0.3.1/cirros-0.3.1-x86_64-disk.img image and register it in Glance with name 'TestVM' as 'admin' tenant.
  - Create volume and attach it to instance (failure) TestVM image not found. Please, download http://download.cirros-cloud.net/0.3.1/cirros-0.3.1-x86_64-disk.img image and register it in Glance with name 'TestVM' as 'admin' tenant.

on primary controller:
root@node-1:~# glance image-list

+----+------+
| ID | Name |
+----+------+
+----+------+

in puppet on the same node
http://paste.openstack.org/show/490129/

at the same time manual execution after revert work fine:

http://paste.openstack.org/show/490130/

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Oleksiy Molchanov (omolchanov)
Changed in fuel:
status: New → In Progress
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

I have found nothing meaningful in logs, seems that the error is related to issues with network on that environment. Marking it as Invalid.

Changed in fuel:
status: In Progress → Invalid
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

It reproduced on many system tests

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Changed in fuel:
status: Invalid → Confirmed
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :
Changed in fuel:
status: Confirmed → New
assignee: Oleksiy Molchanov (omolchanov) → nobody
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Oleksiy, even if there was some networking issue, deployment should be in error state then

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

BTW looks like it is reproduced reguraly and not only on this type of deploy

Changed in fuel:
assignee: nobody → Oleksiy Molchanov (omolchanov)
Maciej Relewicz (rlu)
Changed in fuel:
status: New → Confirmed
tags: added: area-library
Revision history for this message
Kairat Kushaev (kkushaev) wrote :
Download full text (4.3 KiB)

I tried to check that manually and I succeed to create the image.
After investigating log files I found place where image is created.
See example here:
http://paste.openstack.org/show/490491/

The first thing I have noticed is that we have 2 consequent similar requests in glance.
Look here:
http://paste.openstack.org/show/490494/
and here:
http://paste.openstack.org/show/490493/

Finally, glance stuck here:
2016-03-11T01:08:53.629562+00:00 debug: 2016-03-11 01:08:53.631 22033 DEBUG swiftclient [req-6240f11d-6c68-43f0-8adb-440139d33c1a 68526c0eed764eb6820b56095fb4b993 1c7cac237889499ba3796cb5376d6ad3 - - -] REQ: curl -i http://10.109.1.6:8080/v1/AUTH_1c7cac237889499ba3796cb5376d6ad3/glance -X PUT -H "Content-Length: 0" -H "X-Auth-Token: gAAAAABW4hql7uZqcztasytRkqgYupgowQng03K0KPB8Eqzymymf5eZyhbmKb2JzvejuXQtF7RRVOKarW0naXF0xUAUwoUJeJZ3Vre0EVN0Q4tpFyMJqb7XB8sas96eAkUuYtwSd-5HWCQ0HRNaOi_sIAj4DWmAVJiflxF7qj2FwdBYXyyTA_Tc" http_log /usr/lib/python2.7/dist-packages/swiftclient/client.py:100
2016-03-11T01:08:53.629630+00:00 debug: 2016-03-11 01:08:53.631 22033 DEBUG swiftclient [req-6240f11d-6c68-43f0-8adb-440139d33c1a 68526c0eed764eb6820b56095fb4b993 1c7cac237889499ba3796cb5376d6ad3 - - -] REQ: curl -i http://10.109.1.6:8080/v1/AUTH_1c7cac237889499ba3796cb5376d6ad3/glance -X PUT -H "Content-Length: 0" -H "X-Auth-Token: gAAAAABW4hql7uZqcztasytRkqgYupgowQng03K0KPB8Eqzymymf5eZyhbmKb2JzvejuXQtF7RRVOKarW0naXF0xUAUwoUJeJZ3Vre0EVN0Q4tpFyMJqb7XB8sas96eAkUuYtwSd-5HWCQ0HRNaOi_sIAj4DWmAVJiflxF7qj2FwdBYXyyTA_Tc" http_log /usr/lib/python2.7/dist-packages/swiftclient/client.py:100
2016-03-11T01:08:53.630006+00:00 debug: 2016-03-11 01:08:53.632 22033 DEBUG swiftclient [req-6240f11d-6c68-43f0-8adb-440139d33c1a 68526c0eed764eb6820b56095fb4b993 1c7cac237889499ba3796cb5376d6ad3 - - -] RESP STATUS: 201 Created http_log /usr/lib/python2.7/dist-packages/swiftclient/client.py:101
2016-03-11T01:08:53.630476+00:00 debug: 2016-03-11 01:08:53.632 22033 DEBUG swiftclient [req-6240f11d-6c68-43f0-8adb-440139d33c1a 68526c0eed764eb6820b56095fb4b993 1c7cac237889499ba3796cb5376d6ad3 - - -] RESP STATUS: 201 Created http_log /usr/lib/python2.7/dist-packages/swiftclient/client.py:101
2016-03-11T01:08:53.631980+00:00 debug: 2016-03-11 01:08:53.633 22033 DEBUG swiftclient [req-6240f11d-6c68-43f0-8adb-440139d33c1a 68526c0eed764eb6820b56095fb4b993 1c7cac237889499ba3796cb5376d6ad3 - - -] RESP HEADERS: [('Date', 'Fri, 11 Mar 2016 01:08:53 GMT'), ('Content-Length', '0'), ('Content-Type', 'text/html; charset=UTF-8'), ('Connection', 'close'), ('X-Trans-Id', 'tx53fa14b3cfae4913b5f70-0056e21aa5')] http_log /usr/lib/python2.7/dist-packages/swiftclient/client.py:102
2016-03-11T01:08:53.632057+00:00 debug: 2016-03-11 01:08:53.633 22033 DEBUG swiftclient [req-6240f11d-6c68-43f0-8adb-440139d33c1a 68526c0eed764eb6820b56095fb4b993 1c7cac237889499ba3796cb5376d6ad3 - - -] RESP HEADERS: [('Date', 'Fri, 11 Mar 2016 01:08:53 GMT'), ('Content-Length', '0'), ('Content-Type', 'text/html; charset=UTF-8'), ('Connection', 'close'), ('X-Trans-Id', 'tx53fa14b3cfae4913b5f70-0056e21aa5')] http_log /usr/lib/python2.7/dist-packages/swiftclient/client.py:102
2016-03-11T01:08:53.632742+00:00 debug: 2016-03-11 01:08:53...

Read more...

Revision history for this message
Aleksandra Fedorova (bookwar) wrote :

Set to Critical as it blocks master bvt

Changed in fuel:
importance: High → Critical
Revision history for this message
Alexey Galkin (agalkin) wrote :

I'm not sure, but probably this is related with https://bugs.launchpad.net/mos/+bug/1555004 issue. In any case, I believe that it is necessary to wait for a patch for bug # 1555004. After that, re-check this bug.

Revision history for this message
Mike Fedosin (mfedosin) wrote :

Hi, seems like our glance-registry-paste.ini is broken: https://paste.mirantis.net/show/1975/

There are 2 flavors with name 'keystone'. First one is correct (lines 6-7), second is wrong (lines 37-38) because it uses unauthenticated-context. Moreover all tail lines after #36 are wrong and shouldn't be there.

So, the fix is to remove lines 37-50 from glance-registry-paste.ini

tags: added: swarm-blocker
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :
Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

After fixes TestVM is in 'saving' status and tests still fail with 'No TestVM image'
nova image-list
+--------------------------------------+--------+--------+--------+
| ID | Name | Status | Server |
+--------------------------------------+--------+--------+--------+
| 04b8ff08-98f8-4f2f-a96f-ec580c1da10f | TestVM | SAVING | |
+--------------------------------------+--------+--------+--------+

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :
Revision history for this message
Bug Checker Bot (esikachev-l) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Changed in fuel:
assignee: Oleksiy Molchanov (omolchanov) → Mike Fedosin (mfedosin)
Changed in fuel:
assignee: Mike Fedosin (mfedosin) → Oleksiy Molchanov (omolchanov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/295828

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Andrey Maximov (maximov) wrote : Re: TestVm is missed on ready cluster

Guys, can you elaborate on this bug ?
My understanding is that
1) upload_cirros.rb successfully uploads image to glance, i.e. glance image-create returns 0
2) but after that when we try to use this image - glance returns 404

if all these statements are correct, why don't we move this bug to glance team ?

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Hi.

1) It seems that the image is in !active state, but listed in image-list (trying to be saved)
2) Nope, glance returns empty list, because image was not saved

The problem is that it is swift issue, and AFAIK we do not have swift experts.

Revision history for this message
Ivan Berezovskiy (iberezovskiy) wrote :

Oleksiy, could you please check swift endpoints? Are they correct?

Revision history for this message
Alexander Noskov (anoskov) wrote :

It looks like this bug also affects CI job 'master.fuel-library.pkgs.ubuntu.neutron_vlan_ha'
We are getting issue in https://ci.fuel-infra.org/job/master.fuel-library.pkgs.ubuntu.neutron_vlan_ha/7798/ periodically after rebase/retrigger

Revision history for this message
Ivan (iremizov) wrote :

There is a lot of people who ask to retrigger their builds because of this bug.
And this brings additional load for CI servers.

Please take more care about this issue.

Revision history for this message
Ivan (iremizov) wrote :

https://review.openstack.org/#/c/290635/ : 3 continuosly failed runs

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Ivan, it already has a patch on review

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/295828
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=d0b612fade8066655cd0767f4a5d957270ebfb30
Submitter: Jenkins
Branch: master

commit d0b612fade8066655cd0767f4a5d957270ebfb30
Author: Oleksiy Molchanov <email address hidden>
Date: Tue Mar 22 15:42:09 2016 +0200

    Add additional status check for TestVM creation

    * Give possibility to check whether the image is in saving/queued state.

    Closes-Bug: 1556068
    Change-Id: I6da5064a5daf3ed76233a6a32ef3606000a41a28

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Provided fix is actually a workaround for glance issue. Reopening the issue and changing priority to high.

summary: - TestVm is missed on ready cluster
+ Glance upload image returns before image became available for usage
Changed in fuel:
assignee: Oleksiy Molchanov (omolchanov) → MOS Glance (mos-glance)
importance: Critical → High
status: Fix Committed → Confirmed
tags: added: area-mos
removed: area-library
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

The issue is that glance upload image returns before image became available for usage

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Raised to critical as it blocks CI

Changed in fuel:
importance: High → Critical
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

It is important to note that the issue reproduced not in 100% of cases, because we have red and green builds with the same ISO.

Revision history for this message
Kairat Kushaev (kkushaev) wrote :

From the latest CI fails I see nothing in Glance and:
2016-03-23T12:03:56.114536+00:00 info: 10.109.12.4 - - [23/Mar/2016:12:03:56 +0000] "PUT /2/465/AUTH_f958e736383a4ba584454239aad52012/glance/f2c6d5e1-9d81-4f35-92a5-bfc224ed9977" 499 - "PUT http://10.109.11.3:8080/v1/AUTH_f958e736383a4ba584454239aad52012/glance/f2c6d5e1-9d81-4f35-92a5-bfc224ed9977" "tx33a3da87196a4d9bba73b-0056f285ef" "proxy-server 27407" 60.2053 "-" 24459 0
499 means client disconnect but it is not clear why it happened.

Revision history for this message
Mike Fedosin (mfedosin) wrote :

Afaik it's a bug in Swift and there is the decision to drop its support in MOS 9.0. Should we investigate the cause of this issue in this case?

Revision history for this message
Kairat Kushaev (kkushaev) wrote :

Swift folks, we need to debug swift to understand why swift generates 499 from time to time.
Glanceclient specifies content-length in v1 but it seems that content-length always the same, so it would be perfect to understand root cause of the error.

Changed in fuel:
assignee: MOS Glance (mos-glance) → Alyona Kiseleva (akiselyova)
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Yes.

Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :
Revision history for this message
Alyona Kiseleva (akiselyova) wrote :

On the last failed test I see a lot of errors in swift logs.

Errors on start:
swift-object-server: UNCAUGHT EXCEPTION .............. IOError: [Errno 2] No such file or directory: '/etc/swift/object.ring.gz'
And later errors on put:
ERROR 500 Trying to PUT ...... OSError: [Errno 13] Permission denied: '/var/lib/glance/node' (txn: tx0b6c2504e67c4eeeb952b-0056f47577)

It means, swift isn't configured correctly and cannot work at all. Reasons are not clear for me just now.

Revision history for this message
Alyona Kiseleva (akiselyova) wrote :

UPD: Errors from previous message exists only in last build. In other cases there are other errors, particularly a lot og "connection refused" But exactly PUT command with image in swift logs mostly fails with 408 error code (timeout). Cannot it be some network problem, because of which image cannot be loaded to swift during existing timeout?
Continue investigating.

PS: I couldn't find any other entries with 499 code.

Revision history for this message
Alex Schultz (alex-schultz) wrote :

It's a permissions thing...

From swift-proxy-server.log...
2016-03-25T18:46:22.820115+00:00 err: ERROR 500 Trying to PUT /AUTH_baf9cfeedba34edab658d944c0a92d7eFrom Container Server 10.109.4.1:6002/1 (txn: tx11b5885e697641f98ee63-0056f5877d) (client_ip: 10.109.2.7)

From swift-container-server.log....
/swift/container/server.py", line 603, in __call__#012 res = method(req)#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line 1412, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/dist-packages/swift/container/server.py", line 549, in REPLICATE#012 ret = self.replicator_rpc.dispatch(post_args, args)#012 File "/usr/lib/python2.7/dist-packages/swift/common/db_replicator.py", line 702, in dispatch#012 mkdirs(os.path.join(self.root, drive, 'tmp'))#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line 1037, in mkdirs#012 os.makedirs(path)#012 File "/usr/lib/python2.7/os.py", line 150, in makedirs#012 makedirs(head, mode)#012 File "/usr/lib/python2.7/os.py", line 150, in makedirs#012 makedirs(head, mode)#012 File "/usr/lib/python2.7/os.py", line 157, in makedirs#012 mkdir(name, mode)#012OSError: [Errno 13] Permission denied: '/var/lib/glance/node'

because...
root@node-1:/var/lib/glance# ls -al
total 4
drwxr-x--- 5 glance glance 48 Mar 25 16:57 .
drwxr-xr-x. 68 root root 4096 Mar 25 18:48 ..
drwxr-xr-x 5 glance glance 64 Mar 25 17:30 image-cache
drwxr-xr-x 2 glance glance 6 Mar 15 19:41 images
drwxr-xr-x 2 glance glance 6 Mar 25 16:47 node

because /var/lib/glance is provided by glance-common which does...
» chown glance:glance -R /var/lib/glance/ /etc/glance/

And because...
2016-03-25 16:47:57 +0000 /Stage[main]/Osnailyfacter::Swift::Storage/File[/var/lib/glance]/group (notice): group changed 'root' to 'swift'

but glance gets installed...
2016-03-25 17:00:32 +0000 /Stage[main]/Glance::Registry/Package[glance-registry]/ensure (notice): ensure changed 'purged' to 'present'
2016-03-25 17:00:54 +0000 /Stage[main]/Glance::Api/Package[glance-api]/ensure (notice): ensure changed 'purged' to 'present'

Swift probably needs to be configured after glance or not share the /var/lib/glance/* folders

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/297833

Changed in fuel:
assignee: Alyona Kiseleva (akiselyova) → Alex Schultz (alex-schultz)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/297833
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=560f319a3062090eb68df5291b6ebfa9336548de
Submitter: Jenkins
Branch: master

commit 560f319a3062090eb68df5291b6ebfa9336548de
Author: Alex Schultz <email address hidden>
Date: Fri Mar 25 13:33:13 2016 -0600

    Run swift after glance is configured

    We use /var/lib/glance/nodes for swift storage as we are backing glance
    with swift. Unfortunately because the glance packages include a
    recursive chmod on /var/lib/glance we need to ensure that we configure
    the permissions for swift after glance is installed.

    Change-Id: I6ba572c75d65f88f0bf3b6eeb289d644f63d41b0
    Closes-Bug: 1556068

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :

@Alex could you please take a look
https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.ha_neutron/60/console

from puppet log:
2016-03-27 22:36:17 ERR (/Stage[main]/Main/Exec[upload_cirros_shell]/returns) change from notrun to 0 failed: /bin/bash "/etc/puppet/shell_manifests/upload_cirros_command.sh" returned 1 instead of one of [0]

from glance-api log
2016-03-28 09:10:56 ERR glance.common.wsgi [req-a2114ffa-4a34-4565-ae18-a6be1583e85f d48a262cd3ac49deb20284097c1b6f91 37a4aa3c38ee47efa707b1ab8c196bd6 - - -] Caught error: Failed to add object to Swift.
Got error from Swift: put_object('glance', 'a0b3a54d-663c-424b-a86f-46bcc4c3098d-00001', ...) failure and no ability to reset contents for reupload..

from swift-object-server
http://paste.openstack.org/show/492026/

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/298292

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Reproduced again on MOS 9.0 #108 (without fix https://review.openstack.org/298292).
Probabbly the latest fix from Alex Schultz will help to fix the issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/298456

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/298292
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=7d075ff3bfb266ae0c6ec0701712d2d0862511de
Submitter: Jenkins
Branch: master

commit 7d075ff3bfb266ae0c6ec0701712d2d0862511de
Author: Alex Schultz <email address hidden>
Date: Mon Mar 28 09:41:27 2016 -0600

    Wait for all cluster services before enabling quorum

    This change adds cross depends on the task that enables quorum to wait
    until all the services have been enabled. We are also adding a
    dependency on the upload_cirros task to ensure all services are up on
    the cluster prior to attempting finish the openstack configuration.

    Change-Id: Ie29d2df050a3e067440ed5e4bd0c84d54df21356
    Related-Bug: #1556068

Changed in fuel:
status: Confirmed → In Progress
description: updated
Revision history for this message
Alex Schultz (alex-schultz) wrote :

The bug has been addressed by https://review.openstack.org/298292, I am continuing to look into cleaning up the upload_cirros script to remove the various work arounds we added while attempting to address this bug. But the main issue has been resolved.

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Hi Alex, could you please describe a separate bug where you can track the status of removing of workarounds?

Thank you!

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Moving cleanup item tracking to Bug 1563981

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Looks like the issue was fixed, moved to Fix Released state.
Please move it to Confirmed state if the issue will be reproduced again.

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Sergey Shevorakov (sshevorakov) wrote :

Timur, where the bug was verified (#ISO)?
Please mind the rules: https://mirantis.jira.com/wiki/display/QA/How+to+verify+a+bug

Changed in fuel:
status: Fix Released → Fix Committed
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Hi Sergey, the build is MOS 9.0 #140.
The issue reproduced not in 100% of cases, this is why I expect that it can be reproduced for some reasons in the future but I hope that the fix is complete.

So, let's keep it in Fix Released state.

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Verified on 9.0-#250

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.