Image corrupts when upload

Bug #1537721 reported by Andrey Shestakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
In Progress
High
Unassigned
Liberty
New
High
Unassigned

Bug Description

After this commit https://github.com/openstack/glance_store/commit/a0572ef672512a8ed7ef203816ec256eafd5f9de
image uploads works incorrect.

Steps to reproduce:

1. Upload image (1.47GB)
md5sum virtual_ubuntu_trasty_ext4_demo
fa9ec35d64d43aefd6356150d361ec24 virtual_ubuntu_trasty_ext4_demo

glance image-create --disk-format raw --container-format bare --file virtual_ubuntu_trasty_ext4_demo --progress --name ubuntu_upload_with_fix
[=============================>] 100%
+------------------+--------------------------------------+
| Property | Value |
+------------------+--------------------------------------+
| checksum | fa9ec35d64d43aefd6356150d361ec24 |
| container_format | bare |
| created_at | 2016-01-25T11:06:15Z |
| disk_format | raw |
| id | 5b955bce-61ab-4c55-afd9-8bd6012cf1ab |
| min_disk | 0 |
| min_ram | 0 |
| name | ubuntu_upload_with_fix |
| owner | e885a6c5e87c45d38a274de4388241e6 |
| protected | False |
| size | 1476395008 |
| status | active |
| tags | [] |
| updated_at | 2016-01-25T11:10:49Z |
| virtual_size | None |
| visibility | private |
+------------------+--------------------------------------+

2. It has 10 obects in Swift, but should has only one (size is less than 5G)
swift list glance
5b955bce-61ab-4c55-afd9-8bd6012cf1ab
5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00001
5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00002
5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00003
5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00004
5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00005
5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00006
5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00007
5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00008
5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00009

Image splitted by 7 chunks of 200MB + 1*47MB + 1 zero length chunk.
swift stat glance 5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00007
       Account: v1
     Container: glance
        Object: 5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00007
  Content Type: binary/octet-stream
Content Length: 204800000
 Last Modified: Mon, 25 Jan 2016 11:10:42 GMT
          ETag: eda9a9889837ac4bc81d6387d92c1bec
 Accept-Ranges: bytes
        Server: Apache
   X-Timestamp: 1453720242.00000
    X-Trans-Id: tx0000000000000000da5cc-0056a6064c-5e81-default
swift stat glance 5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00008
       Account: v1
     Container: glance
        Object: 5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00008
  Content Type: binary/octet-stream
Content Length: 42795008
 Last Modified: Mon, 25 Jan 2016 11:10:49 GMT
          ETag: f3c9b36eceea8d2192996ca931f1fa55
 Accept-Ranges: bytes
        Server: Apache
   X-Timestamp: 1453720249.00000
    X-Trans-Id: tx0000000000000000d8c0e-0056a60655-377d-default
swift stat glance 5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00009
       Account: v1
     Container: glance
        Object: 5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00009
  Content Type: binary/octet-stream
Content Length: 0
 Last Modified: Mon, 25 Jan 2016 11:10:49 GMT
          ETag: d41d8cd98f00b204e9800998ecf8427e
 Accept-Ranges: bytes
        Server: Apache
   X-Timestamp: 1453720249.00000
    X-Trans-Id: tx0000000000000000da39b-0056a60660-85c1-default

3. Download image
glance image-download 5b955bce-61ab-4c55-afd9-8bd6012cf1ab --file ubuntu_download_with_fix --progress
[=============================>] 100%[Errno 32] Corrupt image download. Checksum was 9253e738cabf6a3c0beace1a3b07e623 expected fa9ec35d64d43aefd6356150d361ec24

Image is corrupted, and original and result files has difference in lenght (1754 bytes)
ls -l virtual_ubuntu_trasty_ext4_demo
-rw-r--r-- 1 root root 1476395008 Jan 25 11:04 virtual_ubuntu_trasty_ext4_demo
ls -l ubuntu_download_with_fix
-rw-r--r-- 1 root root 1476393254 Jan 25 11:17 ubuntu_download_with_fix

Changed in glance:
assignee: nobody → Kairat Kushaev (kkushaev)
wangxiyuan (wangxiyuan)
Changed in glance:
status: New → Incomplete
status: Incomplete → Confirmed
Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :
Download full text (6.1 KiB)

> It has 10 obects in Swift, but should has only one (size is less than 5G)

The v2 client doesn't seem to send the size to the server at all:

 PUT /v2/images/36e7def9-f734-4e09-9feb-872e58648a01/file HTTP/1.1.
 Host: 10.0.0.100:9292.
 Accept-Encoding: gzip, deflate.
 Transfer-Encoding: chunked.
 Accept: */*.
 X-Auth-Token: xxx
 Connection: keep-alive.
 User-Agent: python-glanceclient.
 Content-Type: application/octet-stream.

so the server, not having anyway to know the amount of bytes it's going to receive will segment the upload.
(This may be a client bug.)

I haven't been able to reproduce this on devstack (so far):

 $ glance image-create --file /tmp/dd.1476395008 --name bug1537721 --container-format bare --disk-format raw
 +------------------+--------------------------------------+
 | Property | Value |
 +------------------+--------------------------------------+
 | checksum | a9db0ed9b9e467089c5bbc6c5bd1a305 |
 | container_format | bare |
 | created_at | 2016-01-25T12:29:03Z |
 | disk_format | raw |
 | id | 94a15b66-1b78-462b-93a0-dbf3a66d26b2 |
 | min_disk | 0 |
 | min_ram | 0 |
 | name | bug1537721 |
 | owner | a03febe481094927a96fe367c15c347b |
 | protected | False |
 | size | 1476395008 |
 | status | active |
 | tags | [] |
 | updated_at | 2016-01-25T12:29:23Z |
 | virtual_size | None |
 | visibility | private |
 +------------------+--------------------------------------+

 $ glance image-download 94a15b66-1b78-462b-93a0-dbf3a66d26b2 --file /tmp/download.1476395008

$ md5sum /tmp/download.1476395008
 a9db0ed9b9e467089c5bbc6c5bd1a305 /tmp/download.1476395008

 94a15b66-1b78-462b-93a0-dbf3a66d26b2

       Account: AUTH_30514a9dd11f4ed3970e46bfeb0b47ee
     Container: glance
        Object: 94a15b66-1b78-462b-93a0-dbf3a66d26b2
  Content Type: application/octet-stream
 Content Length: 1476395008
 Last Modified: Mon, 25 Jan 2016 12:29:24 GMT
          ETag: "c5c28ea692acc7aa9aa658f991cdc108"
      Manifest: glance/94a15b66-1b78-462b-93a0-dbf3a66d26b2-
 Accept-Ranges: bytes
   X-Timestamp: 1453724963.57836
    X-Trans-Id: tx9e7e24de2cc344c988ebd-0056a6169c

 94a15b66-1b78-462b-93a0-dbf3a66d26b2-00001

       Account: AUTH_30514a9dd11f4ed3970e46bfeb0b47ee
     Container: glance
        Object: 94a15b66-1b78-462b-93a0-dbf3a66d26b2-00001
  Content Type: application/octet-stream
 Content Length: 204800000
 Last Modified: Mon, 25 Jan 2016 12:29:04 GMT
          ETag: eda9a9889837ac4bc81d6387d92c1bec
 Accept-Ranges: bytes
   X-Timestamp: 1453724943.43825
    X-Trans-Id: txd8aa4dc200bc405fabb5d-0056a6169d

 94a15b66-1b78-462b-93a0-dbf3a66d26b2-00002

       Account: AUTH_30514a9dd11f4ed3970e46bfe...

Read more...

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

In the bug report, the presence of this zero length object

5b955bce-61ab-4c55-afd9-8bd6012cf1ab-00009

 suggests that the patch

https://github.com/openstack/glance_store/commit/a0572ef672512a8ed7ef203816ec256eafd5f9de may not have been applied.

Can you re-run your reproducer, with the api server in debug mode (it it isn't already) and search for this log entry:

'Not writing zero-length chunk', eg:

 2016-01-25 12:29:23.567 DEBUG glance_store._drivers.swift.store [req-c763b94f-73c5-4577-a27f-bce3fd
 32943f b2e2adda995f4b34880f956fa85e30b0 a03febe481094927a96fe367c15c347b] Not writing zero-length chunk from (pid=26218) add /usr/local/lib/python2.7/dist-packages/glance_store/_drivers/swift/store.py:555

That will confirm whether or not the referenced patch is present.

Thanks.

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

@wangxiyuan

You marked the bug as 'confirmed'.
Did you manage to reproduce the issue?

Thanks.

Revision history for this message
Kairat Kushaev (kkushaev) wrote :

Yep, I analyzed the bug a bit.
Glance does not send the size to the swift, so by default glance_store splits the file to 200 Mb chunks.
But it seems not linked to the issue root cause: radosgw create doesn't delete 0-length chunk when glance_store generated ZeroSize exception. I tested this with Swift installation and it works well - no zero-length chunk.
I digged into requests library without no success (but still analyzing).
I am wondering if it is RadosGW issue (it doesn't expect ZeroLength exception) or Apache issue.

Changed in glance:
status: Confirmed → New
Revision history for this message
Kairat Kushaev (kkushaev) wrote :

I am also wondering if raising of StopIteration when deleting 0-chunk size would be safer. Need to check - perhaps it will show the root cause.

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

> radosgw create doesn't delete 0-length chunk when glance_store generated ZeroSize exception.

Ok (I've never used radosgw) but it's possible it could have slightly different behaviour here.

With Swift, if you create a zero length PUT, but don't send any data, Swift returns "499" and doesn't create the object:

 T 10.0.0.100:54416 -> 10.0.0.100:8080 [AP]
 PUT /v1/AUTH_30514a9dd11f4ed3970e46bfeb0b47ee/glance/0480d91e-49ee-401b-aae4-2b9911a33e45 HTTP/1.1.
 Host: 10.0.0.100:8080.
 Accept-Encoding: identity.
 content-length: 0.
 x-auth-token: xxx
 x-object-manifest: glance/0480d91e-49ee-401b-aae4-2b9911a33e45-.
 etag: d41d8cd98f00b204e9800998ecf8427e.
 user-agent: python-swiftclient-2.7.0.
 content-type: .
 .

 ####
 T 10.0.0.100:8080 -> 10.0.0.100:54411 [AP]
 HTTP/1.1 499 Client Disconnect.
 Content-Length: 89.
 Content-Type: text/html; charset=UTF-8.
 X-Trans-Id: txf24cf29ae2f14b0d93155-0056a61d90.
 Date: Mon, 25 Jan 2016 13:05:20 GMT.

It should be possible to check the equivalent radosgw behaviour (without Glance).

Revision history for this message
Kairat Kushaev (kkushaev) wrote :

Stuart, thanks for pointing this out! Will be back with reply soon.

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :

It's a little clunky, but here's an alternative way to prevent writing the zero-size object:

http://paste.openstack.org/show/484876

This makes no request at all to Swift (/rados) so should work in both cases.

I'm interested to know what folks think about this approach.

Revision history for this message
Stuart McLaren (stuart-mclaren) wrote :
Revision history for this message
Kairat Kushaev (kkushaev) wrote :

I tested the fix above for Rados and Swift. It solves the trouble. Thanks!

Changed in glance:
assignee: Kairat Kushaev (kkushaev) → nobody
Revision history for this message
Kairat Kushaev (kkushaev) wrote :

I think we also need to create a bug for radosgw developers because they are not fully compatible with swift api.

Revision history for this message
wangxiyuan (wangxiyuan) wrote :

Sorry for delay replay because of the time difference.

I just reproduced the bug yesterday in my env.

Changed in glance:
status: New → In Progress
Changed in glance:
importance: Undecided → High
Revision history for this message
Hugo Kou (tonytkdk) wrote :

This affects Swift user who has Global Cluster enabled with R/W-affinity turn on.
The glance server got 404 from Swift if there's 1/2 objects remain in the local handoff locations and waiting for replicator to replicate to remote region.

Once Glance receive 404 Not Found while deleting the last 0-byte chunk (segment), it returns 500 Error to client. Wheather OpenStack Cli or glance CLI.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/glance_store 0.9.2

This issue was fixed in the openstack/glance_store 0.9.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.