image import copy-image will start multiple importing threads due to race condition
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Glance |
Fix Released
|
Critical
|
Dan Smith |
Bug Description
I'm filing this bug a little prematurely because Abhi and I didn't get a chance to fully discuss it. However, looking at the code and the behavior I'm seeing due to another bug (1884587), I feel rather confident.
Especially in a situation where glance is running on multiple control plane nodes (i.e. any real-world situation), I believe there is a race condition whereby two closely-timed requests to copy an image to a store will result in two copy operations in glance proceeding in parallel. I believe this to be the case due to a common "test-and-set that isn't atomic" error.
In the API layer, glance checks that an import copy-to-store operation isn't already in progress here:
And if that passes, it proceeds to setup the task as a thread here:
which may start running immediately or sometime in the future. Once running, that code updates a property on the image to indicate that the task is running here:
Between those two events, if another API user makes the same request, glance will not realize that a thread is already running to complete the initial task and will start another. In a situation where a user spawns a thousand new instances to a thousand compute nodes in a single operation where the image needs copying first, it's highly plausible to have _many_ duplicate glance operations going, impacting write performance on the rbd cluster at the very least.
As evidence that this can happen, we see an abnormally extended race window because of the aforementioned bug (1884587) where we fail to update the property that indicates the task is running. In a test we see a large number of them get started, followed by a cascade of failures when they fail to update that image property, implying that many such threads are running. If this situation is allowed to happen when the property does *not* fail to update, I believe we would end up with glance copying the image to the destination in multiple threads simultaneously. That is much harder to simulate in practice in a development environment, but the other bug makes it happen every time since we never update the image property to prevent it and thus the window is long.
Abhi also brought up the case where if this race occurs on the same node, the second attempt *may* actually start copying the partial image in the staging directory to the destination, finish early, and then mark the image as "copied to $store" such that nova will attempt to use the partial image immediately, resulting in a corrupted disk and various levels of failure after that. Note that it's not clear if that's really possible or not, but I'm putting it here so the glance gurus can validate.
The use of the os_glance_
Changed in glance: | |
assignee: | nobody → Abhishek Kekane (abhishek-kekane) |
importance: | Undecided → Critical |
Changed in glance: | |
status: | New → In Progress |
summary: |
- image import copy-to-store will start multiple importing threads due to + image import copy-image will start multiple importing threads due to race condition |
Changed in glance: | |
assignee: | Abhishek Kekane (abhishek-kekane) → Dan Smith (danms) |
There is definitely race condition problem here, ran two different scenario and below is the observation;
Note; Image size 3 GB slow:file, fast:file via-import --container-format ami --disk-format ami --name copy-scenario-1 --file <image-file> --store fast id-from- step-1> | grep status id-from- step-1> --import-method copy-image --stores slow importing_ to_stores' will show 'slow' id-from- step-1> | grep os_glance_ importing_ to_stores 4829-4e5d- 9016-71a39fbcef 5e) id-from- step-1> --import-method copy-image --stores ceph importing_ to_stores' will overwrite 'slow' with 'ceph'
Available stores; ceph:rbd,
Scenario 1: Copy same image in two different stores using two different commands (may be 2 different users running this operation)
Steps to reproduce:
1. Create image in store fast
$ glance image-create-
2. Ensure that image is active
$ glance image-show <image-
3. Copy image in store slow
$ glance image-import <image-
This will send immidiate 20X response to user and 'os_glance_
$ glance image-show <image-
4. Copy image in ceph (Task 4d6e443e-
$ glance image-import <image-
This will send immidiate 20X response to user and now 'os_glance_
Observations with the help of g-api logs https:/ /etherpad. opendev. org/p/glance- copy-to- store-race- scenario- logs
Step 3 Task Id is eefbd9c8- be47-4ba5- b0e2-9b44407d32 34 4829-4e5d- 9016-71a39fbcef 5e
Step 4 Task Id is 4d6e443e-
1st copy operation (copy image in slow store) will start copying file into staging area. importing_ to_stores' it will log below debug message; importing_ to_stores.
2nd copy operation (copy image in ceph store) will exit this step as image is already present in staging area.
2nd operation will start importing image in ceph (rbd) backend
For 1st operation, After importing image in slow store while removing it from 'os_glance_
Store slow not found in property os_glance_
At this moment image data is imported completely in slow store and it will delete the staging data from staging store (line #412), but location metadata is not updated yet.
Moment after Import task of 2nd operation of copying image-data to ceph store will also complete (line #899) and this operation will fail while deleting image data from staging area (line #920)
This will trigger revert task for 2nd operation and image-data will be deleted from ceph store only.
Image data will remain in 'slow' store but it doesn't show in image locations anymore.
Final Output:
Original image created in step 1, remains active and availble in store 'fast'
Staging area is clean
Data remians as orphan in 'slow' store (1st copy operation)
Note; Image size 1.5 GB slow:file, fast:file, common: file,cheap: file,reliable: file via-import --container-format ami --disk-format ami --name copy-scenario-1 --file <image-file> --all-stores True --allow-failure True
Available stores; ceph:rbd,
Scenario 2: User 1 imports image in all stores with allow-failure as True and user 2 tries to copy that image in other store
Steps to reproduce:
1. Create image in all stores with allow-failure as True
$ glance image-create-
As allow-failure is True, image status will be set to active as soon as it is imported to one of the store, say ceph
At this moment, ...