OpenStack Compute (nova)

Race condition: compute intermittently corrupts base images on download from glance

Bug #1350766 reported by Michael Steffens on 2014-07-31

This bug report is a duplicate of: Bug #1368815: qemu-img convert intermittently corrupts output images. Edit Remove

260

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	In Progress	High	Tony Breeds
	OpenStack Security Advisory	Won't Fix	Undecided	Unassigned

Bug Description

Under certain conditions, which I happen to meet often on my Icehouse single node setup, uploaded images or snapshots fail to boot. See also https://ask.openstack.org/en/question/42804/icehouse-how-to-boot-a-snapshot-from-a-running-instance/

Reason: When first instantiating a QCOW2 image, it's

(1) downloaded as QCOW2 to /var/lib/nova/instances/_base/IMAGEID.part
(2) converted to RAW format base /var/lib/nova/instances/_base/IMAGEID.converted using qemu-img

The step (1) is performed in nova/image/glance.py, GlanceImageService.download using buffered IO, which does not guarantee the resulting data to be written to disk on file close. Consequently, the source image file may not be written completely when qemu-img sub-process starts reading in step (2). Whether the result is good or bad depends on speed of download, file size, and how quickly qemu-image can digest its input.

Proposed fix: enforce fsync on output File object before returning from download. Patch attached.

Security considerations:

* Due to the race between resources shared between users and tenants (compute node network and filesystem IO) a failure can be triggered across tenants, implying the risk of DoS.

* To make things worse -- with the default setting of not cleaning the image cache -- any corrupted image will remain in cache until replaced with fresh upload using a new image ID. Affected snapshots remain unusable forever, until ex- and re-imported manually under better conditions.

* Base image corruptions here are not detected and cannot be caught. Theoretically (a bit esoteric, quite unlikely, but not impossible), an attacker might modulate resource usage to precisely create an incompletely written image, that boots and runs, but has access control information stripped.

See original description

Tags:

Revision history for this message

Michael Steffens (michael-steffens-b) wrote on 2014-07-31:

Enforce fsync on output File object before returning from download Edit (1.1 KiB, text/plain)

Michael Steffens (michael-steffens-b) on 2014-08-01

summary:	- Race condition: compute intermittanty corrupts base images on download + Race condition: compute intermittently corrupts base images on download from glance
description:	updated

Michael Steffens (michael-steffens-b) on 2014-08-04

tags:	added: compute
tags:	added: libvirt security

Michael Steffens (michael-steffens-b) on 2014-08-04

description:

updated

Michael Steffens (michael-steffens-b) on 2014-08-05

information type:

Public → Public Security

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2014-08-05:

Thanks for the report, the OSSA task is set to incompete pending for additional security detail from nova-coresec.

What is the likeliness to trigger this race in production ?

Changed in ossa:
status:	New → Incomplete

Revision history for this message

Michael Steffens (michael-steffens-b) wrote on 2014-08-06:

That is really tough to guess. I don't know any reason why a production environment would be less susceptible by principle.

During my tests almost all QCOW2 instantiations and launches of snapshots failed, until applying the fsync fix. Notable exceptions:

* The cirros and ubuntu original images instantiated right after OpenStack setup, with no other load on the system at all.
* Windows (due to its size! qemu-img can consume the disk head without catching up, while nova download is far away still writing the disk tail).

Failures of the others were varying, from no boot disk found at all, to failures during boot. Thus, I wouldn't be too surprised if even a certain fraction of running instances in production got clipped unnoticed, but only lost chunks not being read so early, or not at all.

How big that fraction is depends on too many boundary conditions.

What's bothering me most with respect to security is the failure's stickiness. Once a base image is broken on a compute node, it requires careful intervention not get promoted into all subsequent instantiations, also of other users and tenants.

Revision history for this message

Michael Steffens (michael-steffens-b) wrote on 2014-08-06:

To answer you question more precisely: the race is triggered always, also in production. Whether the right runner wins, and whether you notice the damage if not, depends.

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2014-08-06:

Thanks you for the additional details.

There is a clear risk of compute node DoS through base image corruption. This seems to impact up to Havana. Can @nova-coresec have a look at this report please ?

Revision history for this message

Thierry Carrez (ttx) wrote on 2014-08-11:

This feels like a corruption bug, I'm not sure an "attacker" would act differently from a normal user here, so i'm not sure it really qualifies as a vulnerability.

If the user had to do something special to trigger corruption, I would change my mind, but I think most setups do not hit this condition (otherwise this bug would have surfaced earlier) and normal usage triggers the exact same issue as an attack.

I'd definitely a bug though, and it should definitely be fixed. the question is, should we go through the delays of private security bugfixing or fix it asap.

Revision history for this message

Michael Steffens (michael-steffens-b) wrote on 2014-08-12:

A vulnerability exploited by normal user behavior (such as putting load on a system), that can be used to cause corruption across different user instances I'd even be more concerned about, than something that needs special actions. On the other hand, modulating the load in manner that a specific corruption (such as selectively dropping chunks), would require very sophisticated actions, I agree.

Nevertheless, yesterday, after a regular Ubuntu nova-compute update reverted my local fix to defective behavior, I observed a new variant of corruption: A new snapshot booted fine, but then exposed filesystem errors. After redoing the whole exercise using the same image after reapplying the fsync patch, everything was fine.

I wouldn't be surprised if such issues do already surface in production now and then (less frequent than in my environment, though), but are then blamed on guest OS issues instead. Let me illustrate.:

This is how it looks to the end user: Take a snapshot, launch, fails. Launch the same snapshot again, fails the same way. Looks like the snaphost itself is defective, doesn't it? Most suspected: the filesystem has been in inconsistent state when doing the snapshot. So let's do a new snapshot. And indeed that either works, or fails consistently in a different way than the first.

Who wouldn't conclude that it's the guest OS or the way the snapshot is done (nothing OpenStack could do anything about) that is at fault, rather than the image being corrupted after download from glance, and then cached?

Is there anything I can provide to get this ticket out of the incomplete and unassigned state?

Revision history for this message

Jeremy Stanley (fungi) wrote on 2014-08-12:

If you want to propose a patch to nova addressing the issue, you can assign the nova bugtask to yourself, otherwise it will need to wait for nova bug supervisors to triage it and an interested developer to pick it up and start work on a solution.

The vulnerability management team will probably end up removing/invalidating the security advisory task and switching this to a normal public bug, but that really shouldn't affect the nova triage and development work on it.

Revision history for this message

Michael Steffens (michael-steffens-b) wrote on 2014-08-12:

The patch was proposed in the very beginning, see https://bugs.launchpad.net/nova/+bug/1350766/+attachment/4166489/+files/nova-glance.patch

It's copies a snipplet of utils code from swift, in order not to introduce a cross service dependency. The actual fix is then a one-liner: fsync the image before returning from download.

The modification itself is quite self contained. Whether the style is acceptable needs review. But should be low hanging fruit.

melanie witt (melwitt) on 2014-09-04

Changed in nova:
importance:	Undecided → High
status:	New → Triaged

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-08: Fix proposed to nova (master)

#10

Fix proposed to branch: master
Review: https://review.openstack.org/119866

Changed in nova:
assignee:	nobody → Davanum Srinivas (DIMS) (dims-v)
status:	Triaged → In Progress

Revision history for this message

Jeremy Stanley (fungi) wrote on 2014-09-08:

#11

I've marked the OSSA task as "won't fix" to indicate this issue isn't one for which the project vulnerability management team would publish a coordinated security advisory, as the conditions by which it is triggered do not seem to be under direct control of a malicious actor but rather one of volume and statistical happenstance. This does definitely still sound like an annoying bug, however, and one which the Nova developers will hopefully address in a timely manner.

Changed in ossa:
status:	Incomplete → Won't Fix

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-09: Change abandoned on nova (master)

#12

Change abandoned by Davanum Srinivas (dims) (<email address hidden>) on branch: master
Review: https://review.openstack.org/119866
Reason: Looks like there are major issues with this approach. need to find another way