Clean steps don't actually run (CVE-2015-7514)

Bug #1517277 reported by Jim Rollenhagen
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Critical
aeva black
Liberty
Fix Released
Critical
Jim Rollenhagen
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned

Bug Description

In playing with the latest code, it seems that ironic just drops all clean steps it's getting from the agent.

Since devstack drops the priority on erase_devices so it doesn't run, I added a clean step to the agent, and ran through cleaning.

stack@jim-devstack:~/ironic-python-agent$ git diff
diff --git a/ironic_python_agent/hardware.py b/ironic_python_agent/hardware.py
index d755f8c..7590959 100644
--- a/ironic_python_agent/hardware.py
+++ b/ironic_python_agent/hardware.py
@@ -230,6 +230,9 @@ class HardwareManager(object):
         """
         raise errors.IncompatibleHardwareMethodError

+ def test_devstack(self, node, ports):
+ return 'hi devstack!'
+
     def erase_devices(self, node, ports):
         """Erase any device that holds user data.

@@ -305,6 +308,13 @@ class HardwareManager(object):
                 'interface': 'deploy',
                 'reboot_requested': False,
                 'abortable': True
+ },
+ {
+ 'step': 'test_devstack',
+ 'priority': 20,
+ 'interface': 'deploy',
+ 'reboot_requested': False,
+ 'abortable': True
             }
         ]

This results in the following logs, indicating that Ironic never attempted to run the new clean step:

2015-11-18 01:19:09.669 DEBUG oslo_concurrency.lockutils [req-9b9df105-4fdb-4461-8d07-7298f48ac261 None None] Lock "conductor_w
orker_spawn" released by "ironic.conductor.manager._spawn_worker" :: held 0.000s from (pid=27761) inner /usr/local/lib/python2.
7/dist-packages/oslo_concurrency/lockutils.py:265
2015-11-18 01:19:09.670 DEBUG ironic.drivers.modules.agent_base_vendor [-] Heartbeat from 795a5722-22ad-4537-b17c-fcea5f7dc125,
 last heartbeat at None. from (pid=27761) heartbeat /opt/stack/ironic/ironic/drivers/modules/agent_base_vendor.py:316
2015-11-18 01:19:09.681 DEBUG ironic.drivers.modules.agent_base_vendor [-] Node 795a5722-22ad-4537-b17c-fcea5f7dc125 just boote
d to start cleaning. from (pid=27761) heartbeat /opt/stack/ironic/ironic/drivers/modules/agent_base_vendor.py:356
2015-11-18 01:19:09.684 DEBUG ironic.drivers.modules.agent_client [-] Executing agent command clean.get_clean_steps for node 79
5a5722-22ad-4537-b17c-fcea5f7dc125 from (pid=27761) _command /opt/stack/ironic/ironic/drivers/modules/agent_client.py:69
2015-11-18 01:19:09.945 DEBUG ironic.drivers.modules.agent_client [-] Agent command clean.get_clean_steps for node 795a5722-22a
d-4537-b17c-fcea5f7dc125 returned result {u'clean_steps': {u'GenericHardwareManager': [{u'priority': 10, u'interface': u'deploy
', u'step': u'erase_devices', u'abortable': True, u'reboot_requested': False}, {u'priority': 20, u'interface': u'deploy', u'ste
p': u'test_devstack', u'abortable': True, u'reboot_requested': False}]}, u'hardware_manager_version': {u'generic_hardware_manag
er': u'1.0'}}, error None, HTTP status code 200 from (pid=27761) _command /opt/stack/ironic/ironic/drivers/modules/agent_client
.py:93
2015-11-18 01:19:09.965 DEBUG ironic.drivers.modules.agent_base_vendor [-] Sending RPC to conductor to resume cleaning for node
 795a5722-22ad-4537-b17c-fcea5f7dc125 from (pid=27761) notify_conductor_resume_clean /opt/stack/ironic/ironic/drivers/modules/a
gent_base_vendor.py:211
2015-11-18 01:19:09.973 DEBUG ironic.conductor.task_manager [-] Successfully released exclusive lock for calling vendor passthr
u on node 795a5722-22ad-4537-b17c-fcea5f7dc125 (lock was held 0.31 sec) from (pid=27761) release_resources /opt/stack/ironic/ir
onic/conductor/task_manager.py:311
2015-11-18 01:19:09.976 DEBUG ironic.conductor.manager [req-9b9df105-4fdb-4461-8d07-7298f48ac261 None None] RPC continue_node_c
lean called for node 795a5722-22ad-4537-b17c-fcea5f7dc125. from (pid=27761) continue_node_clean /opt/stack/ironic/ironic/conduc
tor/manager.py:885
2015-11-18 01:19:09.978 DEBUG ironic.conductor.task_manager [req-9b9df105-4fdb-4461-8d07-7298f48ac261 None None] Attempting to
get exclusive lock on node 795a5722-22ad-4537-b17c-fcea5f7dc125 (for node cleaning) from (pid=27761) __init__ /opt/stack/ironic
/ironic/conductor/task_manager.py:201
2015-11-18 01:19:09.989 DEBUG ironic.conductor.task_manager [req-9b9df105-4fdb-4461-8d07-7298f48ac261 None None] Node 795a5722-
22ad-4537-b17c-fcea5f7dc125 successfully reserved for node cleaning (took 0.01 seconds) from (pid=27761) reserve_node /opt/stac
k/ironic/ironic/conductor/task_manager.py:239
2015-11-18 01:19:09.992 DEBUG ironic.common.states [req-9b9df105-4fdb-4461-8d07-7298f48ac261 None None] Exiting old state 'clea
n wait' in response to event 'resume' from (pid=27761) on_exit /opt/stack/ironic/ironic/common/states.py:199
2015-11-18 01:19:09.992 DEBUG ironic.common.states [req-9b9df105-4fdb-4461-8d07-7298f48ac261 None None] Entering new state 'cle
aning' in response to event 'resume' from (pid=27761) on_enter /opt/stack/ironic/ironic/common/states.py:205
2015-11-18 01:19:09.999 DEBUG oslo_concurrency.lockutils [req-9b9df105-4fdb-4461-8d07-7298f48ac261 None None] Lock "conductor_w
orker_spawn" acquired by "ironic.conductor.manager._spawn_worker" :: waited 0.000s from (pid=27761) inner /usr/local/lib/python
2.7/dist-packages/oslo_concurrency/lockutils.py:253
2015-11-18 01:19:10.001 DEBUG oslo_concurrency.lockutils [req-9b9df105-4fdb-4461-8d07-7298f48ac261 None None] Lock "conductor_w
orker_spawn" released by "ironic.conductor.manager._spawn_worker" :: held 0.001s from (pid=27761) inner /usr/local/lib/python2.
7/dist-packages/oslo_concurrency/lockutils.py:265
2015-11-18 01:19:10.001 INFO ironic.conductor.manager [-] Executing cleaning on node 795a5722-22ad-4537-b17c-fcea5f7dc125, rema
ining steps: []

Tracking this down further, it appears that at first boot, this conditional is true: https://github.com/openstack/ironic/blob/7ec5a06c3ca55c97a84c3659f7d206be998877b6/ironic/conductor/manager.py#L848-L849
Causing that method to return an empty list of clean steps.

This was released in 4.2.0 so will also need a backport.

CVE References

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Need to figure out how to validate this in the gate so it doesn't happen again. Add the same dummy clean step and check logs to see if it ran?

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Suggested fix, need to test in more detail and add unit tests:

stack@jim-devstack:~/ironic$ git diff
diff --git a/ironic/conductor/manager.py b/ironic/conductor/manager.py
index 4e639c9..76b1065 100644
--- a/ironic/conductor/manager.py
+++ b/ironic/conductor/manager.py
@@ -845,10 +845,11 @@ class ConductorManager(periodic_task.PeriodicTasks):

         """
         node = task.node
+ next_steps = node.driver_internal_info.get('clean_steps', [])
         if not node.clean_step:
- return []
+ # first time through, return all steps
+ return next_steps

- next_steps = node.driver_internal_info.get('clean_steps', [])
         try:
             # Trim off the last clean step (now finished) and
             # all previous steps

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Patch with unit tests included

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :
Download full text (12.0 KiB)

The patch also applies cleanly to stable/liberty. Logs with the patch applied:

2015-11-18 03:54:36.294 INFO ironic.conductor.manager [-] Executing cleaning on node 795a5722-22ad-4537-b17c-fcea5f7dc125, remaining steps: [{u'priority': 20, u'interface': u'deploy', u'step': u'test_devstack', u'abortable': True, u'reboot_requested': False}, {u'priority': 10, u'interface': u'deploy', u'step': u'erase_devices', u'abortable': True, u'reboot_requested': False}]
2015-11-18 03:54:36.300 INFO ironic.conductor.manager [-] Executing {u'priority': 20, u'interface': u'deploy', u'step': u'test_devstack', u'abortable': True, u'reboot_requested': False} on node 795a5722-22ad-4537-b17c-fcea5f7dc125
2015-11-18 03:54:36.304 DEBUG ironic.drivers.modules.agent_client [-] Executing agent command clean.execute_clean_step for node 795a5722-22ad-4537-b17c-fcea5f7dc125 from (pid=17795) _command /opt/stack/ironic/ironic/drivers/modules/agent_client.py:69
2015-11-18 03:54:36.447 DEBUG ironic.drivers.modules.agent_client [-] Agent command clean.execute_clean_step for node 795a5722-22ad-4537-b17c-fcea5f7dc125 returned result None, error None, HTTP status code 200 from (pid=17795) _command /opt/stack/ironic/ironic/drivers/modules/agent_client.py:93
2015-11-18 03:54:36.447 INFO ironic.conductor.manager [-] Clean step {u'priority': 20, u'interface': u'deploy', u'step': u'test_devstack', u'abortable': True, u'reboot_requested': False} on node 795a5722-22ad-4537-b17c-fcea5f7dc125 being executed asynchronously, waiting for driver.
2015-11-18 03:54:36.448 DEBUG ironic.common.states [-] Exiting old state 'cleaning' in response to event 'wait' from (pid=17795) on_exit /opt/stack/ironic/ironic/common/states.py:199
2015-11-18 03:54:36.448 DEBUG ironic.common.states [-] Entering new state 'clean wait' in response to event 'wait' from (pid=17795) on_enter /opt/stack/ironic/ironic/common/states.py:205
2015-11-18 03:54:36.460 DEBUG ironic.conductor.task_manager [-] Successfully released exclusive lock for node cleaning on node 795a5722-22ad-4537-b17c-fcea5f7dc125 (lock was held 0.18 sec) from (pid=17795) release_resources /opt/stack/ironic/ironic/conductor/task_manager.py:311
2015-11-18 03:54:36.501 DEBUG oslo_messaging._drivers.amqpdriver [-] received message msg_id: d14e31a90b6d404da8ad5a066f0f1861 reply to reply_b48c2415999f4ac1845b4f36e08cb180 from (pid=17795) __call__ /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:193
2015-11-18 03:54:36.502 DEBUG ironic.conductor.manager [req-51e72e4c-69a7-4225-b21e-d8db3e85b65a None None] RPC vendor_passthru called for node 795a5722-22ad-4537-b17c-fcea5f7dc125. from (pid=17795) vendor_passthru /opt/stack/ironic/ironic/conductor/manager.py:491
2015-11-18 03:54:36.503 DEBUG ironic.conductor.task_manager [req-51e72e4c-69a7-4225-b21e-d8db3e85b65a None None] Attempting to get exclusive lock on node 795a5722-22ad-4537-b17c-fcea5f7dc125 (for calling vendor passthru) from (pid=17795) __init__ /opt/stack/ironic/ironic/conductor/task_manager.py:201
2015-11-18 03:54:36.512 DEBUG ironic.conductor.task_manager [req-51e72e4c-69a7-4225-b21e-d8db3e85b65a None None] Node 795a5722-22ad-4537-b17c-fcea5f7dc125 successfully reserved for c...

Changed in ironic:
importance: Undecided → Critical
status: New → Confirmed
no longer affects: ironic/trunk
Revision history for this message
Grant Murphy (gmurphy) wrote :

Did you mean to mark this as a security issue?

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

@Grant very much yes. The issue here is that Ironic is expected to "clean" a server when a tenant is done, however that is transparently not happening.

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Sorry, submitted too early. This means a previous tenant's data could be left behind on the disk.

Revision history for this message
Grant Murphy (gmurphy) wrote :

Thanks for clarifying! Was just trying to figure out if an OSSA was needed here.

Ironic doesn't seem to currently have the vulnerability:managed tag in openstack/governance.
(See - http://governance.openstack.org/reference/tags/vulnerability_managed.html)

We will most likely need to look into that if a security advisory is required here. I'll add a OSSA task to the bug in assist the VMT with tracking.

Changed in ossa:
status: New → Incomplete
Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Hrm, we've had help from the VMT in the past (before that tag was a thing), not sure why we didn't get that tag. I'll look into that at some point. Thanks Grant!

Revision history for this message
aeva black (tenbrae) wrote :

I have confirmed the issue as Jim describes in my bifrost-based lab. Even with cleaning enabled, Ironic does not erase the servers' disks during instance deletion. If, after instance deletion, I manually turn that server on, I can confirm that it still has an OS and all user data is present.

I have also confirmed the patch corrects this behaviour. After applying the patch on top of trunk and deleting an instance, Ironic logs the clean_steps appropriately and the server no longer has any data or operating system on its disks.

Revision history for this message
aeva black (tenbrae) wrote :

confirmed the fix also works on stable/liberty

Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

Hi,

I can confirm the issue as well. I've enabled cleaning and the clean step was just skipped.

I also can confirm that if I apply the patch and only enable the "erase_devices" clean step this works as expected.

However, I wanted to test some different scenarios with in-band and out-of-band cleaning steps and cleaning seems to be stuck when there's more than one clean step enabled after I applied the patch. I've created a very simple clean step that all it does is to create a file under "/tmp" in the RAID interface for the agent_ssh driver (that interface is enabled by default for that driver). So when I run, I can see cleaning being executed and then once it moves to the new step Ironic executes it (I can see the file under /tmp) but Ironic never finishes the clean process, here's the current output of node-show and the file created [1].

And here's the diff of the @Jim's patch applied + the new in-band clean step [2]

[1] http://fpaste.org/292697/48015718/ (password: ironic123)

[2] http://fpaste.org/292695/01541214/ (password: ironic123)

@Deva, @Jim, can you guys confirm if the patch works when there's more than one clean step enabled?

Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

More info... Turns out it was my mistake. The fact that I was returning states.CLEANWAIT in that out-of-band clean step was what was causing it to hang.

So, I ran it again removing that return and everything went fine.

Thanks @Jim for the fix.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Does this bug also reproduces before 4.2.0 ?
Also, can you attach properly formated patch (see https://security.openstack.org/#how-to-propose-and-review-a-security-patch ) ?

While the Ironic vmt support tag is being worked on, here is an impact description draft that could be use to request a CVE. Please make sure it is accurate:

Title: Ironic does not honor clean steps
Reporter: Jim Rollenhagen (Rackspace)
Products: Ironic
Affects: >= 4.2.0, <= 4.2.1

Description:
Jim Rollenhagen from Rackspace reported a vulnerability in Ironic. To prevent user data leak, Ironic is expected to "clean" a server after use, however that is transparently not happening. Previous tenant's data may be left behind on the disk and may be available to new users. All Ironic setup are affected.

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Uploaded a new version of the patch, rebased on master, with proper patch formatting. Also added reno-style release note.

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Thanks Tristan, uploaded a new patch. That description looks good to me, with the exception that "setup" should be "setups".

We'd like to get this out ASAP - would y'all have the bandwidth to push through the OSSA/CVE process while the VMT support tag is being worked on? Or should I do some of that work?

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

One more thing - Brad Morgan (also from Rackspace) originally reported this bug, he should get the credit for it. :)

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Thank for the impact description review, I've requested a CVE with it.

Can you please also attach the backport to stable/liberty branch ? (the last patch doesn't apply with "git am")

Since Ironic doesn't have the tag yet, I would prefer if you did the disclosure. (this step: https://security.openstack.org/vmt-process.html#embargoed-disclosure )
You can reuse the following template: https://security.openstack.org/vmt-process.html#downstream-stakeholders-notification-email-private-issues

So the next step is to get someone to approve the proposed patch.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Updated impact description:

Title: Ironic does not honor clean steps
Reporter: Brad Morgan (Rackspace)
Products: Ironic
Affects: >= 4.2.0, <= 4.2.1

Description:
Grad Morgan from Rackspace reported a vulnerability in Ironic. To prevent user data leak, Ironic is expected to "clean" a server after use, however that is transparently not happening. Previous tenant's data may be left behind on the disk and may be available to new users. All Ironic setups are affected.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Tristan's updated impact description in comment 19 looks good to me.

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Liberty patch attached.

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Tristan, almost good to go. s/Grad/Brad/ :)

I've pinged Deva and Lucas to review the new patches.

As an FYI, I'll do a 4.2.2 release immediately after the stable/liberty patch lands.

I'm happy to send the email, if I'm given a list of addresses to send it to. I'd prefer if the VMT or deva sent the email as I'm sure my GPG keys are not as well-trusted. Here's a draft with a proposed disclosure date:

This is an advance warning of a vulnerability discovered in OpenStack,
to give you, as downstream stakeholders, a chance to coordinate the
release of fixes and reduce the vulnerability window. Please treat the
following information as confidential until the proposed public
disclosure date.

Title: Ironic does not honor clean steps
Reporter: Brad Morgan (Rackspace)
Products: Ironic
Affects: >= 4.2.0, <= 4.2.1

Description:
Grad Morgan from Rackspace reported a vulnerability in Ironic. To prevent user data leak, Ironic is expected to "clean" a server after use, however that is transparently not happening. Previous tenant's data may be left behind on the disk and may be available to new users. All Ironic setups are affected.

Proposed patch:
See attached patches. Unless a flaw is discovered in them, these patches
will be merged to the stable/liberty and master branches on the public disclosure date.

CVE: $CVE

Proposed public disclosure date/time:
December 1, 2015, 1500UTC
Please do not make the issue public (or release public patches) before
this coordinated embargo date.

summary: - Clean steps don't actually run
+ Clean steps don't actually run (CVE-2015-7514)
Revision history for this message
aeva black (tenbrae) wrote :

Patches look good to me. I've confirmed they apply cleanly on master and stable/liberty.

Aside from the latest comment still indicating that "Grad Morgan" found the issue (instead of Brad) this looks fine to my non-VMT-experienced eyes.

I'm happy to sign and send the email, but I, too, do not know who all the downstream stakeholders are.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Thanks Devananda for sending the notification.
The public disclosure date is:
December 3, 2015, 1500UTC

Changed in ossa:
status: Incomplete → Fix Committed
information type: Private Security → Public
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/253001

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic (stable/liberty)

Change abandoned by Jim Rollenhagen (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/252994
Reason: I screwed up change-id, new patch is here https://review.openstack.org/#/c/253001/

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Master patch is: https://review.openstack.org/#/c/252993/

Advisory is out, I removed the ossa task now.

Thanks everyone!

Changed in ossa:
status: Fix Committed → Won't Fix
Changed in ironic:
assignee: nobody → Devananda van der Veen (devananda)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (stable/liberty)

Reviewed: https://review.openstack.org/253001
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=6eb970b71cb6ae629b733ced84917d9db5afc78a
Submitter: Jenkins
Branch: stable/liberty

commit 6eb970b71cb6ae629b733ced84917d9db5afc78a
Author: Jim Rollenhagen <email address hidden>
Date: Mon Nov 23 22:40:19 2015 +0000

    Fix bug where clean steps do not run

    A bug was introduced during Liberty where Ironic transparently
    ignores all clean steps and finishes cleaning. This is caused
    by _get_node_next_clean_steps returning an empty list when
    cleaning has just started. Fix this method to return the full
    list of clean steps when cleaning begins.

    This may leave previous tenants' data on disk and available to future
    tenants. Deployers should apply this patch (or upgrade to a new release
    with this patch) ASAP.

    Closes-Bug: #1517277
    (cherry picked from commit 1c700e6d62ad299e3fc9023e30b98d51408e49e1)

    Change-Id: If815f81d7e668244f0d434d4e2933e8f41946107

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/252993
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=bf518cd5407296e32e26686fc6e99be8f71e5731
Submitter: Jenkins
Branch: master

commit bf518cd5407296e32e26686fc6e99be8f71e5731
Author: Jim Rollenhagen <email address hidden>
Date: Mon Nov 23 22:40:19 2015 +0000

    Fix bug where clean steps do not run

    A bug was introduced during Liberty where Ironic transparently
    ignores all clean steps and finishes cleaning. This is caused
    by _get_node_next_clean_steps returning an empty list when
    cleaning has just started. Fix this method to return the full
    list of clean steps when cleaning begins.

    This may leave previous tenants' data on disk and available to future
    tenants. Deployers should apply this patch (or upgrade to a new release
    with this patch) ASAP.

    Depends-On: Id15cf6cc49122b08e557e44871b31a8c0d20b55d

    Change-Id: If815f81d7e668244f0d434d4e2933e8f41946107
    Closes-Bug: #1517277

Changed in ironic:
status: In Progress → Fix Committed
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/ironic 4.2.2

This issue was fixed in the openstack/ironic 4.2.2 release.

Changed in ironic:
status: Fix Committed → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/ironic 4.3.0

This issue was fixed in the openstack/ironic 4.3.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.