soft reboot of instance does not ensure iptables rules are present

Bug #1316822 reported by Marc Heckmann on 2014-05-06
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned
OpenStack Security Notes
High
Doug Chivers

Bug Description

The iptables rules needed to implement instance security group rules get inserted by the "_create_domain_and_network" function in nova/virt/libvirt/driver.py

This function is called by the following functions: _hard_reboot, resume and spawn (also in a couple of migration related functions).

Doing "nova reboot <instance_id>" only does a soft reboot (_soft_reboot) and assumes that the rules are already present and therefore does not check or try to add them.

If the instances is stopped (nova stop <instance_id>) and nova-compute is restarted (for example for a maintenance or problem), the iptables rules are removed as observed via output displayed in iptables -S.

If the instance is started via nova reboot <instance_id> the rule is NOT reapplied until a service nova-compute restart is issued. I have reports that this may affect "nova start <instance_id>" as well.

Depending on if the Cloud is public facing, this opens up a potentially huge security vulnerability as an instance can be powered on without being protected by any security group rules (not even the sg-fallback rule). This is unbeknownst to the instance owner or Cloud operators unless they specifically monitor for this situation.

The code should not do a soft reboot/start and error out or fallback to a resume (start)or hard reboot if it detects that the domain is not running.

information type: Private Security → Public Security
Grant Murphy (gmurphy) wrote :

Added OSSA bug task, set to incomplete until confirmed by core developer. Even then I suspect we might issue a OSSN instead of a OSSA for this.

Thoughts?

Changed in ossa:
status: New → Incomplete
Tracy Jones (tjones-i) on 2014-05-07
tags: added: network
Thierry Carrez (ttx) wrote :

nova coresec: please comment -- OSSA vs. OSSN all depends on how likely (and triggerable) the chain of events is here.

Thierry Carrez (ttx) wrote :

Directly subscribing nova-coresec to elicit a response.

Andrew Laski (alaski) wrote :

The most glaring issue from my perspective is that iptables rules are lost upon a nova-compute restart. That has the broadest impact and fixing that would address this. Beyond that it would be nice to have a check that security group rules are in place anytime an instance is started, whether it's reboot or start. Additionally I question whether reboot should be a valid action for a stopped instance, but that's outside the scope of this.

The likeliness is difficult to comment on because it is dependent on how often a deployer restarts their nova-compute services. That's not something that should be occurring frequently so I would classify this as rather unlikely. But it is triggerable over a long enough timeframe. As a user if I stopped my instance and waited long enough it would almost certainly trigger this at some point. But it should be noted that this is a security hole that a user can open on their own instance, or a deployer could inadvertently open on a user, but can't be triggered in a targeted manner upon another users instance.

Marc Heckmann (marc-w-heckmann) wrote :

Andrew's analysis is correct. The only point that I would like to add is the case where all VMs are shut down if the Hypervisor itself is rebooted for maintenance or due to a crash.

I also question the validity of allowing the using reboot to start a stopped instance.

When the nova compute service is restarted or the host is restarted, instances will be put back online with a soft reboot (thus without iptables rules) ? Even in case of a crash ?

That seems like a practical attack vector.

Do we know when was this introduced ?

Marc Heckmann (marc-w-heckmann) wrote :

No, they will not be put back online with reboot. The problem is that the host will come back up with all the VMs in the shutdown state and the problem arises when the operator uses "reboot" to bring the nodes back online.

In Icehouse, if the operator/user uses "start" instead of "reboot", things should be ok. I haven't tested it, but my quick review of the Icehouse code shows that the "start_instance" API call will call "power_on" in the driver. In Icehouse, "power_on" calls "_hard_reboot" which does the right thing in terms of setting up the security groups.

In Grizzly this is not the case.

The "reboot" call does a "_soft_reboot" and even in Icehouse, this does not setup the network security group rules.

In short, the problem is with the "reboot" call. It too should probably teardown and bring the rules back up. Like Andrew, I also question the validity of even allowing the use of "reboot" on a server in the shutdown state but that would be a behavioral change.

Thierry Carrez (ttx) wrote :

The chain of events is a bit unpredictable, but I'm leaning towards issuing a nadvisory here, unless the change ends up not being backportable (for example, if we decide the proper way to fix this is to disable a previously-supported behavior)

Thierry Carrez (ttx) on 2014-05-30
Changed in ossa:
status: Incomplete → Confirmed
Thierry Carrez (ttx) on 2014-06-09
Changed in ossa:
importance: Undecided → Medium
Thierry Carrez (ttx) wrote :

The proposed "related fix" is not backportable. alaski said:
<alaski> fixing reboot would be a better fix, and should be backportable

So as far as the OSSA goes, let's wait a bit for a better patch

Abhishek Chanda (abhishek-i) wrote :

This is not backportable because it depends on one of the fixes for blueprint [1]. That fix introduces a set of checks before executing an action on a VM. This will be backportable if this check is introduced in individual drivers as I was trying to do in an earlier patch [3]. Please let me know if I should go back to that patch.

[1] https://blueprints.launchpad.net/nova/+spec/recover-stuck-state
[2] https://github.com/openstack/nova/commit/cc0be157d005c5588fe5db779fc30fefbf22b44d
[3] https://review.openstack.org/#/c/101021/2/nova/virt/libvirt/driver.py

Jeremy Stanley (fungi) wrote :

This also might be a duplicate of bug 1043886 (nearly two years old now, but looks strikingly similar)?

yes, it does look related to that bug.

I just checked in Icehouse and it seems that as mentioned above, there is code to prevent soft rebooting a VM in the SHUTOFF state.

My attempt at trying gave me the following error:

"ERROR: Cannot 'reboot' while instance is in vm_state stopped (HTTP 409) (Request-ID: req-9fbb9089-50a2-44b3-8a15-0083fbf67d3c)"

Furthermore, since the "start" method in Icehouse does the right thing, Icehouse is NOT affected by this bug.

This only affects older releases, Grizzly in particular.

Jeremy Stanley (fungi) wrote :

Can anyone confirm whether Havana is affected? If not, we wouldn't issue a security advisory (since Grizzly is months out of security support at this point).

Changed in ossa:
status: Confirmed → Incomplete
importance: Medium → Undecided

I'm not able to test on Havana right now, but I don't see the code in Havana that prevents a soft reboot (https://github.com/openstack/nova/commit/2392313f562ba6a90ed1ec3fbc507862043fa44f) if an instance is not currently running.

So, yes, I strongly suspect that Havana is affected.

-m

Jeremy Stanley (fungi) wrote :

https://review.openstack.org/51130 is pretty tiny... perhaps it could be backported to stable/havana without much trouble?

Jeremy Stanley (fungi) wrote :

After discussing with Andrew and Thierry, I'm convinced that the potential behavior change introduced by a backport of that mitigating commit, when weighed against the amount of social engineering needed to exploit this in Havana, means this bug is probably better just documented as a known behavior.

Removed the advisory task and tagged security in case the OSSG has any interest in documenting this.

tags: added: security
information type: Public Security → Public
no longer affects: ossa
Changed in ossn:
assignee: nobody → Doug Chivers (doug-chivers)
status: New → In Progress
Doug Chivers (doug-chivers) wrote :

Does this impact systems using Neutron, or only systems using nova networking?

Doug Chivers (doug-chivers) wrote :

Answering my own question: no, this only impacts nova networking.

Changed in ossn:
importance: Undecided → High
Bryan D. Payne (bdpayne) wrote :

Sorry that I'm a little late to the party here. I was just made aware of this issue today.

Bottom line is that I strongly believe that this should be an OSSA. Security controls are expected to be in place and they are not. That is pretty clear cut to me. The fact that this takes an uncommon path to happen shouldn't mitigate that. I see that the discussion went back and forth above. And perhaps it's too late to do anything. But, if not, I would encourage reopening an OSSA on this issue. Thanks!

Jeremy Stanley (fungi) wrote :

I think the counterargument is that you shouldn't be able to "reboot" an instance which is in a down state, and safety checks were added in Icehouse to prevent exactly that. The issue arises if you're running Havana or earlier and don't realize you shouldn't reboot a down instance, in which case it gets brought up with no filtering (because reboot assumes it was already running and doesn't reapply them). So essentially if you do something you're not supposed to do, you can leave instances vulnerable--this requires a mistake on the part of an inexperienced operator, or a fairly significant amount of social engineering on the part of an attacker to convince the operator to make such an error, and has since been hardened in subsequent Nova releases anyway.

Jeremy Stanley (fungi) wrote :

Also, to issue an OSSA, we'd need a stable backport of the enforcement which was added in Icehouse (a behavioral change so far deemed unsafe for introduction into a stable branch) or some other hotfix which is less impacting on existing behaviors in Havana.

Change abandoned by Abhishek Chanda (<email address hidden>) on branch: master
Review: https://review.openstack.org/101021

Sean Dague (sdague) on 2014-09-12
Changed in nova:
status: New → Won't Fix
Nathan Kinder (nkinder) wrote :

This was published as OSSN-0022:

  https://wiki.openstack.org/wiki/OSSN/OSSN-0022

Changed in ossn:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers