nova-compute consuming 100% of cpu after rebuilding with invalid data parameters

Bug #1801733 reported by Wallace Cardoso
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Undecided
Unassigned

Bug Description

Description
==============
The 'conductor-api' for 'rebuild_instance' has a vulnerability point for the parameter 'rebuild_instance/args/instance/nova_object.data/flavor/nova_object.data/vcpus'. When set to an invalid number of vcpus in the flavor of the instance, the compute component takes 100% of cpu consuming forever without changing the state from rebuild to active (or error). In addition, new requests to compute component are not computed, that is, the node gets out-of-service until its restart. Maybe, this bug can be a way of using a denial-of-service attack.

Steps to reproduce
=====================
1) create an instance with the flavor (VCPUS: 1, MEM: 64MB, STORAGE: 0GB) and the cirros image 0.3.4;
2) rebuild the instance with an alternative cirros image 0.4.0;
2.1) intercept the message to 'conductor' api (ComputeTaskAPI) for the method 'rebuild_instance', and change the parameter 'rebuild_instance/args/instance/nova_object.data/flavor/nova_object.data/vcpus' to 10000000000000000000001;
3) rebuild again the instance with the original image of the instance (cirros 0.3.4);
4) shelve the instance;
5) delete the instance;

Expected result
================
Even that rebuild is not an action that takes the flavor into account, should exist something for ensuring correctness of other parameters. The compute node does not stop working because of an invalid parameter.

Actual result
================
The instance does not change from rebuild to active, remaining rebuilding forever, and the compute node gets innoperating until the services be restarted. 'nova-compute' consuming 100% of cpu.

Environment
==============
I used devstack/stable/queens, a fresh Ubuntu environment.

Logs & Configs
=================
Logs attached.
The fault is injected after 11:24:16.
If you search for '10000000000000000000001', you will see the line below:
Nov 5 11:24:21 localhost nova-compute[14517]: #033[00;32mDEBUG nova.virt.hardware [#033[01;36mNone req-f97def42-9630-4165-81e5-abc0cab5c02f #033[00;36madmin admin#033[00;32m] #033[01;35m#033[00;32mBuild topologies for 10000000000000000000001 vcpu(s) 65536:65536:65536#033[00m #033[00;33m{{(pid=14517) _get_possible_cpu_topologies /opt/stack/queens/dest/nova/nova/virt/hardware.py:418}}#033[00m

Revision history for this message
Wallace Cardoso (wallacec) wrote :
Revision history for this message
Wallace Cardoso (wallacec) wrote :
description: updated
description: updated
description: updated
summary: - compute consuming 100% of cpu after rebuilding with invalid data
+ nova-compute consuming 100% of cpu after rebuilding with invalid data
parameters
description: updated
description: updated
tags: added: fault-injection
Revision history for this message
Artom Lifshitz (notartom) wrote :

This is definitely nifty and impressive, but realistically will never be addressed. We recommend folks deploy with TLS on all internal services, including the message queue used for RPC. TLS makes this kind of in-flight RPC hacking even less of a concern in practice - because let's face it, if someone has gained enough access to modify in-flight RPC this way, setting VCPUs to a wrong value is the least of your concerns.

Changed in nova:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.