Bug #1021340 “Race condition in network/deallocate_for_instance(...” : Bugs : OpenStack Compute (nova)

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2012-07-10:

#1

Agreed, lets convert everything to call. We may be sacrificing a little performance, but It will prevent a lot of annoying race conditions like this one.

Changed in nova:
status:	New → Triaged
importance:	Undecided → High

OpenStack Infra (hudson-openstack) on 2012-07-11

Changed in nova:
assignee:	nobody → Phil Day (philip-day)
status:	Triaged → In Progress

Revision history for this message

Russell Bryant (russellb) wrote on 2012-07-11:

#2

In the future, when a bug has security implications, please mark it as a security issue so that the vulnerability management team can evaluate it and determine whether the patch should go through embargo and a coordinated release.

security vulnerability:

no → yes

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-07-13: Fix merged to nova (master)

#3

Reviewed: https://review.openstack.org/9630
Committed: http://github.com/openstack/nova/commit/34f9d7e974d0e09c723e0a04ed6eecd1b482e77d
Submitter: Jenkins
Branch: master

commit 34f9d7e974d0e09c723e0a04ed6eecd1b482e77d
Author: Phil Day <email address hidden>
Date: Wed Jul 11 09:26:11 2012 +0100

Convert remaining network API casts to calls

Fix for Bug 1021340

    During compute/terminate_instance networking is de-allocated by a call to
    network/api/deallocate_for_instance(), which is implemented as an rpc.cast to
    the network manager. This in turn will eventually call
    network/manager/deallocate_fixed_ip(), which in turn call the compute API to
    trigger a security group refresh, which will get the instance from the database

    However because original call to the network manager is a cast there is a chance
    that the compute manager will delete the instance record in the DB before the
    compute API (in the network manager) tries to retrieve the instance. At this
    point the security group refresh fails (leaving rules in place which are a
    security risk when the IP is reused), and potentially stopping otheraspects
    of the network deallocation from completing.

Changing this from rpc.call to rpc.cast will fix this issue.

    Aside from this specific use of a cast there are 4 other casts in the
    network API:
      add_fixed_ip_to_instance
      remove_fixed_ip_from_instance
      add_network_to_project
      release_floating_ip

and to avoid other timing issues these will also be converted to calls.

Change-Id: I5cdcc628293d3e7cf165c5ffe4883f138783f73f

Changed in nova:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-07-26: Fix proposed to nova (stable/essex)

#4

Fix proposed to branch: stable/essex
Review: https://review.openstack.org/10387

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-07-30: Fix merged to nova (stable/essex)

#5

Reviewed: https://review.openstack.org/10387
Committed: http://github.com/openstack/nova/commit/219c5ca3bed950b7f36e57e942c91d177ec788d5
Submitter: Jenkins
Branch: stable/essex

commit 219c5ca3bed950b7f36e57e942c91d177ec788d5
Author: Phil Day <email address hidden>
Date: Wed Jul 11 09:26:11 2012 +0100

Convert remaining network API casts to calls

Fix for Bug 1021340

    During compute/terminate_instance networking is de-allocated by a call to
    network/api/deallocate_for_instance(), which is implemented as an rpc.cast to
    the network manager. This in turn will eventually call
    network/manager/deallocate_fixed_ip(), which in turn call the compute API to
    trigger a security group refresh, which will get the instance from the database

    However because original call to the network manager is a cast there is a chance
    that the compute manager will delete the instance record in the DB before the
    compute API (in the network manager) tries to retrieve the instance. At this
    point the security group refresh fails (leaving rules in place which are a
    security risk when the IP is reused), and potentially stopping otheraspects
    of the network deallocation from completing.

Changing this from rpc.call to rpc.cast will fix this issue.

    Aside from this specific use of a cast there are 4 other casts in the
    network API:
      add_fixed_ip_to_instance
      remove_fixed_ip_from_instance
      add_network_to_project
      release_floating_ip

and to avoid other timing issues these will also be converted to calls.

Change-Id: I5cdcc628293d3e7cf165c5ffe4883f138783f73f
(cherry picked from commit 34f9d7e974d0e09c723e0a04ed6eecd1b482e77d)

Thierry Carrez (ttx) on 2012-08-16

Changed in nova:
milestone:	none → folsom-3
status:	Fix Committed → Fix Released

Dave Walker (davewalker) on 2012-08-24

Changed in nova (Ubuntu):
status:	New → Fix Released
Changed in nova (Ubuntu Precise):
status:	New → Confirmed

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2012-08-30: Verification report.

#6

Please find the attached test log from the Ubuntu Server Team's CI infrastructure. As part of the verification process for this bug, Nova has been deployed and configured across multiple nodes using precise-proposed as an installation source. After successful bring-up and configuration of the cluster, a number of exercises and smoke tests have be invoked to ensure the updated package did not introduce any regressions. A number of test iterations were carried out to catch any possible transient errors.

Please Note the list of installed packages at the top and bottom of the report.

For records of upstream test coverage of this update, please see the Jenkins links in the comments of the relevant upstream code-review(s):

Trunk review: https://review.openstack.org/9630
Stable review: https://review.openstack.org/10387

As per the provisional Micro Release Exception granted to this package by the Technical Board, we hope this contributes toward verification of this update.

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2012-08-30:

#7

2012.1.3+stable-20120827-4d2a4afe-0ubuntu1.log Edit (82.0 KiB, text/plain)

Test coverage log.

tags:

added: verification-done

Revision history for this message

Launchpad Janitor (janitor) wrote on 2012-09-03:

#8

Download full text (5.4 KiB)

This bug was fixed in the package nova - 2012.1.3+stable-20120827-4d2a4afe-0ubuntu1

---------------
nova (2012.1.3+stable-20120827-4d2a4afe-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot, fixes FTBFS in -proposed. (LP: #1041120)
  * Resynchronize with stable/essex (4d2a4afe):
    - [5d63601] Inappropriate exception handling on kvm live/block migration
      (LP: #917615)
    - [ae280ca] Deleted floating ips can cause instance delete to fail
      (LP: #1038266)

nova (2012.1.3+stable-20120824-86fb7362-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot. (LP: #1041120)
  * Dropped, superseded by new snapshot:
    - debian/patches/CVE-2012-3447.patch: [d9577ce]
    - debian/patches/CVE-2012-3371.patch: [25f5bd3]
    - debian/patches/CVE-2012-3360+3361.patch: [b0feaff]
  * Resynchronize with stable/essex (86fb7362):
    - [86fb736] Libvirt driver reports incorrect error when volume-detach fails
      (LP: #1029463)
    - [272b98d] nova delete lxc-instance umounts the wrong rootfs (LP: #971621)
    - [09217ab] Block storage connections are NOT restored on system reboot
      (LP: #1036902)
    - [d9577ce] CVE-2012-3361 not fully addressed (LP: #1031311)
    - [e8ef050] pycrypto is unused and the existing code is potentially insecure
      to use (LP: #1033178)
    - [3b4ac31] cannot umount guestfs (LP: #1013689)
    - [f8255f3] qpid_heartbeat setting in ineffective (LP: #1030430)
    - [413c641] Deallocation of fixed IP occurs before security group refresh
      leading to potential security issue in error / race conditions
      (LP: #1021352)
    - [219c5ca] Race condition in network/deallocate_for_instance() leads to
      security issue (LP: #1021340)
    - [f2bc403] cleanup_file_locks does not remove stale sentinel files
      (LP: #1018586)
    - [4c7d671] Deleting Flavor currently in use by instance creates error
      (LP: #994935)
    - [7e88e39] nova testsuite errors on newer versions of python-boto (e.g.
      2.5.2) (LP: #1027984)
    - [80d3026] NoMoreFloatingIps: Zero floating ips available after repeatedly
      creating and destroying instances over time (LP: #1017418)
    - [4d74631] Launching with source groups under load produces lazy load error
      (LP: #1018721)
    - [08e5128] API 'v1.1/{tenant_id}/os-hosts' does not return a list of hosts
      (LP: #1014925)
    - [801b94a] Restarting nova-compute removes ip packet filters (LP: #1027105)
    - [f6d1f55] instance live migration should create virtual_size disk image
      (LP: #977007)
    - [4b89b4f] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [6e873bc] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [7b215ed] Use default qemu-img cluster size in libvirt connection driver
    - [d3a87a2] Listing flavors with marker set returns 400 (LP: #956096)
    - [cf6a85a] nova-rootwrap hardcodes paths instead of using
      /sbin:/usr/sbin:/usr/bin:/bin (LP: #1013147)
    - [2efc87c] affinity filters don't work if scheduler_hints is None
      (LP: #1007573)
  ...

This bug was fixed in the package nova - 2012.1.3+stable-20120827-4d2a4afe-0ubuntu1

---------------
nova (2012.1.3+stable-20120827-4d2a4afe-0ubuntu1) precise-proposed; urgency=low

* New upstream snapshot, fixes FTBFS in -proposed. (LP: #1041120)
  * Resynchronize with stable/essex (4d2a4afe):
    - [5d63601] Inappropriate exception handling on kvm live/block migration
      (LP: #917615)
    - [ae280ca] Deleted floating ips can cause instance delete to fail
      (LP: #1038266)

nova (2012.1.3+stable-20120824-86fb7362-0ubuntu1) precise-proposed; urgency=low

* New upstream snapshot. (LP: #1041120)
  * Dropped, superseded by new snapshot:
    - debian/patches/CVE-2012-3447.patch: [d9577ce]
    - debian/patches/CVE-2012-3371.patch: [25f5bd3]
    - debian/patches/CVE-2012-3360+3361.patch: [b0feaff]
  * Resynchronize with stable/essex (86fb7362):
    - [86fb736] Libvirt driver reports incorrect error when volume-detach fails
      (LP: #1029463)
    - [272b98d] nova delete lxc-instance umounts the wrong rootfs (LP: #971621)
    - [09217ab] Block storage connections are NOT restored on system reboot
      (LP: #1036902)
    - [d9577ce] CVE-2012-3361 not fully addressed (LP: #1031311)
    - [e8ef050] pycrypto is unused and the existing code is potentially insecure
      to use (LP: #1033178)
    - [3b4ac31] cannot umount guestfs  (LP: #1013689)
    - [f8255f3] qpid_heartbeat setting in ineffective (LP: #1030430)
    - [413c641] Deallocation of fixed IP occurs before security group refresh
      leading to potential security issue in error / race conditions
      (LP: #1021352)
    - [219c5ca] Race condition in network/deallocate_for_instance() leads to
      security issue (LP: #1021340)
    - [f2bc403] cleanup_file_locks does not remove stale sentinel files
      (LP: #1018586)
    - [4c7d671] Deleting Flavor currently in use by instance creates error
      (LP: #994935)
    - [7e88e39] nova testsuite errors on newer versions of python-boto (e.g.
      2.5.2) (LP: #1027984)
    - [80d3026] NoMoreFloatingIps: Zero floating ips available after repeatedly
      creating and destroying instances over time (LP: #1017418)
    - [4d74631] Launching with source groups under load produces lazy load error
      (LP: #1018721)
    - [08e5128] API 'v1.1/{tenant_id}/os-hosts' does not return a list of hosts
      (LP: #1014925)
    - [801b94a] Restarting nova-compute removes ip packet filters (LP: #1027105)
    - [f6d1f55] instance live migration should create virtual_size disk image
      (LP: #977007)
    - [4b89b4f] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [6e873bc] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [7b215ed] Use default qemu-img cluster size in libvirt connection driver
    - [d3a87a2] Listing flavors with marker set returns 400 (LP: #956096)
    - [cf6a85a] nova-rootwrap hardcodes paths instead of using
      /sbin:/usr/sbin:/usr/bin:/bin (LP: #1013147)
    - [2efc87c] affinity filters don't work if scheduler_hints is None
      (LP: #1007573)
    - [48e5f46] metadata injection is broken in xen (LP: #1022036)
    - [25f5bd3] scheduler hang (DOS) possible with
      DifferentHostFilter/SameHostFilter  (LP: #1017795)
    - [1c1b858] cannot umount guestfs  (LP: #1013689)
    - [835ba4f] not able to get host total memory in xen with libvirt
      (LP: #1004298)
    - [00e5104] Call to network_get_all_by_uuids missing 'db' (LP: #986922)
    - [4c49df7] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [19631f3] [nova][volumes] Exceeding volumes quotas logs
      "VolumeSizeTooLarge" instead of "VolumeLimitExceeded"  (LP: #1020634)
    - [b0feaff] Remote arbitrary file corruption / creation flaw via injected
      files (LP: #1015531)
    - [3cb6e57] NoMoreFixedIps: Zero fixed ips available. Nova seems leaking
      them. (LP: #1014769)
    - [5d8431b] ram_allocation_ratio does not work (LP: #1016273)
    - [410060f] test_get_console_output_file requires sudo NOPASSWD
      (LP: #992805)
    - [33c2575] Stop/start a KVM instance with volumes attached produces an
      error state (LP: #1013782)
    - [6c01c01] Backport tox settings to unbreak jenkins jobs.
    - [344125f] Set defaultbranch in .gitreview to stable/essex
    - [9b789be] floating ips are not disassociated from instances on deletion
      (LP: #997763)
    - [d89c2f3] qpid timeout causing compute service to crash (LP: #999698)
    - [caae0e9] floating ips do not display in 'nova list' after association to
      instance (LP: #939122)
    - [1dc9f19] impl_qpid doesn't ACK messages (LP: #1012374)
    - [bc621bc] Restarting nova-network removes ip packet filters
      (LP: #1000853)
    - [7870157] Add caching to openstack.common.cfg
    - [27133ee] Firewall rules from nova-compute are not refreshed after host
      reboot (LP: #985162)
    - [3ee026e] Source group based security group rule without protocol and port
      causes failures (LP: #1010514)
    - [f0a9f47] [SRU] dns_domains table mysql charset is 'latin1'. Should be
      'utf8' (LP: #993663)
    - [cc8fd97] euca-describe-keypair NonExistent returns 200 (LP: #1006664)
    - [9f9e9da] Security groups fail to be set correctly if incorrect case is
      used for protocol specification (LP: #985184)
 -- Adam Gandelman <adamg@canonical.com>   Mon, 27 Aug 2012 14:50:40 -0700

Changed in nova (Ubuntu Precise):
status:	Confirmed → Fix Released

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2012-09-03: Update Released

#9

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Thierry Carrez (ttx) on 2012-09-27

Changed in nova:
milestone:	folsom-3 → 2012.2

OpenStack Compute (nova)

Race condition in network/deallocate_for_instance() leads to security issue

Bug Description

Related branches

CVE References

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to	Milestone
OpenStack Compute (nova)	Fix Released	High	Phil Day	OpenStack Compute (nova) 2012.2 "folsom"
Essex	Fix Released	High	Unassigned	OpenStack Compute (nova) 2012.1.2
nova (Ubuntu)	Fix Released	Undecided	Unassigned
Precise	Fix Released	Undecided	Unassigned