server_group_members quota check failure with multi-create

Bug #1780373 reported by Chen on 2018-07-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Matt Riedemann
Pike
High
Matt Riedemann
Queens
High
Matt Riedemann

Bug Description

Circumstance:
When multi-creating a quota-exceeding number of instances in a server group, it will pass server_group_members quota check.

Actual result:
Servers successfully created.

Expected result:
Raising QuotaExceeded API exception.

Reproduce steps (Queen):
1 nova server-group-create sg affinity (policy shouldn't matter)
2 set in nova.conf server_group_members=2 (so we don't need to create too many servers to test)
3 nova boot --flavor flavor --net net-name=netname --image image --max-count 3 server-name
Then we will see all 3 servers created successfully, violating server_group_members quota policy.

Chen (chenn2) on 2018-07-06
Changed in nova:
assignee: nobody → Chen (chenn2)

Fix proposed to branch: master
Review: https://review.openstack.org/580684

Changed in nova:
status: New → In Progress
melanie witt (melwitt) wrote :

First, I assume your boot request included the instance group, for example:

  nova boot --flavor flavor --net net-name=netname --image image --max-count 3 --hint group=<sg uuid> server-name

I was able to reproduce this with devstack by setting quota to 2 and using the above command ^ and found the reason for the bug is that the resource count for server_group_members during the quota check counts instance records for a user, and we're not creating instance records until much later on, in conductor. So on a fresh install and a multi-create request, the quota check repeatedly counts 0 group members for a user as we add members to the instance_group_members table in the API database.

I added several comments on the patch review, but to summarize, it seems like we need to consider counting build requests in addition to instance records for multi-create scenarios (and maybe more) while de-duping instance uuids for the small window where a build request and instance record can co-exist for the same instance uuid, to avoid over-counting.

Changed in nova:
importance: Undecided → High
tags: added: quotas

Related fix proposed to branch: master
Review: https://review.openstack.org/580755

Changed in nova:
assignee: Chen (chenn2) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem) on 2018-07-06
tags: added: api

Reviewed: https://review.openstack.org/580755
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f9874e059df50dc81803fcfdfd1045cc09624894
Submitter: Zuul
Branch: master

commit f9874e059df50dc81803fcfdfd1045cc09624894
Author: Matt Riedemann <email address hidden>
Date: Fri Jul 6 16:10:48 2018 -0400

    Add functional regressions tests for server_group_members OverQuota

    Since we started counting quotas in Pike, it is possible to bypass
    the server_group_members qouta check if either creating multiple
    servers in a single request or creating one server each in multiple
    concurrent requests. This is because the server_group_members
    count is based on existing server group members in the cell database
    and those group members (instances) don't get created in a cell until
    we get to conductor and after the scheduler picks a host. In other
    words, the server_group_members quota check in the API does not account
    for build requests.

    Change-Id: Icb268ca2f792bfcefd152ba4c6aa13270d9a7900
    Related-Bug: #1780373

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/580684
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bbee9a26a5c64a1463bd9a9f82d735ec17c62d52
Submitter: Zuul
Branch: master

commit bbee9a26a5c64a1463bd9a9f82d735ec17c62d52
Author: Chen <email address hidden>
Date: Fri Jul 6 22:47:12 2018 +0800

    Fix server_group_members quota check

    For example there are 3 instances in a server group (quota is 5).
    When doing multi-creating of 3 more instances in this group
    (would have 6 members), current quota checking scheme will fail to
    prevent this happening, which is not expected.

    This is due to the server_group_members quota check previously
    only counting group members that existed as instance records in
    cell databases and not accounting for build requests which are
    the temporary representation of the instance in the API database
    before the instance is scheduled to a cell.

    Co-Authored-By: Matt Riedemann <email address hidden>

    Change-Id: If439f4486b8fe157c436c47aa408608e639a3e15
    Closes-Bug: #1780373

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers