[Ocata]resource tracker does not validate placement allocation

Bug #1861067 reported by Yang Youseok
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
Ocata
Confirmed
Low
Unassigned

Bug Description

For stable/ocata, we got serious scheduler problem makes us to upgrade to upper release. I could not find any issue report for that so leave it for whom meet this issue later.

The problem which we encounter is like this
- conductor try to schedule one compute nodes for 2 instances
- nova-compute at that time has enough resource in compute_nodes, scheduler choose the nova-compute
- resource tracker in nova-compute claim for resource to placement
- placement returns for the answer of one of the request 409, since there were several concurrent requests.
- [BUG here] resource tracker in nova-compute does not care about the return code from placement, so 'allocation' is only increased for share of the one instance.
- After that compute_nodes in scheduler was full but allocation in placement has slot to be used.
- [User meet weirdness here] since there were slot to be used in scheduler side, instance could be made in compute node which is actually full. The result is that compute node is over provisionning.
- OOM occurs. (We got tight memory, if admin has other resource policy, they would be meet different side effect)

I found it's already fixed over pike in which scheduler make allocation first and nova-compute just checks the compute_nodes. But for me, it's very hard to find root cause and need to investigate a lot for scheduler history, so I hope someone who meet this problem would be helpful.

I do not sure it should be fixed since ocata is quite old though, we can fix it up to change the function (nova/scheduler/client/report.py _allocate_for_instance()) to catch the 409 conflict similar to the function latter added (put_allocations())

Thanks.

Yang Youseok (ileixe)
description: updated
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I checked and on stable/ocata nova ignores the error from placement in the reported case. So I made this confirmed for ocata. The same issue is not valid for newer branches. Ocata is in extended maintenance so the official project does not focus on fixing issues there but you can still persuade your OpenStack vendor to fix the problem upstream.

Changed in nova:
status: New → Confirmed
status: Confirmed → Invalid
tags: added: placement scheduler
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.