v3 extensions api inherently racey wrt instances
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Critical
|
Russell Bryant |
Bug Description
The pci extension for the v3 API does another instance lookup back to the database for instance objects. The issue being that when you are doing something like a list_* operation on instances, this means that we're making a second trip to the database that's distinct from the first lookup in the request handling. If an instance got deleted between the request and the extension hook running, this will generate a database exception, which turns into an InstanceNot found, and 404s the list operation *if any instance was deleted during the request*
We are managing to hit this quite frequently in tempest with our test_list_
The explosion looks like this - http://
Logstash picks up these tracebacks really easily. This kind of explosion doesn't always trigger a Tempest failure, because some times this might be in cleanup code, where we protect against 404s (though it probably means we are leaking resources a lot on a normal run).
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → Critical |
Changed in nova: | |
assignee: | nobody → Christopher Yeoh (cyeoh-0) |
Changed in nova: | |
milestone: | none → icehouse-2 |
status: | Fix Committed → Fix Released |
Changed in nova: | |
assignee: | Christopher Yeoh (cyeoh-0) → Russell Bryant (russellb) |
Changed in nova: | |
milestone: | icehouse-2 → 2014.1 |
So this affects both V2 and V3. We in many cases just cache a small reference when getting information about an instance in the server code and the extension can then access the server id and use it to access the db to get more info when the hook runs. I guess we've just been really lucky or haven't noticed it previously (perhaps some timing has changed in the tests)
I can see two ways of fixing this - cache a whole lot more information or alternatively just fail gracefully in extensions. I prefer the latter as its much simpler and I don't think we need to be too concerned about omitting information provided by extensions when reporting on an instance which has just been deleted.