Fix race condition on configuration-detach
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack DBaaS (Trove) |
Fix Released
|
High
|
Petr Malik |
Bug Description
Fix race condition on configuration-
Configuration detach (and attach) are async calls from the task manager.
A Trove instance is loaded from the database [in instance/
an async call is made to detach the configuration
group [in instance/
new configuration id (None) [in instance/
The current detach implementation [in taskmanager/
assumes the instance has a configuration id throughout the execution of
the async call.
This causes a failure in the task manager if the database update
happens before the async call finishes (update_db removes the
configuration id and re-saves the instance in the Trove database).
There is another related issue. If the detach call fails anywhere in
the taskmanager or the guestagent the Trove database still gets updated.
The instance then behaves like it has (not) a configuration group
attached although it is not true.
Notes:
The configuration id is needed on detach because the code tries to
reset the changed values to their defaults dynamically and to do that it
needs to load the configuration group.
If it cannot reset all configuration properties dynamically it puts
the instance to RESTART_REQUIRED state. This is however overridden
in instance/
configuration detach.
Another problem is in update_overrides [in taskmanager/
where we currently *always* apply the changed values even if some
of them require instance reset or their default values are not available
(on detach). This potentially puts the instance into a state with only
some changes applied and other waiting for restart.
Proposed fixes:
- The preferred solution is to move the current configuration casts
from the task manager to the API and make them into blocking calls
(configuration change is a quick operation).
That way we could wait for the guestagent operation to succeed
before we start updating the Trove record. If the call fails we
leave the configuration record intact so that it reflects the real
state of the instance and the user can retry to detach it again.
This would also allow us to emit a meaningful error message to the
client.
- An alternative solution (quick fix) would be to remove the
code responsible for resetting the configuration values on detach
(we currently always put the instance to RESTART_REQUIRED state
anyways). That way the configuration id would no longer be required.
This should mitigate the race condition but it does not solve the
other issue - the record in the Trove database gets updated even if
the guestagent call fails.
Changed in trove: | |
status: | New → In Progress |
Changed in trove: | |
milestone: | liberty-2 → liberty-3 |
Changed in trove: | |
status: | Fix Committed → Fix Released |
Changed in trove: | |
milestone: | liberty-3 → 4.0.0 |
Reviewed: https:/ /review. openstack. org/198891 /git.openstack. org/cgit/ openstack/ trove/commit/ ?id=6a6b3dae4fc bec068a6c9eafc9 eaa21819b8a94e
Committed: https:/
Submitter: Jenkins
Branch: master
commit 6a6b3dae4fcbec0 68a6c9eafc9eaa2 1819b8a94e
Author: Petr Malik <email address hidden>
Date: Sun Jul 5 14:08:42 2015 -0400
Fix race conditions in config overrides tasks
* Convert configuration casts into blocking calls and move them
from the task manager to the API. Update the Trove records
only after a successful update on the guestagent.
* Change the way how we apply the configuration changes.
- On attach: Apply the values dynamically only if all of them
can be applied at once. Put the instance into
the 'RESTART_REQUIRED' state otherwise.
a) Default values for all of them are
available in the configuration template.
b) All values can be applied at once.
Put the instance into the 'RESTART_REQUIRED'
state otherwise.
- On detach: Apply the values dynamically only if:
* Remove override template resolution. Pass override values in a
guestagents that do not support configuration imports.
Python dict (like it is for 'apply_overrides').
The user-provided configuration values do not get resolved and
are applied as supplied by the user - hence no need for a template.
It also avoids the need to double-parse the overrides in
* Moved MySQL-specific value conversions (K, M, G suffixes)
down to the MySQL guestagent.
* Update the MySQL 'update_overrides' methods to accept both
compatibilit y with older task managers that send overrides as
Python dicts and rendered strings. This is for backwards
a string.
* Remove deprecated methods from the taskmanager.
Closes-Bug: 1468488 afe6663e6c5a5d7 0c6e949c60b
Change-Id: Ie125131945ad82