Unable to restart c-vol services when using multi-backend due to set_voldb_empty_at_startup_indicator
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Cinder |
New
|
Undecided
|
Michal Dulko |
Bug Description
When attempting to restart c-vol services in setups that have multi-backend enabled we encounter an unhandled exception from the DB causing the service to die:
2015-02-20 23:10:10.924 10176 CRITICAL cinder [req-c867de6c-
2015-02-20 23:10:10.924 10176 TRACE cinder Traceback (most recent call last):
2015-02-20 23:10:10.924 10176 TRACE cinder File "/usr/local/
2015-02-20 23:10:10.924 10176 TRACE cinder load_entry_
2015-02-20 23:10:10.924 10176 TRACE cinder File "/opt/stack/
2015-02-20 23:10:10.924 10176 TRACE cinder binary=
2015-02-20 23:10:10.924 10176 TRACE cinder File "/opt/stack/
2015-02-20 23:10:10.924 10176 TRACE cinder service_
2015-02-20 23:10:10.924 10176 TRACE cinder File "/opt/stack/
2015-02-20 23:10:10.924 10176 TRACE cinder *args, **kwargs)
2015-02-20 23:10:10.924 10176 TRACE cinder File "/opt/stack/
2015-02-20 23:10:10.924 10176 TRACE cinder context.
2015-02-20 23:10:10.924 10176 TRACE cinder File "/opt/stack/
2015-02-20 23:10:10.924 10176 TRACE cinder None, filters=None)
2015-02-20 23:10:10.924 10176 TRACE cinder File "/opt/stack/
2015-02-20 23:10:10.924 10176 TRACE cinder filters=filters)
2015-02-20 23:10:10.924 10176 TRACE cinder File "/opt/stack/
2015-02-20 23:10:10.924 10176 TRACE cinder return f(*args, **kwargs)
2015-02-20 23:10:10.924 10176 TRACE cinder File "/opt/stack/
2015-02-20 23:10:10.924 10176 TRACE cinder return query.all()
2015-02-20 23:10:10.924 10176 TRACE cinder File "/usr/local/
2015-02-20 23:10:10.924 10176 TRACE cinder return list(self)
2015-02-20 23:10:10.924 10176 TRACE cinder File "/usr/local/
2015-02-20 23:10:10.924 10176 TRACE cinder fetch = cursor.fetchall()
2015-02-20 23:10:10.924 10176 TRACE cinder File "/usr/local/
2015-02-20 23:10:10.924 10176 TRACE cinder self.cursor, self.context)
2015-02-20 23:10:10.924 10176 TRACE cinder File "/usr/local/
2015-02-20 23:10:10.924 10176 TRACE cinder util.raise_
2015-02-20 23:10:10.924 10176 TRACE cinder File "/usr/local/
2015-02-20 23:10:10.924 10176 TRACE cinder reraise(
2015-02-20 23:10:10.924 10176 TRACE cinder File "/usr/local/
2015-02-20 23:10:10.924 10176 TRACE cinder l = self.process_
2015-02-20 23:10:10.924 10176 TRACE cinder File "/usr/local/
2015-02-20 23:10:10.924 10176 TRACE cinder self._non_result()
2015-02-20 23:10:10.924 10176 TRACE cinder File "/usr/local/
2015-02-20 23:10:10.924 10176 TRACE cinder "This result object does not return rows. "
2015-02-20 23:10:10.924 10176 TRACE cinder DBError: This result object does not return rows. It has been closed automatically.
2015-02-20 23:10:10.924 10176 TRACE cinder
It appears that this was introduced by the NFS empty startup check made here:
Commit: 6879bd0720b2c4c
It looks like there's a timing issue where the startup of the second backend makes the call out to the DB? Not sure, but it also seems like some systems I hit it, others I don't and I'm not clear on what the actual cause is or how to reliably reproduce it.
We should see if we can figure out what's going on here, if nothing else maybe wrap the call in a try/catch and I guess return False on exception (or wait and retry seems to work well too).
Changed in cinder: | |
assignee: | nobody → Michal Dulko (michal-dulko-f) |
To clarify, I suspect this is related to how we do multi-backend, we have two threads and I believe they are sharing the same DB session, so this becomes a contention issue; I'm not sure but I think that it might be something along those lines.