mysql db opportunistic unit tests timing out intermittently in the gate (bad thread switch?)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Cinder |
Incomplete
|
High
|
Unassigned | ||
OpenStack Compute (nova) |
Incomplete
|
High
|
Unassigned | ||
OpenStack Shared File Systems Service (Manila) |
Incomplete
|
Undecided
|
Goutham Pacha Ravi | ||
oslo.db |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
Seen in nova and cinder unit test runs in the gate:
http://
http://
Years ago we updated the timeout scaling factor on those tests in nova:
https:/
For bug 1216851.
Looking at a timeout/failed run for nova.tests.
nova.tests.
nova.tests.
Based on that, it's clear that we're not just missing some timeout - it looks like we're totally breaking somewhere and only being caught by the timeout.
There are a couple of errors in the logs when we see this:
b'sqlalchemy.
2018-09-19 05:35:08.697560 | ubuntu-xenial | b''
2018-09-19 05:35:08.697658 | ubuntu-xenial | b'The above exception was the direct cause of the following exception:'
2018-09-19 05:35:08.697700 | ubuntu-xenial | b''
2018-09-19 05:35:08.697772 | ubuntu-xenial | b'Traceback (most recent call last):'
2018-09-19 05:35:08.697957 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.698031 | ubuntu-xenial | b' self.db_
2018-09-19 05:35:08.698174 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.698285 | ubuntu-xenial | b' sa_migration.
2018-09-19 05:35:08.698451 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.698522 | ubuntu-xenial | b' repository, version)'
2018-09-19 05:35:08.698682 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.698766 | ubuntu-xenial | b' return _migrate(url, repository, version, upgrade=True, err=err, **opts)'
2018-09-19 05:35:08.698823 | ubuntu-xenial | b' File "<decorator-
2018-09-19 05:35:08.698951 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.698991 | ubuntu-xenial | b' return f(*a, **kw)'
2018-09-19 05:35:08.699108 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.699164 | ubuntu-xenial | b' schema.
2018-09-19 05:35:08.699315 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.706788 | ubuntu-xenial | b' change.
2018-09-19 05:35:08.707005 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.707058 | ubuntu-xenial | b' script_
2018-09-19 05:35:08.707248 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.707326 | ubuntu-xenial | b' shadow_
2018-09-19 05:35:08.707475 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.707538 | ubuntu-xenial | b' column.
2018-09-19 05:35:08.707684 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.707769 | ubuntu-xenial | b' engine.
2018-09-19 05:35:08.707922 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.708036 | ubuntu-xenial | b' conn._run_
2018-09-19 05:35:08.708167 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.708217 | ubuntu-xenial | b' **kwargs)
2018-09-19 05:35:08.708343 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.708410 | ubuntu-xenial | b' ret = super(AlterTabl
2018-09-19 05:35:08.708534 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.708588 | ubuntu-xenial | b' return meth(obj, **kw)'
2018-09-19 05:35:08.708716 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.708752 | ubuntu-xenial | b' self.execute()'
2018-09-19 05:35:08.708870 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.708932 | ubuntu-xenial | b' return self.connection
2018-09-19 05:35:08.709049 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.709113 | ubuntu-xenial | b' return self._execute_
2018-09-19 05:35:08.709246 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.709293 | ubuntu-xenial | b' statement, parameters'
2018-09-19 05:35:08.709420 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.709452 | ubuntu-xenial | b' context)'
2018-09-19 05:35:08.709582 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.709637 | ubuntu-xenial | b' util.raise_
2018-09-19 05:35:08.709760 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.709970 | ubuntu-xenial | b' reraise(
2018-09-19 05:35:08.710095 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.710143 | ubuntu-xenial | b' raise value.with_
2018-09-19 05:35:08.710277 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.710347 | ubuntu-xenial | b' context)'
2018-09-19 05:35:08.710523 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.710585 | ubuntu-xenial | b' cursor.
2018-09-19 05:35:08.710699 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.710744 | ubuntu-xenial | b' result = self._query(query)'
2018-09-19 05:35:08.710858 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.710896 | ubuntu-xenial | b' conn.query(q)'
2018-09-19 05:35:08.711046 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.711131 | ubuntu-xenial | b' self._affected_rows = self._read_
2018-09-19 05:35:08.711295 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.711339 | ubuntu-xenial | b' result.read()'
2018-09-19 05:35:08.711454 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.711510 | ubuntu-xenial | b' first_packet = self.connection
2018-09-19 05:35:08.711630 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.711680 | ubuntu-xenial | b' packet_header = self._read_
2018-09-19 05:35:08.711797 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.711845 | ubuntu-xenial | b' data = self._rfile.
2018-09-19 05:35:08.711911 | ubuntu-xenial | b' File "/usr/lib/
2018-09-19 05:35:08.711988 | ubuntu-xenial | b' return self._sock.
2018-09-19 05:35:08.712132 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.712208 | ubuntu-xenial | b' return self._recv_
2018-09-19 05:35:08.712333 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.712376 | ubuntu-xenial | b' self._read_
2018-09-19 05:35:08.712499 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.712553 | ubuntu-xenial | b' timeout_
2018-09-19 05:35:08.712673 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.712723 | ubuntu-xenial | b' mark_as_
2018-09-19 05:35:08.712844 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.712883 | ubuntu-xenial | b' return hub.switch()'
2018-09-19 05:35:08.713007 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.713091 | ubuntu-xenial | b' return self.greenlet.
2018-09-19 05:35:08.713213 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.713254 | ubuntu-xenial | b' self.wait(
2018-09-19 05:35:08.713367 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.713414 | ubuntu-xenial | b' presult = self.do_
2018-09-19 05:35:08.713529 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.713575 | ubuntu-xenial | b' return self.poll.
2018-09-19 05:35:08.713721 | ubuntu-xenial | b' File "/home/
2018-09-19 05:35:08.713771 | ubuntu-xenial | b' raise TimeoutException()'
2018-09-19 05:35:08.713819 | ubuntu-xenial | b'IndexError: tuple index out of range'
2018-09-19 05:35:08.713843 | ubuntu-xenial | b''
It looks like there are a few places things could be broken, since we have oslo.db, sqlalchemy-migrate, sqlalchemy, pymysql, and eventlet all involved here. Since we're appearing to hit a huge timeout, my guess is we're hitting some issue with eventlet and greenthread switches.
Changed in cinder: | |
status: | New → Confirmed |
importance: | Undecided → High |
summary: |
mysql db opportunistic unit tests timing out intermittently in the gate + (bad thread switch?) |
Changed in manila: | |
status: | New → Confirmed |
Changed in manila: | |
status: | In Progress → Incomplete |
Changed in nova: | |
status: | Confirmed → Incomplete |
Changed in oslo.db: | |
status: | New → Incomplete |
Changed in cinder: | |
status: | Confirmed → Incomplete |
The IndexError looks like the killer, and that's probably coming from one of the eventlet parts of the stacktrace where it's reading from a buffer.