Reset deployment task (and maybe some others) sometimes hangs

Bug #1529613 reported by Vitaly Kramskikh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
Medium
Fuel Python (Deprecated)
8.0.x
Won't Fix
Medium
Fuel Python (Deprecated)
Mitaka
Confirmed
Medium
Fuel Python (Deprecated)

Bug Description

Please see the logs and artifacts here:

https://ci.fuel-infra.org/job/verify-fuel-web-ui/6778/

This bug started to occur ~3 weeks ago and lead to hanging of cluster reset (and maybe deployment, but I'm not sure if it wasn't random failure of UI tests - we didn't have nailgun logs that time) during UI functional tests. After adding nailgun logs in case of UI test failure it seems there are deadlocks causing this:

Possible deadlock found: Possible deadlock found while attempting to lock table: 'tasks'. Lock transition is not allowed: clusters, nodes, tasks.

Changed in fuel:
status: New → Confirmed
Revision history for this message
Alexander Kislitsky (akislitsky) wrote :

@Vitaly there are no DB deadlocks in the console logs. Only warnings from the deadlock detector. Warnings caused by differences between actual locking order in the code and allowed by detector. It should be fixed, but it is not the High bug. On the production environment we have disabled detector.

When DB deadlock occurs we see ShareLock exception from PostgreSQL in the logs. Have you an example of failed tests with ShareLock exception?

Changed in fuel:
status: Confirmed → Invalid
status: Invalid → Incomplete
Revision history for this message
Vitaly Kramskikh (vkramskikh) wrote :

Oh, ok. Due to deadlock traces spam in the logs I didn't find another issue:

Exception in thread GRANULAR_DEPLOY:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/home/jenkins/workspace/verify-fuel-web-ui/nailgun/nailgun/task/fake.py", line 207, in run
    resp_method(**msg)
  File "/home/jenkins/workspace/verify-fuel-web-ui/nailgun/nailgun/rpc/receiver.py", line 242, in deploy_resp
    fail_if_not_found=True
  File "/home/jenkins/workspace/verify-fuel-web-ui/nailgun/nailgun/objects/task.py", line 66, in get_by_uuid
    "Task with UUID={0} is not found in DB".format(uuid)
ObjectNotFound: Task with UUID=56d3d57c-87ba-4680-9042-33f90f76fa23 is not found in DB

So I removed "deadlock" word from the bug's title

summary: - Reset deployment task (and maybe some others) sometimes hang due to
- deadlock
+ Reset deployment task (and maybe some others) sometimes hangs
Changed in fuel:
status: Incomplete → Confirmed
Dmitry Pyzhov (dpyzhov)
tags: added: team-bugfix
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Looks like it doesn't affect real deployments. Marking it a tech-debt. Please remove tech-debt tag if you have a broken deployment because of this case.

tags: added: tech-debt
removed: team-bugfix
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Sergey Slipushenko (sslypushenko)
Changed in fuel:
status: Confirmed → In Progress
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 8.0 → 9.0
Changed in fuel:
assignee: Sergey Slipushenko (sslypushenko) → nobody
assignee: nobody → Fuel Python Team (fuel-python)
status: In Progress → Confirmed
Dmitry Pyzhov (dpyzhov)
tags: added: team-bugfix
removed: tech-debt
tags: added: tech-debt
removed: team-bugfix
Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

Taking into account that the bug isn't about real deployment and affecting fake threads, we shouldn't fix it for 8.0. Lower the prio then.

Vitaly, please increase prio for 9.0 if it will occur even often.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.