Workflow execution stuck in 'RUNNING' state if DB error occurs due to large outputs

Bug #1772001 reported by Hardik Jasani
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mistral
Won't Fix
High
ali abdelal

Bug Description

Workflow execution gets stuck in 'RUNNING' state if DB error occurs during updating DB after outputs is evaluated. Mistral keeps retrying for 50 times (as per db retry policy) and still fails.

Setup details:
MySQL v5.6 with InnoDB engine

Mistral master
[engine]
execution_field_size_limit_kb = -1

Bug description:
The DB errors faced in case of saving large outputs (say 16MB in size) can be due to one or more of following reasons:
1) `max_packet_allowed` is less than what is required to handle such large size row update,
2) Large UPDATE queries fail due to insufficient redo size log (size of update data is > 10% of redo size log) see bug: https://bugs.mysql.com/bug.php?id=69477

Bug from mistral side is improper error handling --- Mistral keeps retrying operation (assuming error to be recoverable Deadlock or similar).

How to reproduce:
Execute a WF that has very large size outputs.

Workflow:
---
version: "2.0"

bugs_me:
     output:
         big_dict: <% 2097152 * ["ABCD"] %>
     tasks:
       t:
         action: std.noop

Expectation:
Workflow status should be 'ERROR'.

Revision history for this message
Hardik Jasani (hjasani) wrote :
Dougal Matthews (d0ugal)
Changed in mistral:
status: New → Confirmed
importance: Undecided → Critical
importance: Critical → High
milestone: none → rocky-2
tags: added: low-hanging-fruit
Dougal Matthews (d0ugal)
Changed in mistral:
milestone: rocky-2 → rocky-3
Hardik Jasani (hjasani)
Changed in mistral:
assignee: nobody → Hardik Jasani (hjasani)
Dougal Matthews (d0ugal)
Changed in mistral:
milestone: rocky-3 → stein-1
Dougal Matthews (d0ugal)
Changed in mistral:
milestone: stein-1 → stein-2
Changed in mistral:
milestone: stein-2 → stein-3
Changed in mistral:
assignee: Hardik Jasani (hjasani) → nobody
milestone: stein-3 → train-1
assignee: nobody → Renat Akhmerov (rakhmerov)
Changed in mistral:
assignee: Renat Akhmerov (rakhmerov) → ali (alielal)
Changed in mistral:
status: Confirmed → Won't Fix
Revision history for this message
ali abdelal (alielal) wrote :

the exception raised in this case is very generic, as it is raised in other cases too in which we want it to retry.

for example, in case of a timeout.

the exception raised in case of a Mysql is 'MySQL server has gone away' , which can happen for many reasons, incorrect package, timeout or because of a package that is too big.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.