Can't download images in Maas 3.5.0 in HA mode with 3 nodes

Bug #2057979 reported by Jacopo Rota
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Committed
High
Alexsander de Souza
3.5
Fix Committed
High
Alexsander de Souza

Bug Description

In Maas 3.5.0 in HA mode with 3 nodes the image sync is broken.

How to reproduce: setup Maas with 3 nodes and try to download some images.

From my reproducer:
By looking at the sync-boot-resource workflow some child workflows like `bootresource-download:497bc88ad4e5` are started twice and then it is terminated with the error `terminated by new runID: 6cd76c58-7a56-498e-86d3-53364ddacec5`
when this happens the top level `sync-boot-resources` workflow fails with
```
{
  "type": "workflowExecutionFailedEventAttributes",
  "failure": {
    "message": "Child Workflow execution terminated",
    "cause": {
      "message": "Terminated",
      "terminatedFailureInfo": {}
    },
    "childWorkflowExecutionFailureInfo": {
      "namespace": "default",
      "workflowExecution": {
        "workflowId": "bootresource-download:497bc88ad4e5",
        "runId": "3908106b-b8bc-49ae-ae74-c57512643a7b"
      },
      "workflowType": {
        "name": "download-bootresource"
      },
      "initiatedEventId": "117",
      "startedEventId": "151",
      "retryState": "RETRY_STATE_NON_RETRYABLE_FAILURE"
    }
  },
  "retryState": "RETRY_STATE_RETRY_POLICY_NOT_SET",
  "workflowTaskCompletedEventId": "209"
}
```

in the run id `3908106b-b8bc-49ae-ae74-c57512643a7b` I see

```
[
  "terminated by new runID: 6cd76c58-7a56-498e-86d3-53364ddacec5"
]
```

and in the run id `6cd76c58-7a56-498e-86d3-53364ddacec5` I see
```
{
  "type": "workflowExecutionTerminatedEventAttributes",
  "reason": "by parent close policy",
  "identity": "temporal-sys-parent-close-policy-workflow"
}
```
in the run id `3908106b-b8bc-49ae-ae74-c57512643a7b` I see

```
[
  "terminated by new runID: 6cd76c58-7a56-498e-86d3-53364ddacec5"
]
```

and in the run id `6cd76c58-7a56-498e-86d3-53364ddacec5` I see
```
{
  "type": "workflowExecutionTerminatedEventAttributes",
  "reason": "by parent close policy",
  "identity": "temporal-sys-parent-close-policy-workflow"
}
```

in the run id `3908106b-b8bc-49ae-ae74-c57512643a7b` I see

```
[
  "terminated by new runID: 6cd76c58-7a56-498e-86d3-53364ddacec5"
]
```

it's like the child workflow is started twice and since the first one is cancelled then the entire workflow is terminated

The two child workflows have the same parent but they are in 2 different task queues (the 2 regions that need to download the images from the region that started the sync-boot-resource s).

This is because the 2 child workflows are sharing the same workflow id, but it must be unique within the temporal namespace unless specified differently using a Reuse Policy https://docs.temporal.io/workflows#workflow-id

Related branches

Changed in maas:
status: Triaged → In Progress
assignee: nobody → Alexsander de Souza (alexsander-souza)
Changed in maas:
milestone: 3.5.0 → 3.6.0
Changed in maas:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.