Activity log for bug #1930293

Date Who What changed Old value New value Message
2021-05-31 15:23:38 Radosław Piliszek bug added bug
2021-05-31 15:23:45 Radosław Piliszek nominated for series kolla-ansible/xena
2021-05-31 15:23:45 Radosław Piliszek bug task added kolla-ansible/xena
2021-06-01 18:34:22 Radosław Piliszek summary multinode rabbitmq unstable kolla ansible actions multinode rabbitmq failing upgrades
2021-06-01 18:34:59 Radosław Piliszek description Multinode rabbitmq kolla ansible actions may fail depending on the order of stops and starts. It can be randomly wrong and cause any changes (updates, upgrades, config change) to multinode rabbitmq fail the run. Example failure: ara summary: (It shows stop on 'secondary1' last, yet first to start is 'secondary2') Stopping all rabbitmq instances but the first node secondary1 kolla_docker 0:02:38 0:00:00 SKIPPED Stopping all rabbitmq instances but the first node secondary2 kolla_docker 0:02:38 0:00:07 CHANGED Stopping all rabbitmq instances but the first node primary kolla_docker 0:02:38 0:00:09 CHANGED Stopping rabbitmq on the first node secondary2 kolla_docker 0:02:48 0:00:00 SKIPPED Stopping rabbitmq on the first node primary kolla_docker 0:02:48 0:00:00 SKIPPED Stopping rabbitmq on the first node secondary1 kolla_docker 0:02:48 0:00:17 CHANGED Restart rabbitmq container secondary2 include_tasks 0:03:06 0:00:00 OK Restart rabbitmq container secondary1 include_tasks 0:03:06 0:00:00 OK Restart rabbitmq container primary include_tasks 0:03:06 0:00:00 OK Restart rabbitmq container secondary2 kolla_docker 0:03:06 0:00:01 CHANGED Waiting for rabbitmq to start secondary2 command 0:03:07 0:10:06 FAILED Restart rabbitmq container secondary1 kolla_docker 0:13:14 0:00:01 CHANGED Waiting for rabbitmq to start secondary1 command 0:13:15 0:00:05 CHANGED Restart rabbitmq container primary kolla_docker 0:13:21 0:00:01 CHANGED Waiting for rabbitmq to start primary command 0:13:23 0:00:07 CHANGED docker logs for the failing rabbitmq: (It shows the order is the actual problem) 2021-05-31T13:48:33.608436819Z BOOT FAILED 2021-05-31T13:48:33.608444389Z =========== 2021-05-31T13:48:33.608571562Z Timeout contacting cluster nodes: [rabbit@primary,rabbit@secondary1]. 2021-05-31T13:48:33.608687375Z 2021-05-31T13:48:33.608727786Z BACKGROUND 2021-05-31T13:48:33.608930872Z ========== 2021-05-31T13:48:33.608990003Z 2021-05-31T13:48:33.609201178Z This cluster node was shut down while other nodes were still running. 2021-05-31T13:48:33.609556107Z To avoid losing data, you should start the other nodes first, then 2021-05-31T13:48:33.609564438Z start this one. To force this node to start, first invoke 2021-05-31T13:48:33.609612299Z "rabbitmqctl force_boot". If you do so, any changes made on other 2021-05-31T13:48:33.609766853Z cluster nodes after this one was shut down may be lost. 2021-05-31T13:48:33.609805674Z 2021-05-31T13:48:33.609895306Z DIAGNOSTICS 2021-05-31T13:48:33.609953178Z =========== 2021-05-31T13:48:33.609981468Z 2021-05-31T13:48:33.610106611Z attempted to contact: [rabbit@primary,rabbit@secondary1] 2021-05-31T13:48:33.610173433Z 2021-05-31T13:48:33.610252235Z rabbit@primary: 2021-05-31T13:48:33.610450790Z * unable to connect to epmd (port 4369) on primary: address (cannot connect to host/port) 2021-05-31T13:48:33.610635545Z 2021-05-31T13:48:33.610760428Z rabbit@secondary1: 2021-05-31T13:48:33.610963233Z * unable to connect to epmd (port 4369) on secondary1: address (cannot connect to host/port) 2021-05-31T13:48:33.611150918Z 2021-05-31T13:48:33.611189209Z 2021-05-31T13:48:33.611298392Z Current node details: 2021-05-31T13:48:33.611434945Z * node name: rabbit@secondary2 Multinode rabbitmq upgrade may fail depending on the order of stops and starts. It can be randomly wrong and cause the run to fail. Example failure: ara summary: (It shows stop on 'secondary1' last, yet first to start is 'secondary2') Stopping all rabbitmq instances but the first node secondary1 kolla_docker 0:02:38 0:00:00 SKIPPED Stopping all rabbitmq instances but the first node secondary2 kolla_docker 0:02:38 0:00:07 CHANGED Stopping all rabbitmq instances but the first node primary kolla_docker 0:02:38 0:00:09 CHANGED Stopping rabbitmq on the first node secondary2 kolla_docker 0:02:48 0:00:00 SKIPPED Stopping rabbitmq on the first node primary kolla_docker 0:02:48 0:00:00 SKIPPED Stopping rabbitmq on the first node secondary1 kolla_docker 0:02:48 0:00:17 CHANGED Restart rabbitmq container secondary2 include_tasks 0:03:06 0:00:00 OK Restart rabbitmq container secondary1 include_tasks 0:03:06 0:00:00 OK Restart rabbitmq container primary include_tasks 0:03:06 0:00:00 OK Restart rabbitmq container secondary2 kolla_docker 0:03:06 0:00:01 CHANGED Waiting for rabbitmq to start secondary2 command 0:03:07 0:10:06 FAILED Restart rabbitmq container secondary1 kolla_docker 0:13:14 0:00:01 CHANGED Waiting for rabbitmq to start secondary1 command 0:13:15 0:00:05 CHANGED Restart rabbitmq container primary kolla_docker 0:13:21 0:00:01 CHANGED Waiting for rabbitmq to start primary command 0:13:23 0:00:07 CHANGED docker logs for the failing rabbitmq: (It shows the order is the actual problem) 2021-05-31T13:48:33.608436819Z BOOT FAILED 2021-05-31T13:48:33.608444389Z =========== 2021-05-31T13:48:33.608571562Z Timeout contacting cluster nodes: [rabbit@primary,rabbit@secondary1]. 2021-05-31T13:48:33.608687375Z 2021-05-31T13:48:33.608727786Z BACKGROUND 2021-05-31T13:48:33.608930872Z ========== 2021-05-31T13:48:33.608990003Z 2021-05-31T13:48:33.609201178Z This cluster node was shut down while other nodes were still running. 2021-05-31T13:48:33.609556107Z To avoid losing data, you should start the other nodes first, then 2021-05-31T13:48:33.609564438Z start this one. To force this node to start, first invoke 2021-05-31T13:48:33.609612299Z "rabbitmqctl force_boot". If you do so, any changes made on other 2021-05-31T13:48:33.609766853Z cluster nodes after this one was shut down may be lost. 2021-05-31T13:48:33.609805674Z 2021-05-31T13:48:33.609895306Z DIAGNOSTICS 2021-05-31T13:48:33.609953178Z =========== 2021-05-31T13:48:33.609981468Z 2021-05-31T13:48:33.610106611Z attempted to contact: [rabbit@primary,rabbit@secondary1] 2021-05-31T13:48:33.610173433Z 2021-05-31T13:48:33.610252235Z rabbit@primary: 2021-05-31T13:48:33.610450790Z * unable to connect to epmd (port 4369) on primary: address (cannot connect to host/port) 2021-05-31T13:48:33.610635545Z 2021-05-31T13:48:33.610760428Z rabbit@secondary1: 2021-05-31T13:48:33.610963233Z * unable to connect to epmd (port 4369) on secondary1: address (cannot connect to host/port) 2021-05-31T13:48:33.611150918Z 2021-05-31T13:48:33.611189209Z 2021-05-31T13:48:33.611298392Z Current node details: 2021-05-31T13:48:33.611434945Z * node name: rabbit@secondary2
2021-06-01 18:35:08 Radosław Piliszek nominated for series kolla-ansible/wallaby
2021-06-01 18:35:08 Radosław Piliszek bug task added kolla-ansible/wallaby
2021-06-01 18:35:08 Radosław Piliszek nominated for series kolla-ansible/ussuri
2021-06-01 18:35:08 Radosław Piliszek bug task added kolla-ansible/ussuri
2021-06-01 18:35:08 Radosław Piliszek nominated for series kolla-ansible/train
2021-06-01 18:35:08 Radosław Piliszek bug task added kolla-ansible/train
2021-06-01 18:35:08 Radosław Piliszek nominated for series kolla-ansible/victoria
2021-06-01 18:35:08 Radosław Piliszek bug task added kolla-ansible/victoria
2021-06-01 18:35:16 Radosław Piliszek kolla-ansible/wallaby: status New Triaged
2021-06-01 18:35:19 Radosław Piliszek kolla-ansible/victoria: status New Triaged
2021-06-01 18:35:21 Radosław Piliszek kolla-ansible/ussuri: status New Triaged
2021-06-01 18:35:23 Radosław Piliszek kolla-ansible/train: status New Triaged
2021-06-01 18:35:26 Radosław Piliszek kolla-ansible/wallaby: importance Undecided High
2021-06-01 18:35:28 Radosław Piliszek kolla-ansible/victoria: importance Undecided High
2021-06-01 18:35:31 Radosław Piliszek kolla-ansible/train: importance Undecided High
2021-06-01 18:35:35 Radosław Piliszek kolla-ansible/ussuri: importance Undecided High
2021-06-01 18:35:42 Radosław Piliszek kolla-ansible/wallaby: assignee Radosław Piliszek (yoctozepto)
2021-06-01 18:35:45 Radosław Piliszek kolla-ansible/ussuri: assignee Radosław Piliszek (yoctozepto)
2021-06-01 18:35:47 Radosław Piliszek kolla-ansible/train: assignee Radosław Piliszek (yoctozepto)
2021-06-01 18:35:52 Radosław Piliszek kolla-ansible/victoria: assignee Radosław Piliszek (yoctozepto)
2021-06-02 16:01:20 Imtiaz Chowdhury bug added subscriber Imtiaz Chowdhury
2021-06-04 17:08:55 OpenStack Infra kolla-ansible: status Triaged In Progress
2021-06-08 17:53:37 OpenStack Infra kolla-ansible: status In Progress Fix Released
2021-06-08 18:08:29 OpenStack Infra kolla-ansible/wallaby: status Triaged In Progress
2021-06-08 18:14:25 OpenStack Infra kolla-ansible/victoria: status Triaged In Progress
2021-06-08 18:15:55 OpenStack Infra kolla-ansible/ussuri: status Triaged In Progress
2021-06-08 18:17:03 OpenStack Infra kolla-ansible/train: status Triaged In Progress
2021-06-09 09:17:43 OpenStack Infra kolla-ansible/ussuri: status In Progress Fix Committed
2021-06-09 09:26:43 OpenStack Infra kolla-ansible/victoria: status In Progress Fix Committed
2021-06-09 09:27:14 OpenStack Infra kolla-ansible/wallaby: status In Progress Fix Committed
2021-06-09 09:28:01 OpenStack Infra kolla-ansible/train: status In Progress Fix Committed