The issue was reproduced on 2020-08-05 - subcloud1 & subcloud2 were in-sync whereas subcloud4 was out-of-sync even though the device image state shows completed. As per Difu, he was doing 2 rounds of orchestration: the first round all passed, then the second round reported an out-of-sync on subcloud4. As per Al, the reason subcloud4 is out-of-sync is because there are 2 applied images on the system controller, but only one is applied on subcloud4 On the system controller, there are 2 images applied: system device-image-list -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ uuid bitstream_t pci_ven pci_dev bitstream_ key_signat revoke_ name description image_ve applied applied_labels ype dor ice id ure key_id rsion -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ cbe0f5f7-f1d6-4b1b-8153-28951991d4ee functional 8086 0b30 11 None None None None None True [\{u'subcloud': u'abc'}, \{u'subcloud3': u'3'}, \{u'subcloud4': u'4'}] ee659fb7-d433-4937-bb5b-f213185b07b5 functional 8086 0b30 2 None None None None None True [\{u'subcloud': u'abc'}, \{u'subcloud3': u'3'}, \{u'subcloud4': u'4'}] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ On subcloud4, there is only 1 image applied: system device-image-list ----------------------------------------------------------------------------------------------------------------------------------------------------------------+ uuid bitstream_t pci_ven pci_devic bitstream_id key_signature revoke_key_id name description image_version applied applied_labels ype dor e ----------------------------------------------------------------------------------------------------------------------------------------------------------------+ cbe0f5f7-f1d6-4b1b-8153-28951991d4ee functional 8086 0b30 11 None None None None None False None ee659fb7-d433-4937-bb5b-f213185b07b5 functional 8086 0b30 2 None None None None None True [\{u'subcloud4': u'4'}] ----------------------------------------------------------------------------------------------------------------------------------------------------------------+ Detailed Sequence of events are as follows: subcloud1: system host-device-label-assign controller-0 0000:b2:00.0 subcloud=abc subcloud2: system host-device-label-assign controller-0 0000:b2:00.0 subcloud=abc subcloud3: system host-device-label-assign controller-0 0000:b4:00.0 subcloud3=3 subcloud4: system host-device-label-assign controller-0 0000:b4:00.0 subcloud4=4 system --os-region-name SystemController device-image-upload 5gldpc_1x2x25g_20ww2.3_swap_ddr4_2xrefresh-signed-ssl-csk1.bin functional 8086 0b30 --bitstream-id 11 system --os-region-name SystemController device-image-apply cbe0f5f7-f1d6-4b1b-8153-28951991d4ee subcloud=abc subcloud3=3 subcloud4=4 dcmanager fw-update-strategy create --subcloud-apply-type serial dcmanager fw-update-strategy apply first round finished with all subclouds complete and insync (re-assign subcloud1 and subcloud2 label as it got removed due to LP-1890296) subcloud1: system host-device-label-assign controller-0 0000:b2:00.0 subcloud=abc subcloud2: system host-device-label-assign controller-0 0000:b2:00.0 subcloud=abc system --os-region-name SystemController device-image-upload 5gldpc_1x2x25g_20ww2.3_swap_ddr4_2xrefresh-unsigned.bin functional 8086 0b30 --bitstream-id 2 system --os-region-name SystemController device-image-apply ee659fb7-d433-4937-bb5b-f213185b07b5 subcloud=abc subcloud3=3 subcloud4=4 dcmanager fw-update-strategy create --max-parallel-subclouds 4 dcmanager fw-update-strategy apply second round finished with subcloud1 & 2 in-sync, but subcloud3 & 4 out-of-sync (subcloud3 was out-of-sync due to user error: it had already been flashed with root-key, so unsigned bin failed) In summary, there is a mismatch between the SystemController and the subcloud….we've basically told the SystemController "these images are both applied", but if we had run those same commands in the subcloud only one of them would actually be applied because it would overwrite the other. ** Longer term, we should look at a solution where the system controller doesn't report both images as applied. This needs further investigation as the system controller does not have device image state records, so it does not know to replace the old image with the new image. Shorter term, we suggest that the user "removes" the old functional image before "applying" a new one for the same labels. This can be achieved by running a "device-image-remove" before the "device-image-apply" in the second round of orchestration.