Looking at the BMC log[1] the instances took very long to shutdown:
Instances power off time:
cf56643a-932d-468a-95f5-e46cacd743a0 :: 71.645662 seconds
3098f5ef-ceec-4b30-b54e-169a999444f7 :: 65.899861 seconds
d9ececd0-50c5-4910-8619-df004e7c68c5 :: 64.710398 seconds
5196ab60-6fb3-433b-a2c9-20ed85578577 :: 46.557931 seconds
62.203463 seconds/avg time to power off
If we compare that to a master job[2]:
32153366-01d4-44fe-af19-b3c8513241bc :: 68.145963 seconds
6301e44a-092c-4d90-956f-cbe6da0120da :: 120.570997 seconds
a2c99a5b-3b8f-427a-bd05-c1c6f7def36f :: 120.436390 seconds
36029b10-d981-4d02-83fc-8f5f8df8a351 :: 70.800235 seconds
94.98839625 seconds/avg time to power off
Now, another difference between train[1] and master[2] is that on master the BMC events occur in a different order.
In the train job we can see it is setting boot device to network, then comes the power off:
[ 3156.139144] openstackbmc[8149]: Set boot device to network
[ 3176.987111] openstackbmc[8192]: Powered off 5196ab60-6fb3-433b-a2c9-20ed85578577
In the master job[2] the "Set boot device to network" happens after the nodes has been powered off:
[ 2322.935851] openstackbmc[8281]: Powered off 32153366-01d4-44fe-af19-b3c8513241bc
[ 2391.081814] openstackbmc[8281]: Reporting power state "False" for instance 32153366-01d4-44fe-af19-b3c8513241bc
[ 2403.399431] openstackbmc[8281]: Set boot device to network
So, why does it work on master despite OVB nodes actually taking even longer to power off?
The workflow is different, the master job uses os_tripleo_baremetal_node_introspection which power off nodes prior to starting inspection in prepare_for_attempt[3] which has a very long timeout[4].
1200 seconds !!!!!! Thats 20 minutes!!
Another interesting observation is that the initial power off in OVB seem to be very slow, but subsequent power offs are much quicker at ~5 seconds!!
INITIAL for node 36029b10-d981-4d02-83fc-8f5f8df8a351: (2397.319949-2326.519714 = 70.800235)
[ 2326.519714] openstackbmc[8186]: Powered off 36029b10-d981-4d02-83fc-8f5f8df8a351
[ 2397.319949] openstackbmc[8186]: Reporting power state "False" for instance 36029b10-d981-4d02-83fc-8f5f8df8a351
SECOND for node 36029b10-d981-4d02-83fc-8f5f8df8a351: (2553.212485−2547.906511 = 5.305974)
[ 2547.906511] openstackbmc[8186]: Powered off 36029b10-d981-4d02-83fc-8f5f8df8a351
[ 2553.212485] openstackbmc[8186]: Reporting power state "False" for instance 36029b10-d981-4d02-83fc-8f5f8df8a351
INITIAL for node a2c99a5b-3b8f-427a-bd05-c1c6f7def36f: (2443.568209-2323.131819 = 120.43639)
[ 2323.131819] openstackbmc[8146]: Powered off a2c99a5b-3b8f-427a-bd05-c1c6f7def36f
[ 2443.568209] openstackbmc[8146]: Reporting power state "False" for instance a2c99a5b-3b8f-427a-bd05-c1c6f7def36f
SECOND for node a2c99a5b-3b8f-427a-bd05-c1c6f7def36f: (2609.103883-2603.517865 = 5.586018)
[ 2603.517865] openstackbmc[8146]: Powered off a2c99a5b-3b8f-427a-bd05-c1c6f7def36f
[ 2609.103883] openstackbmc[8146]: Reporting power state "False" for instance a2c99a5b-3b8f-427a-bd05-c1c6f7def36f
Conclusion:
- bump power_state_change_timeout to 180 seconds in train?
- Since initial/secondary poweroffs have such a big timedifference. It may be an idea to implement a change in openstack-virtual-baremetal that will power off the nodes when openstackbmc is initially started.
Looking at the BMC log[1] the instances took very long to shutdown:
Instances power off time: 932d-468a- 95f5-e46cacd743 a0 :: 71.645662 seconds ceec-4b30- b54e-169a999444 f7 :: 65.899861 seconds 50c5-4910- 8619-df004e7c68 c5 :: 64.710398 seconds 6fb3-433b- a2c9-20ed855785 77 :: 46.557931 seconds
cf56643a-
3098f5ef-
d9ececd0-
5196ab60-
62.203463 seconds/avg time to power off
If we compare that to a master job[2]: 01d4-44fe- af19-b3c8513241 bc :: 68.145963 seconds 092c-4d90- 956f-cbe6da0120 da :: 120.570997 seconds 3b8f-427a- bd05-c1c6f7def3 6f :: 120.436390 seconds d981-4d02- 83fc-8f5f8df8a3 51 :: 70.800235 seconds
32153366-
6301e44a-
a2c99a5b-
36029b10-
94.98839625 seconds/avg time to power off
Now, another difference between train[1] and master[2] is that on master the BMC events occur in a different order.
In the train job we can see it is setting boot device to network, then comes the power off:
[ 3156.139144] openstackbmc[8149]: Set boot device to network 6fb3-433b- a2c9-20ed855785 77
[ 3176.987111] openstackbmc[8192]: Powered off 5196ab60-
In the master job[2] the "Set boot device to network" happens after the nodes has been powered off:
[ 2322.935851] openstackbmc[8281]: Powered off 32153366- 01d4-44fe- af19-b3c8513241 bc 01d4-44fe- af19-b3c8513241 bc
[ 2391.081814] openstackbmc[8281]: Reporting power state "False" for instance 32153366-
[ 2403.399431] openstackbmc[8281]: Set boot device to network
So, why does it work on master despite OVB nodes actually taking even longer to power off?
The workflow is different, the master job uses os_tripleo_ baremetal_ node_introspect ion which power off nodes prior to starting inspection in prepare_ for_attempt[ 3] which has a very long timeout[4].
1200 seconds !!!!!! Thats 20 minutes!!
Another interesting observation is that the initial power off in OVB seem to be very slow, but subsequent power offs are much quicker at ~5 seconds!!
INITIAL for node 36029b10- d981-4d02- 83fc-8f5f8df8a3 51: (2397.319949- 2326.519714 = 70.800235) d981-4d02- 83fc-8f5f8df8a3 51 d981-4d02- 83fc-8f5f8df8a3 51 d981-4d02- 83fc-8f5f8df8a3 51: (2553.212485− 2547.906511 = 5.305974) d981-4d02- 83fc-8f5f8df8a3 51 d981-4d02- 83fc-8f5f8df8a3 51
[ 2326.519714] openstackbmc[8186]: Powered off 36029b10-
[ 2397.319949] openstackbmc[8186]: Reporting power state "False" for instance 36029b10-
SECOND for node 36029b10-
[ 2547.906511] openstackbmc[8186]: Powered off 36029b10-
[ 2553.212485] openstackbmc[8186]: Reporting power state "False" for instance 36029b10-
INITIAL for node a2c99a5b- 3b8f-427a- bd05-c1c6f7def3 6f: (2443.568209- 2323.131819 = 120.43639) 3b8f-427a- bd05-c1c6f7def3 6f 3b8f-427a- bd05-c1c6f7def3 6f 3b8f-427a- bd05-c1c6f7def3 6f: (2609.103883- 2603.517865 = 5.586018) 3b8f-427a- bd05-c1c6f7def3 6f 3b8f-427a- bd05-c1c6f7def3 6f
[ 2323.131819] openstackbmc[8146]: Powered off a2c99a5b-
[ 2443.568209] openstackbmc[8146]: Reporting power state "False" for instance a2c99a5b-
SECOND for node a2c99a5b-
[ 2603.517865] openstackbmc[8146]: Powered off a2c99a5b-
[ 2609.103883] openstackbmc[8146]: Reporting power state "False" for instance a2c99a5b-
Conclusion: change_ timeout to 180 seconds in train? virtual- baremetal that will power off the nodes when openstackbmc is initially started.
- bump power_state_
- Since initial/secondary poweroffs have such a big timedifference. It may be an idea to implement a change in openstack-
[1] https:/ /logserver. rdoproject. org/56/ 36356/3/ check/periodic- tripleo- ci-centos- 8-ovb-3ctlr_ 1comp-featurese t035-train/ 13d5d98/ logs/bmc_ 3_25330- console. log /logserver. rdoproject. org/50/ 815650/ 3/openstack- check/tripleo- ci-centos- 8-ovb-3ctlr_ 1comp-featurese t001/08e0d06/ logs/bmc_ 3_37470- console. log /opendev. org/openstack/ tripleo- ansible/ src/branch/ master/ tripleo_ ansible/ ansible_ plugins/ modules/ os_tripleo_ baremetal_ node_introspect ion.py# L255-L277 /opendev. org/openstack/ tripleo- ansible/ src/branch/ master/ tripleo_ ansible/ ansible_ plugins/ modules/ os_tripleo_ baremetal_ node_introspect ion.py# L60-L64
[2] https:/
[3] https:/
[4] https:/