Brief Description
ml350_g10 installation (linux kernel 5.10) failed because DM failed to find disk
The issue is not seen on WRCP_Dev_Build 2021-10-06_00-00-08 (linux kernel 3.10).
Severity
Major
Steps to Reproduce
install controller-0 ansible bootstrap on controller-0 configure and unlock controller-0 via DM
Expected Behavior
controller-0 is unlocked and in available status
Actual Behavior
installation failed because DM failed to find disk
Reproducibility
reproducible (happened 2/2)
System Configuration
ml350_g10
Branch/Pull Time/Commit
private load from Davlet for linux kernel 5.10 yow-cgts4-lx:/localdisk/loadbuild/dpanech/wrcp_master_5.10_kernel_2021_10_01/
Last Pass
Installing ml350_g10 with WRCP_Dev_Build 2021-10-06_00-00-08 successfully
(linux kernel 3.10)
Timestamp/Logs
[sysadmin@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+ | id | hostname | personality | administrative | operational | availability | +----+--------------+-------------+----------------+-------------+--------------+ | 1 | controller-0 | controller | locked | disabled | online | +----+--------------+-------------+----------------+-------------+--------------+
[sysadmin@controller-0 ~(keystone_admin)]$ system host-unlock controller-0
Expecting number of interface sriov_numvfs=7. Please wait a few minutes for inventory update and retry host-unlock.
[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE armada armada-api-778fc65fd6-2qz78 2/2 Running 0 59m cert-manager cm-cert-manager-785cb658cd-ldldr 1/1 Running 0 51m cert-manager cm-cert-manager-cainjector-544d67bcb8-djggd 1/1 Running 0 51m cert-manager cm-cert-manager-webhook-d47d89c8-dlz6j 1/1 Running 0 51m flux-helm helm-controller-74667bfd95-wfht4 1/1 Running 1 59m flux-helm source-controller-7d448db5b4-s8n82 1/1 Running 1 59m kube-system calico-kube-controllers-5cd4695574-hz279 1/1 Running 1 59m kube-system calico-node-xf2tr 1/1 Running 0 59m kube-system coredns-666cb94996-d86hv 1/1 Running 0 59m kube-system ic-nginx-ingress-ingress-nginx-controller-ffsbj 1/1 Running 0 54m kube-system kube-apiserver-controller-0 1/1 Running 0 59m kube-system kube-controller-manager-controller-0 1/1 Running 0 59m kube-system kube-multus-ds-amd64-w5xzl 1/1 Running 0 59m kube-system kube-proxy-qdx9w 1/1 Running 0 59m kube-system kube-scheduler-controller-0 1/1 Running 0 59m kube-system kube-sriov-cni-ds-amd64-2xklc 1/1 Running 0 59m kube-system kube-sriov-device-plugin-amd64-klkk6 0/1 CrashLoopBackOff 13 45m platform-deployment-manager platform-deployment-manager-0 2/2 Running 2 51m Test Activity
Upgrade Testing
Workaround
Describe workaround if available
Brief Description
ml350_g10 installation (linux kernel 5.10) failed because DM failed to find disk
The issue is not seen on WRCP_Dev_Build 2021-10-06_00-00-08 (linux kernel 3.10).
Severity
Major
Steps to Reproduce
install controller-0
ansible bootstrap on controller-0
configure and unlock controller-0 via DM
Expected Behavior
controller-0 is unlocked and in available status
Actual Behavior
installation failed because DM failed to find disk
Reproducibility
reproducible (happened 2/2)
System Configuration
ml350_g10
Branch/Pull Time/Commit
private load from Davlet for linux kernel 5.10 lx:/localdisk/ loadbuild/ dpanech/ wrcp_master_ 5.10_kernel_ 2021_10_ 01/
yow-cgts4-
Last Pass
Installing ml350_g10 with WRCP_Dev_Build 2021-10-06_00-00-08 successfully
(linux kernel 3.10)
Timestamp/Logs
[sysadmin@ controller- 0 ~(keystone_admin)]$ system host-list
+----+- ------- ------+ ------- ------+ ------- ------- --+---- ------- --+---- ------- ---+ ------- ------+ ------- ------+ ------- ------- --+---- ------- --+---- ------- ---+ ------- ------+ ------- ------+ ------- ------- --+---- ------- --+---- ------- ---+
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | locked | disabled | online |
+----+-
[sysadmin@ controller- 0 ~(keystone_admin)]$ system host-unlock controller-0
Expecting number of interface sriov_numvfs=7. Please wait a few minutes for inventory update and retry host-unlock.
[sysadmin@ controller- 0 ~(keystone_admin)]$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE api-778fc65fd6- 2qz78 2/2 Running 0 59m manager- 785cb658cd- ldldr 1/1 Running 0 51m manager- cainjector- 544d67bcb8- djggd 1/1 Running 0 51m manager- webhook- d47d89c8- dlz6j 1/1 Running 0 51m -74667bfd95- wfht4 1/1 Running 1 59m controller- 7d448db5b4- s8n82 1/1 Running 1 59m kube-controller s-5cd4695574- hz279 1/1 Running 1 59m 666cb94996- d86hv 1/1 Running 0 59m ingress- ingress- nginx-controlle r-ffsbj 1/1 Running 0 54m controller- 0 1/1 Running 0 59m -manager- controller- 0 1/1 Running 0 59m ds-amd64- w5xzl 1/1 Running 0 59m controller- 0 1/1 Running 0 59m cni-ds- amd64-2xklc 1/1 Running 0 59m device- plugin- amd64-klkk6 0/1 CrashLoopBackOff 13 45m deployment- manager platform- deployment- manager- 0 2/2 Running 2 51m
armada armada-
cert-manager cm-cert-
cert-manager cm-cert-
cert-manager cm-cert-
flux-helm helm-controller
flux-helm source-
kube-system calico-
kube-system calico-node-xf2tr 1/1 Running 0 59m
kube-system coredns-
kube-system ic-nginx-
kube-system kube-apiserver-
kube-system kube-controller
kube-system kube-multus-
kube-system kube-proxy-qdx9w 1/1 Running 0 59m
kube-system kube-scheduler-
kube-system kube-sriov-
kube-system kube-sriov-
platform-
Test Activity
Upgrade Testing
Workaround
Describe workaround if available