Kube OVN Charm

ovn pods stuck in CrashLoopBackOff with "Illegal instruction (core dumped)"

Bug #1989363 reported by Bas de Bruijne on 2022-09-12

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Kube OVN Charm	Invalid	High	Unassigned

Bug Description

In testrun https://solutions.qa.canonical.com/testruns/testRun/9b189314-b023-4944-a598-4b5905b1039a which is a 1.25 release test jammy on aws with kube_ovn, all k8s control plane nodes and 1 kube-ovn node are stuck waiting:

```
kubernetes-control-plane/0* waiting idle 12 3.88.150.106 6443/tcp Waiting for 3 kube-system pods to start
  containerd/3 active idle 3.88.150.106 Container runtime available
  filebeat/16 active idle 3.88.150.106 Filebeat ready.
  kube-ovn/3 waiting idle 3.88.150.106 Waiting to retry configuring Kube-OVN
  ntp/16 active idle 3.88.150.106 123/udp chrony: Ready
  telegraf/16 active idle 3.88.150.106 9103/tcp Monitoring kubernetes-control-plane/0 (source version/commit 76901fd)
kubernetes-control-plane/1 waiting idle 13 54.167.245.12 6443/tcp Waiting for 3 kube-system pods to start
  containerd/4 active idle 54.167.245.12 Container runtime available
  filebeat/19 active idle 54.167.245.12 Filebeat ready.
  kube-ovn/4 active idle 54.167.245.12
  ntp/19 active idle 54.167.245.12 123/udp chrony: Ready
  telegraf/19 active idle 54.167.245.12 9103/tcp Monitoring kubernetes-control-plane/1 (source version/commit 76901fd)
```

In the kube-system-ovs-ovn-gc6kk-openvswitch.log I see some errors:
```
kube-system-ovs-ovn-gc6kk-openvswitch.log:parm: log_ecn_error:Log packets received with corrupted ECN (bool)
kube-system-ovs-ovn-gc6kk-openvswitch.log-filename: /lib/modules/5.15.0-1019-aws/kernel/net/ipv4/netfilter/ip_tables.ko
kube-system-ovs-ovn-gc6kk-openvswitch.log-alias: ipt_icmp
kube-system-ovs-ovn-gc6kk-openvswitch.log-description: IPv4 packet filter
kube-system-ovs-ovn-gc6kk-openvswitch.log-author: Netfilter Core Team <email address hidden>
kube-system-ovs-ovn-gc6kk-openvswitch.log-license: GPL
kube-system-ovs-ovn-gc6kk-openvswitch.log-srcversion: 766BC3192DA4B163CBE393B
kube-system-ovs-ovn-gc6kk-openvswitch.log-depends: x_tables
kube-system-ovs-ovn-gc6kk-openvswitch.log-retpoline: Y
kube-system-ovs-ovn-gc6kk-openvswitch.log-intree: Y
kube-system-ovs-ovn-gc6kk-openvswitch.log-name: ip_tables
kube-system-ovs-ovn-gc6kk-openvswitch.log-vermagic: 5.15.0-1019-aws SMP mod_unload modversions
kube-system-ovs-ovn-gc6kk-openvswitch.log-sig_id: PKCS#7
kube-system-ovs-ovn-gc6kk-openvswitch.log-signer: Build time autogenerated kernel key
kube-system-ovs-ovn-gc6kk-openvswitch.log-sig_key: 4E:2B:42:19:F5:56:93:8B:20:A1:FC:3B:04:F8:B7:5A:8C:4C:32:A0
kube-system-ovs-ovn-gc6kk-openvswitch.log-sig_hashalgo: sha512
kube-system-ovs-ovn-gc6kk-openvswitch.log-signature: ...
```

But otherwise there is nothing standing out. Looking at the SKu:https://solutions.qa.canonical.com/skus/sku/fkb-undertest-kubernetes-jammy-aws-kube_ovn-kubernetes_release, we did have some passing runs already with the same configs.

Link to crashdumps:
https://oil-jenkins.canonical.com/artifacts/9b189314-b023-4944-a598-4b5905b1039a/index.html