[sig-network] tests failing

Bug #2058246 reported by Peter Jose De Sousa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cilium Charm
Triaged
High
Mateo Florido

Bug Description

Hello

Currently when running sonobuoy e2e (from: https://github.com/vmware-tanzu/sonobuoy/releases/tag/v0.57.1) sig-network tests are failing:

Failed tests:
 [sig-network] Services should be able to switch session affinity for service with type clusterIP [LinuxOnly] [Conformance]
 [sig-network] HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol [LinuxOnly] [Conformance]
 [sig-network] Services should have session affinity work for service with type clusterIP [LinuxOnly] [Conformance]
 [sig-network] Services should serve endpoints on same port and different protocols [Conformance]

It appears that cilium may be making some breaking changes upstream:

[1] https://github.com/cilium/cilium/issues/24481
[2] https://github.com/cilium/cilium/issues/14287

[Steps to reproduce]

1. Deploy bundle: https://pastebin.canonical.com/p/Kg94BZ7zhw/
2. In my case for VMs on MAAS I had to configure geneve tunnel over VXLAN, but it may not be required
3. Run sonobuoy tests: ./sonobuoy run --mode=certified-conformance --plugin-env=e2e.E2E_EXTRA_ARGS="--non-blocking-taints=node-role.kubernetes.io/control-plane" --wait --plugin-env='e2e.E2E_PARALLEL=true

Observe the above failures

Thank you,

Peter

Revision history for this message
Peter Jose De Sousa (pjds) wrote :

Subscribing field critical as control plane components are down/not working [conformance tests failing]

Changed in charm-cilium:
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Mateo Florido (mateoflorido)
milestone: none → 1.29+ck1
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

@peter could you confirm if the conformance tests are putting the control plane in an unrecoverable state, or if the control plane returns to a previously good state in spite of the failed tests?

Revision history for this message
Peter Jose De Sousa (pjds) wrote (last edit ):

@kwmonroe the control plane is working, I just cannot confirm if the tests are a false negative, or not - do these tests mean the control plane is actually broken? ie CNI is not working as expected?

Revision history for this message
Mateo Florido (mateoflorido) wrote :

We have tested both versions 1.12.5 and 1.12.13 but encountered the same issues during the conformance tests. We are updating the Cilium version to the latest from upstream (1.15.2) and exploring other workarounds identified in the Cilium repository, specifically those addressing the integration of kube-proxy with Cilium.

Revision history for this message
Peter Jose De Sousa (pjds) wrote :

Working Mateo we have confirmed the conformance tests are a false negative. Removing field critical

Changed in charm-cilium:
importance: Critical → Medium
importance: Medium → High
milestone: 1.29+ck1 → 1.30
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Additional findings:

- cilium < 1.13, 2 session affinity failures:
https://github.com/SovereignCloudStack/k8s-cluster-api-provider/issues/144

- upstream skips 2 protocol tests due to failures with k8s 1.29; pending upstream release:
https://github.com/cilium/cilium/pull/29524

- Current cilium self- and connectivity-tests are passing and confirm expected CNI functionality:
...
[=] Skipping Test [outside-to-ingress-service-deny-all-ingress] [60/62] (Feature ingress-controller is disabled)
[=] Test [dns-only] [61/62]
..........
[=] Test [to-fqdns] [62/62]
........

✅ All 45 tests (361 actions) successful, 17 tests skipped, 1 scenarios skipped.

As such, we'll keep 1.12.5 as the default charm manifest version, provide the latest 1.12.x in subsequent charmed k8s maintenance releases, and use this bug to bump up to 1.15.x (or later) for charmed k8s 1.30 due in May.

Revision history for this message
Peter Jose De Sousa (pjds) wrote :

Thank you! @kwmonroe - trying some workarounds for 1.28 to confirm if the config values referenced in the issue can be applied as a workaround

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.