Status Code 418 (I'm a teapot) thrown by the Pebble readiness check
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Expired
|
Undecided
|
Unassigned |
Bug Description
Hello Team,
While working on integration tests for my projects (Charmed 5G), I've noticed around 20%-25% of the runs fails because Pebble in not able to start.
Charm goes to error state (i.e. hook failed: "start") and when I look into the pod's logs it turns out that the pebble is a teapot:
[pebble] Check "readiness" failure 190 (threshold 3): received non-20x status code 418
My env is Juju 3.4 + Microk8s 1.29-strict running on Canonical's self-hosted GH runner.
From my observations, the problem visibility increases when the infrastructure performance starts to be a problem. Charmed 5G includes around 20 charms. When using Canonical's self-hosted runners, if I try to deploy it on the `large` runner, there's almost 100% chance for failure. If I use `xlarge`, the failure rate would go down to maybe 10-15%.
It would be great if the status code 418 could be replaced with something meaningful.
Latest failed run is available at https:/
At the bottom of the page, there's a Juju crashdump and K8s logs available for your reference.
BR,
Bartek
It looks like this is coming from the Juju "caasprober" worker here: https:/ /github. com/juju/ juju/pull/ 12048/files# diff-17cd046249 5cd82a91e96cdd4 070e2e3a39e1e51 db5d0d05e9d2df1 14657da64R103 ... it's not Pebble that's unable to start, but the Juju probe returning a not-good return value from the "suppler" (not sure what that is, haven't followed it through).
Thomas Miller added this code in the above PR, so he may be able to help here.