Status Code 418 (I'm a teapot) thrown by the Pebble readiness check

Bug #2059105 reported by Bartlomiej Gmerek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
Undecided
Unassigned

Bug Description

Hello Team,

While working on integration tests for my projects (Charmed 5G), I've noticed around 20%-25% of the runs fails because Pebble in not able to start.
Charm goes to error state (i.e. hook failed: "start") and when I look into the pod's logs it turns out that the pebble is a teapot:
[pebble] Check "readiness" failure 190 (threshold 3): received non-20x status code 418

My env is Juju 3.4 + Microk8s 1.29-strict running on Canonical's self-hosted GH runner.

From my observations, the problem visibility increases when the infrastructure performance starts to be a problem. Charmed 5G includes around 20 charms. When using Canonical's self-hosted runners, if I try to deploy it on the `large` runner, there's almost 100% chance for failure. If I use `xlarge`, the failure rate would go down to maybe 10-15%.

It would be great if the status code 418 could be replaced with something meaningful.

Latest failed run is available at https://github.com/canonical/sdcore-tests/actions/runs/8434000175.
At the bottom of the page, there's a Juju crashdump and K8s logs available for your reference.

BR,
Bartek

Revision history for this message
Ben Hoyt (benhoyt) wrote :

It looks like this is coming from the Juju "caasprober" worker here: https://github.com/juju/juju/pull/12048/files#diff-17cd0462495cd82a91e96cdd4070e2e3a39e1e51db5d0d05e9d2df114657da64R103 ... it's not Pebble that's unable to start, but the Juju probe returning a not-good return value from the "suppler" (not sure what that is, haven't followed it through).

Thomas Miller added this code in the above PR, so he may be able to help here.

Revision history for this message
Harry Pidcock (hpidcock) wrote :

Are we able to see why the start hook failed? The unit is not considered ready until the start hook has completed successfully.

Changed in juju:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Canonical Juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.