Postgresql charm fails to initialize (rev 429)

Bug #2074017 reported by Ethan Myers
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Incomplete
Undecided
Unassigned
PostgreSQL Charm
Invalid
Undecided
Unassigned

Bug Description

Hello,

I am unable to get Postgresql (rev 429) to initialize when deploying the charm with juju. The issue shows up as postgresql units never finishing the "agent initializing" step in juju. In the debug-log, I see a complaint about the charm not being found on disk. Sometimes, I also get a message about a Sha256 length 0 being not valid. Please see the attached debug.log for one example.

This does not happen with older revisions of the charm, and the environment has wide-open network connectivity. Other charms initialize fine.

I am using PostgreSQL charm as part of a Landscape deployment. The bundle is attached. I did not have this issue with a previous charm revision (345).

Happy to provide any more details as needed

Revision history for this message
Ethan Myers (ethanmyers) wrote :
description: updated
description: updated
description: updated
Revision history for this message
Ethan Myers (ethanmyers) wrote :
Revision history for this message
Ethan Myers (ethanmyers) wrote :

whoops, wrong bundle. This is the correct version.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

From the logs provided, its not clear that this is an actual issue with the postgresql charm itself. Subscribing Juju as it appears that there are issues actually pulling the charm. I do wonder about environmental settings (proxies, etc) that may be getting in the way here.

Revision history for this message
Ethan Myers (ethanmyers) wrote (last edit ):

This is the older bundle that worked in the same environment. For the env settings theory, the issue I have is none of the other charms have this problem -- rabbit, landscape, and haproxy all download and deploy OK. They're all going into the same machines.

@jeff hillman ran into this same issue on a totally separate env as well. I suspect it some issue with the artifact/packaging, not necessarily with postgres the application.

Revision history for this message
Jeff Hillman (jhillman) wrote :

This also happened with charm rev 444 from latest/edge. It is inconsistent behavior, sometimes it gives sha256archive error, and sometimes it just says charm not on disk.

I also tried downloading the charm (429 and 444) and the local charm still didn't load.

In all scenarios, the unit sits at allocating and initializing.

Revision history for this message
Peter Jose De Sousa (pjds) wrote :

I think this should be field critical - its severely degraded & blocking a deployment.

Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
Joseph Phillips (manadart) wrote :

Please confirm:
1) Whether bouncing the agent on the machine causes correct resumption.
2) The Juju version being used here.

Changed in juju:
status: New → Incomplete
Revision history for this message
Nobuto Murata (nobuto) wrote :

As per the data platform's request, I opened the issue in Github to initiate the investigation from the data platform side too.

https://github.com/canonical/postgresql-operator/issues/548

Revision history for this message
Alex Lutay (taurus) wrote (last edit ):

Thank you Nobuto!

We have tried to reproduce the issue with PostgreSQL on MAAS setup using this manual:
https://charmhub.io/postgresql/docs/h-deploy-maas

We have successfully deployed rev429 to MAAS: https://pastebin.canonical.com/p/PjJM7fJYCZ/

From our experience, the Juju error message:
> ... failed to download charm "ch:amd64/jammy/postgresql-429" from API server: download request with archiveSha256 length 0 not valid
is just a noise and should be ignored.

JFYI, the difference between the legacy and modern postgresql charms is well described here:
https://charmhub.io/postgresql/docs/e-legacy-charm

TL;DR: Ractive VS Ops frameworks + Juju Secrets + Juju Storage in use for modern charm.
I suspect juju storage as the source of difference in experience between latest/stable (rev345) and 14/stable (rev429).

I recall the Juju limitations for nested virtualization and juju storage: https://bugs.launchpad.net/juju/+bug/2060098
and requirements for storage class definition on LXD levels: https://github.com/canonical/postgresql-operator/issues/354

In fact from the debug-log, juju didn't start the charm.py to pass execution to our code.
Juju stuck in a LXD/VM machine start.

Let's start with comparing the versions (see the bottom of) https://pastebin.canonical.com/p/PjJM7fJYCZ/
and trying to deploy one unit of Postgresql 14/stable (re 429) to your MAAS.
What do you think?

Revision history for this message
Alex Lutay (taurus) wrote :

From the discussion in Matrix:

https://matrix.to/#/!BukWfnyOTgQSKAxdtT:ubuntu.com/$_Nj8qJjfup15dO4Fo2cXMJLZoymzEnZCScePypT_r1g?via=ubuntu.com&via=matrix.org&via=laquadrature.net

$ juju deploy postgresql --channel 14/stable --to 12 --debug --storage pgdata=rootfs,15G
$
$ juju storage
Unit Storage ID Type Pool Size Status Message
postgresql/2 pgdata/2 filesystem rootfs 16 GiB attaching "/var/lib/juju/storage/rootfs/12/2" ("") and "/var/snap/charmed-postgresql/common" ("/dev/mapper/vg_root-lv_var") are on different filesystems

The juju storage Message:

> "/var/lib/juju/storage/rootfs/12/2" ("") and "/var/snap/charmed-postgresql/common" ("/dev/mapper/vg_root-lv_var") are on different filesystems

looks suspicious for me. We should invite Juju team here.

Revision history for this message
Alex Lutay (taurus) wrote :
Revision history for this message
Jeff Hillman (jhillman) wrote :

Juju 3.5.2

Revision history for this message
Jeff Hillman (jhillman) wrote :

It should be cleared up that while VMware is the substrate, juju is using manual provider. Same with Ethan's scenario, except he is in Azure but with manual provider also. I'm not sure how that affects storage pools, but please be advised on how these machines are provisioned. Als of note, they are NOT (currently), being reprovisioned. we're manually cleaning them up.

Revision history for this message
Alex Lutay (taurus) wrote :

As traced by Anna Savchenko and Jeff Hillman in Matrix:

https://matrix.to/#/!BukWfnyOTgQSKAxdtT:ubuntu.com/$PZW8sileim3sNFg7q1-lycwKWgWK3pYsOIkCArGWqpQ?via=ubuntu.com&via=matrix.org&via=laquadrature.net

The initially deployed model was destroyed with --force which left snap mounts in place and all the following snap installations failed.

The similar story has happened with MAAS Anvil project: https://github.com/canonical/maas-anvil/issues/9#issue-2322906346

TL;DR: clean SNAP mount points IF you --force model/app removal.

Ethan Myers (ethanmyers)
summary: - Postgresql charm fails to initialize (rev 429)_
+ Postgresql charm fails to initialize (rev 429)
Changed in postgresql-charm:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.