PostgreSQL Charm

Postgresql charm fails to initialize (rev 429)

Bug #2074017 reported by Ethan Myers on 2024-07-24

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Incomplete	Undecided	Unassigned
	PostgreSQL Charm	Invalid	Undecided	Unassigned

Bug Description

Hello,

I am unable to get Postgresql (rev 429) to initialize when deploying the charm with juju. The issue shows up as postgresql units never finishing the "agent initializing" step in juju. In the debug-log, I see a complaint about the charm not being found on disk. Sometimes, I also get a message about a Sha256 length 0 being not valid. Please see the attached debug.log for one example.

This does not happen with older revisions of the charm, and the environment has wide-open network connectivity. Other charms initialize fine.

I am using PostgreSQL charm as part of a Landscape deployment. The bundle is attached. I did not have this issue with a previous charm revision (345).

Happy to provide any more details as needed

See original description

Revision history for this message

Ethan Myers (ethanmyers) wrote on 2024-07-24:

debug.log Edit (2.9 KiB, text/plain)

description:	updated
description:	updated
description:	updated

Revision history for this message

Ethan Myers (ethanmyers) wrote on 2024-07-24:

bundle2.yaml Edit (1.2 KiB, text/plain)

Revision history for this message

Ethan Myers (ethanmyers) wrote on 2024-07-24:

bundle2.yaml Edit (1.4 KiB, text/plain)

whoops, wrong bundle. This is the correct version.

Revision history for this message

Billy Olsen (billy-olsen) wrote on 2024-07-24:

From the logs provided, its not clear that this is an actual issue with the postgresql charm itself. Subscribing Juju as it appears that there are issues actually pulling the charm. I do wonder about environmental settings (proxies, etc) that may be getting in the way here.

Revision history for this message

Ethan Myers (ethanmyers) wrote on 2024-07-24 (last edit on 2024-07-24):

bundle2.yaml Edit (1.4 KiB, text/plain)

This is the older bundle that worked in the same environment. For the env settings theory, the issue I have is none of the other charms have this problem -- rabbit, landscape, and haproxy all download and deploy OK. They're all going into the same machines.

@jeff hillman ran into this same issue on a totally separate env as well. I suspect it some issue with the artifact/packaging, not necessarily with postgres the application.

Revision history for this message

Jeff Hillman (jhillman) wrote on 2024-07-24:

This also happened with charm rev 444 from latest/edge. It is inconsistent behavior, sometimes it gives sha256archive error, and sometimes it just says charm not on disk.

I also tried downloading the charm (429 and 444) and the local charm still didn't load.

In all scenarios, the unit sits at allocating and initializing.

Revision history for this message

Peter Jose De Sousa (pjds) wrote on 2024-07-25:

I think this should be field critical - its severely degraded & blocking a deployment.

Revision history for this message

Nobuto Murata (nobuto) wrote on 2024-07-25:

ref: https://bugs.launchpad.net/juju/+bug/2041263

Revision history for this message

Joseph Phillips (manadart) wrote on 2024-07-25:

Please confirm:
1) Whether bouncing the agent on the machine causes correct resumption.
2) The Juju version being used here.

Changed in juju:
status:	New → Incomplete

Revision history for this message

Nobuto Murata (nobuto) wrote on 2024-07-25:

#10

As per the data platform's request, I opened the issue in Github to initiate the investigation from the data platform side too.

https://github.com/canonical/postgresql-operator/issues/548

Revision history for this message

Alex Lutay (taurus) wrote on 2024-07-25 (last edit on 2024-07-25):

#11

Thank you Nobuto!

We have tried to reproduce the issue with PostgreSQL on MAAS setup using this manual:
https://charmhub.io/postgresql/docs/h-deploy-maas

We have successfully deployed rev429 to MAAS: https://pastebin.canonical.com/p/PjJM7fJYCZ/

From our experience, the Juju error message:
> ... failed to download charm "ch:amd64/jammy/postgresql-429" from API server: download request with archiveSha256 length 0 not valid
is just a noise and should be ignored.

JFYI, the difference between the legacy and modern postgresql charms is well described here:
https://charmhub.io/postgresql/docs/e-legacy-charm

TL;DR: Ractive VS Ops frameworks + Juju Secrets + Juju Storage in use for modern charm.
I suspect juju storage as the source of difference in experience between latest/stable (rev345) and 14/stable (rev429).

I recall the Juju limitations for nested virtualization and juju storage: https://bugs.launchpad.net/juju/+bug/2060098
and requirements for storage class definition on LXD levels: https://github.com/canonical/postgresql-operator/issues/354

In fact from the debug-log, juju didn't start the charm.py to pass execution to our code.
Juju stuck in a LXD/VM machine start.

Let's start with comparing the versions (see the bottom of) https://pastebin.canonical.com/p/PjJM7fJYCZ/
and trying to deploy one unit of Postgresql 14/stable (re 429) to your MAAS.
What do you think?

Revision history for this message

Alex Lutay (taurus) wrote on 2024-07-25:

#12

From the discussion in Matrix:

https://matrix.to/#/!BukWfnyOTgQSKAxdtT:ubuntu.com/$_Nj8qJjfup15dO4Fo2cXMJLZoymzEnZCScePypT_r1g?via=ubuntu.com&via=matrix.org&via=laquadrature.net

$ juju deploy postgresql --channel 14/stable --to 12 --debug --storage pgdata=rootfs,15G
$
$ juju storage
Unit Storage ID Type Pool Size Status Message
postgresql/2 pgdata/2 filesystem rootfs 16 GiB attaching "/var/lib/juju/storage/rootfs/12/2" ("") and "/var/snap/charmed-postgresql/common" ("/dev/mapper/vg_root-lv_var") are on different filesystems

The juju storage Message:

> "/var/lib/juju/storage/rootfs/12/2" ("") and "/var/snap/charmed-postgresql/common" ("/dev/mapper/vg_root-lv_var") are on different filesystems

looks suspicious for me. We should invite Juju team here.

Revision history for this message

Alex Lutay (taurus) wrote on 2024-07-25:

#13

Invited Juju team to assist here: https://matrix.to/#/!xzmWHtGpPfVCXKivIh:ubuntu.com/$eys8dtWUQBdWPx-2WU-9JIrH1s3JbQ3KK7cr-3e7Xpc?via=ubuntu.com&via=matrix.org&via=matrix.debian.social

Revision history for this message

Jeff Hillman (jhillman) wrote on 2024-07-25:

#14

Juju 3.5.2

Revision history for this message

Jeff Hillman (jhillman) wrote on 2024-07-25:

#15

It should be cleared up that while VMware is the substrate, juju is using manual provider. Same with Ethan's scenario, except he is in Azure but with manual provider also. I'm not sure how that affects storage pools, but please be advised on how these machines are provisioned. Als of note, they are NOT (currently), being reprovisioned. we're manually cleaning them up.

Revision history for this message

Alex Lutay (taurus) wrote on 2024-07-25:

#16

As traced by Anna Savchenko and Jeff Hillman in Matrix:

https://matrix.to/#/!BukWfnyOTgQSKAxdtT:ubuntu.com/$PZW8sileim3sNFg7q1-lycwKWgWK3pYsOIkCArGWqpQ?via=ubuntu.com&via=matrix.org&via=laquadrature.net

The initially deployed model was destroyed with --force which left snap mounts in place and all the following snap installations failed.

The similar story has happened with MAAS Anvil project: https://github.com/canonical/maas-anvil/issues/9#issue-2322906346

TL;DR: clean SNAP mount points IF you --force model/app removal.

Ethan Myers (ethanmyers) on 2024-07-25

summary:

- Postgresql charm fails to initialize (rev 429)_
+ Postgresql charm fails to initialize (rev 429)

Billy Olsen (billy-olsen) on 2024-07-25