charm stays in allocating state indefinitely

Bug #1835770 reported by Marian Gasparovic on 2019-07-08
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
Joseph Phillips
2.6
High
Joseph Phillips

Bug Description

2019-07-08 06:37:15 INFO juju.worker.uniter uniter.go:224 resuming charm install
2019-07-08 06:37:15 INFO juju.worker.uniter.charm bundles.go:77 downloading cs:telegraf-29 from API server
2019-07-08 06:37:15 INFO juju.downloader download.go:111 downloading from cs:telegraf-29
2019-07-08 06:37:16 INFO juju.downloader download.go:94 download complete ("cs:telegraf-29")
2019-07-08 06:37:16 INFO juju.downloader download.go:174 download verified ("cs:telegraf-29")

Nothing after these messages until juju deploy times out eventually

Chris Gregan (cgregan) wrote :

Saw this same issue with hacluster-nova

Similar result in another test run, but with hacluster-designate, and ha-cluster-heat

Chris Gregan (cgregan) wrote :
tags: added: cdo-qa
tags: added: foundations-engine
Richard Harding (rharding) wrote :

This might be related to #1835374. We'll double check but a fix for that is underway and we can help validate if it is the same issue.

Changed in juju:
status: New → Triaged
importance: Undecided → High
Heather Lanigan (hmlanigan) wrote :

@asbalderson, @cgreadan, looks like you're both hitting the same issue. Not clear if it's the same bug as @rharding mentioned yet. It'd be helpful to get juju-crashdumps with the /var/lib/juju/agents data included.

@marosg, do you have a different config and juju-crashdump from @asbalderson or @cgreadan? If so, can we please get a crashdump from you as well?

Pedro Guimarães (pguimaraes) wrote :

Marian, Chris, I am facing similar issue with manual providers. Can you share which environment (xenial/bionic/etc + which juju provider openstack/vmware/etc)?

Also, on my case, setting up model-defaults logging-config to "<root>=debug" reveals:
2019-07-08 11:36:11 DEBUG juju.worker.uniter runlistener.go:120 juju-run listener stopping
2019-07-08 11:36:11 DEBUG juju.worker.uniter runlistener.go:139 juju-run listener stopped

At the end of failing charms' logs.

Marian Gasparovic (marosg) wrote :

@pguimaraes we are getting this on bionic, maas providing baremetal machines

@hmlanigan I just encountered another failed build for the same reason, I will upload crashdump. This one has three units on two machines with this issue.

filebeat/61, nrpe-container/44, telegraf/60

Heather Lanigan (hmlanigan) wrote :

The crash dump from #7 mimics the signature of the crash dumps from 183574.

The crash dumps from #2 and #3 have units in the same workload status of waiting/"agent initializing", but the unit log is different.

Changes being made for 183574 may resolve the issues seen in #2 and #3 as well? Will have to test once fixed.

Jason Hobbs (jason-hobbs) wrote :

We can't include /var/lib/juju/agents in crashdumps. It makes them too large. We tried it, and crashdumps went from ~200mb to ~4GB.

Joseph Phillips (manadart) wrote :

Looks like https://github.com/juju/juju/pull/10431 is a verified fix for the 2.6 branch.

Changed in juju:
milestone: none → 2.7-beta1
status: Triaged → In Progress
assignee: nobody → Joseph Phillips (manadart)
Jason Hobbs (jason-hobbs) wrote :

Sub'd as field high as it's causing many failures in our CI and we don't have a workaround.

Changed in juju:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers