Canonical Juju

Multiple jujud services stuck with "uniter" manifold worker returned unexpected error: failed to initialize uniter cannot create storage hook source

Bug #1827838 reported by Pedro Guimarães on 2019-05-06

This bug affects 2 people

	Status	Importance	Assigned to	Milestone
Canonical Juju	Fix Released	High	Joseph Phillips	Canonical Juju 2.7-beta1
2.5	Fix Released	High	Joseph Phillips	Canonical Juju 2.5.7
2.6	Fix Released	High	Joseph Phillips	Canonical Juju 2.6.2

Bug Description

Juju 2.5.4
MAAS 2.5.2

OpenStack Bionic/Queens

Multiple jujud services seems to be getting stuck with following error message:
ERROR juju.worker.dependency engine.go:636 "uniter" manifold worker returned unexpected error: failed to initialize uniter for "UNIT_NAME": cannot create storage hook source: getting unit attachments: runtime error: invalid memory address or nil pointer dereference

Jujud repeatedly keeps pushing same information to the logs, such as:
https://pastebin.ubuntu.com/p/5QN6HmtHxz/

Work-around at the moment is to monitor deployment and restart any services that seem to be stuck.
I am doing so with a script that restarts any services that keeps same status for too long.

The issue is that I do not know the consequences so many restarts can cause to the overall deployment or if any services will be disrupted. It worked on my last trial.

Tags:

Pedro Guimarães (pguimaraes) on 2019-05-06

tags:

added: cpe-onsite

Revision history for this message

Pedro Guimarães (pguimaraes) wrote on 2019-05-06:

This is an ongoing customer deployment:

Bundle: https://drive.google.com/open?id=1I6WtKVJtM0YzpgSNvJinBh0r9jf-2q-t
Juju crashdump download link: https://drive.google.com/open?id=1JGuQhQ359PJMcbN8PnJBlk27HbH76_O7

Canonical Juju QA Bot (juju-qa-bot) on 2019-05-06

Changed in juju:
status:	New → Triaged
importance:	Undecided → Critical

Canonical Juju QA Bot (juju-qa-bot) on 2019-05-06

Changed in juju:
assignee:	nobody → Canonical Juju QA Bot (juju-qa-bot)

Joseph Phillips (manadart) on 2019-05-06

Changed in juju:
assignee:	Canonical Juju QA Bot (juju-qa-bot) → Joseph Phillips (manadart)

Revision history for this message

Joseph Phillips (manadart) wrote on 2019-05-06:

Proposed a patch against the 2.5 branch that fixes a missing error check on this code path:
https://github.com/juju/juju/pull/10142

I got pulled off on to other things today, but I will dig into the crash dump first thing and see if there is more that can be gleaned there.

Revision history for this message

Pedro Guimarães (pguimaraes) wrote on 2019-05-06:

Hi, just tested your repo on my environment and it failed during juju deploy:
https://pastebin.ubuntu.com/p/FdgVjbr6n2/

Also, juju enable-ha returns empty: https://pastebin.ubuntu.com/p/YyvwsXsCKC/

Revision history for this message

Joseph Phillips (manadart) wrote on 2019-05-07:

Thanks for giving it a test.

This does indeed look like the error that was being obfuscated, so now we can set about addressing it.

Revision history for this message

Pedro Guimarães (pguimaraes) wrote on 2019-05-07:

Hi, an update, I was running controller model with proxy, that explains issues I've faced on #4. Once proxy configs were set correctly, controller deployment works.

I am currently running on your snap version, here is the full crashdump of latest deployment: https://drive.google.com/open?id=1S0fhIt-BvqX1I_Hau0t2XcRhJr8MQpDL

I am seeing some units failing, like:
https://pastebin.canonical.com/p/zcTwFJrJcS/
https://pastebin.canonical.com/p/Zyqqyc5XFR/

I can see that both units cited above were on the same node. Both had this on its log:
2019-05-07 12:53:43 DEBUG juju.api apiclient.go:888 successfully dialed "wss://REDACTED:17070/model/d389f110-5ed1-46d3-8526-796a9ddf67ca/api"
2019-05-07 12:53:43 INFO juju.api apiclient.go:608 connection established to "wss://REDACTED:17070/model/d389f110-5ed1-46d3-8526-796a9ddf67ca/api"
2019-05-07 12:54:08 DEBUG juju.worker.apicaller connect.go:155 [d389f1] failed to connect
2019-05-07 12:54:08 ERROR juju.worker.apicaller connect.go:204 Failed to connect to controller: Closed explicitly (unauthorized access)

jujud services were marked as "exited".
This may be related to proxy configurations as well, but restarting jujud service on both units and they moved to "active".

Joseph Phillips (manadart) on 2019-05-07

Changed in juju:
milestone:	none → 2.5.6

Revision history for this message

Joseph Phillips (manadart) wrote on 2019-05-08:

Based on the log output in https://pastebin.canonical.com/p/qfxgQpZSqc/, I am bit concerned that we are not getting consistent results.

My 10142 patch will be in the latest. 2.5 edge channel (2.5.5+2.5-a42c953). Can you run with this version? Then we will be sure that the client/controller/agent versions are all congruent.

If we see these errors again, can you get me a crash-dump of the controller model in addition to the one you are deploying into?

Changed in juju:
status:	Triaged → In Progress

Joseph Phillips (manadart) on 2019-05-09

Changed in juju:
status:	In Progress → Incomplete
importance:	Critical → High

Revision history for this message

Joseph Phillips (manadart) wrote on 2019-05-09:

Are we able to resolve this bug based on the resolution of prior proxy issues?

It will be easier to track further issues in new tickets.

Revision history for this message

Pedro Guimarães (pguimaraes) wrote on 2019-05-09:

Hi manadart, yes thanks for all the help.
I do not see the issue or any other issues on the last few deployments.
We can mark as Fix Released.

Joseph Phillips (manadart) on 2019-05-10

Changed in juju:
status:	Incomplete → Fix Committed

Revision history for this message

Andrey Grebennikov (agrebennikov) wrote on 2019-05-10:

#10

I've just came across the same exact problem yesterday.
Unfortunately I don't have direct access to the environment, but here is my summary:

1. No proxy is set up, direct internet access.
2. The issue happens on different physical nodes.
3. The issue happens with the agents of different charms.

Hope this helps.

Joseph Phillips (manadart) on 2019-05-14

Changed in juju:
status:	Fix Committed → Fix Released

Anastasia (anastasia-macmood) on 2019-05-14

Changed in juju:
status:	Fix Released → Fix Committed
milestone:	2.5.6 → 2.7-beta1
no longer affects:	juju/2.7

Anastasia (anastasia-macmood) on 2019-12-09

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.