Canonical Juju

[2.9.23 candidate] connection is shut down

Bug #1957824 reported by Marian Gasparovic on 2022-01-13

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Triaged	High	Unassigned

Bug Description

During testing 2.9.23 candidate we encountered two failures during charm install hook which we did not see before.
Both failed with "ERROR connection is shut down"

Both failed when calling juju

subprocess.CalledProcessError: Command '['config-get', '--all', '--format=json']' returned non-zero exit status 1.

and

subprocess.CalledProcessError: Command '['leader-get', '--format=json', 'rndc_key']' returned non-zero exit status 1.

Links to artifacts

https://oil-jenkins.canonical.com/artifacts/2e001c7d-94fb-44b3-9716-505e98f11178/index.html

and

https://oil-jenkins.canonical.com/artifacts/5fc83154-f905-4065-8c8c-5ae2fccbcb23/index.html

Tags:

Revision history for this message

John A Meinel (jameinel) wrote on 2022-01-13:

This is happening for charms, but https://github.com/juju/python-libjuju/issues/615 is something similar seen for clients (running pylibjuju). It is plausible that it isn't the same thing at all, it is just interesting to see some similar behavior.

Revision history for this message

Simon Richardson (simonrichardson) wrote on 2022-01-14:

Fixes panic: https://github.com/juju/juju/pull/13622 this might not fix the connection shut-down, but it's a panic in the logs nevertheless.

Do we know why we're performing an engine report during the tests that might cause a restart of units?

Revision history for this message

Alexander Balderson (asbalderson) wrote on 2022-01-14:

Simon, we run juju engine report as part of the crashdump collection after the run has failed and it shouldn't be run until after we've detected a failure (unusually when juju-wait detects an error and stops).

Also all the instances of this bug can be found at:
https://solutions.qa.canonical.com/bugs/bugs/bug/1957824

Revision history for this message

Alexander Balderson (asbalderson) wrote on 2022-01-14:

SQA his this another half dozen times over the last day of testing, adding the release-blocker tag

tags:

added: cdo-release-blocker

Revision history for this message

Ian Booth (wallyworld) wrote on 2022-02-07:

Is this still happening in the 2.9.24 candidate?

Changed in juju:
status:	New → Incomplete

Yang Kelvin Liu (kelvin.liu) on 2022-02-08

Changed in juju:
status:	Incomplete → Triaged
importance:	Undecided → High
assignee:	nobody → Yang Kelvin Liu (kelvin.liu)
milestone:	none → 2.9.25

Revision history for this message

Yang Kelvin Liu (kelvin.liu) wrote on 2022-02-08:

I found this happens after mongo SYNC. After discussing with Ian, we think this issue might be the same/similar one we got in the prodstack(got timeout when mongo switches primary).
They mostly happen on the non-public cloud(probably low IOPS disk).
I would suggest we could have another run with the logging level set to WARNING.

Revision history for this message

Alexander Balderson (asbalderson) wrote on 2022-02-10:

I can provide some info about what we're running the juju controllers on where we see this.

The controllers are in KVMs with 24G memory, and 4 cores and 50G disk. The machines hosting the KVMs (3 per machine) are also running 3 vault KVMs (4G memory, 2 cores, 40G disk)

the hosts have 2 240g ssd's, 64G memory, and 4 core processor (3.4GHz). We over commit memory by 2x and cpu by 5x. I dont feel like we're pushing the machine to its limits with the over commit.

Canonical Juju QA Bot (juju-qa-bot) on 2022-02-15

Changed in juju:
milestone:	2.9.25 → 2.9.26

Canonical Juju QA Bot (juju-qa-bot) on 2022-03-09

Changed in juju:
milestone:	2.9.26 → 2.9.27

Canonical Juju QA Bot (juju-qa-bot) on 2022-03-18

Changed in juju:
milestone:	2.9.27 → 2.9.28

Canonical Juju QA Bot (juju-qa-bot) on 2022-03-30

Changed in juju:
milestone:	2.9.28 → 2.9.29

Canonical Juju QA Bot (juju-qa-bot) on 2022-04-27

Changed in juju:
milestone:	2.9.29 → 2.9.30

Ian Booth (wallyworld) on 2022-05-12

Changed in juju:
milestone:	2.9.30 → 2.9-next
assignee:	Yang Kelvin Liu (kelvin.liu) → nobody

Ian Booth (wallyworld) on 2023-04-25

Changed in juju:
milestone:	2.9-next → none

Revision history for this message

Marian Gasparovic (marosg) wrote on 2024-01-22:

We hit this again with 3.3 stable

  File "/var/lib/juju/agents/unit-kubeapi-load-balancer-0/charm/reactive/nginx.py", line 11, in <module>
    config = hookenv.config()
  File "/var/lib/juju/agents/unit-kubeapi-load-balancer-0/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 444, in config
    subprocess.check_output(config_cmd_line).decode('UTF-8'))
  File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['config-get', '--all', '--format=json']' returned non-zero exit status 1.

Artifacts - https://oil-jenkins.canonical.com/artifacts/76af2731-6203-427a-9cd0-3db3071af831/index.html

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

auto-github-juju-python-libjuju #615
[closed] Edit

Bug watches keep track of this bug in other bug trackers.