vault: hook failed: "start" (after unit reboot) due to mysql8 cluster being down - ergonomics need to be better

Bug #1931481 reported by Mario Chirinos
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Bundles
Invalid
Undecided
Unassigned
vault-charm
Triaged
High
Unassigned

Bug Description

After a a server shoot down the valve is no starting again.with the error
vault 1.5.4 error 1 vault charmstore 44 ubuntu hook failed: "start"

tail /var/log/juju/unit-vault-0.log is
2021-06-09 19:04:10 WARNING start File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
2021-06-09 19:04:10 WARNING start resp = self.send(prep, **send_kwargs)
2021-06-09 19:04:10 WARNING start File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
2021-06-09 19:04:10 WARNING start r = adapter.send(request, **kwargs)
2021-06-09 19:04:10 WARNING start File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
2021-06-09 19:04:10 WARNING start raise ConnectionError(e, request=request)
2021-06-09 19:04:10 WARNING start requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8220): Max retries exceeded with url: /v1/auth/approle/login (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe353439160>: Failed to establish a new connection: [Errno 111] Connection refused'))
2021-06-09 19:04:10 ERROR juju.worker.uniter.operation runhook.go:136 hook "start" (via explicit, bespoke hook script) failed: exit status 1
2021-06-09 19:04:10 INFO juju.worker.uniter resolver.go:143 awaiting error resolution for "start" hook
2021-06-09 19:06:11 INFO juju.worker.uniter resolver.go:143 awaiting error resolution for "start" hook

my settings are:
OS: Focal
versions:
   openstack-dashboard 18.6.1
   keystone 18.0.0
   nova-cloud-controller 22.0.1
   nova-compute 22.0.1
   nova-mysql-router 8.0.23

charm openstack base #73

openstack 4.0.0

description: updated
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi Mario

In order to understand why vault isn't started, the logs are needed for the vault unit(s) please.

Thanks

Changed in openstack-bundles:
status: New → Incomplete
Revision history for this message
Mario Chirinos (mario-chirinos) wrote (last edit ):

Alex thank for your replay , I am attaching /var/log/juju/unit-vault-0.log (ZIP file) form the vault0 unit, which other log should I attach?

Regards

Changed in openstack-bundles:
status: Incomplete → New
affects: openstack-bundles → ubuntu
affects: ubuntu → openstack-bundles
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Okay, so the vault unit was functioning okay, but after a re-boot, it failed. It looks like vault itself is failing to start.

2021-06-10 13:12:03 WARNING start requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8220): Max retries exceeded with url: /v1/auth/approle/login (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3a1804a160>: Failed to establish a new connection: [Errno 111] Connection refused'))

Is there anything in the vault logs that indicates why it might not be starting. e.g. "systemctl status vault" on the unit?

Revision history for this message
Mario Chirinos (mario-chirinos) wrote :

Not that I can see, What I attached is all the vault log, which other log should I check ?
I am struggling to solve this problem I hop you can helpme.
I am attaching dmesg log

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

What does "sudo systemctl status vault" on the unit reveal?
Also "sudo journalctl -u vault.service"

These are the logs for the actual vault service on the unit.

Revision history for this message
Mario Chirinos (mario-chirinos) wrote :
Download full text (7.5 KiB)

error connecting to /tmp/tmux-0/default (No such file or directory)

sudo systemctl status vault
This is a Juju debug-hooks tmux session. Remember:
1. You need to execute hooks/actions manually if you want them to run for trapped events.
2. When you are finished with an event, you can run 'exit' to close the current window and allow Juju to continue processing
new events for this unit without exiting a current debug-session.
3. To run an action or hook and end the debugging session avoiding processing any more events manually, use:

./hooks/start
tmux kill-session -t vault/0 # or, equivalently, CTRL+a d

4. CTRL+a is tmux prefix.

More help and info is available in the online documentation:
https://discourse.jujucharms.com/t/debugging-charm-hooks

----------------------------------------
sudo journalctl -u vault.service

geoint@maas:~$ juju debug-hook vault/0
no server running on /tmp/tmux-0/default

- (press RETURN)-- Logs begin at Sun 2021-02-21 06:50:14 UTC, end at Mon 2021-06-14 19:05:20 UTC. --
Feb 21 07:12:13 juju-191a8d-0-lxd-6 systemd[1]: Started HashiCorp Vault.
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: ==> Vault server configuration:
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: Cgo: enabled
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: Go Version: go1.13.15
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: Listener 2: tcp (addr: "127.0.0.1:8220", cluster address: "127.0.0.1:8221", max_request_duration: "1m30s", max_request_size: "33554432", tls: "dis>
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: Log Level: info
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: Mlock: supported: true, enabled: false
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: Recovery Mode: false
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: Storage: mysql (HA disabled)
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: Version: Vault v1.5.4
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: ==> Vault server started! Log data will stream in below:
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: 2021-02-21T07:12:15.502Z [INFO] proxy environment: http_proxy= https_proxy= no_proxy=
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: 2021-02-21T07:12:15.509Z [WARN] storage.mysql: No TLS specified, credentials will be sent in plaintext. To mute this warning add 'plaintext_connection_allowed'>
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: 2021-02-21T07:12:16.322Z [WARN] no `api_addr` value specified in config or in VAULT_API_ADDR; falling back to detection if possible, but this value should be m>
Feb 21 07:12:16 juju-191a8d-0-lxd-6 vault[42952]: 2021-02-21T07:12:16.503Z [INFO] core: security barrier not initialized
Feb 21 07:15:24 juju-191a8d-0-lxd-6 vault[42952]: 2021-02-21T07:15:24.575Z [INFO] core: security barrier not initialized
Feb 21 07:19:41 juju-191a8d-0-lxd-6 vault[42952]: 2021-02-21T07:19:41....

Read more...

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi Mario

That looks really confusing? It looks like the copy and paste has mixed up a debug-hooks session and listing out the logs?

Just use "juju ssh vault/0" to get onto the unit; you don't need to do debug-hooks (yet). Thanks.
It looks like vault has started. Can you access it (when on the unit using ssh)? e.g. "vault status".

Revision history for this message
Mario Chirinos (mario-chirinos) wrote (last edit ):

Sorry my fault, here it is:
vault.service result is attached

ubuntu@juju-191a8d-0-lxd-6:~$ sudo systemctl status vault
● vault.service - HashiCorp Vault
     Loaded: loaded (/etc/systemd/system/vault.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2021-06-14 19:57:01 UTC; 5min ago
   Main PID: 1767 (code=exited, status=1/FAILURE)

Jun 14 19:57:01 juju-191a8d-0-lxd-6 systemd[1]: vault.service: Scheduled restart job, restart counter is at 5.
Jun 14 19:57:01 juju-191a8d-0-lxd-6 systemd[1]: Stopped HashiCorp Vault.
Jun 14 19:57:01 juju-191a8d-0-lxd-6 systemd[1]: vault.service: Start request repeated too quickly.
Jun 14 19:57:01 juju-191a8d-0-lxd-6 systemd[1]: vault.service: Failed with result 'exit-code'.
Jun 14 19:57:01 juju-191a8d-0-lxd-6 systemd[1]: Failed to start HashiCorp Vault.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Okay, so now we know the vault service is failing to start. That's what has to be resolved. Also, if you are rebooting the unit to fix this, please don't.

What does the rest of your model look like? (juju status). Are any other units in error? What does the vault config look like (in /var/snap/vault/common/vault.hcl)

Is the mysql database available?

Revision history for this message
Mario Chirinos (mario-chirinos) wrote (last edit ):

mySQL is not available, I tough it was because of the vault , but maybe is the other way around,

Juju status is attached and here is the vault.hcl content

sudo cat /var/snap/vault/common/vault.hcl

disable_mlock = true
storage "mysql" {
  username = "vault"
  password = "qWhGtkJT3kdJMx5zqMyFq69jG9BJ5y8J"
  database = "vault"
  address = "127.0.0.1:3306"
  max_connection_lifetime = "3600"
}
listener "tcp" {
  address = "[::]:8200"
  tls_disable = 1
}

# Localhost only listener for charm access to vault.
listener "tcp" {
  address = "127.0.0.1:8220"
  tls_disable = 1
}

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Ah, okay, the key issue is this:

mysql-innodb-cluster/0* blocked executing 0/lxd/3 192.168.221.99 MySQL InnoDB Cluster not healthy: None
mysql-innodb-cluster/1 blocked idle 1/lxd/2 192.168.221.5 MySQL InnoDB Cluster not healthy: None
mysql-innodb-cluster/2 blocked idle 2/lxd/2 192.168.221.8 MySQL InnoDB Cluster not healthy: None

The whole cloud is bust because mysql-innodb-cluster is not up. The vault charm *has* a bug in that it is erroring because it can't talk to mysql, but that will resolve when the mysql cluster comes up. That's what you need to concentrate on. Once mysql is back, everything should sort itself out.

Please reach out on IRC (OFTC #openstack-charms) or the discourse for charmhub (https://discourse.charmhub.io/) if you need help getting the mysql cluster back up. The bug here is that the ergonomics of the vault charm aren't handling the absent mysql cluster properly.

summary: - vault: hook failed: "start"
+ vault: hook failed: "start" due to mysql8 cluster being down -
+ ergonomics need to be better
summary: - vault: hook failed: "start" due to mysql8 cluster being down -
- ergonomics need to be better
+ vault: hook failed: "start" (after unit reboot) due to mysql8 cluster
+ being down - ergonomics need to be better
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Based on the resolution found at https://discourse.charmhub.io/t/vault-hook-failed-start/4729/13, and that it was due to mysql not restarting properly after a power outage, I'm closing this as invalid on OpenStack bundles. However, on the vault charm, this is a bug in the start hook when the mysql server is not available at reboot during a power outage.

Changed in openstack-bundles:
status: New → Invalid
Changed in vault-charm:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Added for the vault charm:

Issue: vault charm is installed and has been working in a system. System fails a power outage; mysql is not available when the vault charm comes back from cold-start; install hook fails.

What should happen; charm should enter a blocked state until the mysql server has returned. This may mean that the mysql charm may need to signal to the vault charm that it is 'back' (i.e. generate a hook execution).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.