Install Failed: Couldn't acquire DPKG lock.

Bug #1773084 reported by Brandan Schmitz
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack RabbitMQ Server Charm
Expired
Medium
Unassigned

Bug Description

This is being deployed on Ubuntu 18.04 LTS and it is getting the following error according to the juju debug-log command:

unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install Job for rabbitmq-server.service failed because the control process exited with error code.
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install See "systemctl status rabbitmq-server.service" and "journalctl -xe" for details.
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install invoke-rc.d: initscript rabbitmq-server, action "start" failed.
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install ● rabbitmq-server.service - RabbitMQ Messaging Server
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: enabled)
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install Active: failed (Result: exit-code) since Thu 2018-05-24 05:52:00 UTC; 11ms ago
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install Process: 9623 ExecStartPost=/usr/lib/rabbitmq/bin/rabbitmq-server-wait (code=exited, status=70)
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install Process: 9622 ExecStart=/usr/sbin/rabbitmq-server (code=exited, status=1/FAILURE)
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install Main PID: 9622 (code=exited, status=1/FAILURE)
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install May 24 05:51:27 juju-8f71e3-2-lxd-3 systemd[1]: Starting RabbitMQ Messaging Server...
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install May 24 05:51:28 juju-8f71e3-2-lxd-3 rabbitmq[9623]: Waiting for 'rabbit@juju-8f71e3-2-lxd-3'
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install May 24 05:51:28 juju-8f71e3-2-lxd-3 rabbitmq[9623]: pid is 9631
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install May 24 05:51:57 juju-8f71e3-2-lxd-3 systemd[1]: rabbitmq-server.service: Main process exited, code=exited, status=1/FAILURE
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install May 24 05:52:00 juju-8f71e3-2-lxd-3 rabbitmq[9623]: Error: process_not_running
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install May 24 05:52:00 juju-8f71e3-2-lxd-3 systemd[1]: rabbitmq-server.service: Control process exited, code=exited status=70
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install May 24 05:52:00 juju-8f71e3-2-lxd-3 systemd[1]: rabbitmq-server.service: Failed with result 'exit-code'.
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install May 24 05:52:00 juju-8f71e3-2-lxd-3 systemd[1]: Failed to start RabbitMQ Messaging Server.
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install dpkg: error processing package rabbitmq-server (--configure):
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install installed rabbitmq-server package post-installation script subprocess returned error exit status 1
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install Errors were encountered while processing:
unit-rabbitmq-server-1: 05:52:00 DEBUG unit.rabbitmq-server/1.install rabbitmq-server
unit-rabbitmq-server-1: 05:52:02 DEBUG unit.rabbitmq-server/1.install E: Sub-process /usr/bin/dpkg returned an error code (1)
unit-rabbitmq-server-1: 05:52:02 INFO unit.rabbitmq-server/1.juju-log Couldn't acquire DPKG lock. Will retry in 10 seconds

It just continues in this loop. It will cause juju to say hook failed: "install" and then it goes back to maintenance mode and then back to error again.

Juju version: 2.3.7-bionic-amd64
Charm version: #73

All settings for the charm are default to when it is added.

Revision history for this message
Brandan Schmitz (celestialdeath99) wrote :
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Thank you for reporting this issue and for linking and referencing related bugs.

How often would you say this issue presents itself in your environment?

Changed in charm-rabbitmq-server:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Brandan Schmitz (celestialdeath99) wrote :

Every time that I have tried so far. Right now I have all the other OpenStack services up and running, except this one.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Would it be possible for you to collect the following information as soon as possible after you see the hook error on the rabbitmq-server unit:
    sudo fuser /var/lib/dpkg/lock
    ps faxuw

Revision history for this message
Brandan Schmitz (celestialdeath99) wrote :

ubuntu@juju-8f71e3-2-lxd-7:~$ sudo fuser /var/lib/dpkg/lock
ubuntu@juju-8f71e3-2-lxd-7:~$ ps faxuw
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 37592 5856 ? Ss 08:08 0:02 /sbin/init
root 54 0.0 0.0 35272 7340 ? Ss 08:08 0:00 /lib/systemd/systemd-journald
root 62 0.0 0.0 41720 3228 ? Ss 08:08 0:00 /lib/systemd/systemd-udevd
root 375 0.0 0.0 65508 6196 ? Ss 08:08 0:00 /usr/sbin/sshd -D
root 9842 0.0 0.0 92796 6836 ? Ss 08:11 0:00 \_ sshd: ubuntu [priv]
ubuntu 9876 0.0 0.0 92796 3400 ? S 08:11 0:00 \_ sshd: ubuntu@pts/1
ubuntu 9877 0.0 0.0 21328 5160 pts/1 Ss 08:11 0:00 \_ -bash
ubuntu 10888 0.0 0.0 36084 3408 pts/1 R+ 08:14 0:00 \_ ps faxuw
root 376 0.0 0.0 272868 5976 ? Ssl 08:08 0:00 /usr/lib/accountsservice/accounts-daemon
root 377 0.0 0.0 289108 25008 ? Ssl 08:08 0:00 /usr/lib/snapd/snapd
root 378 0.0 0.0 26068 2624 ? Ss 08:08 0:00 /usr/sbin/cron -f
syslog 384 0.0 0.0 186896 3300 ? Ssl 08:08 0:00 /usr/sbin/rsyslogd -n
message+ 386 0.0 0.0 42944 4112 ? Ss 08:08 0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root 396 0.0 0.0 28544 3104 ? Ss 08:08 0:00 /lib/systemd/systemd-logind
daemon 397 0.0 0.0 26044 2284 ? Ss 08:08 0:00 /usr/sbin/atd -f
root 399 0.0 0.0 5220 148 ? Ss 08:08 0:00 /sbin/iscsid
root 400 0.0 0.0 5720 3536 ? SLs 08:08 0:00 /sbin/iscsid
root 403 0.0 0.0 277176 6080 ? Ssl 08:08 0:00 /usr/lib/policykit-1/polkitd --no-debug
root 429 0.0 0.0 12840 1896 ? Ss+ 08:08 0:00 /sbin/agetty --noclear --keep-baud console 115200 38400 9600 linux
root 619 0.0 0.0 18032 2928 ? Ss 08:08 0:00 bash /var/lib/juju/init/jujud-machine-2-lxd-7/exec-start.sh
root 623 0.0 0.0 848564 53116 ? Sl 08:08 0:00 \_ /var/lib/juju/tools/machine-2-lxd-7/jujud machine --data-dir /var/lib/juju --machine-id 2/lxd/7 --debug
root 727 0.0 0.0 722524 21960 ? Ssl 08:08 0:09 /usr/bin/lxd --group lxd --logfile=/var/log/lxd/lxd.log
root 741 0.0 0.0 18028 2916 ? Ss 08:08 0:00 bash /var/lib/juju/init/jujud-unit-rabbitmq-server-7/exec-start.sh
root 745 0.0 0.0 711412 55608 ? Sl 08:08 0:01 \_ /var/lib/juju/tools/unit-rabbitmq-server-7/jujud unit --data-dir /var/lib/juju --unit-name rabbitmq-server/7 --debug
ubuntu 9844 0.0 0.0 45148 4892 ? Ss 08:11 0:00 /lib/systemd/systemd --user
ubuntu 9845 0.0 0.0 61044 1684 ? S 08:11 0:00 \_ (sd-pam)
ubuntu@juju-8f71e3-2-lxd-7:~$

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Thank you for providing the requested information. Unfortunately the process holding the DPKG lock is already gone in the above output.

FWIW; you will probably be able to work around this issue by issuing the following command from where you issue your juju commands:
    juju resolved <unit-name/number-of-rabbitmq-server-unit>

In the mean time we will investigate how the install hook of this charm can be made more resilient to this situation.

Revision history for this message
Brandan Schmitz (celestialdeath99) wrote :

Unfortunately running juju resolved does work, it tries to reinstall the rabbitmq-server package and after 3-4 attempts it once again fails. I am not sure what to do because this is holding back the entire deployment, as it’s the last portion to be installed. I leave tomorrow and will be away fronm the servers for 12 weeks and it will be hard to work on it.

Revision history for this message
Brandan Schmitz (celestialdeath99) wrote :

In my previous comment I meant to say it DOES NOT work. I have attempted removing and reconfiguring the charm to use xenial and it appears the have the same issue. I am not sure why, but even when manually deploying and using version 72 it also encounters the issue. Could it be something on the RabbitMQ-server packages end?

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Setting this back to incomplete; is this still a problem, or has it been resolved?

Changed in charm-rabbitmq-server:
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack rabbitmq-server charm because there has been no activity for 60 days.]

Changed in charm-rabbitmq-server:
status: Incomplete → Expired
Revision history for this message
Joshua Genet (genet022) wrote :
Download full text (6.9 KiB)

I'm here to resurrect a year old bug! Or to at least figure out what is causing this.
We (Solutions-qa) ran into this while trying to enable lvl2 CIS Hardened images.

I can reliably (at least 2/2 fresh deploys so far) reproduce this (just takes 3hrs to get everything up).

Let me know if there is anything else I can provide to help.

---

Here's a link to one of the runs on our dashboard:
https://solutions.qa.canonical.com/testruns/testRun/e614dd36-3c26-4abf-b976-9b0c9c6bd8f3

Artifacts/logs/bundles here:
https://oil-jenkins.canonical.com/artifacts/e614dd36-3c26-4abf-b976-9b0c9c6bd8f3/index.html

Juju model crashdump here:
https://oil-jenkins.canonical.com/artifacts/e614dd36-3c26-4abf-b976-9b0c9c6bd8f3/generated/generated/openstack/juju-crashdump-openstack-2021-03-16-19.01.01.tar.gz

---

And here's the output you previously requested:
ubuntu@juju-533a8f-1-lxd-10:~$ sudo fuser /var/lib/dpkg/lock
/var/lib/dpkg/lock: 44677
ubuntu@juju-533a8f-1-lxd-10:~$ ps faxuw
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.7 0.0 77992 9392 ? Ss 18:40 0:15 /sbin/init
root 58 0.1 0.0 103300 24700 ? Ss 18:40 0:02 /lib/systemd/systemd-journald
root 62 0.6 0.0 33356 3508 ? Ss 18:40 0:12 /lib/systemd/systemd-udevd
systemd+ 156 0.0 0.0 70672 4984 ? Ss 18:40 0:00 /lib/systemd/systemd-resolved
systemd+ 185 0.0 0.0 71896 5292 ? Ss 18:40 0:00 /lib/systemd/systemd-networkd
daemon 218 0.0 0.0 28340 2472 ? Ss 18:40 0:00 /usr/sbin/atd -f
root 219 0.0 0.0 31756 3164 ? Ss 18:40 0:00 /usr/sbin/cron -f
root 220 0.0 0.0 70676 6316 ? Ss 18:40 0:00 /lib/systemd/systemd-logind
root 223 0.0 0.0 170872 17432 ? Ssl 18:40 0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root 228 0.0 0.0 288004 6836 ? Ssl 18:40 0:00 /usr/lib/accountsservice/accounts-daemon
message+ 229 0.1 0.0 50212 4680 ? Ss 18:40 0:02 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-onl
root 238 0.0 0.0 72308 6468 ? Ss 18:40 0:00 /usr/sbin/sshd -D
root 34692 0.0 0.0 118456 7400 ? Ss 19:23 0:00 \_ sshd: ubuntu [priv]
ubuntu 34771 0.0 0.0 118456 4016 ? S 19:23 0:00 | \_ sshd: ubuntu@pts/0
root 34772 0.0 0.0 78672 4540 pts/0 Ss+ 19:23 0:00 | \_ sudo /bin/bash -c F=$(mktemp); echo IyEvYmluL2Jhc2gKKApjbGVhbnVwX29uX2V4aXQoKSAKeyAKCWVjaG8
root 34773 0.0 0.0 13324 3368 pts/0 S+ 19:23 0:00 | \_ /bin/bash -c F=$(mktemp); echo IyEvYmluL2Jhc2gKKApjbGVhbnVwX29uX2V4aXQoKSAKeyAKCWVjaG8g
root 34778 0.0 0.0 13324 2100 pts/0 S+ 19:23 0:00 | \_ /bin/bash -c F=$(mktemp); echo IyEvYmluL2Jhc2gKKApjbGVhbnVwX29uX2V4aXQoKSAKeyAKCWVj
root 34780 0.0 0.0 13324 2100 pts/0 S+ 19:23 0:00 | \_ /bin/bash -c F=$(mktemp); echo IyEvYmluL2Jhc2gKKApjbGVhbnVwX29uX2V4aXQoKSAKeyAK
root 34784 0.0 0.0 19924 3432 pts/0 S+ ...

Read more...

Revision history for this message
Joshua Genet (genet022) wrote :

Subbed field-crit, a customer asked Field for CIS and this is preventing us from testing the CIS SKU.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Looking at the apt logs, we can see that the post install job for starting the rabbit service is causing the issues.

Adding group `rabbitmq' (GID 120) ...
Done.
Adding system user `rabbitmq' (UID 114) ...
Adding new user `rabbitmq' (UID 114) with group `rabbitmq' ...
Not creating home directory `/var/lib/rabbitmq'.
Created symlink /etc/systemd/system/multi-user.target.wants/rabbitmq-server.service → /lib/systemd/system/rabbitmq-server.service.
Job for rabbitmq-server.service failed because the control process exited with error code.
See "systemctl status rabbitmq-server.service" and "journalctl -xe" for details.
invoke-rc.d: initscript rabbitmq-server, action "start" failed.
● rabbitmq-server.service - RabbitMQ Messaging Server
   Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Tue 2021-03-16 18:53:35 UTC; 12ms ago
  Process: 32301 ExecStart=/usr/sbin/rabbitmq-server (code=exited, status=2)
 Main PID: 32301 (code=exited, status=2)

Looking at the syslog, we can see it fails due to a permission denied:

Mar 16 18:53:35 juju-533a8f-3-lxd-10 systemd[1]: rabbitmq-server.service: Failed to reset devices.list: Operation not permitted
Mar 16 18:53:35 juju-533a8f-3-lxd-10 systemd[1]: Starting RabbitMQ Messaging Server...
Mar 16 18:53:35 juju-533a8f-3-lxd-10 rabbitmq[32301]: /usr/sbin/rabbitmq-server: 33: /usr/sbin/rabbitmq-server: cannot create /var/log/rabbitmq/startup_log: Permission denied
Mar 16 18:53:35 juju-533a8f-3-lxd-10 systemd[1]: rabbitmq-server.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 16 18:53:35 juju-533a8f-3-lxd-10 systemd[1]: rabbitmq-server.service: Failed with result 'exit-code'.
Mar 16 18:53:35 juju-533a8f-3-lxd-10 systemd[1]: Failed to start RabbitMQ Messaging Server.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

revisiting this, the install fails b/c the package attempts to start the rabbitmq-server which gets access denied for creating the necessary logs for startup.

In my recreate attempt, there was a reboot pending - which upon rebooting the server everything worked as expected. I'm validating some update cloudinit-userdata yaml that might help

Revision history for this message
Billy Olsen (billy-olsen) wrote :

I've added the following power_state config to the cloudinit-userdata snippet which appears to resolve the issues:

  power_state:
    delay: "+0"
    mode: reboot
    timeout: 5
    condition: True

Revision history for this message
Joshua Genet (genet022) wrote :

That did the trick! All my rabbitmq-server units are active idle now.

I'm having other deployment issues, but those are almost certainly unrelated to this.

Thanks Billy!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.