unable to restart some services on ubuntu 16.04 containers

Bug #1691536 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.0
Fix Committed
High
Ignatious Johnson Christopher
Trunk
Fix Committed
High
Ignatious Johnson Christopher

Bug Description

R4.0 Build 6 Ubuntu 16.04.2 container setup

Using systemd to restart contrail-vrouter-agent does not work..the command hangs
Example node : 10.204.217.197 or 198

root@testbed-1-vm2(agent):/# service contrail-vrouter-agent restart

^C
root@testbed-1-vm2(agent):/# service contrail-vrouter-agent restart
^C
root@testbed-1-vm2(agent):/#

root@testbed-1-vm2(agent):/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 33.9 0.0 37104 3628 ? Ss 06:49 248:17 /lib/systemd/systemd systemd.unit=multi-user.target
root 21 0.0 0.4 92444 35296 ? Ss 06:49 0:21 /lib/systemd/systemd-journald
root 688 6.6 2.6 1018444 211648 ? Ssl 06:49 48:58 /usr/bin/contrail-vrouter-agent
contrail 718 0.0 0.7 358596 60552 ? Ssl 06:49 0:00 /usr/bin/python /usr/bin/contrail-nodemgr --nodetype=contrail-vrouter
root 1051 0.0 0.0 18240 2016 ? Ss 18:58 0:00 bash
root 1072 0.0 0.0 18240 2012 ? Ss+ 19:00 0:00 bash
root 1102 0.0 0.0 34416 1468 ? R+ 19:01 0:00 ps aux
root@testbed-1-vm2(agent):/#

root@testbed-1-vm2(agent):/# service contrail-vrouter-agent status
● contrail-vrouter-agent.service - Contrail vrouter agent service
   Loaded: loaded (/lib/systemd/system/contrail-vrouter-agent.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2017-05-17 06:49:41 UTC; 12h ago
 Main PID: 688 (contrail-vroute)
   CGroup: /system.slice/docker-a5b44795d9f78964a5ab28998a143d6f7a017804dcec9d1b14260c6d4957d33c.scope/system.slice/contrail-vrouter-agent.service
           └─688 /usr/bin/contrail-vrouter-agent

May 17 06:49:41 testbed-1-vm2 systemd[1]: Started Contrail vrouter agent service.
May 17 06:49:41 testbed-1-vm2 contrail-vrouter-agent[688]: Config file </etc/contrail/contrail-vrouter-agent.conf> parsing completed.
May 17 06:49:41 testbed-1-vm2 contrail-vrouter-agent[688]: vmware_mode is
May 17 06:49:41 testbed-1-vm2 contrail-vrouter-agent[688]: Error in parsing arguement for HYPERVISOR.vmware_mode <
May 17 06:49:42 testbed-1-vm2 contrail-vrouter-agent[688]: log4cplus:ERROR No appenders could be found for logger (SANDESH).
May 17 06:49:42 testbed-1-vm2 contrail-vrouter-agent[688]: log4cplus:ERROR Please initialize the log4cplus system properly.
May 17 06:49:42 testbed-1-vm2 contrail-vrouter-agent[688]: log4cplus:WARN RollingFileAppender: MaxFileSize property value is too small. Resetting to 204800.
root@testbed-1-vm2(agent):/#

Jeba Paulaiyan (jebap)
tags: added: blocker
Revision history for this message
Hari Prasad Killi (haripk) wrote :

Tried this on 4.0.0.0-7 and service stop / start / restart are working fine. Please recheck.

Revision history for this message
Vedamurthy Joshi (vedujoshi) wrote :

On another setup, seeing the same.

Today, on build 8, on a controller docker container, I am seeing similar
issue with contrail-control and rabbitmq-server services.

root@testbed-1-vm1(controller):/# ps aux |grep rabbit
root 932 0.2 0.0 23248 1192 ? S 11:41 0:00
/bin/systemctl start rabbitmq-server <<< stuck here
root 935 0.0 0.0 11276 720 ? S+ 11:42 0:00 grep
--color=auto rabbit
root@testbed-1-vm1(controller):/#

Node 10.204.217.194 is in the same state.

This happened when initially provisioning had failed due to bug
1692119
(cassandra service issue) and I followed the steps for 16.04 WA in
that bug

Stopping and starting the container too did not help

Dmesg showed only this
[58388.471550] systemd-journald[21]: File
/var/log/journal/c11c805d86e60e07bfe721debca6d245/system.journal corrupted
or uncleanly shut down, renaming and replacing.
[58388.529248] systemd[1]: Started Journal Service.
root@testbed-1-vm1(controller):/#

summary: - unable to restart contrail-vrouter-agent on ubuntu 16.04 containers
+ unable to restart some services on ubuntu 16.04 containers
Revision history for this message
Ignatious Johnson Christopher (ijohnson-x) wrote :

In the failed containers, the "/etc/selinux/“ was not accessible and so the dbus.service didn’t come up.
Thus causing an issue similar to this https://github.com/systemd/systemd/issues/719

Failed container:
------------------------
root@testbed-1-vm1(controller):/# systemctl status dbus
● dbus.service - D-Bus System Message Bus
   Loaded: loaded (/lib/systemd/system/dbus.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2017-05-25 05:33:26 UTC; 9min ago
     Docs: man:dbus-daemon(1)
  Process: 763 ExecStop=/bin/true (code=exited, status=0/SUCCESS)
  Process: 146 ExecStart=/usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation (code=exited, status=1/FAILURE)
 Main PID: 146 (code=exited, status=1/FAILURE)

May 25 05:33:01 testbed-1-vm1 systemd[1]: Started D-Bus System Message Bus.
May 25 05:33:01 testbed-1-vm1 dbus-daemon[146]: Failed to start message bus: Failed to open "/etc/selinux/targeted/contexts/dbus_contexts": No such file or directory
May 25 05:33:26 testbed-1-vm1 systemd[1]: dbus.service: Main process exited, code=exited, status=1/FAILURE
May 25 05:33:26 testbed-1-vm1 systemd[1]: dbus.service: Unit entered failed state.
May 25 05:33:26 testbed-1-vm1 systemd[1]: dbus.service: Failed with result 'exit-code'.
root@testbed-1-vm1(controller):/# ls /etc/selinux/targeted/contexts/dbus_contexts
ls: cannot access '/etc/selinux/targeted/contexts/dbus_contexts': No such file or directory
root@testbed-1-vm1(controller):/# ls /etc/selinux/t^C
root@testbed-1-vm1(controller):/# exit
exit
[root@testbed-1-vm1 ~]# ls /etc/selinux/targeted/

So I have added the /etc/selinux as volumes to the container. Following is the diff in your external contrail-ansible code.

[root@ansible-runner playbooks]# diff roles/node/defaults/main.yml.old roles/node/defaults/main.yml |
85a86
> - "/etc/selinux:/etc/selinux"
[root@ansible-runner playbooks]#

Provisioned container:
------------------------------
root@testbed-1-vm1(controller):/# service dbus status
● dbus.service - D-Bus System Message Bus
   Loaded: loaded (/lib/systemd/system/dbus.service; static; vendor preset: enabled)
   Active: active (running) since Thu 2017-05-25 08:15:22 UTC; 19min ago
     Docs: man:dbus-daemon(1)
 Main PID: 4097 (dbus-daemon)
   CGroup: /system.slice/docker-777eb98bf92e7209141c42ee6c466e2bb459f605f0466523fcc2ac58e68c2428.scope/system.slice/dbus.service
           └─4097 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

May 25 08:15:22 testbed-1-vm1 systemd[1]: Started D-Bus System Message Bus.
May 25 08:15:23 testbed-1-vm1 dbus[4097]: [system] Successfully activated service 'org.freedesktop.systemd1'
root@testbed-1-vm1(controller):/#
root@testbed-1-vm1(controller):/# ls /etc/selinux/targeted/contexts/dbus_contexts
/etc/selinux/targeted/contexts/dbus_contexts
root@testbed-1-vm1(controller):/#

With the above diff, I re provisioned the cluster after removing the controller container.
Now the container came up fine, tried multiple times, didn’t hit the issue.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/32147
Submitter: Ignatious Johnson Christopher (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/32148
Submitter: Ignatious Johnson Christopher (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/32148
Committed: http://github.com/Juniper/contrail-ansible/commit/0e15c64f3c6283ea1ad21fd6cd84aaa3f1fe937d
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit 0e15c64f3c6283ea1ad21fd6cd84aaa3f1fe937d
Author: Ignatious Johnson Christopher <email address hidden>
Date: Thu May 25 02:54:44 2017 -0700

"/etc/selinux/“ was not accessible and so the

dbus service didn’t come up. Thus causing an issue similar to this
https://github.com/systemd/systemd/issues/719

adding /etc/selinux as volumes to the contianers.

Change-Id: I13f2fab2cc0d0cabe9ab5d1f7ce03c281be0e669
Closes-Bug: 1691536

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/32147
Committed: http://github.com/Juniper/contrail-ansible/commit/5bfe48e2399ff531677c99fb44f473215a04d94a
Submitter: Zuul (<email address hidden>)
Branch: master

commit 5bfe48e2399ff531677c99fb44f473215a04d94a
Author: Ignatious Johnson Christopher <email address hidden>
Date: Thu May 25 02:54:44 2017 -0700

"/etc/selinux/“ was not accessible and so the

dbus service didn’t come up. Thus causing an issue similar to this
https://github.com/systemd/systemd/issues/719

adding /etc/selinux as volumes to the contianers.

Change-Id: I7ff6666e5cd9595deb950e6c6963489ce151e601
Closes-Bug: 1691536

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.