[2.21 86] Config processes dint come up after rebooting the node

Bug #1488783 reported by Pavana
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
New
Undecided
Unassigned
R2.20
New
High
Unassigned

Bug Description

Build : 2.21-86
cores : {}
CoreLocation : /cs-shared/test_runs/nodeb8/2015_08_26_03_29_50
LogsLocation : http://10.204.216.50/Docs/logs/2.21-86_2015_08_26_03_29_50/logs/
Report : http://10.204.216.50/Docs/logs/2.21-86_2015_08_26_03_29_50/junit-noframes.html
Topology :
Config Nodes : [u'nodeb8']
Control Nodes : [u'nodeb8']
Compute Nodes : [u'nodeb8']
Openstack Node : nodeb8
WebUI Node : nodeb8
Analytics Nodes : [u'nodeb8']

Required logs copied to : /cs-shared/test_runs/nodeb8/2015_08_26_03_29_50

Seen particularly on CentOS 6.5. Sundar is aware of the issue.
While provisioning when supervisor/API/IFMAP are restarted, the rest of the processes are triggered for restart but they donot come up.

[root@nodeb8 contrail-test]# contrail-status
== Contrail vRouter ==
supervisor-vrouter: active
contrail-vrouter-agent initializing (Collector, Discovery:Collector, Discovery:dns-server, Discovery:xmpp-server connection down)
contrail-vrouter-nodemgr active

== Contrail Control ==
supervisor-control: active
contrail-control initializing (Number of connections:4, Expected:5)
contrail-control-nodemgr active
contrail-dns active
contrail-named active

== Contrail Analytics ==
supervisor-analytics: active
contrail-analytics-api initializing (Discovery:OpServer, Discovery:Collector connection down)
contrail-analytics-nodemgr active
contrail-collector initializing (Discovery:Collector connection down)
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

== Contrail Config ==
supervisor-config: inactive
unix:///tmp/supervisord_config.sockno

== Contrail Web UI ==
supervisor-webui: active
contrail-webui active
contrail-webui-middleware active

== Contrail Database ==
supervisor-database: active
contrail-database active
contrail-database-nodemgr active

== Contrail Support Services ==
supervisor-support-service: active
rabbitmq-server active

[root@nodeb8 contrail]# openstack-status
== Nova services ==
openstack-nova-api: active
openstack-nova-cert: dead
openstack-nova-compute: active
openstack-nova-network: inactive (disabled on boot)
openstack-nova-scheduler: active
openstack-nova-volume: dead (disabled on boot)
openstack-nova-conductor: active
== Glance services ==
openstack-glance-api: active
openstack-glance-registry: active
== Keystone service ==
openstack-keystone: active
== Horizon service ==
openstack-dashboard: active
== Cinder services ==
openstack-cinder-api: active
openstack-cinder-scheduler: active
openstack-cinder-volume: inactive (disabled on boot)
== Support services ==
mysqld: active
httpd: active
libvirtd: active
rabbitmq-server: active
memcached: active
== Keystone users ==
Warning keystonerc not sourced

contrail-config-nodemgr-stderr.log :

08/25/2015 03:09:44 PM [nodeb8:contrail-config-nodemgr:Config:0]: Discarding event[EvSandeshUVESend] in state[Disconnect]
wokeup and found a line
process:0,groupname:contrail-api,eventname:PROCESS_STATE_EXITED
contrail-api:0 with pid:4702 exited abnormally

supervisord-config.log :

2015-08-25 15:09:17,521 INFO success: contrail-device-manager entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2015-08-25 15:09:17,521 INFO success: contrail-schema entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2015-08-25 15:09:17,521 INFO success: contrail-svc-monitor entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2015-08-25 15:09:25,483 INFO exited: 0 (exit status 1; not expected)
2015-08-25 15:09:25,801 INFO spawned: '0' with pid 6658
2015-08-25 15:09:26,800 INFO success: 0 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2015-08-25 15:09:47,081 INFO stopped: ifmap (terminated by SIGTERM)
2015-08-25 15:09:47,083 INFO stopped: contrail-config-nodemgr (terminated by SIGTERM)
2015-08-25 15:09:47,383 INFO stopped: 0 (terminated by SIGKILL)
2015-08-25 15:09:47,664 INFO stopped: 0 (terminated by SIGKILL)
2015-08-25 15:09:48,669 INFO stopped: contrail-device-manager (terminated by SIGKILL)
2015-08-25 15:09:49,674 INFO stopped: contrail-schema (terminated by SIGKILL)
2015-08-25 15:09:50,680 INFO stopped: contrail-svc-monitor (terminated by SIGKILL)

Revision history for this message
Raj Reddy (rajreddy) wrote : Re: [Bug 1488783] [2.21 86] Config processes dint come up after rebooting the node
Download full text (5.9 KiB)

Pavana, can you see what the syslog says wrt supervisor-config service..
It seems someone is issuing ‘service supervisor-config stop'

thanks,

> On Aug 25, 2015, at 11:59 PM, Pavana <email address hidden> wrote:
>
> ** Changed in: juniperopenstack/r2.20
> Assignee: (unassigned) => Raj Reddy (rajreddy)
>
> --
> You received this bug notification because you are a member of Contrail
> Systems engineering, which is subscribed to Juniper Openstack.
> https://bugs.launchpad.net/bugs/1488783
>
> Title:
> [2.21 86] Config processes dint come up after rebooting the node
>
> Status in Juniper Openstack:
> New
> Status in Juniper Openstack r2.20 series:
> New
>
> Bug description:
> Build : 2.21-86
> cores : {}
> CoreLocation : /cs-shared/test_runs/nodeb8/2015_08_26_03_29_50
> LogsLocation : http://10.204.216.50/Docs/logs/2.21-86_2015_08_26_03_29_50/logs/
> Report : http://10.204.216.50/Docs/logs/2.21-86_2015_08_26_03_29_50/junit-noframes.html
> Topology :
> Config Nodes : [u'nodeb8']
> Control Nodes : [u'nodeb8']
> Compute Nodes : [u'nodeb8']
> Openstack Node : nodeb8
> WebUI Node : nodeb8
> Analytics Nodes : [u'nodeb8']
>
> Required logs copied to : /cs-
> shared/test_runs/nodeb8/2015_08_26_03_29_50
>
> Seen particularly on CentOS 6.5. Sundar is aware of the issue.
> While provisioning when supervisor/API/IFMAP are restarted, the rest of the processes are triggered for restart but they donot come up.
>
> [root@nodeb8 contrail-test]# contrail-status
> == Contrail vRouter ==
> supervisor-vrouter: active
> contrail-vrouter-agent initializing (Collector, Discovery:Collector, Discovery:dns-server, Discovery:xmpp-server connection down)
> contrail-vrouter-nodemgr active
>
> == Contrail Control ==
> supervisor-control: active
> contrail-control initializing (Number of connections:4, Expected:5)
> contrail-control-nodemgr active
> contrail-dns active
> contrail-named active
>
> == Contrail Analytics ==
> supervisor-analytics: active
> contrail-analytics-api initializing (Discovery:OpServer, Discovery:Collector connection down)
> contrail-analytics-nodemgr active
> contrail-collector initializing (Discovery:Collector connection down)
> contrail-query-engine active
> contrail-snmp-collector active
> contrail-topology active
>
> == Contrail Config ==
> supervisor-config: inactive
> unix:///tmp/supervisord_config.sockno
>
> == Contrail Web UI ==
> supervisor-webui: active
> contrail-webui active
> contrail-webui-middleware active
>
> == Contrail Database ==
> supervisor-database: active
> contrail-database active
> contrail-database-nodemgr active
>
> == Contrail Support Services ==
> supervisor-support-service: active
>...

Read more...

Raj Reddy (rajreddy)
tags: added: config
Changed in juniperopenstack:
assignee: Raj Reddy (rajreddy) → nobody
Revision history for this message
Pavana (pavanap) wrote :

Not happening consistently so removing the 'blocker' tag

tags: removed: blocker
Revision history for this message
Raj Reddy (rajreddy) wrote :

The issue is contrail-api is exiting which in turn causes 'supervisor-config restart' from nodemgr, but that is probably inconsistent as the nodemgr process itself will get killed when it is in the process of doing a restart..

Easy to reproduce, by 'kill -9 <contrail-api>'

This seems like an issue only on centos - may be related to how service restarts are done..

I believe at some point there was a discussion to remove 'supervisor-config restart' on contrail-api exit -- the config team should take a look and see how best to fix this..

Ganesha HV (ganeshahv)
tags: added: releasenote
information type: Proprietary → Public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.