Bug #1801474 “Contrail Ansible Deployer: schema remains in initi...” : Bugs : Juniper Openstack

Prachi Yadav (pryadav) on 2018-11-03

tags:

added: contrail-networking sanity

Jeba Paulaiyan (jebap) on 2018-11-05

tags:

added: blocker

Revision history for this message

Nagendra Prasath (npchandran) wrote on 2018-11-05:

#1

Restart the contrail-schema to proceed. Shouldnt be a blocker.

Revision history for this message

Prachi Yadav (pryadav) wrote on 2018-11-05:

#2

Download full text (3.5 KiB)

Build: 5.0-333
sku: queens

The provisioning looks good on the latest build.

[root@controller-0-de42d58b7729422095b955676d449053 logs]# contrail-status
Original Name State Status
redis contrail-external-redis running Up 13 hours
contrail-analytics-alarm-gen running Up 13 hours
contrail-analytics-api running Up 13 hours
contrail-analytics-collector running Up 13 hours
contrail-nodemgr running Up 13 hours
contrail-analytics-query-engine running Up 13 hours
snmp-collector contrail-analytics-snmp-collector running Up 13 hours
contrail-analytics-topology running Up 13 hours
contrail-controller-config-api running Up 13 hours
device-manager contrail-controller-config-devicemgr running Up 13 hours
contrail-nodemgr running Up 13 hours
contrail-controller-config-schema running Up 13 hours
contrail-controller-config-svcmonitor running Up 13 hours
contrail-external-cassandra running Up 13 hours
contrail-nodemgr running Up 13 hours
contrail-external-rabbitmq running Up 13 hours
contrail-external-zookeeper running Up 13 hours
contrail-controller-control-control running Up 13 hours
contrail-controller-control-dns running Up 13 hours
contrail-controller-control-named running Up 13 hours
contrail-nodemgr running Up 13 hours
contrail-external-cassandra running Up 13 hours
contrail-external-kafka running Up 13 hours
contrail-nodemgr running Up 13 hours
contrail-external-zookeeper running Up 13 hours
contrail-controller-webui-job running Up 13 hours
contrail-controller-webui-web running Up 13 hours

WARNING: container with original name 'contrail-external-redis' have Pod or Service empty. Pod: '' / Service: 'redis'. Please pass NODE_TYPE with pod name to container's env

== Contrail control ==
control: active
nodemgr: active
named: active
dns: active

== Contrail config-database ==
nodemgr: active
zookeeper: active
rabbitmq: active
cassandra: active

== Contrail database ==
kaf...

Build: 5.0-333
sku: queens

The provisioning looks good on the latest build.

[root@controller-0-de42d58b7729422095b955676d449053 logs]# contrail-status
Pod              Service         Original Name                          State    Status       
                 redis           contrail-external-redis                running  Up 13 hours  
analytics        alarm-gen       contrail-analytics-alarm-gen           running  Up 13 hours  
analytics        api             contrail-analytics-api                 running  Up 13 hours  
analytics        collector       contrail-analytics-collector           running  Up 13 hours  
analytics        nodemgr         contrail-nodemgr                       running  Up 13 hours  
analytics        query-engine    contrail-analytics-query-engine        running  Up 13 hours  
analytics        snmp-collector  contrail-analytics-snmp-collector      running  Up 13 hours  
analytics        topology        contrail-analytics-topology            running  Up 13 hours  
config           api             contrail-controller-config-api         running  Up 13 hours  
config           device-manager  contrail-controller-config-devicemgr   running  Up 13 hours  
config           nodemgr         contrail-nodemgr                       running  Up 13 hours  
config           schema          contrail-controller-config-schema      running  Up 13 hours  
config           svc-monitor     contrail-controller-config-svcmonitor  running  Up 13 hours  
config-database  cassandra       contrail-external-cassandra            running  Up 13 hours  
config-database  nodemgr         contrail-nodemgr                       running  Up 13 hours  
config-database  rabbitmq        contrail-external-rabbitmq             running  Up 13 hours  
config-database  zookeeper       contrail-external-zookeeper            running  Up 13 hours  
control          control         contrail-controller-control-control    running  Up 13 hours  
control          dns             contrail-controller-control-dns        running  Up 13 hours  
control          named           contrail-controller-control-named      running  Up 13 hours  
control          nodemgr         contrail-nodemgr                       running  Up 13 hours  
database         cassandra       contrail-external-cassandra            running  Up 13 hours  
database         kafka           contrail-external-kafka                running  Up 13 hours  
database         nodemgr         contrail-nodemgr                       running  Up 13 hours  
database         zookeeper       contrail-external-zookeeper            running  Up 13 hours  
webui            job             contrail-controller-webui-job          running  Up 13 hours  
webui            web             contrail-controller-webui-web          running  Up 13 hours

WARNING: container with original name 'contrail-external-redis' have Pod or Service empty. Pod: '' / Service: 'redis'. Please pass NODE_TYPE with pod name to container's env

== Contrail control ==
control: active
nodemgr: active
named: active
dns: active

== Contrail config-database ==
nodemgr: active
zookeeper: active
rabbitmq: active
cassandra: active

== Contrail database ==
kafka: active
nodemgr: active
zookeeper: active
cassandra: active

== Contrail analytics ==
snmp-collector: active
query-engine: active
api: active
alarm-gen: active
nodemgr: active
collector: active
topology: active

== Contrail webui ==
web: active
job: active

== Contrail config ==
svc-monitor: backup
nodemgr: active
device-manager: backup
api: active
schema: backup

Jeba Paulaiyan (jebap) on 2018-11-06

tags:	removed: blocker
tags:	added: releasenote

Revision history for this message

manishkn (manishkn) wrote on 2018-11-06: hi..Latest good build to use for fabric ?

#3

Thanks
Manish Krishnan

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2018-11-06: [Review update] master

#4

Review in progress for https://review.opencontrail.org/47491
Submitter: Shivayogi Ugaji (<email address hidden>)

Revision history for this message

alok kumar (kalok) wrote on 2018-11-08:

#6

this is seen on RHOSP13 setup too for 5.0.2 build 341, restarting schema fixed the issue.

Revision history for this message

Jeba Paulaiyan (jebap) wrote on 2018-11-20:

#7

Notes:

After provisioning, if the contrail-status shows that schema is in initializing state with connection to Cassandra down error, then restarting the schema service will recover from the error.

Jeba Paulaiyan (jebap) on 2018-11-20

information type:

Proprietary → Public

Sudheendra Rao (sudheendra-k) on 2018-12-10

tags:

added: sanityblocker
removed: sanity

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2019-01-25: A change has been merged

#8

Reviewed: https://review.opencontrail.org/47491
Committed: http://github.com/Juniper/contrail-controller/commit/6a29730679db2375c0eee7006de2665b53cf979d
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit 6a29730679db2375c0eee7006de2665b53cf979d
Author: Shivayogi Ugaji <email address hidden>
Date: Mon Nov 5 22:07:18 2018 -0800

db_resync_done lock is used to indicate the amqp thread to wait for resync to
complete. In this case, when we call SchemaTransformer.destroy_instance()
due to Casandra connection failure, this lock remains locked blocking
destroy_instance. destroy_instance calls _vnc_subscribe_callback to drain the
amqp queue which waits infinitely for db_resync_done lock to be released.
This fix releases db_resync_done lock so that destroy_instance doesnt get
blocked.

Change-Id: Ic81fd0acda0fd4d43b3bfd061b47527b150b9096
Closes-Bug: #1801474

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2019-01-25: [Review update] R5.0

#9

Review in progress for https://review.opencontrail.org/48899
Submitter: Nagendra Prasath (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2019-01-31: A change has been merged

#11

Reviewed: https://review.opencontrail.org/48899
Committed: http://github.com/Juniper/contrail-controller/commit/f4fdbf871d088907475c88069d46a3e23aadbf3d
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit f4fdbf871d088907475c88069d46a3e23aadbf3d
Author: Shivayogi Ugaji <email address hidden>
Date: Mon Nov 5 22:07:18 2018 -0800

db_resync_done lock is used to indicate the amqp thread to wait for resync to
complete. In this case, when we call SchemaTransformer.destroy_instance()
due to Casandra connection failure, this lock remains locked blocking
destroy_instance. destroy_instance calls _vnc_subscribe_callback to drain the
amqp queue which waits infinitely for db_resync_done lock to be released.
This fix releases db_resync_done lock so that destroy_instance doesnt get
blocked.

Change-Id: Ic81fd0acda0fd4d43b3bfd061b47527b150b9096
Closes-Bug: #1801474
(cherry picked from commit 6a29730679db2375c0eee7006de2665b53cf979d)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2019-02-14: [Review update] R6.0-WIP

#13

Review in progress for https://review.opencontrail.org/49393
Submitter: Mateusz Neumann (<email address hidden>)

Revision history for this message

OpenContrail Admin (ci-admin-f) wrote on 2019-02-20: A change has been merged

#15

Download full text (37.3 KiB)

Reviewed: https://review.opencontrail.org/49393
Committed: http://github.com/Juniper/contrail-controller/commit/77df3b58265b3fab414dfbc00e1ff39d19f0a99c
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R6.0-WIP

commit 77df3b58265b3fab414dfbc00e1ff39d19f0a99c
Author: Shivayogi Ugaji <email address hidden>
Date: Mon Nov 5 22:07:18 2018 -0800

Apply commits from master onto R6.0-WIP

db_resync_done lock is used to indicate the amqp thread to wait for resync to
complete. In this case, when we call SchemaTransformer.destroy_instance()
due to Casandra connection failure, this lock remains locked blocking
destroy_instance. destroy_instance calls _vnc_subscribe_callback to drain the
amqp queue which waits infinitely for db_resync_done lock to be released.
This fix releases db_resync_done lock so that destroy_instance doesnt get
blocked.
Closes-Bug: #1801474

[DM] Hitless image upgrade implementation
Closes-Bug: #1799322

Provisioner for the devicemanager node.
usage:
from /opt/contrail/utils
python provision_devicemgr_node.py --host_name aio --host_ip 10.87.82.2
--oper add --admin_user admin --admin_password contrail123 --admin_tenant_name
admin --openstack_ip 10.87.82.2 --api_server_ip 10.87.82.2
Closes-Bug: #1805303

CFM: Changes for onboarding L3PNF
- Add new platform SRX240
- Add L3PNF subnet is schema
- Add new namespace, VN and IPAM for L3PNF during brownfield onboarding
Closes-Bug: 1800701

Add entrypoint to vrouter-agent service on Windows
Introduce entrypoint for agent similar in design to that from
microservice deployment. For now it will only start agent,
actual features will be added in following changes.
Partial-Bug: #1806677

Check build dependencies for tbb, SimpleAmqpClient and rabbitmq
Closes-Bug: #1806719

Make agent's entrypoint update agent's config on Windows
In future we will generate the whole config from scratch
as on Linux, but for now we only update the vhost's ifname.
It's the only field that can change upon restart.
Partial-Bug: #1806677

bgp-peer selection support for bgpaas
1. Listener BgpRouterConfig is added for BgpRouter and ControlNodeZone
2. BgpRouterConfig builds BgpRouterTree and ControlNodeZoneTree
from IFMapNode
3. BGPaaS gets BgpRouter for configured ControlNodeZone from
BgpRouterConfig and Updates bgp-peer-ip and bgp-peer-port in
the flow.
4. Step 3 is followed for xmpp based peer-selection also.
5. BGPaaS sandesh is updated with primary_control_node_zone,
secondary_control_node_zone, bgp_peer_ip and bgp_peer_port
Partial-bug: #1775872

[DM] Inside-outside workflow - lag/mH
1. Change the exisiting business logic to adhere to the new data model for lag/mH workflow
2. Multi-vlan support
Partial-Bug: #1799329

Rework nodemgr before fixing ntp issue
- move windows/linux code to separate classes instead of same condition through the code
- simplify main.py
- remove copy duplication
Closes-Bug: 1800704

[fabric] Added playbook retry support to job manager
1) When playbook return retry_devices in the output, job manager will retry the playbooks against those devices
2) remove obsolete playbooks from 5.0
3) remove obsolete ansible roles from 5.0
4) added a warning log on missing loopback interface when ...

Reviewed:  https://review.opencontrail.org/49393
Committed: http://github.com/Juniper/contrail-controller/commit/77df3b58265b3fab414dfbc00e1ff39d19f0a99c
Submitter: Zuul v3 CI (zuulv3@zuul.opencontrail.org)
Branch:    R6.0-WIP

commit 77df3b58265b3fab414dfbc00e1ff39d19f0a99c
Author: Shivayogi Ugaji <yogi@juniper.net>
Date:   Mon Nov 5 22:07:18 2018 -0800