Juniper Openstack

containers: WebUI shows two server is down

Bug #1694048 reported by Andrey Pavlov on 2017-05-27

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Juniper Openstack	Status tracked in Trunk
R4.0	Won't Fix	Low	Unassigned
Trunk	Invalid	Low	Unassigned	Juniper Openstack r5.0.0

Bug Description

build 14, mitaka + trusty, deployed by Juju

analytics and analyticsdb has an alarms that says 'ContrailConfig is missing or incorrect'

screen is attached

Tags:

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2017-05-27:

webui.png Edit (106.0 KiB, image/png)

Jeba Paulaiyan (jebap) on 2017-05-29

tags:

added: provisioning

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2017-05-31:

there are no warnings/errors when deployment has 3 controllers/analytics/analyticsdb

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2017-09-06:

Comment from Gokul:

1. Analytics and analyticsdb: the nodes are not registered with contrail and the status shows “Missing Contrail Config”. I tried registering them manually and they are properly displayed on the UI now:

curl -u admin:password http://localhost:8082/analytics-nodes

{"analytics-nodes": []}

curl -u admin:password http://localhost:8082/database-nodes

{"database-nodes": []}

python /usr/share/contrail-utils/provision_analytics_node.py --api_server_ip 172.31.15.186 --host_name ip-172-31-15-186 --host_ip 172.31.15.186 --oper add --admin_tenant_name admin --admin_password password --openstack_ip 18.220.180.11

python /usr/share/contrail-utils/provision_analytics_database.py --api_server_ip 172.31.15.186 --host_name ip-172-31-15-186 --host_ip 172.31.15.186 --oper add --admin_tenant_name admin --admin_password password --openstack_ip 18.220.180.11

Revision history for this message

Santosh Gupta (sangupta) wrote on 2017-10-03:

Could you check if its provisioned correctly. Looks similar to bug 1716799.
Plese check if ContrailConfig is present in raw data, see comment#1 in bug 1716799.

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2017-10-04:

Hi Santosh,

As it posted in comment #3 - analytics and analyticsdb is not provisioned during deployment. I think that it should be done by ansible-internal for all cases (not only for non-openstack deployments).

So I think that its provisioned correctly and don't looks similar to bug 1716799.

Revision history for this message

Santosh Gupta (sangupta) wrote on 2017-10-05:

Hi Andrey,
Which branch/image – I didn’t get it from build#14.
Also when you try next please try a the latest build on that branch.
In comment#3, its mentioned that analytics and analyticsdb nodes are not provisioned.
Did that provisioning reach that step and bail out on error. Please provide this info and error message.

I need some more info to debug this.
combined json file
List of juju installation steps followed
Debug logs of the juju installation scripts
docker ps -a
docker logs controller
docker logs analytics
docker logs analyticsdb

Could you please leave box in that state and send me the details to login.
Thanks

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2017-10-05:

Hi Santosh,

Tried this on latest release build 32 (R4.0) for ubuntu 14.04 - the same behaviour.

I mentioned that analytics/analyticsdb is not provisioned. I meant that ansible-internal code even doesn't try to provision them -
https://github.com/Juniper/contrail-ansible-internal/blob/R4.0/playbooks/roles/contrail/analytics/tasks/provision.yml#L10
So there is no error cause there is no try.

Do you still need environment?

Revision history for this message

Santosh Gupta (sangupta) wrote on 2017-10-09:

https://github.com/Juniper/contrail-ansible-internal/blob/R4.0/playbooks/roles/contrail/analytics/tasks/main.yml#L28

Hi Andrey,
Since you have openstack orchestrator, we don't need to run it. You can see in "docker logs" that its evaluated and skipped. You dont need to run this command manually.

ubuntu@ip-172-31-8-29:/etc/contrailctl$ sudo docker logs contrail-analytics | grep -A5 -B5 "register analytics"
ok: [localhost]

TASK [contrail/analytics : Wait till config api server answers] ****************
skipping: [localhost]

TASK [contrail/analytics : register analytics to config api server (non-openstack)] ***
skipping: [localhost]

PLAY RECAP *********************************************************************
localhost : ok=29 changed=3 unreachable=0 failed=0

ubuntu@ip-172-31-8-29:/etc/contrailctl$ sudo docker logs contrail-analyticsdb | grep -A5 -B5 "register analyticsdb"
ok: [localhost]

TASK [contrail/analyticsdb : Wait till config api server answers] **************
skipping: [localhost]

TASK [contrail/analyticsdb : register analyticsdb to config api server (non-openstack)] ***
skipping: [localhost]

PLAY RECAP *********************************************************************
localhost : ok=31 changed=2 unreachable=0 failed=0

The alarm that you see is another manifestation of the bug#1716799. I see the same on ubuntu14 setup using ansible/openstack. We suspect there is a race condition where alarmgen gets notified of uve via kafka but its not present in redis when alarmgen tries to read it. Restart of alarmgen should resolve it.

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2017-10-10:

Hi Santosh,

I disagree that the bug is duplicate. I don't see timing issues here - Servers are down always. In all 4.X builds.

I don't see that containers do restart of alarm-gen.

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2017-10-10:

#10

webui3.png Edit (121.6 KiB, image/png)

same thing on HA setup with a last release build (4.0.2 33) - screen is attached.

Another point - TripleO and OSPd do not have such problem cause it calls provision_analytics/provision_analyticsdb after deployment.

Revision history for this message

Santosh Gupta (sangupta) wrote on 2017-10-11:

#11

Hi Andrey,
The setup that you provided me for debugging had all servers up.
The only issue it had was the “ContrailConfig missing or incorrect” alarm on 2 nodes. Restarting alarmgen removed alarms on your box. Hence I marked it duplicate.
That is the only issue stated in your initial bug report.
If you see other issues please open separate bug so that we don’t chase different issues in one bug.

>>> I don't see timing issues here - Servers are down always. In all 4.X builds.
In my setup, SM and ansible based all-in-one systems come up fine for 4.0.2.0-33.
Please bring up systems using ansible and juju on ubuntu14.04 and then we can debug your system.

>>> Another point - TripleO and OSPd do not have such problem cause it calls provision_analytics/provision_analyticsdb after deployment.
Openstack orchestrator based handling is different from non-openstack orchestrators as pointed in comment#8.
But I use openstack based provisioning using SM/ansible and it comes up fine. I don’t see the servers down.
Please have the systems up and we can look into it. Thanks.

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2017-10-12:

#12

Hi Santosh,
setup is ready. WebUI - https://13.58.222.139:8143/
build 4.0.2.0-34
nodes are present but down.

Some time ago Gokul (if I'm remember correctly the person) said me that this issue is happened cause no one call provision_analytics and provision_analyticsdb.

P.S. Juju can't restart alarm-gen inside container.

Revision history for this message

Santosh Gupta (sangupta) wrote on 2017-10-13:

#13

As discussed the only issue we have is 'ContrailConfig is missing or incorrect' alarm. We shall wait and verify it once the bug#1716799 is resolved.

Revision history for this message

Santosh Gupta (sangupta) wrote on 2017-10-20:

#14

bug#1716799 has some debugging code added. Please see if you can reproduce the issue on your setup.

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2017-10-20:

#15

@Santosh - what build I can use for it?

Revision history for this message

Santosh Gupta (sangupta) wrote on 2017-10-20:

#16

vi /auto/cs-build/jenkins-jobs/CB-mainline-ubuntu14-mitaka/builds/83/archive/sandbox/controller/src/opserver/alarmgen.py
You can pick this build, the file has changes done for bug#1716799

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2017-10-22:

#17

tried build 84 from mainline.
analyticsdb container doesn't work.

TASK [cassandra : Configure Datastax-agent] ************************************
ESC[0;31mfatal: [localhost]: FAILED! => {"failed": true, "msg": "{{ opscenter_ip }}: 'opscenter_ip' is undefined"}ESC[0m
to retry, use: --limit @/contrail-ansible-internal/playbooks/contrail_analyticsdb.retry

PLAY RECAP *********************************************************************
ESC[0;31mlocalhostESC[0m : ESC[0;32mok=34 ESC[0m ESC[0;33mchanged=9 ESC[0m unreachable=0 ESC[0;31mfailed=1 ESC[0m

waiting for a new build.

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2017-10-25:

#18

looks like it's blocked by https://bugs.launchpad.net/juniperopenstack/+bug/1720447

Andrey Pavlov (apavlov-e) on 2018-03-08

information type:

Proprietary → Public

Revision history for this message

Andrey Pavlov (apavlov-e) wrote on 2018-03-21:

#19

It's not applicable for master branch (trunk)

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.