Provision fails for Contrail 5.0 (contrail-ansible-deployer) for orch: K8s

Bug #1760217 reported by Madhava Jayamani on 2018-03-30
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R5.0
Fix Committed
Critical
Madhava Jayamani
Trunk
Fix Committed
Critical
Bartosz Kupidura

Bug Description

Provision fails for Contrail 5.0 (contrail-ansible-deployer) for orch: K8s - Build used 46.

Setup : 10.87.118.145

2018-03-30 18:40:01,907 p=10403 u=root | TASK [install_contrail : show master list] *************************************
2018-03-30 18:40:01,986 p=10403 u=root | ok: [10.0.0.7] => {}
2018-03-30 18:40:01,987 p=10403 u=root | ok: [10.0.0.8] => {}
2018-03-30 18:40:02,005 p=10403 u=root | ok: [10.0.0.9] => {}
2018-03-30 18:40:02,029 p=10403 u=root | ok: [10.0.0.6] => {}
2018-03-30 18:40:02,045 p=10403 u=root | TASK [install_contrail : set master] *******************************************
2018-03-30 18:40:02,100 p=10403 u=root | fatal: [10.0.0.7]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: list object has no element 0

The error appears to have been in '/tmp/ansible.Gh3nwO_contrail/contrail-ansible-deployer/playbooks/roles/install_contrail/tasks/set_master.yml': line 13, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: set master
  ^ here

exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
exception: list object has no element 0

2018-03-30 18:40:02,102 p=10403 u=root | fatal: [10.0.0.8]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: list object has no element 0

Instances.yaml used for deploying:

[root@server3 config]# cat instances.yaml
REGISTRY_PRIVATE_INSECURE: True
CONTAINER_REGISTRY: ci-repo.englab.juniper.net:5000
provider_config:
  bms:
    domainsuffix: local
    ntpserver: 10.84.5.100
    ssh_pwd: c0ntrail123
    ssh_user: root

instances:
  server1:
      ip: 10.0.0.4
      provider: bms
      roles:
          analytics: null
          analytics_database: null
          config: null
          config_database: null
          control: null
          webui: null
  server2:
      ip: 10.0.0.5
      provider: bms
      roles:
          analytics: null
          analytics_database: null
          config: null
          config_database: null
          control: null
          webui: null
  server3:
      ip: 10.0.0.6
      provider: bms
      roles:
          analytics: null
          analytics_database: null
          config: null
          config_database: null
          control: null
          k8s_master: null
          kubemanager: null
          webui: null
  server4:
      ip: 10.0.0.7
      provider: bms
      roles:
          k8s_node: null
          vrouter: null
  server5:
      ip: 10.0.0.8
      provider: bms
      roles:
          k8s_node: null
          vrouter: null
  server6:
      ip: 10.0.0.9
      provider: bms
      roles:
          k8s_node: null
          vrouter: null

contrail_configuration:
  CONTAINER_REGISTRY: ci-repo.englab.juniper.net:5000
  CONTRAIL_VERSION: latest
  CLOUD_ORCHESTRATOR: kubernetes
  METADATA_PROXY_SECRET: c0ntrail123
  CONTROLLER_NODES: 10.10.0.4,10.10.0.5,10.10.0.6
  AAA_MODE: no-auth
  CONTROLLER_NODES: 10.10.0.4,10.10.0.5,10.10.0.6
  CONTROL_DATA_NET_LIST: 10.10.0.0/24
  PHYSICAL_INTERFACE: eth1
  VROUTER_GATEWAY: 10.10.0.1
  two_interface: true

[root@server3 config]#

Logs can be found and the below location:
madhavaj@ubuntu-build02:/cs-shared/bugs/1760217$ ls -ltr
total 3588K
-rwxrwxrwx 1 madhavaj jrs 1695 Mar 30 13:53 instances.yaml
-rwxrwxrwx 1 madhavaj jrs 3653514 Mar 30 13:53 ansible.log
madhavaj@ubuntu-build02:/cs-shared/bugs/1760217$

description: updated
description: updated
Ramprakash R (ramprakash) wrote :

Reassigning to Bartosz based on this email thread:

%<-------------------------------------------------
+Bartosz

Hi Ram,

the referenced yaml is executed by this loop:

- name: set master
  include: set_master.yml
  with_items:
    - "{{ controller_list }}"
  loop_control:
    loop_var: controller_item
  when: roles[instance_name].k8s_master is defined or roles[instance_name].k8s_node is defined
  tags:
    - k8s

from main.yaml.

The problem is that we are working with a single controller_list which is wrong.
Bartosz is working on creating a controller_list per role.
@Bartosz, we also need to consider k8s. Can you cross check all references where controller_list is used?
We need to replace it with the role specific controller_list.

Regards,
Michael

Am 01.04.2018 um 00:47 schrieb Ramprakash Ram Mohan <email address hidden>:

Hi Michael,

https://bugs.launchpad.net/juniperopenstack/+bug/1760217

This bug was hit in a sanity setup where the k8s master and contrail controller roles were separate.
I could not understand the logic in this code here (in set_master.yml)

- name: fill master list
  set_fact:
    master_list: "{{ master_list + [ hostvars[item]['private_ip'] ] }}"
  when: roles[hostvars[item]['instance_name']].k8s_master is defined and controller_item in hostvars[item].ansible_all_ipv4_addresses
  with_items:
    - "{{ groups['container_hosts'] }}"

In this scenario it looks like it is normal that master_list is not updated at all. What is the reason for adding the “and controller_item in hostvars[item].ansible_all_ipv4_addresses” conditional?
Is it not sufficient to just check roles[hostvars[item][‘instance_name’]].k8s_master?
Please let me know.

Thanks,
Ram

Sachin Bansal (sbansal) on 2018-04-09
information type: Proprietary → Public

Review in progress for https://review.opencontrail.org/41105
Submitter: Bartosz Kupidura (<email address hidden>)

Bartosz Kupidura (zynzel) wrote :

@madhavajayamani can You please test if https://review.opencontrail.org/41105 solves issue?
Before merge i need to check as many scenarios as possible

OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/41105
Submitter: Bartosz Kupidura (<email address hidden>)

Reviewed: https://review.opencontrail.org/41105
Committed: http://github.com/Juniper/contrail-ansible-deployer/commit/9bba8f877dbe2486ec2c0878767ce861f851763d
Submitter: Zuul v3 CI (<email address hidden>)
Branch: master

commit 9bba8f877dbe2486ec2c0878767ce861f851763d
Author: Bartosz Kupidura <email address hidden>
Date: Tue Mar 27 14:55:57 2018 +0200

Build node list based on roles

Closes-Bug: 1760217
Change-Id: I1fd8bc4e64b814eb9614913ab0bf416091c094cf

Review in progress for https://review.opencontrail.org/42072
Submitter: Bartosz Kupidura (<email address hidden>)

Reviewed: https://review.opencontrail.org/42072
Committed: http://github.com/Juniper/contrail-ansible-deployer/commit/fa179acc9c18688f7707a8cdcabe3cf70d1b7564
Submitter: Zuul v3 CI (<email address hidden>)
Branch: R5.0

commit fa179acc9c18688f7707a8cdcabe3cf70d1b7564
Author: Bartosz Kupidura <email address hidden>
Date: Tue Mar 27 14:55:57 2018 +0200

Build node list based on roles

Closes-Bug: 1760217
Change-Id: I1fd8bc4e64b814eb9614913ab0bf416091c094cf
(cherry picked from commit 9bba8f877dbe2486ec2c0878767ce861f851763d)

Pulkit Tandon (pulkitt) wrote :

Workaround suggested 15 days back:
Hi Pulkit,

This is because of this bug: https://bugs.launchpad.net/juniperopenstack/+bug/1760217

I believe you could work around this by moving your k8s roles to be on the same node as the controller. (I have not tried this so I am not very sure).

Thanks,
Ram

Pulkit Tandon (pulkitt) wrote :

As per the above comment by Ram, this bug was supposed to take care of the K8s HA provisioning in cases where I intent to spawn contrail-kube-manager containers on multiple nodes:
```
instances:
  server1:
      ip: 10.0.0.4
      provider: bms
      roles:
          analytics: null
          analytics_database: null
          config: null
          config_database: null
          control: null
          kubemanager: null
          webui: null
  server2:
      ip: 10.0.0.5
      provider: bms
      roles:
          analytics: null
          analytics_database: null
          config: null
          config_database: null
          control: null
          kubemanager: null
          webui: null
  server3:
      ip: 10.0.0.6
      provider: bms
      roles:
          analytics: null
          analytics_database: null
          config: null
          config_database: null
          control: null
          k8s_master: null
          kubemanager: null
          webui: null
```

Right now, I am using a workaround by mentioning "k8s_master" as well on all the servers like:
```
```
instances:
  server1:
      ip: 10.0.0.4
      provider: bms
      roles:
          analytics: null
          analytics_database: null
          config: null
          config_database: null
          control: null
          k8s_master: null
          kubemanager: null
          webui: null
  server2:
      ip: 10.0.0.5
      provider: bms
      roles:
          analytics: null
          analytics_database: null
          config: null
          config_database: null
          control: null
          k8s_master: null
          kubemanager: null
          webui: null
  server3:
      ip: 10.0.0.6
      provider: bms
      roles:
          analytics: null
          analytics_database: null
          config: null
          config_database: null
          control: null
          k8s_master: null
          kubemanager: null
          webui: null
```

I am still facing the same issue if I don't use the workaround.
Please let me know if this bug takes care of above scenario or I need to raise a new one.

Bartosz Kupidura (zynzel) wrote :

Can You please paste ansible output from run in case when kubemanager is hosted on node without k8s_master? IMHO this is other issue, but lets check

Pulkit Tandon (pulkitt) wrote :

Following are the error logs:

Following are the failures logs:
fatal: [77.77.1.20]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0\n\nThe error appears to have been in '/root/contrail-ansible-deployer/playbooks/roles/install_contrail/tasks/set_master.yml': line 13, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: set master\n ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: list object has no element 0"}
fatal: [77.77.1.21]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0\n\nThe error appears to have been in '/root/contrail-ansible-deployer/playbooks/roles/install_contrail/tasks/set_master.yml': line 13, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: set master\n ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: list object has no element 0"}
fatal: [77.77.1.31]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: list object has no element 0\n\nThe error appears to have been in '/root/contrail-ansible-deployer/playbooks/roles/install_contrail/tasks/set_master.yml': line 13, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: set master\n ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: list object has no element 0"}

Pulkit Tandon (pulkitt) wrote :

The logs shared in the previous comment are the one which I encountered when I verified the issue on 21/04.
I gave another attempt today with master-octata-138 and I don't see the issue now.
Hence marking it to fix committed for the original bug owner to verify

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers