provision of 16.04.2 storage cluster failed with error in sm_ansible_callback.py

Bug #1695183 reported by wenqing liang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.0
Incomplete
High
wenqing liang
R4.1
Incomplete
High
wenqing liang
Trunk
Incomplete
High
wenqing liang

Bug Description

r4.0-17 newton provision of 16.04.2 storage cluster via r4.0-17 mitaka SM.

+-----------------+---------------------+---------------+-------------------+
| id | status | ip_address | mac_address |
+-----------------+---------------------+---------------+-------------------+
| cmbu-ceph-perf1 | provision_failed | 10.87.140.197 | 00:25:90:AB:9C:88 |
| cmbu-ceph-perf2 | provision_completed | 10.87.140.198 | 00:25:90:35:8A:1F |
| cmbu-ceph-perf3 | provision_completed | 10.87.140.199 | 00:25:90:92:0E:6C |
| cmbu-ceph-perf4 | provision_completed | 10.87.140.200 | 00:25:90:92:0D:F2 |
+-----------------+---------------------+---------------+-------------------+

"2017-06-01 23:49:28,210-INFO-sm_ansible_callback.py:43-append(): TASK [node : Create compute_list - step 2 (when ctrl_data_network is defined)]"
"2017-06-01 23:49:28,253-INFO-sm_ansible_callback.py:43-append(): fatal: [10.87.140.197]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'ansible_hostname'\n\nThe error appears to have been in '/opt/contrail/server_manager/ansible/playbooks/combo_4_0_17newton/playbooks/roles/node/tasks/main.yml': line 194, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Create compute_list - step 2 (when ctrl_data_network is defined)\n ^ here\n"}"
"2017-06-01 23:49:28,257-DEBUG-server_mgr_status.py:112-put_ansible_status(): Server status Data cmbu-ceph-perf1 provision_failed 2017_06_01__23_49_28"
"2017-06-01 23:49:28,267-INFO-sm_ansible_callback.py:43-append(): PLAY RECAP [10.87.140.197] : ok: 43 changed: 17 unreachable: 0 skipped: 30 failed: 1"
"2017-06-01 23:49:28,270-DEBUG-server_mgr_status.py:112-put_ansible_status(): Server status Data cmbu-ceph-perf2 provision_completed 2017_06_01__23_49_28"
"2017-06-01 23:49:28,282-INFO-sm_ansible_callback.py:43-append(): PLAY RECAP [10.87.140.198] : ok: 0 changed: 0 unreachable: 1 skipped: 0 failed: 0"
"2017-06-01 23:49:28,284-DEBUG-server_mgr_status.py:112-put_ansible_status(): Server status Data cmbu-ceph-perf3 provision_completed 2017_06_01__23_49_28"
"2017-06-01 23:49:28,297-INFO-sm_ansible_callback.py:43-append(): PLAY RECAP [10.87.140.199] : ok: 0 changed: 0 unreachable: 1 skipped: 0 failed: 0"
"2017-06-01 23:49:28,300-DEBUG-server_mgr_status.py:112-put_ansible_status(): Server status Data cmbu-ceph-perf4 provision_completed 2017_06_01__23_49_28"
"2017-06-01 23:49:28,315-INFO-sm_ansible_callback.py:43-append(): PLAY RECAP [10.87.140.200] : ok: 0 changed: 0 unreachable: 1 skipped: 0 failed: 0"
"2017-06-01 23:49:28,321-INFO-sm_ansible_server.py:42-run(): Process Done"
"2017-06-01 23:49:36,695-INFO-server_mgr_ssh_client.py:57-connect(): CONNECT FAILED: Host => 10.87.140.200, option => key, ERROR => [Errno None] Unable to connect to port 22 on 10.87.140.200"
"2017-06-01 23:50:48,694-DEBUG-server_mgr_mon_base_plugin.py:706-create_server_dict(): Created server dictionary."
"2017-06-01 23:50:51,681-DEBUG-server_mgr_main.py:1273-validate_smgr_request(): validate_smgr_request"
"2017-06-01 23:50:51,694-INFO-server_mgr_ssh_client.py:57-connect(): CONNECT FAILED: Host => 10.87.140.199, option => key, ERROR => [Errno None] Unable to connect to port 22 on 10.87.140.199"
"2017-06-01 23:51:18,624-DEBUG-server_mgr_main.py:1273-validate_smgr_request(): validate_smgr_request"
"2017-06-01 23:52:06,695-INFO-server_mgr_ssh_client.py:57-connect(): CONNECT FAILED: Host => 10.87.140.198, option => key, ERROR => [Errno None] Unable to connect to port 22 on 10.87.140.198"

Jun 1 23:49:47 cmbu-ceph-perf1 barbican-worker[19718]: 2017-06-01 23:49:47.200 19718 ERROR oslo.messaging._drivers.impl_rabbit [-] [2d7bd482-c075-4d37-9082-45e2e8201064] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 53846
Jun 1 23:49:47 cmbu-ceph-perf1 nova-console[26105]: 2017-06-01 23:49:47.617 26105 ERROR oslo.messaging._drivers.impl_rabbit [-] [897bb6dc-b97f-48d2-a4dd-7c36bac31351] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: 54810

            "provision_role_sequence": "{'completed': [(u'cmbu-ceph-perf1', 'openstack', '2017_06_01__17_58_57'), (u'cmbu-ceph-perf1', 'post_provision', '2017_06_01__23_48_45')], 'steps': []}",

Tags: provisioning
wenqing liang (wliang)
information type: Proprietary → Public
Revision history for this message
Jeya ganesh babu J (jjeya) wrote :

issue not related to storage. I see communication error to the compute nodes. The management interface is not up.
root@cmbu-ceph-perf2:~# ifconfig
enp4s0 Link encap:Ethernet HWaddr 00:25:90:35:8a:1f
          inet6 addr: fe80::225:90ff:fe35:8a1f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:680469 errors:0 dropped:0 overruns:0 frame:0
          TX packets:16276 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:67705163 (67.7 MB) TX bytes:3074974 (3.0 MB)
          Interrupt:17 Memory:faee0000-faf00000

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

p6p2 Link encap:Ethernet HWaddr 90:e2:ba:1c:be:69
          inet addr:5.0.0.2 Bcast:5.0.0.255 Mask:255.255.255.0
          inet6 addr: fe80::92e2:baff:fe1c:be69/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:188 errors:0 dropped:0 overruns:0 frame:0
          TX packets:132 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:17015 (17.0 KB) TX bytes:42513 (42.5 KB)

The /etc/network/interface seems to have the correct config though.
# The primary network interface
auto enp4s0
iface enp4s0 inet static
    address 10.87.140.198
    netmask 255.255.224.0
    gateway 10.87.159.254

tags: removed: storage
Revision history for this message
Jeya ganesh babu J (jjeya) wrote :

it could be possible that the dhcp server running on the network is down.
Not sure why system uses dhcp with static ip configured. One possibility would be the network service is not restarted after the setting is changed.

Revision history for this message
Jeya ganesh babu J (jjeya) wrote :

Assiging to wenqing to see if this happens with a fresh install

Jeba Paulaiyan (jebap)
tags: removed: blocker
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.