possible timing issues in windows contrail ansible deployer

Bug #1784938 reported by sagarkchitnis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenContrail
New
Medium
Michał Kostrzewa

Bug Description

The first time ansible-playbook configure_instances.yml -i inventory was run, it failed but worked the second time without any changes. Maybe w need to update the timeout settings as windows could take different amount of time to start up (especially if automatic updates are pending).
Please search for "Reboot the system" in the logs below to see the differences.

[root@a5s38node4 ansible]# ansible-playbook configure_instances.yml -i inventory

PLAY [windows_host] ************************************************************

TASK [Gathering Facts] *********************************************************
ok: [10.84.14.226]
ok: [10.84.14.234]

TASK [configure_instances : Install feature Windows-Containers] ****************
ok: [10.84.14.226]
changed: [10.84.14.234]

TASK [configure_instances : Install feature NET-Framework-Features] ************
ok: [10.84.14.234]
ok: [10.84.14.226]

TASK [configure_instances : Install feature Hyper-V] ***************************
ok: [10.84.14.226]
ok: [10.84.14.234]

TASK [configure_instances : Enable testsigning] ********************************
changed: [10.84.14.234]
changed: [10.84.14.226]

TASK [configure_instances : Reboot the system] *********************************

fatal: [10.84.14.234]: FAILED! => {"changed": false, "elapsed": 621, "msg": "timed out waiting for reboot uptime check success: ('Connection aborted.', error(111, 'Connection refused'))", "rebooted": true}
changed: [10.84.14.226]

TASK [configure_instances : Wait for reconnection] *****************************
ok: [10.84.14.226]

TASK [configure_instances : Install Chocolatey] ********************************
changed: [10.84.14.226]

TASK [configure_instances : Install python 2.7.13] *****************************
changed: [10.84.14.226]

TASK [configure_instances : Install NSSM] **************************************
changed: [10.84.14.226]

TASK [configure_instances : Install DockerProvider] ****************************
changed: [10.84.14.226]

TASK [configure_instances : Install Docker-EE] *********************************
changed: [10.84.14.226]

TASK [configure_instances : Install MS Visuall C++ Redist 14] ******************
changed: [10.84.14.226]

TASK [configure_instances : Disable Windows Firewall] **************************
changed: [10.84.14.226]
        to retry, use: --limit @/root/contrail-windows-deployer/ansible/configure_instances.retry

PLAY RECAP *********************************************************************
10.84.14.226 : ok=14 changed=9 unreachable=0 failed=0
10.84.14.234 : ok=5 changed=2 unreachable=0 failed=1

[root@a5s38node4 ansible]#
[root@a5s38node4 ansible]# ansible-playbook configure_instances.yml -i inventory

PLAY [windows_host] ************************************************************

TASK [Gathering Facts] *********************************************************
ok: [10.84.14.226]
ok: [10.84.14.234]

TASK [configure_instances : Install feature Windows-Containers] ****************
ok: [10.84.14.226]
changed: [10.84.14.234]

TASK [configure_instances : Install feature NET-Framework-Features] ************
ok: [10.84.14.226]
changed: [10.84.14.234]

TASK [configure_instances : Install feature Hyper-V] ***************************
ok: [10.84.14.226]
changed: [10.84.14.234]

TASK [configure_instances : Enable testsigning] ********************************
changed: [10.84.14.226]
changed: [10.84.14.234]

TASK [configure_instances : Reboot the system] *********************************
changed: [10.84.14.226]
changed: [10.84.14.234]

TASK [configure_instances : Wait for reconnection] *****************************
ok: [10.84.14.226]
ok: [10.84.14.234]

TASK [configure_instances : Install Chocolatey] ********************************
changed: [10.84.14.234]
changed: [10.84.14.226]

TASK [configure_instances : Install python 2.7.13] *****************************
ok: [10.84.14.226]
changed: [10.84.14.234]

TASK [configure_instances : Install NSSM] **************************************
ok: [10.84.14.226]
changed: [10.84.14.234]

TASK [configure_instances : Install DockerProvider] ****************************
ok: [10.84.14.226]
changed: [10.84.14.234]

TASK [configure_instances : Install Docker-EE] *********************************
changed: [10.84.14.226]
changed: [10.84.14.234]

TASK [configure_instances : Install MS Visuall C++ Redist 14] ******************
ok: [10.84.14.226]
changed: [10.84.14.234]

TASK [configure_instances : Disable Windows Firewall] **************************
ok: [10.84.14.226]
changed: [10.84.14.234]

PLAY RECAP *********************************************************************
10.84.14.226 : ok=14 changed=4 unreachable=0 failed=0
10.84.14.234 : ok=14 changed=12 unreachable=0 failed=0

[root@a5s38node4 ansible]# ansible-playbook install_contrail.yml -i inventory

PLAY [windows_host] ************************************************************

TASK [Gathering Facts] *********************************************************
ok: [10.84.14.234]
ok: [10.84.14.226]

TASK [install_contrail : Check if OpenStack Keystone configuration is present] ***
skipping: [10.84.14.226]
skipping: [10.84.14.234]

TASK [install_contrail : Create artifacts directory] ***************************
changed: [10.84.14.226]
changed: [10.84.14.234]

TASK [install_contrail : Run contrail-vrouter-windows] *************************
changed: [10.84.14.234]
changed: [10.84.14.226]

TASK [install_contrail : Copy dlls to testbed] *********************************
changed: [10.84.14.226]
changed: [10.84.14.234]

TASK [install_contrail : Import vRouter certificate] ***************************
changed: [10.84.14.234]
changed: [10.84.14.226]

TASK [install_contrail : Install vRouter Extension] ****************************
changed: [10.84.14.234]
changed: [10.84.14.226]

TASK [install_contrail : Install vRouter Agent] ********************************
changed: [10.84.14.234]
changed: [10.84.14.226]

TASK [install_contrail : Install Docker Driver] ********************************
changed: [10.84.14.234]
changed: [10.84.14.226]

TASK [install_contrail : Get auth token from Keystone] *************************
ok: [10.84.14.234 -> localhost]
ok: [10.84.14.226 -> localhost]

TASK [install_contrail : set_fact] *********************************************
ok: [10.84.14.226]
ok: [10.84.14.234]

TASK [install_contrail : Create virtual router in Contrail] ********************
ok: [10.84.14.226 -> localhost]
ok: [10.84.14.234 -> localhost]

TASK [install_contrail : Install contrail docker driver] ***********************
changed: [10.84.14.234]
changed: [10.84.14.226]

TASK [install_contrail : Start contrail docker driver] *************************
fatal: [10.84.14.234]: FAILED! => {"can_pause_and_continue": false, "changed": false, "depended_by": [], "dependencies": [], "description": "", "desktop_interact": false, "display_name": "DockerDriver", "exists": true, "msg": "Failed to start service 'DockerDriver (DockerDriver)'.", "name": "DockerDriver", "path": "C:\\ProgramData\\chocolatey\\lib\\NSSM\\tools\\nssm.exe", "start_mode": "auto", "state": "stopped", "username": "LocalSystem"}
fatal: [10.84.14.226]: FAILED! => {"can_pause_and_continue": false, "changed": false, "depended_by": [], "dependencies": [], "description": "", "desktop_interact": false, "display_name": "DockerDriver", "exists": true, "msg": "Failed to start service 'DockerDriver (DockerDriver)'.", "name": "DockerDriver", "path": "C:\\ProgramData\\chocolatey\\lib\\NSSM\\tools\\nssm.exe", "start_mode": "auto", "state": "stopped", "username": "LocalSystem"}
        to retry, use: --limit @/root/contrail-windows-deployer/ansible/install_contrail.retry

PLAY RECAP *********************************************************************
10.84.14.226 : ok=12 changed=8 unreachable=0 failed=1
10.84.14.234 : ok=12 changed=

sagarkchitnis (sagarc)
Changed in opencontrail:
importance: Undecided → Medium
assignee: nobody → Michał Kostrzewa (mkostrzewa)
Revision history for this message
Jacek Iżykowski (j.i.) wrote :

The timeout for reboot is 10 minutes.
I seems to be reasonable compromise between the two needs:
1) Detecting that compute node couldn't be restarted.
2) Allowing compute node to handle reasonable amount of updates.

There is completely no guarantee about reboot time, so everything we pick is a compromise.
Is there any specific timeout that you suggest to set?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.