Environment deployment failed with Fuel (Too many nodes failed to provision)

Bug #1520088 reported by Alexander Koryagin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Fuel DevOps

Bug Description

ERROR FROM FUEL:
    Deployment Failed
    Error
    Too many nodes failed to provision
    Some nodes have an error status after deployment. Redeployment is needed.

After OS installation during deploy nodes became offline.
**The main thing that the same actions performed with the same versions but on other server were finished succesessfuly.**
So probably it is something wrong with the host server configuration.

ENVIRONMENT:
Host: srv94-bud.infra.mirantis.net
OS: Ubuntu 14.04.3 LTS
Version: MirantisOpenStack-7.0.iso (official)
Nodes: 3x controller, 1x compute, 1x cinder

Snapshots:
Before deploy:
# source /home/akoryagin/venv/fuel-devops-venv/bin/activate
# dos.py revert-resume fuelweb_test_system_test ready_3

Now deploy in failed on server and machine wasn't touched after it

MORE INFORMATION:

In Fuel Master, Astute log:
    2015-11-25 17:40:47 ERR [655] Timeout of provisioning is exceeded. Nodes not booted: ["1", "2", "3", "4", "5"]
    2015-11-25 17:40:47 DEBUG [655] Aborting provision. To many nodes failed: ["1", "2", "3", "4", "5"]
    2015-11-25 17:40:47 INFO [655] Node timed out to provision: 5
    2015-11-25 17:40:47 INFO [655] Node timed out to provision: 4
    2015-11-25 17:40:47 INFO [655] Node timed out to provision: 3
    2015-11-25 17:40:47 INFO [655] Node timed out to provision: 2
    2015-11-25 17:40:47 INFO [655] Node timed out to provision: 1

From the host machine:
$ virsh list --all
     Id Name State
    ----------------------------------------------------
     2 fuelweb_test_system_test_slave-05 running
     3 fuelweb_test_system_test_slave-04 running
     4 fuelweb_test_system_test_slave-03 running
     5 fuelweb_test_system_test_slave-02 running
     6 fuelweb_test_system_test_slave-01 running
     7 fuelweb_test_system_test_admin running

From Admin Node:
[root@nailgun ~]# fuel node
    id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
    ---|--------|------------------|---------|------------|-------------------|------------|---------------|--------|---------
    1 | error | Untitled (9a:fc) | 1 | 10.109.0.4 | 64:d0:df:47:9a:fc | controller | | False | 1
    5 | error | Untitled (9c:f6) | 1 | 10.109.0.6 | 64:1d:35:2b:9c:f6 | compute | | False | 1
    3 | error | Untitled (7f:33) | 1 | 10.109.0.7 | 64:b9:3f:2d:7f:33 | controller | | False | 1
    4 | error | Untitled (e4:09) | 1 | 10.109.0.3 | 64:de:b9:e5:e4:09 | cinder | | False | 1
    2 | error | Untitled (35:6e) | 1 | 10.109.0.5 | 64:b2:c4:c9:35:6e | controller | | False | 1

Nodes became offline:
    [root@nailgun ~]# for num in 3 4 5 6 7; do ping -q -w 5 10.109.0.${num}; done
    PING 10.109.0.3 (10.109.0.3) 56(84) bytes of data.

    --- 10.109.0.3 ping statistics ---
    4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3000ms
    pipe 3
    PING 10.109.0.4 (10.109.0.4) 56(84) bytes of data.

    --- 10.109.0.4 ping statistics ---
    3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3000ms
    pipe 3
    PING 10.109.0.5 (10.109.0.5) 56(84) bytes of data.

    --- 10.109.0.5 ping statistics ---
    4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3000ms
    pipe 3
    PING 10.109.0.6 (10.109.0.6) 56(84) bytes of data.

    --- 10.109.0.6 ping statistics ---
    3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3000ms
    pipe 3
    PING 10.109.0.7 (10.109.0.7) 56(84) bytes of data.

    --- 10.109.0.7 ping statistics ---
    4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3001ms
    pipe 3

Tags: area-devops
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

This issue marked as duplicate accordingly to the conversation with Georgy Duldin in skype. He saw this issue before and looks like he knows the root of the issue.

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Hi infra team, looks like we need to redeploy server srv94-bud.infra.mirantis.net, because the original issue reproduced only on this server and we hope that redeployment of operation system will help to fix the issue.

Thank you!

Changed in fuel:
assignee: nobody → Fuel DevOps (fuel-devops)
milestone: none → 8.0
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Igor Shishkin (teran) wrote :

@Timur, please create separate bug since this one in it's main form doesn't represent what are you exactly want.
Marking this as invalid since it's not relevant according to your first comment.

Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

@Igor, ok, I described this request in separate bug: https://bugs.launchpad.net/fuel/+bug/1521101

Dmitry Pyzhov (dpyzhov)
tags: added: area-devops
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.