Fuel for OpenStack

FUEL 9. Deployment has failed. All nodes are finished. Failed tasks: Task[sync_time/1], Task[sync_time/3], Task[sync_time/2] Stopping the deployment process!

Bug #1622518 reported by Samer Machara on 2016-09-12

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	High	Fuel Sustaining	Fuel for OpenStack 9.2

Bug Description

Detailed bug description:
I'm testing fuel 9, after configure my environment when I deploy I got the following error:

Deployment has failed. All nodes are finished. Failed tasks: Task[sync_time/1],
Task[sync_time/3], Task[sync_time/2] Stopping the deployment process!

Steps to reproduce:
    1. Add 3 nodes, 1 Compute, 1 Storage and 1 Controller.
       Each node has de following characteristics:
         Manufacturer: Supermicro
           CPU 8 x 2.53 GHz
           Disks 1 drive, 233.8 GB total
           Interfaces 1 x 0.1 Gbps, 1 x 1.0 Gbps
           Memory 6 x 4.0 GB, 24.0 GB total
           System Supermicro X8DA3
           NUMA topology 2 NUMA nodes
Expected results:
Operational Openstack
Actual result:
    Error Message: Deployment has failed. All nodes are finished. Failed tasks:
                     Task[sync_time/1],Task[sync_time/3], Task[sync_time/2]
                     Stopping the deployment process!

Description of the environment:
Network model: See the figure attached:
Diagnostic Snapshot: see attachment

Tags:

Revision history for this message

Samer Machara (samer-machara) wrote on 2016-09-12:

Network architecture Edit (751.7 KiB, image/svg+xml)

Revision history for this message

Maksim Malchuk (mmalchuk) wrote on 2016-09-12:

Please provide the diagnostic snapshot.
I believe the issue because of fuel master node doesn't have an internet connection.

Changed in fuel:
status:	New → Incomplete
importance:	Undecided → High
assignee:	nobody → Fuel Sustaining (fuel-sustaining-team)
milestone:	none → 9.1
tags:	added: area-library

Revision history for this message

Aleksei Stepanov (penguinolog) wrote on 2016-09-12:

Please provide:
Fuel-devops version
Fuel-qa version

Revision history for this message

Samer Machara (samer-machara) wrote on 2016-09-13:

fuel --fuel-version
api: '1'
auth_required: true
feature_groups: []
openstack_version: mitaka-9.0
release: '9.0'

Revision history for this message

Samer Machara (samer-machara) wrote on 2016-09-13:

the diagnostic snapshot is 398.3MB long. Each time, I try to upload it i get this menssage:

Timeout error
Sorry, something just went wrong in Launchpad.
We’ve recorded what happened, and we’ll fix it as soon as possible. Apologies for the inconvenience.
Trying again in a couple of minutes might work.
(Error ID: OOPS-319800cff376752e9644d64f7a91b05d)

Is there another way to send you the snapshot?

Thanks

Revision history for this message

Samer Machara (samer-machara) wrote on 2016-09-13:

I Don't think is internet problem:

[root@fuel ~]# ping www.google.com
PING www.google.com (172.217.19.132) 56(84) bytes of data.
64 bytes from par03s12-in-f132.1e100.net (172.217.19.132): icmp_seq=1 ttl=52 time=2.52 ms
64 bytes from par03s12-in-f4.1e100.net (172.217.19.132): icmp_seq=2 ttl=52 time=2.55 ms
64 bytes from par03s12-in-f4.1e100.net (172.217.19.132): icmp_seq=3 ttl=52 time=2.53 ms
64 bytes from par03s12-in-f4.1e100.net (172.217.19.132): icmp_seq=4 ttl=52 time=2.60 ms
^C
--- www.google.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 2.528/2.553/2.603/0.046 ms

And, I can access the Fuel Dashboard from Home.

Revision history for this message

Maksim Malchuk (mmalchuk) wrote on 2016-09-13:

Please do ping from node-1 and show the routing on the node.
Anyway, we need a diagnostic snapshot to solve the issue.
Please upload it to the Google Drive or Dropbox and provide the link.

Dmitry Pyzhov (dpyzhov) on 2016-09-13

Changed in fuel:
milestone:	9.1 → 9.2

Revision history for this message

Samer Machara (samer-machara) wrote on 2016-09-19:

Here is the Diagnostic Snapsot

https://drive.google.com/file/d/0B-pmfloa6c8FcXdja28xOWNlck0/view?usp=sharing

Revision history for this message

Samer Machara (samer-machara) wrote on 2016-09-19:

* Documentation: https://help.ubuntu.com/
Last login: Wed Sep 14 13:19:57 2016 from 10.30.0.2

root@node-7:~# ping google.com
PING google.com (216.58.211.78) 56(84) bytes of data.
64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=1 ttl=51 time=2.85 ms
64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=2 ttl=51 time=2.83 ms
64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=3 ttl=51 time=2.80 ms
64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=4 ttl=51 time=2.81 ms
64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=5 ttl=51 time=2.76 ms

64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=6 ttl=51 time=2.79 ms
64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=7 ttl=51 time=2.81 ms
^C
--- google.com ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 6011ms
rtt min/avg/max/mdev = 2.766/2.811/2.850/0.037 ms
root@node-7:~#

Revision history for this message

Maksim Malchuk (mmalchuk) wrote on 2016-09-19:

#10

Could you please also test ntpdate istead of ping

Revision history for this message

Samer Machara (samer-machara) wrote on 2016-09-20:

#11

root@node-7:~# ntpdate 0.fuel.pool.ntp.org
20 Sep 08:17:12 ntpdate[17367]: the NTP socket is in use, exiting
root@node-7:~# ntpdate 1.fuel.pool.ntp.org
20 Sep 08:18:25 ntpdate[17633]: the NTP socket is in use, exiting
root@node-7:~# ntpdate -u 0.fuel.pool.ntp.org
20 Sep 07:19:19 ntpdate[18948]: step time server 195.154.71.176 offset -3815.359765 sec
root@node-7:~# ntpdate -u 1.fuel.pool.ntp.org
20 Sep 07:19:55 ntpdate[19089]: adjust time server 37.59.119.229 offset -0.001740 sec
root@node-7:~# ntpdate -u 2.fuel.pool.ntp.org
20 Sep 07:20:08 ntpdate[19215]: adjust time server 62.4.12.66 offset 0.000963 sec
root@node-7:~#

Revision history for this message

Samer Machara (samer-machara) wrote on 2016-09-20:

#12

I still having the same problem.

After run the previous commands I execute again the deployment and I get the same result.

Error
Deployment has failed. All nodes are finished. Failed tasks: Task[sync_time/7] Stopping the deployment process!

Sure, I'm missing something, because I'm trying to deploy a similar architecture in different servers and I have the same problem.

Revision history for this message

Maksim Malchuk (mmalchuk) wrote on 2016-09-20:

#13

could you please execute ntpdate exactly with the same parameters used during deploy:

ntpdate -u -v $(awk '/^server/ { if ($2 !~ /127\.127\.[0-9]+\.[0-9]+/) {ORS=" "; print $2}}' /etc/ntp.conf)

Revision history for this message

Marouen (mechtri-marwen) wrote on 2016-09-30:

#14

The ntp running on the slave nodes are not able to synchronize their clock with the ntp server running on the Fuel master.

The solution I found is to: change the server name in the ntp.conf file of the slave nodes.

Instead of:
server 10.20.0.2 burst iburst

I changed with:
server 0.fuel.pool.ntp.org burst iburst

I recommend to run the node provisioning then doing this modification. Finally doing the node deployment.

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2016-10-18:

#15

I didn't see this problem for a very long time. Last time I saw this it was due the fact that env was ran on VBox without additions which set up proper clock timings. Marouen, could you tell a bit more about your environment? Is it bare metal, virtual box, some KVM or anything else?

Revision history for this message

Alexander Kurenyshev (akurenyshev) wrote on 2016-11-07:

#16

Moved to Invalid due to a very low reproducibility. Feel free to reopen if it appears again

Changed in fuel:
status:	Incomplete → Invalid

Revision history for this message

Udayendu Kar (udayendu-kar) wrote on 2016-11-19:

#17

Currently I am facing the same issue in our setup. Let me brief the current state:

    - My setup was running well with 2 controller, 2 computes, 3 ceph & 1 baremetal server
    - Then to scale it, I added 1 more ceph node and a controller. The deployment went well.
    - But from this point on wards, I was unable to take the console of the instances smoothly and getting the following error:
    "Failed to connect to server (code: 1006)"

But manually if I reload the novnc console of the instance, out of 10 times it used to work 6-7 times.
That indicates some issue in the HA and inorder to isolate it, I removed the lastly added controller node and started applying the changes.

Now stuck with the below error message:

"Deployment has failed. All nodes are finished. Failed tasks: Task[sync_time/15], Task[sync_time/14], Task[sync_time/3], Task[sync_time/2], Task[sync_time/7], Task[sync_time/6] Stopping the deployment process!"

Any suggestion ?

Changed in fuel:
status:	Invalid → New

Revision history for this message

Oleksiy Molchanov (omolchanov) wrote on 2016-11-21:

#18

@Udayendu,

Please run the following command on one of slave nodes and provide output

ntpdate -u -v $(awk '/^server/ { if ($2 !~ /127\.127\.[0-9]+\.[0-9]+/) {ORS=" "; print $2}}' /etc/ntp.conf)

Changed in fuel:
status:	New → Incomplete

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2016-12-06:

#19

It is not reproducable, moved to invalid.

Changed in fuel:
status:	Incomplete → Invalid

Revision history for this message

Jeffrey Gong (jeffreygong) wrote on 2017-03-15:

#20

ntpdate -u -v $(awk '/^server/ { if ($2 !~ /127\.127\.[0-9]+\.[0-9]+/) {ORS=" "; print $2}}' /etc/ntp.conf)

ntpdate -u -v 10.20.0.2 # default fuel server failed.
# fails...

restarting ntpd on fuel sever fixes issue.

# on fuel server
service ntpd restart

#on node
root@node-5:~# ntpdate -u -v $(awk '/^server/ { if ($2 !~ /127\.127\.[0-9]+\.[0-9]+/) {ORS=" "; print $2}}' /etc/ntp.conf)
15 Mar 20:16:45 ntpdate[30313]: ntpdate 4.2.8p4@1.3265-o Wed Oct 5 12:34:47 UTC 2016 (1)
15 Mar 20:16:52 ntpdate[30313]: adjust time server 10.20.0.2 offset 0.074014 sec

restart of install now works.

Revision history for this message

Venkateshwarlu Vangala (vvenkat) wrote on 2017-06-29:

#21

I got the same issue on OPNFV release Danube 2.0 with Fuel 10.0 on ARM servers.
When I check the ntpd service in Fuel Server, it is not running. Starting the ntpd service resolved.

[root@fuel ~]# ps -ax | grep ntp
18583 pts/1 S+ 0:00 grep --color=auto ntp
[root@fuel ~]# service ntpd restart
Redirecting to /bin/systemctl restart ntpd.service
[root@fuel ~]# ps -ax | grep ntp
18608 ? Ss 0:00 /usr/sbin/ntpd -u ntp:ntp -g
18612 pts/1 S+ 0:00 grep --color=auto ntp

So, the question is why is ntpd service not running, or it got killed after starting.

-Venkat

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Network architecture Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.