FUEL 9. Deployment has failed. All nodes are finished. Failed tasks: Task[sync_time/1], Task[sync_time/3], Task[sync_time/2] Stopping the deployment process!

Bug #1622518 reported by Samer Machara
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Sustaining

Bug Description

Detailed bug description:
  I'm testing fuel 9, after configure my environment when I deploy I got the following error:

      Deployment has failed. All nodes are finished. Failed tasks: Task[sync_time/1],
      Task[sync_time/3], Task[sync_time/2] Stopping the deployment process!

Steps to reproduce:
    1. Add 3 nodes, 1 Compute, 1 Storage and 1 Controller.
       Each node has de following characteristics:
         Manufacturer: Supermicro
           CPU 8 x 2.53 GHz
           Disks 1 drive, 233.8 GB total
           Interfaces 1 x 0.1 Gbps, 1 x 1.0 Gbps
           Memory 6 x 4.0 GB, 24.0 GB total
           System Supermicro X8DA3
           NUMA topology 2 NUMA nodes
Expected results:
 Operational Openstack
Actual result:
    Error Message: Deployment has failed. All nodes are finished. Failed tasks:
                     Task[sync_time/1],Task[sync_time/3], Task[sync_time/2]
                     Stopping the deployment process!

Description of the environment:
 Network model: See the figure attached:
 Diagnostic Snapshot: see attachment

Tags: area-library
Revision history for this message
Samer Machara (samer-machara) wrote :
Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Please provide the diagnostic snapshot.
I believe the issue because of fuel master node doesn't have an internet connection.

Changed in fuel:
status: New → Incomplete
importance: Undecided → High
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
milestone: none → 9.1
tags: added: area-library
Revision history for this message
Aleksei Stepanov (penguinolog) wrote :

Please provide:
 Fuel-devops version
 Fuel-qa version

Revision history for this message
Samer Machara (samer-machara) wrote :

fuel --fuel-version
api: '1'
auth_required: true
feature_groups: []
openstack_version: mitaka-9.0
release: '9.0'

Revision history for this message
Samer Machara (samer-machara) wrote :

the diagnostic snapshot is 398.3MB long. Each time, I try to upload it i get this menssage:

Timeout error
Sorry, something just went wrong in Launchpad.
We’ve recorded what happened, and we’ll fix it as soon as possible. Apologies for the inconvenience.
Trying again in a couple of minutes might work.
(Error ID: OOPS-319800cff376752e9644d64f7a91b05d)

Is there another way to send you the snapshot?

Thanks

Revision history for this message
Samer Machara (samer-machara) wrote :

I Don't think is internet problem:

[root@fuel ~]# ping www.google.com
PING www.google.com (172.217.19.132) 56(84) bytes of data.
64 bytes from par03s12-in-f132.1e100.net (172.217.19.132): icmp_seq=1 ttl=52 time=2.52 ms
64 bytes from par03s12-in-f4.1e100.net (172.217.19.132): icmp_seq=2 ttl=52 time=2.55 ms
64 bytes from par03s12-in-f4.1e100.net (172.217.19.132): icmp_seq=3 ttl=52 time=2.53 ms
64 bytes from par03s12-in-f4.1e100.net (172.217.19.132): icmp_seq=4 ttl=52 time=2.60 ms
^C
--- www.google.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 2.528/2.553/2.603/0.046 ms

And, I can access the Fuel Dashboard from Home.

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Please do ping from node-1 and show the routing on the node.
Anyway, we need a diagnostic snapshot to solve the issue.
Please upload it to the Google Drive or Dropbox and provide the link.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 9.1 → 9.2
Revision history for this message
Samer Machara (samer-machara) wrote :
Revision history for this message
Samer Machara (samer-machara) wrote :

[root@fuel ~]# fuel node list
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---+--------+------------------+---------+-----------+-------------------+------------+---------------+--------+---------
 7 | error | Untitled (50:f4) | 3 | 10.30.0.3 | 00:30:48:b9:50:f4 | controller | | 1 | 3
[root@fuel ~]# ssh 10.30.0.3
Warning: Permanently added '10.30.0.3' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-95-generic x86_64)

 * Documentation: https://help.ubuntu.com/
Last login: Wed Sep 14 13:19:57 2016 from 10.30.0.2

root@node-7:~# ping google.com
PING google.com (216.58.211.78) 56(84) bytes of data.
64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=1 ttl=51 time=2.85 ms
64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=2 ttl=51 time=2.83 ms
64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=3 ttl=51 time=2.80 ms
64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=4 ttl=51 time=2.81 ms
64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=5 ttl=51 time=2.76 ms

64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=6 ttl=51 time=2.79 ms
64 bytes from par03s14-in-f14.1e100.net (216.58.211.78): icmp_seq=7 ttl=51 time=2.81 ms
^C
--- google.com ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 6011ms
rtt min/avg/max/mdev = 2.766/2.811/2.850/0.037 ms
root@node-7:~#

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Could you please also test ntpdate istead of ping

Revision history for this message
Samer Machara (samer-machara) wrote :

root@node-7:~# ntpdate 0.fuel.pool.ntp.org
20 Sep 08:17:12 ntpdate[17367]: the NTP socket is in use, exiting
root@node-7:~# ntpdate 1.fuel.pool.ntp.org
20 Sep 08:18:25 ntpdate[17633]: the NTP socket is in use, exiting
root@node-7:~# ntpdate -u 0.fuel.pool.ntp.org
20 Sep 07:19:19 ntpdate[18948]: step time server 195.154.71.176 offset -3815.359765 sec
root@node-7:~# ntpdate -u 1.fuel.pool.ntp.org
20 Sep 07:19:55 ntpdate[19089]: adjust time server 37.59.119.229 offset -0.001740 sec
root@node-7:~# ntpdate -u 2.fuel.pool.ntp.org
20 Sep 07:20:08 ntpdate[19215]: adjust time server 62.4.12.66 offset 0.000963 sec
root@node-7:~#

Revision history for this message
Samer Machara (samer-machara) wrote :

I still having the same problem.

After run the previous commands I execute again the deployment and I get the same result.

Error
Deployment has failed. All nodes are finished. Failed tasks: Task[sync_time/7] Stopping the deployment process!

Sure, I'm missing something, because I'm trying to deploy a similar architecture in different servers and I have the same problem.

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

could you please execute ntpdate exactly with the same parameters used during deploy:

ntpdate -u -v $(awk '/^server/ { if ($2 !~ /127\.127\.[0-9]+\.[0-9]+/) {ORS=" "; print $2}}' /etc/ntp.conf)

Revision history for this message
Marouen (mechtri-marwen) wrote :

The ntp running on the slave nodes are not able to synchronize their clock with the ntp server running on the Fuel master.

The solution I found is to: change the server name in the ntp.conf file of the slave nodes.

Instead of:
server 10.20.0.2 burst iburst

I changed with:
server 0.fuel.pool.ntp.org burst iburst

I recommend to run the node provisioning then doing this modification. Finally doing the node deployment.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

I didn't see this problem for a very long time. Last time I saw this it was due the fact that env was ran on VBox without additions which set up proper clock timings. Marouen, could you tell a bit more about your environment? Is it bare metal, virtual box, some KVM or anything else?

Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :

Moved to Invalid due to a very low reproducibility. Feel free to reopen if it appears again

Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
Udayendu Kar (udayendu-kar) wrote :

Currently I am facing the same issue in our setup. Let me brief the current state:

    - My setup was running well with 2 controller, 2 computes, 3 ceph & 1 baremetal server
    - Then to scale it, I added 1 more ceph node and a controller. The deployment went well.
    - But from this point on wards, I was unable to take the console of the instances smoothly and getting the following error:
    "Failed to connect to server (code: 1006)"

But manually if I reload the novnc console of the instance, out of 10 times it used to work 6-7 times.
That indicates some issue in the HA and inorder to isolate it, I removed the lastly added controller node and started applying the changes.

Now stuck with the below error message:

  "Deployment has failed. All nodes are finished. Failed tasks: Task[sync_time/15], Task[sync_time/14], Task[sync_time/3], Task[sync_time/2], Task[sync_time/7], Task[sync_time/6] Stopping the deployment process!"

Any suggestion ?

Changed in fuel:
status: Invalid → New
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

@Udayendu,

Please run the following command on one of slave nodes and provide output

ntpdate -u -v $(awk '/^server/ { if ($2 !~ /127\.127\.[0-9]+\.[0-9]+/) {ORS=" "; print $2}}' /etc/ntp.conf)

Changed in fuel:
status: New → Incomplete
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

It is not reproducable, moved to invalid.

Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
Jeffrey Gong (jeffreygong) wrote :

ntpdate -u -v $(awk '/^server/ { if ($2 !~ /127\.127\.[0-9]+\.[0-9]+/) {ORS=" "; print $2}}' /etc/ntp.conf)

or

ntpdate -u -v 10.20.0.2 # default fuel server failed.
# fails...

restarting ntpd on fuel sever fixes issue.

# on fuel server
service ntpd restart

#on node
root@node-5:~# ntpdate -u -v $(awk '/^server/ { if ($2 !~ /127\.127\.[0-9]+\.[0-9]+/) {ORS=" "; print $2}}' /etc/ntp.conf)
15 Mar 20:16:45 ntpdate[30313]: ntpdate 4.2.8p4@1.3265-o Wed Oct 5 12:34:47 UTC 2016 (1)
15 Mar 20:16:52 ntpdate[30313]: adjust time server 10.20.0.2 offset 0.074014 sec

restart of install now works.

Revision history for this message
Venkateshwarlu Vangala (vvenkat) wrote :

I got the same issue on OPNFV release Danube 2.0 with Fuel 10.0 on ARM servers.
When I check the ntpd service in Fuel Server, it is not running. Starting the ntpd service resolved.

[root@fuel ~]# ps -ax | grep ntp
 18583 pts/1 S+ 0:00 grep --color=auto ntp
[root@fuel ~]# service ntpd restart
Redirecting to /bin/systemctl restart ntpd.service
[root@fuel ~]# ps -ax | grep ntp
 18608 ? Ss 0:00 /usr/sbin/ntpd -u ntp:ntp -g
 18612 pts/1 S+ 0:00 grep --color=auto ntp

So, the question is why is ntpd service not running, or it got killed after starting.

-Venkat

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.