deployment fails with ntp_check.pp but should have been caught by network verification

Bug #1477884 reported by Anar
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Triaged
Medium
Fuel Sustaining
Mitaka
Won't Fix
Medium
Fuel Python (Deprecated)
Newton
Triaged
Medium
Fuel Sustaining

Bug Description

Problem:
We do not validate the upstream NTP settings until after we have applied them to the environment at the end of a deployment. If the upstream NTP servers are unavailable post-network configuration then the controllers will not have a proper NTP configuration which may lead to clustering issues. We should validate the upstream NTP servers as part of the network verification phase much like we do with the configured repositories

How to test:
If the deployed environment does not have internet connectivity, leave the default public NTP servers configured and run a deployment.

Expected result:
Network verification should validate NTP server accessibility as part of the network checks. This should throw a warning so that the end use can update the NTP servers with locally accessible systems

Actual results:
At the very last phases of deployment it fails with the following error:

Deployment has failed. Method granular_deploy. Failed to execute hook 'puppet' Puppet run failed. Check puppet logs for details
---
priority: 200
fail_on_error: true
type: puppet
uids:
- '1'
parameters:
  puppet_modules: "/etc/puppet/modules"
  puppet_manifest: "/etc/puppet/modules/osnailyfacter/modular/ntp/ntp-check.pp"
  timeout: 600
  cwd: "/"
.
Inspect Astute logs for the details

Astute logs shows the following:

http://paste.openstack.org/show/405082/

NTP nodes are pingable from both the fuel master and slaves.

Implementation is on VisualBox.

Version information:
[root@fuel ~]# fuel --fuel-version
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
api: '1.0'
astute_sha: 1ea8017fe8889413706d543a5b9f557f5414beae
auth_required: true
build_id: 2015-06-19_13-02-31
build_number: '525'
feature_groups:
- mirantis
fuel-library_sha: 2e7a08ad9792c700ebf08ce87f4867df36aa9fab
fuel-ostf_sha: 8fefcf7c4649370f00847cc309c24f0b62de718d
fuelmain_sha: a3998372183468f56019c8ce21aa8bb81fee0c2f
nailgun_sha: dbd54158812033dd8cfd7e60c3f6650f18013a37
openstack_version: 2014.2.2-6.1
production: docker
python-fuelclient_sha: 4fc55db0265bbf39c369df398b9dc7d6469ba13b
release: '6.1'
release_versions:
  2014.2.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: 1ea8017fe8889413706d543a5b9f557f5414beae
      build_id: 2015-06-19_13-02-31
      build_number: '525'
      feature_groups:
      - mirantis
      fuel-library_sha: 2e7a08ad9792c700ebf08ce87f4867df36aa9fab
      fuel-ostf_sha: 8fefcf7c4649370f00847cc309c24f0b62de718d
      fuelmain_sha: a3998372183468f56019c8ce21aa8bb81fee0c2f
      nailgun_sha: dbd54158812033dd8cfd7e60c3f6650f18013a37
      openstack_version: 2014.2.2-6.1
      production: docker
      python-fuelclient_sha: 4fc55db0265bbf39c369df398b9dc7d6469ba13b
      release: '6.1'

Revision history for this message
Anar (anar-babayev) wrote :
Revision history for this message
Vladimir Kozhukalov (kozhukalov) wrote :

Perhaps having working ping to NTP is not enough. Please make sure ntpdate X.pool.ntp.org works on your servers.

Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Anar, you probably have issues with your virtualbox configuration and default gateway for your public network. Make sure your Fuel environment settings match your vboxnet{0,1,2} network settings. They either can't resolve external domains or can't reach them due to some issues with their default gateway.

Changed in fuel:
milestone: none → 7.0
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Anar (anar-babayev) wrote :

Hello,

I ran ntpdate in both fuel and controller node. In both cases I get the following:

[root@fuel ~]# ntpdate 0.pool.ntp.org
22 Jul 21:27:48 ntpdate[597]: the NTP socket is in use, exiting

root@node-1:~# ntpdate 0.pool.ntp.org
22 Jul 21:28:51 ntpdate[29978]: the NTP socket is in use, exiting

Regards,
Anar

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Hey Anar,

You can use ntpdate -u 0.pool.ntp.org so you don't have to stop NTP. This check makes sure that the controllers will be able to communicate with the specified NTP servers from the settings. If you do not have internet available to the controllers, you'll need to use internal NTP servers

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Anar (anar-babayev) wrote :

Hello,

Here's what I get:

root@node-1:~# ntpdate -u 0.pool.ntp.org
22 Jul 22:11:06 ntpdate[13912]: no server suitable for synchronization found

The other issue is that my /etc/ntp.conf only contains fuel master as a server. Why do I need external NTPs on controllers?

The commands ntpdate -u x.pool.ntp.org fails on fuel master as well despite the fact that x.pool.ntp.org are in /etc/ntp.conf:

[root@fuel ~]# cat /etc/ntp.conf
# ntp.conf: Managed by puppet.
#
# Keep ntpd from panicking in the event of a large clock skew
# when a VM guest is suspended and resumed.
tinker panic 0 stepout 5

# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict -6 ::1

server 0.pool.ntp.org iburst minpoll 3 maxpoll 9
server 1.pool.ntp.org iburst minpoll 3 maxpoll 9
server 2.pool.ntp.org iburst minpoll 3 maxpoll 9

# Undisciplined Local Clock. This is a fake driver intended for backup
# and when no outside source of synchronized time is available.
server 127.127.1.0
fudge 127.127.1.0 stratum 10
restrict 127.127.1.0

# Driftfile.
driftfile /var/lib/ntp/drift

However peers seem to be rejected:

[root@fuel ~]# ntpq
ntpq> peers
     remote refid st t when poll reach delay offset jitter
==============================================================================
*LOCAL(0) .LOCL. 10 l 8 64 377 0.000 0.000 0.000
 hubbard.kohina. .INIT. 16 u - 512 0 0.000 0.000 0.000
 h160n5-vrr-a31. .INIT. 16 u - 512 0 0.000 0.000 0.000
 public-timehost .INIT. 16 u - 512 0 0.000 0.000 0.000
ntpq> assoc

ind assid status conf reach auth condition last_event cnt
===========================================================
  1 25019 963a yes yes none sys.peer sys_peer 3
  2 25020 8011 yes no none reject mobilize 1
  3 25021 8011 yes no none reject mobilize 1
  4 25022 8011 yes no none reject mobilize 1
ntpq>

Regards,
Anar

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Time is very important for the controller cluster so you must have valid NTP servers. Additionally all of the other nodes use the controllers as their source NTP so in the environment the controller's NTP setup is very important. The fuel-master ntp isn't as important but you can change that if you need to. The 'no server suitable for synchronization found' is the problem as the ntp_check verifies that we can actually get a valid NTP response from the provided servers.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Hi Anar,

I suspect that you have or had had problems with network connectivity at deploy time. NTPd can be broken if it have external servers, but cannot reach them at start time - it is why you need connection to them.
Could you, please give the output from command 'ntpdate -u <put_your_fuel_master_ip_here> from controllers and also give content of /etc/ntp.conf from controller node?

Revision history for this message
Anar (anar-babayev) wrote :

Hello,

Here's the output command from ntpdate on controller:

root@node-1:~# ntpdate -u 10.20.0.2
22 Jul 22:25:42 ntpdate[32388]: adjust time server 10.20.0.2 offset 0.000182 sec
root@node-1:~#

Here's the ntp.conf file on controller:

http://paste.openstack.org/show/405887/

I don't really understand why do I need on controller to contact external NTPs if, as you can see from the paste, only fuel master is configured as ntp master?

Second, I think this must be verified during network verification before deployment. I did the verification and it passed without errors.

BR,
Anar

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Anar,

You don't have to set the controllers to use an external NTP server. You can configure the environment to leverage the fuel-master as an NTP server. But you must configure that on the environment's settings page. By default, the environment's NTP servers are configured to use internet based servers. You are free to configure them to what ever you'd like, but you'll need to make sure that you configure the fuel-master's NTP server to make sure it can be queried by the controllers. Also network verification does not check NTP connectivity, only repository connectivity.

Revision history for this message
Anar (anar-babayev) wrote :

Hello Alex,

So the ntp.conf in the controller Host OS is not relevant? Deployment process will anyway test external NTP servers that are listed in environment settings? In that case I believe that this check should be moved to network verification also in addition to repository connectivity. As both failures also leads to deployment failure. Is there a way to resume the deployment after changing settings in environment? Or shall I redeploy environment?

Regards,
Anar

Revision history for this message
Alex Schultz (alex-schultz) wrote :

I'm not sure what you mean by the ntp.conf on the controller host os not being relevant. The deployment process checks to make sure that we are able to query the NTP servers provided after they have been configured to ensure that you don't end up with a deployment that ultimately has time related issues. It's better that we fail in the deployment phase than allow you to proceed and eventually have problems. I believe to make changes to the NTP servers you may have to reset your environment and redeploy.

I agree that we should probably be doing NTP verification as part of the network verification steps as well, if you want to create a bug for that then we could address it in the network verification code.

Revision history for this message
Anar (anar-babayev) wrote :

Hello Alex,

Thanks for the reply. I will change the environment settings and try to redeploy again. Considering the time it takes for the deployment with VirtualBox, I would highly recommend to add this check to the network verification part.

Kind regards,
Anar

summary: - deployment fails with ntp_check.pp
+ deployment fails with ntp_check.pp but should have been caught by
+ network verification
description: updated
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Fuel Python Team (fuel-python)
status: Incomplete → Confirmed
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Maciej Kwiek (maciej-iai)
status: Confirmed → In Progress
Revision history for this message
Maciej Kwiek (maciej-iai) wrote :

I consulted with Dimitry Shulyak, I will add the check to network checker in a similar manner that repository access check was added.

Revision history for this message
Maciej Kwiek (maciej-iai) wrote :

I was asked to leave this bug for someone more experienced.

Changed in fuel:
assignee: Maciej Kwiek (maciej-iai) → Fuel Python Team (fuel-python)
Changed in fuel:
status: In Progress → Confirmed
Revision history for this message
Dima Shulyak (dshulyak) wrote :

It is a feature, we already reached scf, and imo it is better to postpone this task until next release

tags: added: feature
Changed in fuel:
milestone: 7.0 → 7.0-updates
tags: added: qa-agree-8.0
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 7.0-updates → 8.0
no longer affects: fuel/8.0.x
Dmitry Pyzhov (dpyzhov)
tags: added: area-python
Revision history for this message
Alexander Kislitsky (akislitsky) wrote :

We passed SCF in 8.0. Moving the bug to 9.0.

Changed in fuel:
milestone: 8.0 → 9.0
Revision history for this message
Udayendu Kar (udayendu-kar) wrote :

I am also facing the same issue with Fuel 8.0 on VirtualBox setup:

Error
Deployment has failed. Method granular_deploy. Failed to execute hook 'ntp-check' Puppet run failed. Check puppet logs for details

---
uids:
- '1'
parameters:
  puppet_modules: /etc/puppet/modules
  puppet_manifest: /etc/puppet/modules/osnailyfacter/modular/ntp/ntp-check.pp
  timeout: 600
  cwd: /
priority: 200
fail_on_error: true
type: puppet
id: ntp-check
.
Inspect Astute logs for the details

Here is what my setup is:

   - From fuel menu, I have provided 3 NTP server IPs
   - During the network check I am able to see the success message
   - But deployment failed with " Deployment has failed. Method granular_deploy. Failed to execute hook 'ntp-check' Puppet run failed. Check puppet logs for details" error every time. All the nodes are able to access to internet.

Here are my queries:

   1. I have provided all 3 NTP servers name on FUEL MENU and save that with no error.
   2. I can see the correct entries for NTP servers in "/etc/fuel/astute.yaml" file.
   3. Network connection is also correct and all nodes have access to internet

But still the deployment failed with the below errors as mentioned in the log:

      Logs: http://paste.fedoraproject.org/356913/14609644/

Need some assistance on it so that I can make the setup ready. Fuel 8.0 is having few good features but NTP is a major issue still available since Fuel 7.

Revision history for this message
Volodymyr (core-hor) wrote :

Also have the same issue with Fuel and MOS8.0. Network is verified. I can ntpdate servers, specified via fuelmenu and which are exported to astute.yaml. But still deployment fails on the ntp_check

Revision history for this message
John T Johnson (jt.johnson) wrote :

I am having this same issue on Fuel 11. Network checks out fine, deployment goes to 98% then fails. I manually configured the Fuel 11 Master to use an internal NTP server. Did nothing to the Controller or compute node (physical machines - Dell FX2)

I do not see any NTP settings in the GUI. I will manually try and set the NTP settings on the controller from the bootstrap:

Astute log:

2017-09-15 18:37:52 ERROR [92988] Puppet agent took too long to run puppet task. Mark task as failed. Node 1, task ntp-server, manifest /etc/puppet/modules/osnailyfacter/modular/ntp/ntp-server.pp

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.