sync_time_command.sh fails when cluster is deployed via cli

Bug #1565759 reported by Vladimir Khlyunev
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Fuel QA Team
7.0.x
Invalid
High
MOS Maintenance
8.0.x
Invalid
High
MOS Maintenance
Mitaka
Fix Released
High
Fuel QA Team

Bug Description

Swarm test failed: https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.command_line/67/testReport/(root)/cli_selected_nodes_deploy/
ISO: 9.0-152
Snapshot: https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.command_line/67/artifact/logs/fail_error_cli_selected_nodes_deploy-fuel-snapshot-2016-04-03_22-30-01.tar.xz

Steps:
1. Revert snapshot "ready_with_3_slaves"
2. Create a cluster using Fuel CLI
3. Provision a controller node using Fuel CLI
4. Provision two compute+cinder nodes using Fuel CLI
5. Deploy the controller node using Fuel CLI
6. Deploy the compute+cinder nodes usin Fuel CLI

Result:
node-1.test.domain.local 2016-04-03T22:27:36.782136 err: /bin/bash "/etc/puppet/shell_manifests/sync_time_command.sh" returned 1 instead of one of [0]
Reason:
root@node-1:~# awk '/^server/ { if ($2 !~ /127\.127\.[0-9]+\.[0-9]+/) {ORS=" "; print $2}}' /etc/ntp.conf
10.109.1.2
But this address is not assigned to any host:
[root@nailgun ~]# for i in `seq 1 3` ; do ssh node-$i 'ifconfig | grep 109.1'; done
Warning: Permanently added 'node-1' (ECDSA) to the list of known hosts.
          inet addr:10.109.1.4 Bcast:10.109.1.255 Mask:255.255.255.0
Warning: Permanently added 'node-2' (ECDSA) to the list of known hosts.
          inet addr:10.109.1.6 Bcast:10.109.1.255 Mask:255.255.255.0
Warning: Permanently added 'node-3' (ECDSA) to the list of known hosts.
          inet addr:10.109.1.5 Bcast:10.109.1.255 Mask:255.255.255.0
[root@nailgun ~]# ifconfig | grep 109.1
[root@nailgun ~]#
After I switched 10.109.1.2 (mgmt) to 10.109.0.2 (admin) - all works fine:
root@node-1:~# bash /etc/puppet/shell_manifests/sync_time_command.sh ; echo $?
0

Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

expected result

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Changed in fuel:
status: New → Confirmed
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

That is the IP of the fuel master node. It's a fine IP to use. The actual failure needs more investigation

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
tags: added: swarm-blocker
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

What I've discovered is asymmetrical network bridge configuration on nodes:

controller:
root@node-2:~# cat /etc/astute.yaml | grep enp0s5
    enp0s5:
  - {action: add-port, bridge: br-mgmt, name: enp0s5}

compute:
root@node-1:~# cat /etc/astute.yaml | grep enp0s5
    enp0s5:
  - {action: add-port, bridge: br-storage, name: enp0s5}

If I manually assign enp0s5 to br-mgmt ntpdate starts to work.

The test provisions nodes one by one using separate fuel cli commands. It might be that astule.yaml was mutated between these steps.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

This seems to be broken because of incorrect network configuration, not because of puppet manifests.

Changed in fuel:
status: In Progress → Incomplete
Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
Vadim Rovachev (vrovachev) wrote :

Bug periodically reproduced on 8.0 and 7.0 versions:
ntp info from wrong env:
https://paste.mirantis.net/show/2107/

for 8.0 and 7.0 info please see bug:
https://bugs.launchpad.net/fuel/+bug/1563824

Changed in fuel:
status: Invalid → Confirmed
assignee: Kyrylo Galanov (kgalanov) → Fuel Library Team (fuel-library)
Revision history for this message
Vadim Rovachev (vrovachev) wrote :

ntp.conf from wrong configured node:
https://paste.mirantis.net/show/2108/

asstute.yaml wrong configured node:
https://paste.mirantis.net/show/2109/

ntp.cond from corectry configured node:
https://paste.mirantis.net/show/2110/

asstute.yaml from corectry configured node:
https://paste.mirantis.net/show/2111/

Diagnostic snapshot attachet to comment

Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

Also I analyzed the test - it changes cluster settings only once and it doing that before any node deployment, so it looks like the data in astute.yaml were corrupted after deployment one of noded

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
Changed in fuel:
assignee: Kyrylo Galanov (kgalanov) → Fuel Python Team (fuel-python)
tags: added: area-python
removed: area-library
Dmitry Pyzhov (dpyzhov)
tags: added: team-network
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

> ntp.conf from wrong configured node:
> https://paste.mirantis.net/show/2108/

It's not wrong, it's 100% correct. It's controller, and ntp.conf on controller nodes is configured by osnailyfacter/modular/ntp/ntp-server.pp manifest which uses `external_ntp` parameter from astute.yaml (in your case it's 10.109.15.1) as ntp server.

> ntp.cond from corectry configured node:
> https://paste.mirantis.net/show/2110/

It's non-controller node, so ntp.conf is configured by osnailyfacter/modular/ntp/ntp-client.pp manifest and it uses $management_vrouter_vip as ntp server (10.109.16.2 in your case).

> so it looks like the data in astute.yaml were corrupted after deployment one of noded

I've checked logs in your snapshot step-by-step, analyzed astute.yaml parameters uploaded to nodes and I see no data corruption (https://paste.mirantis.net/show/2124/). So please elaborate which data exactly is corrupted.

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

As for failure on 9.0:
> Snapshot: https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.command_line/67/artifact/logs/fail_error_cli_selected_nodes_deploy-fuel-snapshot-2016-04-03_22-30-01.tar.xz

I've analyzed this snapshot, here's the result: http://paste.openstack.org/show/493936/

So again, I see no "mutation" or "corruption" of astute.yaml in this deployment. If you check nailgun api log, you'll see that this test did not configure interfaces on nodes (see http://paste.openstack.org/show/493936/). So Fuel assigned networks to interfaces as it wanted - it's totally ok, but it may not match with your real/virtual network setup (which is exactly the case here).

In order to avoid such errors, I suggest to add interfaces configuration to this tests case. Forwarding this to fuel-qa for review.

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Fuel QA Team (fuel-qa)
tags: added: area-qa
removed: area-python team-network
tags: removed: need-info
Changed in fuel:
milestone: 9.0 → 10.0
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

As far as I can see, the problem is caused by nailgun. Nailgun generates default network interfaces configuration which may be different for nodes. For example, Management network is assigned to enp0s5 on controller nodes, but it is assigned to enp0s6 on compute nodes.

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Moved to area-python.

tags: added: area-python
removed: area-qa
Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → nobody
Peter Zhurba (pzhurba)
Changed in fuel:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Networking (l23-network)
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Please read comment #9. Especially last part. "In order to avoid such errors, I suggest to add interfaces configuration to this tests case. Forwarding this to fuel-qa for review."

Changed in fuel:
assignee: Networking (l23-network) → Fuel QA Team (fuel-qa)
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :
Changed in fuel:
status: Confirmed → Fix Committed
tags: added: non-release
Revision history for this message
Nastya Urlapova (aurlapova) wrote :
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Invalid for 7.0-updates and 8.0-updates as it stays in Incomplete for more than a month

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.