Task 'spawn_vms': ntpdate -u ...: shell timeout error: execution expired

Bug #1504493 reported by Vladimir Khlyunev
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Kyrylo Galanov
7.0.x
Fix Released
High
Sergii Golovatiuk

Bug Description

Reproduced by CI:

Failed to execute hook 'shell' command: cd / && ntpdate -u $(awk '/^server/ { if ($2 !~ /127\.127\.[0-9]+\.[0-9]+/) {ORS=" "; print $2}}' /etc/ntp.conf)

Task: e6fc9ca4-d7ec-41a4-a1d4-68a4cb43b035: shell timeout error: execution expired
Task timeout: 180, Retries: 10

ISO 134
https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.reduced_footprint/13/testReport/%28root%29/spawn_one_vm_on_one_virt_node/spawn_one_vm_on_one_virt_node/

ISO 130:
https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.ceph_ha_one_controller/13/testReport/%28root%29/check_ceph_partitions_after_reboot/check_ceph_partitions_after_reboot/
https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.thread_usb/13/testReport/%28root%29/deploy_ha_one_controller_neutron_usb/deploy_ha_one_controller_neutron_usb/

This bug appeared in several jobs, so there is no clear steps to reproduce right now - please add if something will be localized or ask QA for additional checking.

Diagnostic snapshots are available in jobs, I will place link to latest found: https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.reduced_footprint/13/artifact/logs/fail_error_spawn_one_vm_on_one_virt_node-fuel-snapshot-2015-10-08_22-09-10.tar.xz

Dmitry Klenov (dklenov)
Changed in fuel:
status: New → Confirmed
assignee: Fuel Library Team (fuel-library) → Ivan Ponomarev (ivanzipfer)
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

RCA:
cat /etc/ntp.conf
server 10.109.10.1 iburst minpoll 3 maxpoll 9
server undef iburst minpoll 3 maxpoll 9
server undef iburst minpoll 3 maxpoll 9
=======
Oct 8 22:10:41 nailgun ntpd_intres[24191]: host name not found: undef
Oct 8 22:10:41 nailgun ntpd_intres[24191]: host name not found: undef

The main issue is 2 last lines in ntp.conf which added incorrectly. NTP service tries to connect to host 'undef' before starting and it's unavailable until timeout is reached. At the same time deployment was started and spawn_vms task got timeout error because NTP service is not responding; Sergii Golovatiuk is working on fix for that problem.

Changed in fuel:
assignee: Ivan Ponomarev (ivanzipfer) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
importance: High → Medium
tags: added: tricky
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

Small correction... system tries to resolv 'undef' to FQDN which adds extra delay. This creates additional delay which is a problem on fast environments.

Changed in fuel:
status: Confirmed → In Progress
Changed in fuel:
importance: Medium → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/7.0)

Reviewed: https://review.openstack.org/233735
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=ad2d6b9c7b7fa891991aa4de258415d4406039e3
Submitter: Jenkins
Branch: stable/7.0

commit ad2d6b9c7b7fa891991aa4de258415d4406039e3
Author: Sergii Golovatiuk <email address hidden>
Date: Fri Oct 9 15:01:13 2015 +0200

    Fix ntp.conf generation on master node

    When NTP2 and NTP3 are not set, ERB generates a template with 'undef'
    values. This creates a delay start up on 'master' node. For system
    tests that create environment immediately after master node creation it
    might create sporadic ntp failures.

    - Remove 'undef' elements from ntp_servers Array

    Change-Id: I00d4c9ed64f2cd0af2b5d513e9a868dfb51fea3a
    Closes-Bug: 1504493

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :
Changed in fuel:
status: In Progress → Fix Committed
Dmitry Pyzhov (dpyzhov)
tags: added: area-library
Revision history for this message
Egor Kotko (ykotko) wrote :

Reproduced on:
{"fuel-createmirror_sha": "0315aa30aee56e10f142683a25340c3c9d2f1e85", "api": "1.0", "fuelmain_sha": "21b84eb3d09883a7da526ebc4bd21458d2e9844a", "astute_sha": "eebbb2470cb800e532de19c29673558aeb86aae4", "fuel-ostf_sha": "9f500668555292add5d87c942e0cd804aefa6df2", "openstack_version": "2015.1.0-8.0", "build_id": "75", "build_number": "75", "auth_required": true, "fuel-library_sha": "bc044a0562cda204245b2a9136fa4bd6d7ef723e", "fuel-nailgun-agent_sha": "e377e83268abd406f22b656b76014656077a6a74", "fuel-nailgun_sha": "2476325f95f3bbdc0ff5dbd827868f2ab243e1b4", "fuel-agent_sha": "e4056a7923dd607521d97763d5dfb6de8a33ab5d", "production": "docker", "python-fuelclient_sha": "8ea3b64d21c4d729d1069f3aa5528ede3c76b412", "feature_groups": ["mirantis"], "release": "8.0", "release_versions": {"2015.1.0-8.0": {"VERSION": {"fuel-createmirror_sha": "0315aa30aee56e10f142683a25340c3c9d2f1e85", "api": "1.0", "fuelmain_sha": "21b84eb3d09883a7da526ebc4bd21458d2e9844a", "astute_sha": "eebbb2470cb800e532de19c29673558aeb86aae4", "fuel-ostf_sha": "9f500668555292add5d87c942e0cd804aefa6df2", "openstack_version": "2015.1.0-8.0", "build_id": "75", "build_number": "75", "fuel-library_sha": "bc044a0562cda204245b2a9136fa4bd6d7ef723e", "fuel-nailgun-agent_sha": "e377e83268abd406f22b656b76014656077a6a74", "fuel-nailgun_sha": "2476325f95f3bbdc0ff5dbd827868f2ab243e1b4", "fuel-agent_sha": "e4056a7923dd607521d97763d5dfb6de8a33ab5d", "production": "docker", "python-fuelclient_sha": "8ea3b64d21c4d729d1069f3aa5528ede3c76b412", "feature_groups": ["mirantis"], "release": "8.0"}}}}

https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.ha_destructive_ceph_neutron/28/testReport/%28root%29/prepare_ha_ceph_neutron/prepare_ha_ceph_neutron/

Revision history for this message
Egor Kotko (ykotko) wrote :
Changed in fuel:
status: Fix Committed → New
status: New → Confirmed
tags: added: on-verification
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

verified on 7.0-301 release iso with MU1 updates

tags: added: verified
removed: on-verification
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Moved to Fix Released according to a comment above.

tags: removed: verified
Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

Why it was moved to 'fix commited' for 8.0 if CI jobs continuing to fail and there is no fixes for it?

Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Back to fuel library so it can be retriaged

Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Fuel Library Team (fuel-library)
Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote :
Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote :

Need further investigation.
Is it the only step which throws 'shell timeout'?
Find common tricky problems with ntpdate, not just one with 'undef'.
Create tracer/wrapper to get more info on that command.

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Aleksei Chekunov (achekunov)
Changed in fuel:
status: Confirmed → In Progress
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Aleksei Chekunov (achekunov) → Fuel Library Team (fuel-library)
status: In Progress → Confirmed
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

Hi all,

I have analyzed the log files. Apparently it is environment issue:

1. Nodes 1,2,3 bootstrapped at ~ 21:22
    Node 4 bootstrapped at ~ 23:08

2. ntpdate command was issued at 23:12

3. Node 4 finally was up at 23:13

It could not execute a command which was issued while the node was offline. http://paste.openstack.org/show/479457/

Best regards,
Kyrylo

Changed in fuel:
assignee: Kyrylo Galanov (kgalanov) → Fuel CI (fuel-ci)
tags: added: area-ci
removed: area-library
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :
Changed in fuel:
assignee: Fuel CI (fuel-ci) → Kyrylo Galanov (kgalanov)
tags: added: area-python
removed: area-ci
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :
Changed in fuel:
assignee: Kyrylo Galanov (kgalanov) → Fuel Library Team (fuel-library)
Dmitry Pyzhov (dpyzhov)
tags: added: team-bugfix
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

We can't fix this until https://bugs.launchpad.net/fuel/+bug/1518306 is fixed

tags: added: blocked
Revision history for this message
Lope Beltran II (beltranlope) wrote :

Deployed Fuel 7.0, and all my deployments have this error. Is there a manual way to fix this issue?

Revision history for this message
Lope Beltran II (beltranlope) wrote :

After a few tries, made mine to work. It had to do something with the route to the fuel node.

My fuel node has 2 interfaces (one for fuel network, and 2nd for public ip since my local network is under proxy - makes the the creation of local mirror easier). It had routes for 0.0.0.0/0 to the public gateway and 10.0.0.0/8 to my local gateway. My fuel network is under 10.158.55.x, and its ntp server is under 10.16.3.x.

The command "ntpdate -u $(awk '/^server/ { if ($2 !~ /127\.127\.[0-9]+\.[0-9]+/) {ORS=" "; print $2}}' /etc/ntp.conf)" works just fine, but always returns "ntpdate[31090]: no server suitable for synchronization found".

I tried performing "ntpdate -u fuel-node-address" from a slave node and returns the output above. I tried performing ntpdate to our central ntp server and it succeeded. So i guessed the problem lies between the fuel node and the slave nodes. Since the iptables only accepts ntp packets from eth0 through 0.0.0.0/0, the slaves fails to perform ntpdate since i moved the route gateway of eth0 to 10.0.0.0/8 and 0.0.0.0/0 routes to the public.

I removed my public interface eth1 and restarted the network, and redeployed. Deployment was successful.

Dmitry Pyzhov (dpyzhov)
tags: added: area-library
removed: area-python
Dmitry Pyzhov (dpyzhov)
tags: removed: blocked
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

The bug should have been fixed. Please reopen the issue if it is found in 8.0 and later release.

Changed in fuel:
status: Confirmed → Fix Released
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

Please leave "Fix released" status for verified by QA team bugs

Changed in fuel:
status: Fix Released → Fix Committed
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

verified on 8.0-506

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.