[example] reboot task cause fatal error

Bug #1463868 reported by Daniele Pizzolli
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Łukasz Oleś
6.1.x
Fix Released
High
Łukasz Oleś
7.0.x
Fix Released
High
Łukasz Oleś

Bug Description

The last build of fuel_plugin_example-2.0-2.0.2-1.noarch.rpm on
fuel-community-6.1-267-2015-06-10_05-40-59 fails with:

2015-06-10T14:21:17 err: [568] Unexpected error Failed to execute hook 'fuel_plugin_example-2.0.2' Reboot command failed for nodes ["3", "4"]. Check debug output for details

---
uids:
- '1'
- '3'
- '2'
- '4'
parameters:
  timeout: 600
priority: 3400
fail_on_error: true
type: reboot
diagnostic_name: fuel_plugin_example-2.0.2
 traceback
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/nailgun_hooks.rb:58:in `block in process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/nailgun_hooks.rb:26:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/nailgun_hooks.rb:26:in `process'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/deployment_engine/granular_deployment.rb:203:in `pre_deployment_actions'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/deployment_engine.rb:36:in `deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/orchestrator.rb:171:in `deploy_cluster'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/orchestrator.rb:56:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:111:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

The plugin rpm was dowloaded from

https://ci.fuel-infra.org/job/verify-fuel-plugins/92/
https://ci.fuel-infra.org/job/verify-fuel-plugins/92/artifact/built_plugins/fuel_plugin_example-2.0-2.0.2-1.noarch.rpm

Revision history for this message
Daniele Pizzolli (daniele-pizzolli) wrote :

Add diagnostic snapshot

Evgeniy L (rustyrobot)
Changed in fuel:
milestone: none → 6.1
assignee: nobody → Fuel Astute Team (fuel-astute)
importance: Undecided → High
status: New → Confirmed
Changed in fuel:
assignee: Fuel Astute Team (fuel-astute) → Vladimir Sharshov (vsharshov)
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Problem in time on nodes.

Two nodes: 3 and 4 after reboot has less time for /proc/1 that before:

* 3 -> before 1433945756, after 1433945508
* 4 -> before 1433945849, after 1433945507

But for node 1 and 2 all works fine. Before deploy we run sync time hooks.

We have 2 ways:

* change Astute reboot logic to mark node as rebooted if it return different time after reboot command for /proc/1, not especially more;
* try to find and fix source of problem why time is rollback to early after reboot.

Sync time hooks details:

2015-06-10T14:10:11 info: [568] Run hook ---
priority: 300
fail_on_error: true
type: shell
uids:
- '1'
- '3'
- '2'
- '4'
parameters:
  retries: 10
  cmd: ntpdate -u $(egrep '^server' /etc/ntp.conf | egrep -v '127\.127\.[0-9]+\.[0-9]+'
    | sed '/^#/d' | awk '{print $2}' | head -1)
  timeout: 180
  interval: 30

2015-06-10T14:10:17 debug: [568] 43dfd2b7-bae1-40ee-b24d-02da19378f1f: MC agent 'execute_shell_command', method 'execute', results: {:sender=>"1", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>"10 Jun 14:10:17 ntpdate[8044]: adjust time server 10.20.0.2 offset -0.000414 sec\n", :stderr=>"", :exit_code=>0}}
2015-06-10T14:10:17 debug: [568] 43dfd2b7-bae1-40ee-b24d-02da19378f1f: MC agent 'execute_shell_command', method 'execute', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>"10 Jun 14:10:17 ntpdate[7904]: adjust time server 10.20.0.2 offset 0.000974 sec\n", :stderr=>"", :exit_code=>0}}
2015-06-10T14:10:17 debug: [568] 43dfd2b7-bae1-40ee-b24d-02da19378f1f: MC agent 'execute_shell_command', method 'execute', results: {:sender=>"2", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>"10 Jun 14:10:17 ntpdate[7902]: adjust time server 10.20.0.2 offset -0.000861 sec\n", :stderr=>"", :exit_code=>0}}
2015-06-10T14:10:17 debug: [568] 43dfd2b7-bae1-40ee-b24d-02da19378f1f: MC agent 'execute_shell_command', method 'execute', results: {:sender=>"4", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>"10 Jun 14:10:17 ntpdate[7867]: adjust time server 10.20.0.2 offset -0.000690 sec\n", :stderr=>"", :exit_code=>0}}

Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

I think we should try to resolve original problem with time moving. Guys from fuel-library, can you look?

Changed in fuel:
assignee: Vladimir Sharshov (vsharshov) → Fuel Library Team (fuel-library)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Oleksiy Molchanov (omolchanov)
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

I am not sure we (library team) can help you with this, as the time was synced before the deployment. From my point of view, we should change astute logic.

Please contact me if you have any questions, I am passing the ball to python team.

Changed in fuel:
assignee: Oleksiy Molchanov (omolchanov) → Fuel Python Team (fuel-python)
Łukasz Oleś (loles)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Łukasz Oleś (loles)
Revision history for this message
Evgeniy L (rustyrobot) wrote :

We decided to use uptime instead of timestamp of specific file to figure out if not was rebooted or not.

Łukasz Oleś (loles)
Changed in fuel:
status: Confirmed → In Progress
Łukasz Oleś (loles)
Changed in fuel-plugins:
status: New → Invalid
Revision history for this message
Łukasz Oleś (loles) wrote :
no longer affects: fuel
affects: fuel-plugins → fuel
Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (stable/6.1)

Fix proposed to branch: stable/6.1
Review: https://review.openstack.org/198316

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (stable/6.1)

Reviewed: https://review.openstack.org/198316
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=ba384f448b3d667f243a6cf7b1c3d403666535ac
Submitter: Jenkins
Branch: stable/6.1

commit ba384f448b3d667f243a6cf7b1c3d403666535ac
Author: Łukasz Oleś <email address hidden>
Date: Wed Jun 17 13:30:37 2015 +0200

    Fail only if the time is the same

    Closes-bug: #1463868

    Change-Id: I5bf7c2f3865b68b9ec5582e32ba21e71198c4039
    (cherry picked from commit be85f221dadfb3f483d64927f95a10b67413c30a)

tags: added: on-verification
tags: removed: on-verification
tags: added: on-verification
Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :

Verified on the Fuel RC4 (#301) build.

Plugin has been downloaded from here [1]

[root@nailgun ~]# fuel plugins
id | name | version | package_version
---|---------------------|---------|----------------
1 | fuel_plugin_example | 2.0.2 | 2.0.0

[root@nailgun ~]# fuel task
id | status | name | cluster | progress | uuid
---|--------|------------------------------------|---------|----------|-------------------------------------
1 | ready | dump | None | 100 | 17cd4262-a3d5-470d-833b-1b62741d7b25
36 | ready | check_repo_availability | 1 | 100 | 22e372f1-3b2e-41cc-8957-9fcebefee5bd
37 | ready | check_repo_availability_with_setup | 1 | 100 | 7d1695ac-222a-400a-bc21-0c59d857157d
35 | ready | check_dhcp | 1 | 100 | 6732aa71-d848-472b-a9e7-33d5e4bd629a
38 | ready | deploy | 1 | 100 | 97bf6fb7-f342-4305-88fc-63122e752532
34 | ready | verify_networks | 1 | 100 | a3886551-f1dd-41bc-a60d-4065593c4397
39 | ready | check_networks | 1 | 100 | 8c18ac77-b00f-4058-9e81-d0a5fbea2f2e
43 | ready | deployment | 1 | 100 | 80981a33-2e45-4684-a48e-c0f324f0b370
42 | ready | provision | 1 | 100 | 6daa1264-0521-4aa7-bc02-d67c8cfa7f44

Deploy was successful without any errors.

[1] https://ci.fuel-infra.org/job/verify-fuel-plugins/lastSuccessfulBuild/artifact/built_plugins/fuel_plugin_example-2.0-2.0.2-1.noarch.rpm

tags: removed: on-verification
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.