netconfig fails first time because default gateway not available within 60 seconds

Bug #1501459 reported by Alex Schultz
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Stanislav Makar

Bug Description

With change Ic2e7616526bb578c65e942b3ceddc91482d83de6 we added a default gateway pinger into the netconfig tasks. I noticed in our CI logs[0][1] that the netconfig task seems to fail on the first try but works on the 2nd. We should investigate why this is happening consistently and fix it.

[0] https://ci.fuel-infra.org/job/fuellib_review_pkgs_master_node/4730/artifact/logs/4730/fail_error_deploy_neutron_tun-fuel-snapshot-2015-09-30_15-40-58.tar.xz.filtered.log
[1] https://ci.fuel-infra.org/job/fuellib_review_pkgs_master_node/4733/artifact/logs/4733/pass_deploy_neutron_tun-fuel-snapshot-2015-09-30_18-34-43.tar.xz

Expected Results:
netconfig works on the first time.

Actual Results:
netconfig fails on the first time with error:
  node-1 2015-09-30T15:35:19.164270 err: (/Stage[main]/Main/Ping_host[10.109.6.1]/ensure) change from down to up failed: Timeout waiting for host '10.109.6.1' status to become 'up' after 60 seconds!

Revision history for this message
Alex Schultz (alex-schultz) wrote :

I don't think this is a duplicate because it is still occurring after the fix for bug 1496307 is was committed. Additionally there is no gateway error.

Stanislav Makar (smakar)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Stanislav Makar (smakar)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/231017

Changed in fuel:
status: New → In Progress
Stanislav Makar (smakar)
tags: added: feature
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/231017
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=f2c6195daeb2d3786d166bba4bc9416b870d3b3d
Submitter: Jenkins
Branch: master

commit f2c6195daeb2d3786d166bba4bc9416b870d3b3d
Author: Stanislav Makar <email address hidden>
Date: Mon Oct 5 14:16:56 2015 +0000

    Fix wrong ordering for ping_host default_gateway

    * Ping_host should be run after all l3_* resources.
    * Clean up unneeded l23network class calls.

    Change-Id: I0a59fef64158f4df23dc66dc0f8de2c7c23c6175
    Closes-bug: #1501459

Changed in fuel:
status: In Progress → Fix Committed
tags: added: on-verification
Revision history for this message
Ksenia Svechnikova (kdemina) wrote :

ISO # 130
https://product-ci.infra.mirantis.net/job/8.0-kilo.ubuntu.smoke_neutron/117/

netconfig still fails on the first time with error:

node-1:

2015-10-07 18:18:19 +0000 /Stage[main]/Main/Ping_host[10.109.16.1]/ensure (err): change from down to up failed: Timeout waiting for host '10.109.16.1' status to become 'up' after 60 seconds!

https://paste.mirantis.net/show/1230/

tags: removed: on-verification
Changed in fuel:
status: Fix Committed → Confirmed
Changed in fuel:
status: Confirmed → Triaged
Revision history for this message
Stanislav Makar (smakar) wrote :

this is the same error but under other condition:
- original problem was connected with bad ordering
- current is connected with merged patch https://review.openstack.org/#/c/195765/

if you take any ISO after the patch https://review.openstack.org/231017 was merged and before https://review.openstack.org/#/c/195765/ you will not catch this error

So the root cause of current error is that now we disabled/enabled hotplug during network configuration and bridge is created, ip address is configured but it DOES NOT have any port when we ping default gateway. ( ports to bridge we add later)

Here is order

Notice: add-br(br-fw-admin) -> endpoint(br-fw-admin) -> add-br(br-mgmt) -> endpoint(br-mgmt) -> add-br(br-storage) -> endpoint(br-storage) -> add-br(br-ex) -> endpoint(br-ex) -> add-br(br-floating) -> endpoint(br-floating) -> add-patch(patch__br-ex--br-floating) -> add-br(br-prv) -> endpoint(br-prv) -> add-patch(patch__br-fw-admin--br-prv) -> add-port(eth0) -> add-port(eth0.101) -> add-port(eth0.102) -> add-port(eth1)

Stanislav Makar (smakar)
Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/237998

Dmitry Pyzhov (dpyzhov)
tags: added: area-library
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/237998
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=c365d1763124325e71863550fd07b939f3dc6b45
Submitter: Jenkins
Branch: master

commit c365d1763124325e71863550fd07b939f3dc6b45
Author: Stanislav Makar <email address hidden>
Date: Wed Oct 21 13:46:12 2015 +0000

    Fix ping default gateway error

    Add noop tests

    Change-Id: I902215c21dd512f1229c28a5a410d704755c27b8
    Closes-bug: #1501459

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Ksenia Svechnikova (kdemina) wrote :

Verify fix with ISO MOS 8.0 kilo 176 on HW lab (Version https://paste.mirantis.net/show/1328/)

No time with errors of netconfig were found in remote puppet logs

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.