Public gateway ip unaccessible should not resolve in deployment error

Bug #1524640 reported by Sam Stoelinga
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Medium
Alex Schultz
Mitaka
Fix Released
Medium
Fuel Sustaining

Bug Description

public_vip_ping will timeout if the public gateway is not accessible. In many environments people first try to deploy without the public gateway being accessible. Fuel should show a big warning in stead of failing the whole deployment.

Steps to reproduce:
Deploy Fuel and configure Fuel to use mirrors from Fuel itself. This way we don't require a public gateway to access the mirrors when deploying or when the network verification is done.

Actual result:
If public gateway is unaccessible the network verification will pass but after the deployment starts it will fail at ping public_vip check. An error is shown in the puppet log but users often have no clue why their deployment fails.

Expected result:
There are 2 different kinds of expectations:
1. If we can finish the deployment without a public gateway being up then we should just make sure that the deployment finishes with an error.
2. If the deployment would never be able to finish without an public gateway being up then the Network Checker should catch this issue before the deployment starts and show an easy to understand message that the gateway should be up.

Changed in fuel:
assignee: nobody → Fuel Python Team (fuel-python)
milestone: none → 8.0
importance: Undecided → Medium
status: New → Confirmed
tags: added: area-python module-netcheck
Revision history for this message
Alexander Kislitsky (akislitsky) wrote :

We passed SCF in 8.0. Moving the bug to 9.0.

Changed in fuel:
milestone: 8.0 → 9.0
Dmitry Pyzhov (dpyzhov)
tags: added: team-network
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

Unfortunately it's impossible to change failure to warning in the deployment procedure if gateway is invalid, because we need it to be working properly for correct VIPs checks/migration. Also, if you run a network checker before the deployment, it should be able to catch the problem with gateway since it runs mirrors accessibility checks which should catch invalid gateway.

Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Sam Stoelinga (sammiestoel) wrote :

@Aleksandr: Many people run Fuel in datacenters without internet access. So the network checker will not be able to catch the problem with the gateway if we are using Fuel as the mirror. So I don't think this should be marked as invalid
Instead we should at least make sure that the network checker will fail if the gateway is not ping-able if that would result in failed deployment. Just checking whether mirrors are accessible is not enough if we use Fuel as mirror.
Anyway I still think that the deployment shouldn't fail and the correct fix would be to continue the deployment. Where exactly do we need the public gateway working for correct VIPs?

I've changed it to Confirmed again. This is one of the most common issues that partners / customers contact me about when they are doing an offline fuel installation / deployment and it just fails. They often won't get deep enough inside the Puppet logs to understand that ping public_vip failing means that their public gateway wasn't ping-able. It's unacceptable from a usability perspective.

description: updated
Changed in fuel:
status: Invalid → Confirmed
tags: removed: need-info
Revision history for this message
Yury Konov (yukonoff) wrote :

I experience the same issue on MOS 8.0 when GW IP address is alive but responding with "Packet filtered" to PINGs (kinda "stop pinging my hosts" message from network admins).

root@fuel ~# ping 172.16.49.161
PING 172.16.49.161 (172.16.49.161) 56(84) bytes of data.
From 172.16.49.161 icmp_seq=1 Packet filtered
From 172.16.49.161 icmp_seq=2 Packet filtered
From 172.16.49.161 icmp_seq=3 Packet filtered
From 172.16.49.161 icmp_seq=4 Packet filtered
^C
— 172.16.49.161 ping statistics —
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3004ms

Puppet log at Controller node being deployed:
2016-04-08 12:13:53 ERR (/Stage[main]/Main/Ping_host[172.16.49.161]/ensure) change from down to up failed: Timeout waiting for host '172.16.49.161' status to become 'up' after 60 seconds!

Revision history for this message
Alex Schultz (alex-schultz) wrote :

The public gateway ping requirement can be bypassed via hiera.

http://git.openstack.org/cgit/openstack/fuel-library/tree/deployment/puppet/osnailyfacter/manifests/virtual_ips/public_vip_ping.pp#n7

network checker will report that is unavailable but you can still deploy if you set run_ping_checker to false for your environment.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/337891

no longer affects: fuel/newton
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/337891
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=36515bbbc13a582c92abd46d67843c0939d8a738
Submitter: Jenkins
Branch: master

commit 36515bbbc13a582c92abd46d67843c0939d8a738
Author: Alex Schultz <email address hidden>
Date: Tue Jul 5 15:35:30 2016 -0600

    Expose configuration to skip gateway ping check

    We currently allow for someone to bypass the public gateway ping check
    on the controllers for the public_vip monitor via hiera. But this is
    hidden and not well documented. This change aims to expose this to the
    deployer in a more obvious way.

    Change-Id: Ic244e907fb9db2c020cf590e96626b015cbe0773
    Closes-Bug: #1524640

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/339547

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/339547
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=c3120b546371feba26d8e9f36c1feffe6957412c
Submitter: Jenkins
Branch: stable/mitaka

commit c3120b546371feba26d8e9f36c1feffe6957412c
Author: Alex Schultz <email address hidden>
Date: Tue Jul 5 15:35:30 2016 -0600

    Expose configuration to skip gateway ping check

    We currently allow for someone to bypass the public gateway ping check
    on the controllers for the public_vip monitor via hiera. But this is
    hidden and not well documented. This change aims to expose this to the
    deployer in a more obvious way.

    Change-Id: Ic244e907fb9db2c020cf590e96626b015cbe0773
    Closes-Bug: #1524640
    (cherry picked from commit 36515bbbc13a582c92abd46d67843c0939d8a738)

Oleksandr (oivashchenko)
tags: added: on-verification
Revision history for this message
Oleksandr (oivashchenko) wrote :
Download full text (3.4 KiB)

I tried use wrong gateway and turned off check-box "Public Gateway is Available" at Network Settings. When env starts to deploy we don't get warning message and deployment fails.
We do not even reach public_vip_ping.

Actual result: Deployment has failed. All nodes are finished. Failed tasks: Task[netconfig/1], Task[netconfig/3], Task[netconfig/2] Stopping the deployment process!

[root@nailgun ~]# shotgun2 short-report
cat /etc/fuel_build_id:
 495
cat /etc/fuel_build_number:
 495
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 python-fuelclient-9.0.0-1.mos356.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8861.noarch
 rubygem-astute-9.0.0-1.mos774.noarch
 fuelmenu-9.0.0-1.mos275.noarch
 python-packetary-9.0.0-1.mos151.noarch
 fuel-bootstrap-cli-9.0.0-1.mos291.noarch
 fuel-setup-9.0.0-1.mos6357.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-nailgun-9.0.0-1.mos8861.noarch
 fuel-agent-9.0.0-1.mos291.noarch
 fuel-library9.0-9.0.0-1.mos8605.noarch
 fuel-mirror-9.0.0-1.mos151.noarch
 fuel-ostf-9.0.0-1.mos946.noarch
 nailgun-mcagents-9.0.0-1.mos774.noarch
 fuel-ui-9.0.0-1.mos2814.noarch
 fuel-utils-9.0.0-1.mos8605.noarch
 network-checker-9.0.0-1.mos77.x86_64
 fuel-migrate-9.0.0-1.mos8605.noarch
 fuel-notify-9.0.0-1.mos8605.noarch
 fuel-release-9.0.0-1.mos6357.noarch
 fuel-misc-9.0.0-1.mos8605.noarch
 fuel-9.0.0-1.mos6357.noarch
 fuel-openstack-metadata-9.0.0-1.mos8861.noarch

[root@nailgun ~]# fuel task | tr -s '[:space:]'
id | status | name | cluster | progress | uuid
---+--------+------------------------------------+---------+----------+-------------------------------------
1 | ready | dump | | 100 | e54706a3-dac2-4bc9-b860-61a45f28bc4a
9 | ready | check_repo_availability | 1 | 100 | 92dc0b2a-c1c9-4153-8140-9ea22b2d7897
7 | ready | verify_networks | 1 | 100 | 18484e0e-190b-4aec-a464-fb59e7db58ca
8 | ready | check_dhcp | 1 | 100 | 1f772db9-65a9-4622-8c32-4f92cda9840a
10 | ready | check_repo_availability_with_setup | 1 | 100 | 58d9ac77-7ab5-4734-90a2-fe9209a66947
20 | ready | node_deletion | 1 | 100 | c92bcbff-7bc6-4c68-86f7-fd44fcc53440
11 | ready | deploy | 1 | 100 | 93fe7dc2-51ae-47c9-8a99-a7f2483569b2
15 | ready | deployment | 1 | 100 | 7e5babcb-278d-4409-a531-27a96712b7f9
14 | ready | provision | 1 | 100 | 3f2c8e8f-f981-46f6-8409-84f0580f518c
17 | error | deploy | 1 | 100 | f904b6fa-9d9c-4225-a3c8-5c38dd167c2d
21 | error | deployment | 1 | 100 | ad9636b8-f70e-47b6-b3b7-a9a97e7e5352
22 | ready | check_networks | 1 | 100 | 3482f03f-1d33-4c60-833a-ceefdb297374
26 | error | deployment | 1 | 100 | b3402736-937c-4476-906e-8f7e3bb4a0cc
23 | error | deploy | 1 | 100 | c4fc9ad5-a2ea-41a8-bc62-158fbd48c2e5

[root@nailgun ~]# fuel2 task history show 21 | grep error | tr -s '[:space:]'
| netconfig | 2 | error | 2016-10-06T15:01:09.768859 | 2016-10-06T15:07:50.984157 |
| netconfig | 1 | error | 2016-10-06T15:01:12.200603 | 2016-10-06T15:08:01.563714 |
| netconfig | 3 | error | 2016-10-06T15:01...

Read more...

Revision history for this message
Oleksandr (oivashchenko) wrote :
Download full text (8.5 KiB)

UI logs:

2016-10-06 12:41:20 ERROR [7fdded724880] (base) Unexpected exception occured
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nailgun/api/v1/handlers/base.py", line 297, in handle_errors
    return func(cls, *args, **kwargs)
  File "<string>", line 2, in GET
  File "/usr/lib/python2.7/site-packages/nailgun/api/v1/handlers/base.py", line 355, in validate
    return func(cls, *args, **kwargs)
  File "<string>", line 2, in GET
  File "/usr/lib/python2.7/site-packages/nailgun/api/v1/handlers/base.py", line 381, in serialize
    resp = func(cls, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/nailgun/api/v1/handlers/base.py", line 462, in GET
    return self.collection.to_list(q)
  File "/usr/lib/python2.7/site-packages/nailgun/objects/base.py", line 424, in to_list
    use_iterable
TypeError: argument 2 to map() must support iteration
[pid: 20677|app: 0|req: 8/35] 10.109.0.1 () {50 vars in 850 bytes} [Thu Oct 6 12:41:20 2016] GET /api/notifications?_=1475756327171 => generated 39 bytes in 3 msecs (HTTP/1.1 500) 5 headers in 223 bytes (2 switches on core 0)
[pid: 20680|app: 0|req: 15/36] 10.109.0.1 () {50 vars in 868 bytes} [Thu Oct 6 12:41:20 2016] GET /api/nodes/allocation/stats?_=1475756327170 => generated 30 bytes in 18 msecs (HTTP/1.1 200) 4 headers in 185 bytes (2 switches on core 0)
SIGINT/SIGQUIT received...killing workers...
...brutally killing workers...
mule 1 (pid: 20681) annihilated
worker 1 buried after 1 seconds
worker 2 buried after 1 seconds
worker 3 buried after 1 seconds
worker 4 buried after 1 seconds
goodbye to uWSGI.
*** Starting uWSGI 2.0.12 (64bit) on [Thu Oct 6 12:41:29 2016] ***
compiled with version: 4.8.3 20140911 (Red Hat 4.8.3-9) on 29 April 2016 14:51:44
os: Linux-3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016
nodename: nailgun.test.domain.local
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 2
current working directory: /
writing pidfile to /var/run/nailgun.pid
detected binary path: /usr/sbin/uwsgi
your processes number limit is 11302
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address :8001 fd 3
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
Python version: 2.7.5 (default, Nov 20 2015, 02:00:19) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)]
Set PythonHome to /usr
Python main interpreter initialized at 0xe824d0
python threads support enabled
your server socket listen backlog is limited to 4096 connections
your mercy for graceful operations on workers is 60 seconds
mapped 589260 bytes (575 KB) for 4 cores
*** Operational MODE: preforking ***
added /usr/lib/python2.7/site-packages/nailgun/ to pythonpath.
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 29547)
spawned uWSGI worker 1 (pid: 29555, cores: 1)
spawned uWSGI worker 2 (pid: 29556, cores: 1)
spawned uWSGI worker 3 (pid: 29557, cores: 1)
spawned uWSGI worker 4 (pid: 29558, cores: 1)...

Read more...

tags: removed: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-web 10.0.0rc1

This issue was fixed in the openstack/fuel-web 10.0.0rc1 release candidate.

Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Marking as customer-found because of this comment: https://bugs.launchpad.net/fuel/+bug/1396126/comments/31

tags: added: customer-found
Revision history for this message
Anton Chevychalov (achevychalov) wrote :

Move to Fix Commited for stable/mitaka due to already merged commit.

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Moved to Confirmed because of comment #12, looks like issue wasn't fix in stable/mitaka.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-web 10.0.0

This issue was fixed in the openstack/fuel-web 10.0.0 release.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Hi, Oleksandr

It seems that your deployment fails due to repositories inaccessibility. Please provide the diagnostic snapshot so that we can ensure that public vip ping check is not disabled properly.

tags: added: on-verification
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

Verified on 822
Steps:
- modify /usr/share/fuel-mirror/ubuntu.yaml with correct snapshot and proposed repo
- run fuel-mirror
- modify public network settings - use really unavailable gateway (10.109.0.40 for me), dont forget to overwrite dns and ntp servers to fuel master node
- check "public gateway not available" flag
- deploy cluster
- run ostf (network connectivity test should fail due to unavailable external network)

Revision history for this message
Ekaterina Shutova (eshutova) wrote :

Fix verified on 9.2 snapshot #798. Step with gateway ping check is passed.

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.