tricky network configurations require user intervention

Bug #1477980 reported by Sergey Kolekonov
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Alexander Evseev

Bug Description

fuelmenu is shown even if showmenu option was set to 'no' on first boot via kernel options. The issue has appeared since 83 ISO.
Though according to tests at least master node was deployed successfully http://jenkins-product.srt.mirantis.net:8080/job/7.0.ubuntu.smoke_neutron/75/console

Steps to reproduce:

1. Create a VM with 2 NICs, the 2nd NIC should be plugged into the default libvirt network.
2. Boot the VM from Fuel ISO.
3. Wait until anacoda completes, and create the following file in the VM:

cat > /etc/sysconfig/network-scripts/ifcfg-eth1 <<-EOF
DEVICE=eth1
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=dhcp
PEERDNS=no
EOF

4. Continue the deployment (reboot to the newly installed CentOS)

Actual result: Fuel menu pops up
Expected result: deployment proceeds without the user intervention

Changed in fuel:
status: New → Confirmed
importance: Undecided → High
milestone: none → 7.0
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

This test has showmenu=no and fuelmenu was run with the non-interactive mode:
fuelmenu --save-only --iface eth0

This bug is invalid already by evidence of the log.

Changed in fuel:
status: Confirmed → Invalid
assignee: Fuel Library Team (fuel-library) → Matthew Mosesohn (raytrac3r)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

This was introduced in https://review.openstack.org/#/c/204661/ with the Ubuntu bootstrap feature. Assigning to Alexey Sheplyakov.

Changed in fuel:
assignee: Matthew Mosesohn (raytrac3r) → Alexei Sheplyakov (asheplyakov)
Revision history for this message
Sergey Kolekonov (skolekonov) wrote :

I think we should move this bug to confirmed state according to the previous comment

Changed in fuel:
status: Invalid → Confirmed
Revision history for this message
Sergey Kolekonov (skolekonov) wrote :

Additional information: I initially tried to deploy ISO on a KVM vm. It has two interfaces (eth0 and eth1), eth1 has Internet access (an address is assign via DHCP) and is configured after CentOS installation. So to the moment of menu's appearance the vm had Internet access

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Works as designed. If Ubuntu and MOS mirrors are not accessible during the master node
deployment fuelmenu is supposed to pop up no matter what showmenu value is.
See the `Other end user impact' section of the specification [1]

[1] https://github.com/stackforge/fuel-specs/blob/master/specs/7.0/fuel-bootstrap-on-ubuntu.rst

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

The ISO[1] with Ubuntu bootstrap patch [2] (which introduced this change in the behavior) has successfully passed BVTs:

http://jenkins-product.srt.mirantis.net:8080/view/7.0/job/7.0.ubuntu.bvt_2/79

[1] http://jenkins-product.srt.mirantis.net:8080/view/custom_iso/job/custom_7.0_iso/622
[2] https://review.openstack.org/204661

Perhaps there's something wrong with OP's network settings. Marking as Incomplete.

Changed in fuel:
status: Confirmed → Incomplete
assignee: Alexei Sheplyakov (asheplyakov) → Sergey Kolekonov (skolekonov)
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

@Sergey,

could you please login to the master node when the fuelmenu popped up, run

url_access_check check http://archive.ubuntu.com/ubuntu/dists/trusty/Release

command and post the output here?

Revision history for this message
Sergey Kolekonov (skolekonov) wrote :

There's no such command, but if you're asking about access to this URL, it's ok

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Apparently there's a race between the network configuration and deployment itself.
The deployment script (/usr/local/sbin/bootstrap_admin_node.sh) gets started very early in boot sequence.
In particular it does not depend on "network is up" (which is kind of ill defined on its own).
This we might be checking for mirror availability before the network interface providing Internet connectivity
gets fully configured (i.e. before the dhcp server had a chance to assign the IP).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-main (master)

Fix proposed to branch: master
Review: https://review.openstack.org/205921

Changed in fuel:
assignee: Sergey Kolekonov (skolekonov) → Alexei Sheplyakov (asheplyakov)
status: Incomplete → In Progress
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote : Re: fuelmenu is not skipped even if showmenu is set to 'no'

@Sergey,

Could you please check if the problem reproduces with this custom ISO:

http://jenkins-product.srt.mirantis.net:8080/view/custom_iso/job/custom_7.0_iso/660

Changed in fuel:
status: In Progress → Incomplete
assignee: Alexei Sheplyakov (asheplyakov) → Sergey Kolekonov (skolekonov)
Revision history for this message
Alexander Evseev (aevseev) wrote :

I just tested custom iso #660 - same issue. There was several attempts to get access to archive.ubuntu.com (see screenshot) and then same menu again.

url access checker log contains successful message:

[root@fuel ~]# tail /var/log/url_access_checker.log
2015-07-27 11:35:59 INFO (connectionpool) Starting new HTTP connection (1): archive.ubuntu.com
2015-07-27 11:35:59 DEBUG (connectionpool) Setting read timeout to None
2015-07-27 11:35:59 DEBUG (connectionpool) "GET /ubuntu/dists/trusty/Release HTTP/1.1" 200 58512
2015-07-27 11:36:14 INFO (commands) Starting url access check for ['http://zarchive.ubuntu.com/ubuntu/dists/trusty/Release']
2015-07-27 11:36:14 INFO (connectionpool) Starting new HTTP connection (1): zarchive.ubuntu.com
2015-07-27 11:36:14 ERROR (app) {"failed_urls": ["http://zarchive.ubuntu.com/ubuntu/dists/trusty/Release"]}
2015-07-27 11:36:26 INFO (commands) Starting url access check for ['http://archive.ubuntu.com/ubuntu/dists/trusty/Release']
2015-07-27 11:36:26 INFO (connectionpool) Starting new HTTP connection (1): archive.ubuntu.com
2015-07-27 11:36:26 DEBUG (connectionpool) Setting read timeout to None
2015-07-27 11:36:26 DEBUG (connectionpool) "GET /ubuntu/dists/trusty/Release HTTP/1.1" 200 58512

Running urlaccesscheck manually also succeeds:

# urlaccesscheck -v check http://archive.ubuntu.com/ubuntu/dists/trusty/Release; echo $?
Starting url access check for ['http://archive.ubuntu.com/ubuntu/dists/trusty/Release']
Starting new HTTP connection (1): archive.ubuntu.com
0

description: updated
summary: - fuelmenu is not skipped even if showmenu is set to 'no'
+ tricky network configurations require user intervention
description: updated
Changed in fuel:
assignee: Sergey Kolekonov (skolekonov) → Alexei Sheplyakov (asheplyakov)
status: Incomplete → In Progress
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> Apparently there's a race between the network configuration and deployment itself.

Only if DHCP server is really slow and is unable to reply within 2 minutes

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=dhcp
PEERDNS=no

The root cause of the problem is that network configuration scripts are told to ignore DNS server settings
supplied by the DHCP server. As a result DNS does not work properly so both Ubuntu and MOS APT repositories
are inaccessible, so fuel menu pops up to give the user a chance to fix the configuration (specify a DNS server,
a different Ubuntu mirror, a proxy server, etc).

Changed in fuel:
status: In Progress → Invalid
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

As explained in comment #14 this is a configuration issue. Marking as Invalid.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-main (master)

Change abandoned by Alexei Sheplyakov (<email address hidden>) on branch: master
Review: https://review.openstack.org/205921
Reason: The patch is bogus, network setup scripts get started before fuel menu (without the patch). The bug #1477980 is caused by a configuration problem (invalid DNS configuration)

Revision history for this message
Alexander Evseev (aevseev) wrote :

I tested with static network configuration:

# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
NM_CONTROLLED=no
HWADDR=52:54:00:7F:57:72
USERCTL=no
PEERDNS=yes
BOOTPROTO=static
IPADDR=10.20.0.2
NETMASK=255.255.255.0

# cat /etc/resolv.conf
nameserver 8.8.8.8

# ip ro
10.20.0.0/24 dev eth0 proto kernel scope link src 10.20.0.2
169.254.0.0/16 dev eth0 scope link metric 1002
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.42.1
default via 10.20.0.1 dev eth0

Default gateway 10.20.0.1 - is the host where Fuel VM resides.

Changed in fuel:
status: Invalid → Confirmed
Revision history for this message
Alexander Evseev (aevseev) wrote :

To previous comment:

# cat /var/log/url_access_checker.log
2015-07-28 08:30:29 INFO (commands) Starting url access check for ['http://archive.ubuntu.com/ubuntu/dists/trusty/Release']
2015-07-28 08:30:29 INFO (connectionpool) Starting new HTTP connection (1): archive.ubuntu.com
2015-07-28 08:30:29 ERROR (app) {"failed_urls": ["http://archive.ubuntu.com/ubuntu/dists/trusty/Release"]}
2015-07-28 08:31:07 INFO (commands) Starting url access check for ['http://archive.ubuntu.com/']
2015-07-28 08:31:07 INFO (connectionpool) Starting new HTTP connection (1): archive.ubuntu.com
2015-07-28 08:31:07 DEBUG (connectionpool) Setting read timeout to None
2015-07-28 08:31:07 DEBUG (connectionpool) "GET / HTTP/1.1" 200 383

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> 2015-07-28 08:30:29 ERROR (app) {"failed_urls": ["http://archive.ubuntu.com/ubuntu/dists/trusty/Release"]}

Works as designed. That is, if the default Ubuntu mirror is not reachable the user is prompted to configure the network.

> Default gateway 10.20.0.1 - is the host where Fuel VM resides.

This doesn't mean the Internet is reachable, DNS is working, etc.

Please redeploy the master node from the scratch, run the following commands while fuel
menu is active (i.e. choose 'Shell Login' in the menu, or login via ssh, etc)

host archive.ubuntu.com
tracepath archive.ubuntu.com

and post the output here.

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

The problem might be caused by a configuration issue. Marking as Incomplete.

Changed in fuel:
status: Confirmed → Incomplete
assignee: Alexei Sheplyakov (asheplyakov) → Alexander Evseev (aevseev-h)
Revision history for this message
Alexander Evseev (aevseev) wrote :

Which info you need? Just now I have Fuel installation which stopped on this menu.

Revision history for this message
Alexander Evseev (aevseev) wrote :

Redeployed from scratch. Installation stopped on fuel menu.

To be clear, all info in one comment:

# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
NM_CONTROLLED=no
HWADDR=52:54:00:79:B4:69
USERCTL=no
PEERDNS=yes
BOOTPROTO=static
IPADDR=10.20.0.2
NETMASK=255.255.255.0

# cat /etc/resolv.conf
nameserver 8.8.8.8

# ip ro
10.20.0.0/24 dev eth0 proto kernel scope link src 10.20.0.2
169.254.0.0/16 dev eth0 scope link metric 1002
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.42.1
default via 10.20.0.1 dev eth0

# host archive.ubuntu.com
archive.ubuntu.com has address 91.189.91.23
archive.ubuntu.com has address 91.189.91.24
archive.ubuntu.com has address 91.189.91.14
archive.ubuntu.com has address 91.189.92.200
archive.ubuntu.com has address 91.189.91.15
archive.ubuntu.com has address 91.189.91.13
archive.ubuntu.com has address 91.189.92.201
archive.ubuntu.com has IPv6 address 2001:67c:1360:8c01::18
archive.ubuntu.com has IPv6 address 2001:67c:1360:8c01::19

# tracepath archive.ubuntu.com
 1?: [LOCALHOST] pmtu 1500
 1: 10.20.0.1 (10.20.0.1) 0.299ms
 1: 10.20.0.1 (10.20.0.1) 0.212ms
 2: 172.16.48.253 (172.16.48.253) 0.394ms
 3: cz-eth0205-gw-86.host-telecom.com (193.161.86.1) 0.844ms
 4: cz-sitel-ves.host-telecom.com (185.8.57.6) 188.394ms
 5: gi0-0-0-4.agr11.prg01.atlas.cogentco.com (149.6.24.13) 4.767ms
 6: be2649.ccr21.prg01.atlas.cogentco.com (154.54.38.137) 4.985ms
 7: be2078.ccr42.ham01.atlas.cogentco.com (130.117.0.165) 16.581ms
 8: be2187.ccr42.ams03.atlas.cogentco.com (154.54.74.125) 23.232ms
 9: be2183.ccr22.lpl01.atlas.cogentco.com (154.54.58.69) 31.692ms
10: be2387.ccr22.bos01.atlas.cogentco.com (154.54.44.165) 97.724ms
11: 38.104.186.42 (38.104.186.42) 110.082ms
12: economy.canonical.com (91.189.91.23) 99.603ms reached
     Resume: pmtu 1500 hops 12 back 50

# curl http://archive.ubuntu.com/ubuntu/dists/trusty/Release
Origin: Ubuntu
Label: Ubuntu
Suite: trusty
Version: 14.04
Codename: trusty
Date: Thu, 08 May 2014 14:19:09 UTC
Architectures: amd64 arm64 armhf i386 powerpc ppc64el
Components: main restricted universe multiverse
Description: Ubuntu Trusty 14.04
MD5Sum:
 ead1cbf42ed119c50bf3aab28b5b6351 8234934 main/binary-amd64/Packages
 52d605b4217be64f461751f233dd9a8f 96 main/binary-amd64/Release
 4c2ecc07c5b3859ee08bd41f788a5a79 1743009 main/binary-amd64/Packages.gz

More info?

Revision history for this message
Alexander Evseev (aevseev) wrote :

On system boot resolv.conf contains nameserver 10.20.0.1 and at some point (fuel menu start?) it changed to 8.8.8.8, and host 10.20.0.1 don't have DNS-server.

So DNS/network configuration is really incorrect, but it is very difficult to diagnose problem because resolv.conf is changed.

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.