dhcrelay cannot start after master node reboot

Bug #1324152 reported by Aleksey Kasatkin
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Matthew Mosesohn
5.0.x
Fix Committed
High
Fuel Library (Deprecated)

Bug Description

ISO 5.0-26.

After master reboot dchrelay does not start sometimes.

/var/log/messages:

May 28 13:09:36 fuel dhcrelay: Error getting hardware address for "docker0": No such device

This prevents slave slave nodes from boot via PXE.

Workaround:

1. Start dhcrelay:

service dhcrelay start

2. restart cobbler container:

docker ps | grep cobbler
docker restart <Cobbler ID>

I saw this bug on earlier ISOs as well.

Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

I was unable to reproduce. How are you deploying Fuel? KVM or VBox? Scripts?
Was it a host with a huge load on the server outside of Fuel? (like a shared lab)

Changed in fuel:
status: New → Incomplete
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Also run dockerctl logs cobbler and pastebin that for me, if you could.

Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

It was VBox. I'll save logs when I see it again.

Revision history for this message
Andrey Danin (gcon-monolake) wrote :

Matthew, can we move dhcrelay to the supervisord management?
In our case, when we run into this issue, we just did service dhcrelay start and all became fine. Do you still need docker logs?

Revision history for this message
Igor Zinovik (izinovik) wrote :

Seems that today I hitted this problem:
[root@fuel ~]grep docker0 /var/log/messages|grep Error
Jun 3 14:13:27 fuel dhcrelay: Error getting hardware address for "docker0": No such device

FUEL ISO 5.0 made on May 22.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Andrey, it is doable. It will conflict slightly with Evgeniy's work, but I will do the necessary steps to ensure dhcrelay stays running.

Revision history for this message
Igor Zinovik (izinovik) wrote :

Also output of docker logs:
http://paste.openstack.org/show/82758/

Revision history for this message
Andrew Woodward (xarses) wrote :

was reproduced by an multiple IRC users, it appears to be more common if you change you host IP data with kernel args, and fuelmenu

Changed in fuel:
status: Incomplete → Confirmed
tags: added: customer-found
Revision history for this message
Andrew (box857+launchpad) wrote :

I am also seeing the same problem.

We replaced our 4.0 test env. with 5.0. I mention this because I know the networking was working fine.

As you can see the pxe client makes the request but does not get a response.

http://pastebin.com/eWMqDkmJ

I rebooted the master and tried again, xarses pointed me here. I tried the work around but it doesn't seem to fix for me.

http://pastebin.com/d2Kgq1Ja

Attached is the output of dockerctl logs cobbler > /tmp/dockerlogs

Thanks

Revision history for this message
Andrew (box857+launchpad) wrote :

In addition

We changed the dhcp interface in vi /etc/sysconfig/dhcrelay but something changes it back.

We also see a lot of
P 10.0.0.11.36490 > fuelpxe03.somecompany.com.amqp:

in the tcpdumps

But I don't know where 10.0.0.11 is?

[root@fuelpxe03 init.d]# dockerctl shell cobbler ip a
40: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 96:c2:ed:6e:1e:c2 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.2/16 scope global eth0
    inet6 fe80::94c2:edff:fe6e:1ec2/64 scope link
       valid_lft forever preferred_lft forever
42: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
[root@fuelpxe03 init.d]#

[root@fuelpxe03 init.d]# ip add | grep 10.0.0.11

Not sure if its related.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/98074

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Matthew Mosesohn (raytrac3r)
status: Triaged → In Progress
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

dhcrelay needs listen on both eth0 and docker0 (replace eth0 with your pxe network interface) to relay packets from both sides. I submitted a patch that addresses dhcrelay crashing if docker0 does not exist.

summary: - dchrelay cannot start after master node reboot
+ dhcrelay cannot start after master node reboot
Revision history for this message
Andrew (box857+launchpad) wrote :

Hi,

I applied the patch, rebooted. From what I can gather there is no change.

Please advise what else I need to do.

Thanks,

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Andrew, do you see dhcrelay monitor in supervisorctl status and is it running? It calls a bash script that restarts dhcrelay if it is stopped. The patch is here https://review.openstack.org/#/c/98074/

You can try to call the script in debug mode and see what it's doing. bash -x /usr/local/bin/dhcrelay_monitor

Revision history for this message
Andrew (box857+launchpad) wrote :

Hi,

We applied the patch last week and have had no issues with Fuel since. Everything pxe's great now. Thanks.

[root@fuelpxe02 ~]# supervisorctl status
dhcrelay_monitor BACKOFF Exited too quickly (process log may have details)
docker-astute RUNNING pid 10854, uptime 1 day, 0:48:10
docker-cobbler RUNNING pid 2074, uptime 1 day, 0:49:43
docker-mcollective RUNNING pid 5355, uptime 1 day, 0:48:59
docker-nailgun RUNNING pid 10291, uptime 1 day, 0:48:18
docker-nginx RUNNING pid 1594, uptime 1 day, 0:49:45
docker-ostf RUNNING pid 10500, uptime 1 day, 0:48:12
docker-postgres RUNNING pid 8716, uptime 1 day, 0:48:44
docker-rabbitmq RUNNING pid 1588, uptime 1 day, 0:49:45
docker-rsync RUNNING pid 7908, uptime 1 day, 0:48:48
docker-rsyslog RUNNING pid 1595, uptime 1 day, 0:49:45
[root@fuelpxe02 ~]#

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/98074
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=acf69bd4c1441458843a59858038d11ad2bbe9cc
Submitter: Jenkins
Branch: master

commit acf69bd4c1441458843a59858038d11ad2bbe9cc
Author: Matthew Mosesohn <email address hidden>
Date: Thu Jun 5 13:53:02 2014 +0400

    Add dhcrelay monitor via supervisor to restart

    Dockerctl can start dhcrelay, but not ensure it
    stays running. Supervisord can handle this task,
    so it should watch the pid of dhcrelay and trigger
    restart if it exits.

    Change-Id: I75909d25418e625a2b154125e9a76e22c7443e47
    Closes-Bug: #1324152

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.0)

Fix proposed to branch: stable/5.0
Review: https://review.openstack.org/100739

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/5.0)

Reviewed: https://review.openstack.org/100739
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=e105c1fe424d1461a23ed3a3c0d313ee7df2b0e9
Submitter: Jenkins
Branch: stable/5.0

commit e105c1fe424d1461a23ed3a3c0d313ee7df2b0e9
Author: Matthew Mosesohn <email address hidden>
Date: Thu Jun 5 13:53:02 2014 +0400

    Add dhcrelay monitor via supervisor to restart

    Dockerctl can start dhcrelay, but not ensure it
    stays running. Supervisord can handle this task,
    so it should watch the pid of dhcrelay and trigger
    restart if it exits.

    Change-Id: I75909d25418e625a2b154125e9a76e22c7443e47
    Closes-Bug: #1324152
    (cherry picked from commit acf69bd4c1441458843a59858038d11ad2bbe9cc)

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/5.1.x
tags: added: in progress
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

verified on {

    "build_id": "2014-09-17_21-40-34",
    "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346",
    "build_number": "11",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d",
    "production": "docker",
    "fuelmain_sha": "8ef433e939425eabd1034c0b70e90bdf888b69fd",
    "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13",
    "feature_groups": [
        "mirantis"
    ],
    "release": "5.1",
    "release_versions": {
        "2014.1.1-5.1": {
            "VERSION": {
                "build_id": "2014-09-17_21-40-34",
                "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346",
                "build_number": "11",
                "api": "1.0",
                "nailgun_sha": "eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d",
                "production": "docker",
                "fuelmain_sha": "8ef433e939425eabd1034c0b70e90bdf888b69fd",
                "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13",
                "feature_groups": [
                    "mirantis"
                ],
                "release": "5.1",
                "fuellib_sha": "d9b16846e54f76c8ebe7764d2b5b8231d6b25079"
            }
        }
    },
    "fuellib_sha": "d9b16846e54f76c8ebe7764d2b5b8231d6b25079"

}

tags: removed: in progress
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.