Dhc relay check failed in system tests

Bug #1320834 reported by Andrey Sledzinskiy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Aleksandr Didenko

Bug Description

http://jenkins-product.srt.mirantis.net:8080/view/0_0_swarm/job/master_fuelmain.system_test.ubuntu.services/57/testReport/junit/%28root%29/prepare_slaves_3/prepare_slaves_3/

Test failed on bootstrapping nodes on dhcrelay_check - "dhcpcheck discover --ifaces eth0 --repeat 3 --timeout 10" returns nothing

Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 332, in run
    testMethod()
  File "/usr/lib/python2.7/unittest/case.py", line 1044, in runTest
    self._testFunc()
  File "/home/jenkins/venv-nailgun-tests/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/master_fuelmain.system_test.ubuntu.services/fuelweb_test/tests/base_test_case.py", line 96, in prepare_slaves_3
    self.env.bootstrap_nodes(self.env.nodes().slaves[:3])
  File "/home/jenkins/workspace/master_fuelmain.system_test.ubuntu.services/fuelweb_test/models/environment.py", line 115, in bootstrap_nodes
    self.dhcrelay_check()
  File "/home/jenkins/workspace/master_fuelmain.system_test.ubuntu.services/fuelweb_test/models/environment.py", line 406, in dhcrelay_check
    assert_equal(len(master_ip), 1)
  File "/home/jenkins/venv-nailgun-tests/local/lib/python2.7/site-packages/proboscis/asserts.py", line 55, in assert_equal
    raise ASSERTION_ERROR(message)
AssertionError: 0 != 1

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Alexander Didenko (adidenko)
status: New → Triaged
Revision history for this message
Aleksandr Didenko (adidenko) wrote :
Download full text (3.9 KiB)

I was able to reproduce it. For some reason puppet sometimes does not try to bring "httpd" service up before "cobbler_sync":

Notice: /Stage[main]/Cobbler::Server/Service[httpd]/enable: enable changed 'false' to 'true'
Notice: /Stage[main]/Cobbler::Server/Service[cobblerd]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Cobbler::Server/Service[cobblerd]: Scheduling refresh of Exec[cobbler_sync]
Info: /Stage[main]/Cobbler::Server/Service[cobblerd]: Unscheduling refresh on Service[cobblerd]
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: httpd does not appear to be running and proxying cobbler, or SELinux is in the way. Original traceback:
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: Traceback (most recent call last):
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: File "/usr/lib/python2.6/site-packages/cobbler/cli.py", line 184, in check_setup
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: s.ping()
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: return self.__send(self.__name, args)
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: verbose=self.__verbose
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: File "/usr/lib64/python2.6/xmlrpclib.py", line 1235, in request
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: self.send_content(h, request_body)
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: File "/usr/lib64/python2.6/xmlrpclib.py", line 1349, in send_content
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: connection.endheaders()
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: self._send_output()
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: self.send(msg)
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: File "/usr/lib64/python2.6/httplib.py", line 739, in send
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: self.connect()
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: File "/usr/lib64/python2.6/httplib.py", line 720, in connect
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: self.timeout)
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: raise error, msg
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: error: [Errno 111] Connection refused
Error: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]: Failed to call refresh: cob...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-main (master)

Fix proposed to branch: master
Review: https://review.openstack.org/94169

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
Dmitry Ilyin (idv1985) wrote :

I don't see how these services could be related too dnsmasq+dhcprelay.

Also as for me dhcpcheck did not work if you provide --repeat 3 --timeout 10 try running without these options.

Puppet should restart services if it heve modified their configurstion if there is notification and I hope it's there. And I don't see any reason why there are these start amd stop commands inside this script.

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

> I don't see how these services could be related too dnsmasq+dhcprelay.

This is how they are related:

Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: httpd does not appear to be running and proxying cobbler, or SELinux is in the way.

Which breaks cobbler container configuration.

> Also as for me dhcpcheck did not work if you provide --repeat 3 --timeout 10 try running without these options.

It works fine with these options when cobbler container is configured correctly.

> Puppet should restart services if it heve modified their configurstion if there is notification and I hope it's there.

This is an example of correct behavior when puppet brings httpd up (ok.log in the attachment):

Notice: /Stage[main]/Cobbler::Server/Service[httpd]/ensure: ensure changed 'stopped' to 'running'

But there is no such notification on the problem Fuel VM in docker-cobbler logs, only about enabling service (fail.log in the attachment):

Notice: /Stage[main]/Cobbler::Server/Service[httpd]/enable: enable changed 'false' to 'true'

Which means puppet did not start httpd service, although it's strictly declared in manifests (requirements and "ensure => running").

Please check logs in the attachment. I agree that the bug is pretty odd and intermittent, I managed to reproduce it only few times.

The only scenario that comes to my mind is the following:

1) /var/run/httpd/httpd.pid contains some pid (for example 11)
2) During puppet run we have some process with the same PID (11) currently running.
3) Then "/etc/init.d/httpd status" (and "service httpd status") would return:
httpd (pid 11) is running...
Although it's not httpd but some othe process with the same pid as in /var/run/httpd/httpd.pid
4) Puppet will not try to bring httpd up, because it thinks httpd is already running.

And in this scenario https://review.openstack.org/#/c/94169/ will help, as it will make sure no "dead" pids are left behind.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-main (master)

Reviewed: https://review.openstack.org/94169
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=6da775927f97c9d01ac466fd6cf7a44c9ada8ca2
Submitter: Jenkins
Branch: master

commit 6da775927f97c9d01ac466fd6cf7a44c9ada8ca2
Author: Aleksandr Didenko <email address hidden>
Date: Mon May 19 14:04:44 2014 +0300

    Update cobbler container start script

    Stop httpd and xinetd services before puppet run in order to make
    sure no pids left. Puppet will configure and start needed services
    itself according to manifests.

    Closes-bug: #1320834
    Change-Id: I3a338a13a94512676828b006521672a815e7b922

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Igor Shishkin (teran) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/94232
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=7f0cd165e780149351e41324ce844d4f4a4fefa2
Submitter: Jenkins
Branch: master

commit 7f0cd165e780149351e41324ce844d4f4a4fefa2
Author: Igor Shishkin <email address hidden>
Date: Mon May 19 20:51:26 2014 +0400

    Fix for dhcp missed inside admin net (plus a little refactoring)

    Closes-Bug: #1320834

    Change-Id: I3137377c7e4386e03d5803d60966ddf59ed05f07

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/94162
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=5b713f01229ccb78d68aa614d68535f5b4ddf7d8
Submitter: Jenkins
Branch: master

commit 5b713f01229ccb78d68aa614d68535f5b4ddf7d8
Author: Dima Shulyak <email address hidden>
Date: Mon May 19 13:21:54 2014 +0300

    Add repeat option to dhcp check

    repeat option was disabled when there was no tests for dhcp checker

    Repeat logic:

    send discover -> wait -> send -> wait

    Keep in mind that it does not stop when first dhcp offer is received,
    so for example if 3 repeat with timeout 10 will be set - it will run atleast for 30 seconds

    It does not affect current deployment because of usage repeat=1
    in net_probe.rb

    Change-Id: I9deec4d9bf9ef648518c678218e2460d3170bec3
    Related-Bug: #1320834
    Related-Bug: #1317525

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.