Dnsmasq can't start during the defined timeout

Bug #1624245 reported by Alexander Rubtsov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Denis Puchkin

Bug Description

Sometimes p_dns resource doesn't manage to start on a Controller node during the defined timeout

--- Environment ---
MOS: 7.0
Mode: HA with 3 Controllers

--- Steps to reproduce ---
The issue occurs under some circumstances which I was not able to localize
pcs resource disable p_dns
pcs resource enable p_dns

--- Actual result ---
# pcs status
.....
 Clone Set: clone_p_dns [p_dns]
     Started: [ cic-1.domain.tld cic-2.domain.tld ]
     Stopped: [ cic-3.domain.tld ]
.....
Failed actions:
    p_dns_start_0 on cic-3.domain.tld 'unknown error' (1): call=7168, status=Timed Out, last-rc-change='Mon Sep 12 15:26:34 2016', queued=0ms, exec=30001ms

# less daemon.log (on a Controller where p_dns was unable to start)
.....
2016-09-12T15:33:05.983929+02:00 cic-3 pacemaker_remoted[4879]: warning: child_timeout_callback: p_dns_start_0 process (PID 23221) timed out
2016-09-12T15:33:05.984243+02:00 cic-3 pacemaker_remoted[4879]: warning: operation_finished: p_dns_start_0:23221 - timed out after 30000ms
2016-09-12T15:33:06.166705+02:00 cic-3 ocf-ns_dns: INFO: dnsmasq daemon is not running
2016-09-12T15:33:06.170053+02:00 cic-3 ocf-ns_dns: INFO: Stopped dnsmasq daemon.
.....

--- Expected result ---
Having the value of timeout increased, p_dns is successfully able to start:
# crm configure edit
.....
primitive p_dns ocf:fuel:ns_dns \
        op monitor interval=20 timeout=90 \
        op start interval=0 timeout=90 \
        op stop interval=0 timeout=90 \
.....

# pcs status
.....
 Clone Set: clone_p_dns [p_dns]
     Started: [ cic-1.domain.tld cic-2.domain.tld cic-3.domain.tld ]
.....

# less daemon.log
.....
2016-09-15T14:43:37.961782+02:00 cic-3 dnsmasq[5900]: started, version 2.68 cachesize 150
2016-09-15T14:43:37.961799+02:00 cic-3 dnsmasq[5900]: compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua TFTP conntrack ipset auth
2016-09-15T14:43:37.961823+02:00 cic-3 dnsmasq[5900]: using nameserver 192.168.0.11#53 for domain domain.tld
2016-09-15T14:43:37.961871+02:00 cic-3 dnsmasq[5900]: no servers found in /etc/resolv.dnsmasq.conf, will retry
2016-09-15T14:43:37.962024+02:00 cic-3 dnsmasq[5900]: read /etc/hosts - 16 addresses
2016-09-15T14:43:37.964368+02:00 cic-3 ocf-ns_dns: INFO: Started dnsmasq daemon.
2016-09-15T14:43:38.075824+02:00 cic-3 ocf-ns_dns: INFO: dnsmasq daemon running
.....

Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

sla1 for 7.0-updates

description: updated
Changed in fuel:
importance: Undecided → High
Denis Puchkin (dpuchkin)
Changed in fuel:
assignee: nobody → Denis Puchkin (dpuchkin)
Denis Puchkin (dpuchkin)
Changed in fuel:
status: New → Confirmed
Revision history for this message
Denis Puchkin (dpuchkin) wrote :

I can not reproduce on my env, increase the timeout not a solution, need to find the cause of the delay of start of p_dns resource

Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

The issue was caused by some specific settings on the certain environment which introduce the delay in starting of dnsmasq. So, the p_dns resource was unable to start in time.

Changed in fuel:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.