Scheduled events (e.g. LDAP connection retries) are affected by clock adjustments

Bug #1641875 reported by Victor Tapia on 2016-11-15
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sssd (Ubuntu)
Medium
Victor Tapia
Trusty
Medium
Victor Tapia
Xenial
Medium
Victor Tapia
Yakkety
Medium
Victor Tapia
Zesty
Medium
Victor Tapia

Bug Description

[Impact]

 * SSSD is affected by clock shifts because it's built on libtevent, that doesn't use monotonic clocks.
 * After an event has been scheduled, if the time drifts to the past SSSD won't recover and the event will have to wait the shifted time to be executed.

[Test Case]

 * Modify the /etc/hosts file to force a failed resolution and restart sssd.
 * Within the first ~1 minute retry window, change the date (date --set) to a previous time (e.g. 2h).
 * "Tail" the log file. The event will be executed at the original schedule, past those 2h.

[Regression Potential]

 * None

[Other Info]

 * Version 1.15 implements a watchdog that detects time shifts and resets itselft but doesn't reschedule the scheduled events.
 * There's a network monitor based on netlink that detects interface changes and restarts the providers, rescheduling the scheduled events.
 * Such restart can be triggered with SIGUSR2. Sending that signal after the watchdog restarts fixes this issue.
 * Upstream bug: https://fedorahosted.org/sssd/ticket/3285

[Original Description]

When SSSD fails to connect to a provider (LDAP, for instance) it creates a timed event with tevent_add_timer() in order to retry in ~1 min. Tevent relies on CLOCK_REALTIME, using absolute epoch time, so when the time changes (e.g. NTP sync) the scheduled event is affected.

Reproducer:

1. Modify the /etc/hosts file to force a failed resolution and restart sssd

(Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'ldap' in DNS
(Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [resolv_gethostbyname_done] (0x0040): querying hosts database failed [5]: Input/output error
(Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not contact DNS servers
...
(Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [check_online_callback] (0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

2. Within the ~1 minute window, change the date (date --set) to a previous time (2 hour in my example). Note: if /etc/resolv.conf or a network interface is modified, SSSD providers will restart, but the scheduled retry will be kept.

(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [recreate_ares_channel] (0x0100): Initializing new c-ares channel
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [recreate_ares_channel] (0x0100): Destroying the old c-ares channel
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'name not resolved'
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ldap' as 'neutral'
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
...
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not contact DNS servers
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'not working'
...
(Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [check_online_callback] (0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

3. Once it gets to the programmed date (2 hours later), it retries:

(Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'name not resolved'
(Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ldap' as 'neutral'
(Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
...
(Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not contact DNS servers
(Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'not working'
...
(Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [check_online_callback] (0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

Victor Tapia (vtapia) on 2016-11-22
Changed in sssd (Ubuntu Trusty):
assignee: nobody → Victor Tapia (vtapia)
Changed in sssd (Ubuntu Xenial):
assignee: nobody → Victor Tapia (vtapia)
Changed in sssd (Ubuntu Yakkety):
assignee: nobody → Victor Tapia (vtapia)
Changed in sssd (Ubuntu Zesty):
assignee: nobody → Victor Tapia (vtapia)
Victor Tapia (vtapia) on 2017-02-14
description: updated
Victor Tapia (vtapia) on 2017-02-15
Changed in sssd (Ubuntu Trusty):
importance: Undecided → Medium
Changed in sssd (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in sssd (Ubuntu Zesty):
importance: Undecided → Medium
Changed in sssd (Ubuntu Xenial):
importance: Undecided → Medium
tags: added: sts-sponsor sts-sru
Victor Tapia (vtapia) wrote :
Louis Bouchard (louis) on 2017-02-21
Changed in sssd (Ubuntu Zesty):
status: New → In Progress
Changed in sssd (Ubuntu Yakkety):
status: New → Triaged
Changed in sssd (Ubuntu Xenial):
status: New → Triaged
Changed in sssd (Ubuntu Trusty):
status: New → Triaged
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sssd - 1.15.0-3ubuntu3

---------------
sssd (1.15.0-3ubuntu3) zesty; urgency=medium

  * d/p/restart_providers_on_timeshift.patch: Use SIGUSR2 after watchdog
    detects time shift to execute pending scheduled tasks that could
    be stuck (LP: #1641875)

 -- Victor Tapia <email address hidden> Wed, 15 Feb 2017 17:05:05 +0100

Changed in sssd (Ubuntu Zesty):
status: In Progress → Fix Released
Victor Tapia (vtapia) wrote :
Victor Tapia (vtapia) wrote :
Louis Bouchard (louis) wrote :

Fixed debdiff used for sponsorship (fixed versionning to stay coherent with other releases)

tags: removed: sts-sponsor

Hello Victor, or anyone else affected,

Accepted sssd into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/sssd/1.13.4-3ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in sssd (Ubuntu Yakkety):
status: Triaged → Fix Committed
tags: added: verification-needed
Changed in sssd (Ubuntu Xenial):
status: Triaged → Fix Committed
Brian Murray (brian-murray) wrote :

Hello Victor, or anyone else affected,

Accepted sssd into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/sssd/1.13.4-1ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in sssd (Ubuntu Trusty):
status: Triaged → Fix Committed
Brian Murray (brian-murray) wrote :

Hello Victor, or anyone else affected,

Accepted sssd into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/sssd/1.11.8-0ubuntu0.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Victor Tapia (vtapia) wrote :
Victor Tapia (vtapia) wrote :

# VERIFICATION FOR X&Y

After running the following script (repro.sh), the log shows the LDAP resolution is rescheduled:

#!/bin/bash

change_time() {
        sudo systemctl stop systemd-timesyncd
        HOUR=$(date +%H)
        MIN=$(date +%M)
        NEW=$((10#$HOUR - 1))

        sudo date --set "${NEW}:${MIN}"
        echo "Setting time to ${NEW}:${MIN}"
}

sudo systemctl restart sssd
sleep 20
change_time

Excerpts from /var/log/sssd/sssd_openstacklocal.log:

(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'not working'
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'not working'
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0020): No available servers for service 'LDAP'
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_resolve_server_done] (0x1000): Server resolution failed: 5
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_mark_offline] (0x2000): Going offline!
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask.
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 63 seconds from now [1488462456]
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [sdap_id_op_connect_done] (0x4000): notify offline to op #1
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [sdap_dom_enum_ex_connected] (0x0400): Backend is marked offline, retry later!
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_ptask_done] (0x0400): Task [enumeration]: finished successfully
(Thu Mar 2 13:46:33 2017) [sssd[be[openstacklocal]]] [be_ptask_schedule] (0x0400): Task [enumeration]: scheduling task 300 seconds from last execution time [1488462691]

(Thu Mar 2 12:46:00 2017) [sssd[be[openstacklocal]]] [watchdog_fd_read_handler] (0x0040): Time shift detected, restarting watchdog!
* Some initialization log entries
(Thu Mar 2 12:46:40 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
(Thu Mar 2 12:46:40 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'name not resolved'
(Thu Mar 2 12:46:40 2017) [sssd[be[openstacklocal]]] [get_port_status] (0x1000): Port status of port 389 for server 'ldap' is 'neutral'
(Thu Mar 2 12:46:40 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds

tags: added: verification-done-xenial verification-done-yakkety
Brian Murray (brian-murray) wrote :

Hello Victor, or anyone else affected,

Accepted sssd into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/sssd/1.11.8-0ubuntu0.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Victor Tapia (vtapia) wrote :

# VERIFICATION FOR T

Same script as for X&Y, works as expected:

(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'not working'
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'not working'
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0020): No available servers for service 'LDAP'
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [be_resolve_server_done] (0x1000): Server resolution failed: 5
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [be_mark_offline] (0x2000): Going offline!
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created
(Mon Mar 6 04:31:25 2017) [sssd[be[openstacklocal]]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 69 seconds from now [1488774754]
...
(Mon Mar 6 03:31:00 2017) [sssd[be[openstacklocal]]] [watchdog_fd_read_handler] (0x0040): Time shift detected, restarting watchdog!
...
(Mon Mar 6 03:31:41 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
(Mon Mar 6 03:31:41 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'name not resolved'
(Mon Mar 6 03:31:41 2017) [sssd[be[openstacklocal]]] [get_port_status] (0x1000): Port status of port 389 for server 'ldap' is 'neutral'
(Mon Mar 6 03:31:41 2017) [sssd[be[openstacklocal]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds
(Mon Mar 6 03:31:41 2017) [sssd[be[openstacklocal]]] [get_server_status] (0x1000): Status of server 'ldap' is 'name not resolved'
(Mon Mar 6 03:31:41 2017) [sssd[be[openstacklocal]]] [resolv_is_address] (0x4000): [ldap] does not look like an IP address

tags: added: verification-done
removed: verification-done-xenial verification-done-yakkety verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sssd - 1.13.4-3ubuntu0.1

---------------
sssd (1.13.4-3ubuntu0.1) yakkety; urgency=medium

  * d/p/restart_providers_on_timeshift.patch: Implement watchdog and
    use SIGUSR2 after watchdog detects time shift to execute pending
    scheduled tasks that could be stuck (LP: #1641875)

 -- Victor Tapia <email address hidden> Thu, 23 Feb 2017 11:20:43 +0100

Changed in sssd (Ubuntu Yakkety):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for sssd has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sssd - 1.13.4-1ubuntu1.2

---------------
sssd (1.13.4-1ubuntu1.2) xenial; urgency=medium

  * d/p/restart_providers_on_timeshift.patch: Implement watchdog and
    use SIGUSR2 after watchdog detects time shift to execute pending
    scheduled tasks that could be stuck (LP: #1641875)

 -- Victor Tapia <email address hidden> Thu, 23 Feb 2017 11:46:31 +0100

Changed in sssd (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sssd - 1.11.8-0ubuntu0.5

---------------
sssd (1.11.8-0ubuntu0.5) trusty; urgency=medium

  * d/p/BUILD-Fix-linking-with-librt.patch: Upstream fix for FTBS on
    ppc64el/arm64 after the implementation of timer functions in watchdog
    (LP: #1641875).

 -- Victor Tapia <email address hidden> Tue, 28 Feb 2017 10:20:19 +0100

Changed in sssd (Ubuntu Trusty):
status: Fix Committed → Fix Released
Louis Bouchard (louis) on 2017-03-22
tags: added: sts-sru-done
removed: sts-sru
Timo Aaltonen (tjaalton) wrote :

the patch has not been sent upstream yet?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers