Security changes to chrony causing client to fail

Bug #1656086 reported by Shannon Mitchell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Undecided
Major Hayden

Bug Description

It looks bindaddress is being set by the security playboooks for chrony. When
this happens, chrony doens't make an attempt to reach out to the ntp servers
which eventually skews and causes cinder volumes to appear down. A tcpdump will
show communications via 232 to the chrony service over localhost and nothing ntp
related after on the default interface. I think it uses bindaddress for outgoing
connections as well, but documentation is lacking. I did find where you can disable
the server functionality all together and set up the chronyc->chronyd communications
to be limited to localhost.

https://chrony.tuxfamily.org/faq.html#_how_can_i_make_code_chronyd_code_more_secure

It looks like setting 'port 0' closes the server ports and opens a couple of random
unprivileged ports for the client side to communication out to ntp servers externally.

#####################################
# Findings with bindaddress in place
#####################################

root@infra01:~# grep ^bind /etc/chrony/chrony.conf
bindaddress 127.0.0.1
bindaddress ::1

root@infra01:~# chronyc sources
210 Number of sources = 4
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^? 104.156.99.226 2 10 0 73m -813us[ -813us] +/- 75ms
^? ntp.wdc1.us.leaseweb.net 2 10 0 67m +3204us[+3373us] +/- 289ms
^? pool-173-71-80-235.cmdnnj 0 10 0 10y +0ns[ +0ns] +/- 0ns
^? blue.1e400.net 0 10 0 10y +0ns[ +0ns] +/- 0ns

root@infra01:~# grep chrony /var/log/syslog | tail -n 1
Jan 12 19:51:40 infra01 chronyd[31788]: Could not send to 108.59.2.24:123 : Invalid argument

root@infra01:~# ss -ntpul | grep chron
tcp UNCONN 0 0 127.0.0.1:123 *:* users:(("chronyd",31788,1))
tcp UNCONN 0 0 127.0.0.1:323 *:* users:(("chronyd",31788,3))
tcp UNCONN 0 0 ::1:123 :::* users:(("chronyd",31788,2))
tcp UNCONN 0 0 ::1:323 :::* users:(("chronyd",31788,5))

###########################################################################
# Findings after changing bindaddress to bindcmdaddress and adding 'port 0'
###########################################################################

root@infra01:~# awk '/^(port|bind)/' /etc/chrony/chrony.conf
port 0
bindcmdaddress 127.0.0.1
bindcmdaddress ::1

root@infra01:~# chronyc sources
210 Number of sources = 4
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^+ host-74-205-141-94.VABOLT 2 8 37 53 +2885us[+3086us] +/- 63ms
^* ntp1.wiktel.com 1 8 37 52 -1605us[-1404us] +/- 39ms
^+ zero.gotroot.ca 2 9 37 52 +2298us[+2298us] +/- 25ms
^- ntp1.torix.ca 2 8 37 53 +643us[ +844us] +/- 534ms

root@infra01:~# ss -ntpul | grep chron
tcp UNCONN 0 0 127.0.0.1:323 *:* users:(("chronyd",884,3))
tcp UNCONN 0 0 *:50996 *:* users:(("chronyd",884,1))
tcp UNCONN 0 0 :::35886 :::* users:(("chronyd",884,2))
tcp UNCONN 0 0 ::1:323 :::* users:(("chronyd",884,5))

I think the following template is setting the bindaddress entries causing the issue.

https://github.com/openstack/openstack-ansible-security/blob/master/templates/chrony.conf.j2

Revision history for this message
Major Hayden (rackerhacker) wrote :

Thanks for the bug, Shannon! By default, the role configures chronyd to listen for NTP requests only on localhost, but you can disable that feature by setting an Ansible variable:

security_ntp_bind_local_interfaces_only: False

That will ensure that chronyd listens on all interfaces. Does that help?

Changed in openstack-ansible:
status: New → Incomplete
Revision history for this message
Shannon Mitchell (shannon-mitchell) wrote :

Maybe not. Looks like tying it to localhost breaks the client side capabilities making it useless(unless I'm missing something). At that point, might as well uninstall ntp and chronyd. As connections to cinder need the time to be synced, I'm guessing this isn't what we want.

Setting the port to 0 would disable the server side of things, but still allow the client to receive the query results on the external interface on unprivileged ports. Also setting the bindcmdaddress to localhost will limit the chronyc management connections. As it is right now I have seen people just install ntp to resolve the issue after running OSA with the security plays. I have also seen people skipping the security plays all together due to little paper cuts like this which isn't good.

There may be a better way to do this, but the docs on chronyd are lacking. Some recent versions of chronyd have added a bindacqaddress setting just for binding the client side of things.

Revision history for this message
Shannon Mitchell (shannon-mitchell) wrote :

Hello,

I just wanted to update as I think this has the potential for braking an environment if someone decides to use chronyd with the current security playbooks. The current settings break a system from acting as an ntp client. It can gather a list of sources, but it can't actually do the ntp request without having the return ports listening externally. The following settings put it in a "client only" type mode while disabling it from acting as a server.

# "client only settings"
port 0
bindcmdaddress 127.0.0.1
bindcmdaddress ::1

# Tracking status with current 'bindaddress' settings in the chrony.conf.j2 template
root@infra01:~# chronyc tracking
Reference ID : 127.127.1.1 ()
Stratum : 10
Ref time (UTC) : Thu Feb 23 16:24:09 2017
System time : 0.000000000 seconds fast of NTP time
Last offset : 0.000000000 seconds
RMS offset : 0.000000000 seconds
Frequency : 38.708 ppm slow
Residual freq : 0.000 ppm
Skew : 0.000 ppm
Root delay : 0.000000 seconds
Root dispersion : 0.000001 seconds
Update interval : 0.0 seconds
Leap status : Not synchronised

# Tracking status using 'bindcmdaddress' and 'port 0'
root@infra01:~# chronyc tracking
Reference ID : 45.33.43.25 (eqo3xkmpo4hgof7.com)
Stratum : 3
Ref time (UTC) : Thu Feb 23 16:25:05 2017
System time : 0.000023752 seconds fast of NTP time
Last offset : 0.001212362 seconds
RMS offset : 0.001212362 seconds
Frequency : 38.708 ppm slow
Residual freq : 0.228 ppm
Skew : 0.006 ppm
Root delay : 0.004643 seconds
Root dispersion : 0.048120 seconds
Update interval : 2.0 seconds
Leap status : Normal

I have seen this lead to lost communications with cinder after a while due to the system's time not being updated across the environment.

Revision history for this message
Keith Fralick (keith-fralick) wrote :

This is a confirmed issue. The default installation of OpenStack via OSA leaves Chrony in an inoperable state. By altering the time on a device, Chrony refused to update the time and 'chronyc tracking & chronyc sources' demonstrates the failure. The default OSA install should not leave chrony broken.

Changed in openstack-ansible:
status: Incomplete → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-security (master)

Fix proposed to branch: master
Review: https://review.openstack.org/437670

Changed in openstack-ansible:
assignee: nobody → Shannon Mitchell (shannon-mitchell)
status: Confirmed → In Progress
Changed in openstack-ansible:
assignee: Shannon Mitchell (shannon-mitchell) → Major Hayden (rackerhacker)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-security (master)

Reviewed: https://review.openstack.org/437670
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-security/commit/?id=4cb2fa4eaa5a2b2d282f9292b1a19f79fcb5ebbb
Submitter: Jenkins
Branch: master

commit 4cb2fa4eaa5a2b2d282f9292b1a19f79fcb5ebbb
Author: Shannon Mitchell <email address hidden>
Date: Thu Feb 23 15:00:02 2017 -0600

    Enable ntp client functionality with chronyd

    Using 'bindaddress' in the /etc/chrony/chrony.conf disables both
    client and server ntp functionality as it cannot get the ntp
    responses from peer servers. The default install will leave the
    servers unsynced with an ntp source causing them to skew over
    time and eventually break services that rely on synced time.
    Setting 'port 0' will disable the server functionality. Using
    'bindcmdaddress' will still chronc<->chronyd communictions over
    localhost only. This should allow client functionality and
    disable server functionality.

    Change-Id: Ie9b6e73333d9469a17e4cee06f21aa99b2b3df7e
    Closes-Bug: #1656086

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-security (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/438033

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-security (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/438034

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-security (stable/newton)

Reviewed: https://review.openstack.org/438034
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-security/commit/?id=470232058f6ec904c6a5e72b240e186313539953
Submitter: Jenkins
Branch: stable/newton

commit 470232058f6ec904c6a5e72b240e186313539953
Author: Shannon Mitchell <email address hidden>
Date: Thu Feb 23 15:00:02 2017 -0600

    Enable ntp client functionality with chronyd

    Using 'bindaddress' in the /etc/chrony/chrony.conf disables both
    client and server ntp functionality as it cannot get the ntp
    responses from peer servers. The default install will leave the
    servers unsynced with an ntp source causing them to skew over
    time and eventually break services that rely on synced time.
    Setting 'port 0' will disable the server functionality. Using
    'bindcmdaddress' will still chronc<->chronyd communictions over
    localhost only. This should allow client functionality and
    disable server functionality.

    Change-Id: Ie9b6e73333d9469a17e4cee06f21aa99b2b3df7e
    Closes-Bug: #1656086
    (cherry picked from commit 4cb2fa4eaa5a2b2d282f9292b1a19f79fcb5ebbb)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-security (stable/ocata)

Reviewed: https://review.openstack.org/438033
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-security/commit/?id=e7dc4eec61816ae1e20c8e6cdabffe7f03cc238c
Submitter: Jenkins
Branch: stable/ocata

commit e7dc4eec61816ae1e20c8e6cdabffe7f03cc238c
Author: Shannon Mitchell <email address hidden>
Date: Thu Feb 23 15:00:02 2017 -0600

    Enable ntp client functionality with chronyd

    Using 'bindaddress' in the /etc/chrony/chrony.conf disables both
    client and server ntp functionality as it cannot get the ntp
    responses from peer servers. The default install will leave the
    servers unsynced with an ntp source causing them to skew over
    time and eventually break services that rely on synced time.
    Setting 'port 0' will disable the server functionality. Using
    'bindcmdaddress' will still chronc<->chronyd communictions over
    localhost only. This should allow client functionality and
    disable server functionality.

    Change-Id: Ie9b6e73333d9469a17e4cee06f21aa99b2b3df7e
    Closes-Bug: #1656086
    (cherry picked from commit 4cb2fa4eaa5a2b2d282f9292b1a19f79fcb5ebbb)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-security 14.1.1

This issue was fixed in the openstack/openstack-ansible-security 14.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-security 15.1.0

This issue was fixed in the openstack/openstack-ansible-security 15.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-security 16.0.0.0b1

This issue was fixed in the openstack/openstack-ansible-security 16.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.