[Hyper-V] Implement Hyper-V PTP Source

Bug #1676635 reported by Joshua R. Poulson on 2017-03-27
56
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Xenial
High
Marcelo Cerri
Yakkety
Medium
Unassigned
Zesty
Medium
Unassigned

Bug Description

Please include the following upstream commit into lts-xenial, 16.04 HWE, Yakkity, and Zesty. This will improve the behavior of timesync on Hyper-V hosts while simultaneously using network time sync protocols like NTP.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/hv/hv_util.c?id=3716a49a81ba19dda7202633a68b28564ba95eb5

commit 3716a49a81ba19dda7202633a68b28564ba95eb5
Author: Vitaly Kuznetsov <email address hidden>
Date: Sat Feb 4 09:57:14 2017 -0700

    hv_utils: implement Hyper-V PTP source

    With TimeSync version 4 protocol support we started updating system time
    continuously through the whole lifetime of Hyper-V guests. Every 5 seconds
    there is a time sample from the host which triggers do_settimeofday[64]().
    While the time from the host is very accurate such adjustments may cause
    issues:
    - Time is jumping forward and backward, some applications may misbehave.
    - In case an NTP server runs in parallel and uses something else for time
      sync (network, PTP,...) system time will never converge.
    - Systemd starts annoying you by printing "Time has been changed" every 5
      seconds to the system log.

    Instead of doing in-kernel time adjustments offload the work to an
    NTP client by exposing TimeSync messages as a PTP device. Users may now
    decide what they want to use as a source.

    I tested the solution with chrony, the config was:

     refclock PHC /dev/ptp0 poll 3 dpoll -2 offset 0

    The result I'm seeing is accurate enough, the time delta between the guest
    and the host is almost always within [-10us, +10us], the in-kernel solution
    was giving us comparable results.

    I also tried implementing PPS device instead of PTP by using not currently
    used Hyper-V synthetic timers (we use only one of four for clockevent) but
    with PPS source only chrony wasn't able to give me the required accuracy,
    the delta often more that 100us.

    Signed-off-by: Vitaly Kuznetsov <email address hidden>
    Signed-off-by: K. Y. Srinivasan <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>

Joshua R. Poulson (jrp) on 2017-03-27
Changed in linux (Ubuntu):
status: New → Confirmed
Joshua R. Poulson (jrp) on 2017-03-27
description: updated
tags: added: kernel-da-key kernel-hyper-v xenial yakkety zesty
Changed in linux (Ubuntu Xenial):
status: New → Triaged
Changed in linux (Ubuntu Yakkety):
status: New → Triaged
Changed in linux (Ubuntu Zesty):
status: Confirmed → Triaged
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
Changed in linux (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in linux (Ubuntu Zesty):
importance: Undecided → Medium
Joshua R. Poulson (jrp) wrote :

This patch needs the CONFIG_PTP_1588_CLOCK flag set in the kernel config.

Tim Gardner (timg-tpi) on 2017-03-28
Changed in linux (Ubuntu Zesty):
status: Triaged → Fix Committed
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Tim Gardner (timg-tpi)
status: Triaged → In Progress
Launchpad Janitor (janitor) wrote :
Download full text (21.0 KiB)

This bug was fixed in the package linux - 4.10.0-19.21

---------------
linux (4.10.0-19.21) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1680535

  * ADT regressions caused by "audit: fix auditd/kernel connection state
    tracking" (LP: #1680532)
    - SAUCE: Revert "audit: fix auditd/kernel connection state tracking"

  * Miscellaneous Ubuntu changes
    - [Config] updateconfigs to update CONFIG_GENERIC_CSUM for ppc64el
      This cleans up behind a Kconfig change that went undetected.

linux (4.10.0-18.20) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1680168

  * smartpqi driver needed in initram disk and installer (LP: #1680156)
    - UBUNU: [Config] Add smartpqi to d-i

linux (4.10.0-17.19) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1679718

  * Fix CVE-2017-7308 (LP: #1678009)
    - net/packet: fix overflow in check for priv area size
    - net/packet: fix overflow in check for tp_frame_nr
    - net/packet: fix overflow in check for tp_reserve

  * apparmor: oops on boot if parameters set on grub command line (LP: #1678048)
    - SAUCE: apparmor: fix parameters so that the permission test is bypassed at boot

  * apparmor: does not provide a way to detect policy updataes (LP: #1678032)
    - SAUCE: apparmor: add policy revision file interface

  * apparmor does not make support of query data visible (LP: #1678023)
    - SAUCE: apparmor: add label data availability to the feature set

  * apparmor query interface does not make supported query info available
    (LP: #1678030)
    - SAUCE: apparmor: add information about the query inteface to the feature set

  * change_profile incorrect when using namespaces with a compound stack
    (LP: #1677959)
    - SAUCE: apparmor: fix label parse for stacked labels

  * Zesty update to v4.10.8 stable release (LP: #1678930)
    - xfrm: policy: init locks early
    - xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
    - xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
    - KVM: nVMX: Fix nested VPID vmx exec control
    - KVM: x86: cleanup the page tracking SRCU instance
    - virtio_balloon: init 1st buffer in stats vq
    - pinctrl: qcom: Don't clear status bit on irq_unmask
    - c6x/ptrace: Remove useless PTRACE_SETREGSET implementation
    - h8300/ptrace: Fix incorrect register transfer count
    - mips/ptrace: Preserve previous registers for short regset write
    - sparc/ptrace: Preserve previous registers for short regset write
    - metag/ptrace: Preserve previous registers for short regset write
    - metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
    - metag/ptrace: Reject partial NT_METAG_RPIPE writes
    - qla2xxx: Allow vref count to timeout on vport delete.
    - sched/rt: Add a missing rescheduling point
    - usb: musb: fix possible spinlock deadlock
    - Linux 4.10.8

  * [Hyper-V] pci-hyperv: Use device serial number as PCI domain (LP: #1667527)
    - net/mlx4_core: Use cq quota in SRIOV when creating completion EQs
    - PCI: hv: Use device serial number as PCI domain

  * Miscellaneous Ubuntu changes
    - [Config] flash-kernel should be a...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Daniel (d-maml-o) wrote :

What's the status with Xenial?

Brad Figg (brad-figg) on 2017-05-18
Changed in linux (Ubuntu Xenial):
status: Triaged → Won't Fix
status: Won't Fix → Triaged
Changed in linux (Ubuntu Xenial):
status: Triaged → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)

is there any workaround until the patch is release for Xenial?

this issue is causing our Mongodb cluster to misbehave since the oscillating clock is triggering a bug (not yet fixed and released) that lead the mongodb process to accumulate journal files and filling the disk until it stops working.

do you have an ETA for Xenial?

thank you.

Alex Ng (alexng-v) wrote :

@Alberto Ornaghi.

Are you running in Azure or in your own Hyper-V host? If you have access to the Hyper-V host, you can disable the TimeSync integration service from the VM settings.

Otherwise, you can create a script that runs "echo 2dd1ce17-079e-403c-b352-a1921ee207ee > /sys/bus/vmbus/drivers/hv_util/unbind" upon startup. This disables TimeSync from within the guest VM.

Hope this helps,
Alex

@alex

I'm running on Azure :(

I will try the script you suggested, thank you.

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-yakkety
tags: added: verification-done-yakkety
removed: verification-needed-yakkety
Launchpad Janitor (janitor) wrote :
Download full text (4.3 KiB)

This bug was fixed in the package linux - 4.8.0-54.57

---------------
linux (4.8.0-54.57) yakkety; urgency=low

  * linux: 4.8.0-54.57 -proposed tracker (LP: #1692589)

  * CVE-2017-0605
    - tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

  * Populating Hyper-V MSR for Ubuntu 13.10 (LP: #1193172)
    - SAUCE: (no-up) hv: Supply vendor ID and package ABI

  * [Hyper-V] Implement Hyper-V PTP Source (LP: #1676635)
    - hv: allocate synic pages for all present CPUs
    - hv: init percpu_list in hv_synic_alloc()
    - Drivers: hv: vmbus: Prevent sending data on a rescinded channel
    - hv: switch to cpuhp state machine for synic init/cleanup
    - hv: make CPU offlining prevention fine-grained
    - Drivers: hv: vmbus: Fix a rescind handling bug
    - Drivers: hv: util: kvp: Fix a rescind processing issue
    - Drivers: hv: util: Fcopy: Fix a rescind processing issue
    - Drivers: hv: util: Backup: Fix a rescind processing issue
    - Drivers: hv: vmbus: Move the definition of hv_x64_msr_hypercall_contents
    - Drivers: hv: vmbus: Move the definition of generate_guest_id()
    - Revert "UBUNTU: SAUCE: (no-up) hv: Supply vendor ID and package ABI"
    - Drivers: hv vmbus: Move Hypercall page setup out of common code
    - Drivers: hv: vmbus: Move Hypercall invocation code out of common code
    - Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code
    - Drivers: hv: vmbus: Move the extracting of Hypervisor version information
    - Drivers: hv: vmbus: Move the crash notification function
    - Drivers: hv: vmbus: Move the check for hypercall page setup
    - Drivers: hv: vmbus: Move the code to signal end of message
    - Drivers: hv: vmbus: Restructure the clockevents code
    - Drivers: hv: util: Use hv_get_current_tick() to get current tick
    - Drivers: hv: vmbus: Get rid of an unsused variable
    - Drivers: hv: vmbus: Define APIs to manipulate the message page
    - Drivers: hv: vmbus: Define APIs to manipulate the event page
    - Drivers: hv: vmbus: Define APIs to manipulate the synthetic interrupt
      controller
    - Drivers: hv: vmbus: Define an API to retrieve virtual processor index
    - Drivers: hv: vmbus: Define an APIs to manage interrupt state
    - Drivers: hv: vmbus: Cleanup hyperv_vmbus.h
    - hv_util: switch to using timespec64
    - Drivers: hv: restore hypervcall page cleanup before kexec
    - Drivers: hv: restore TSC page cleanup before kexec
    - Drivers: hv: balloon: add a fall through comment to hv_memory_notifier()
    - Drivers: hv: vmbus: Use all supported IC versions to negotiate
    - Drivers: hv: Log the negotiated IC versions.
    - Drivers: hv: Fix the bug in generating the guest ID
    - hv: export current Hyper-V clocksource
    - hv_utils: implement Hyper-V PTP source
    - SAUCE: (no-up) hv: Supply vendor ID and package ABI

  * CIFS: Enable encryption for SMB3 (LP: #1670508)
    - SMB3: Add mount parameter to allow user to override max credits
    - SMB2: Separate Kerberos authentication from SMB2_sess_setup
    - SMB2: Separate RawNTLMSSP authentication from SMB2_sess_setup
    - SMB3: parsing for new snapshot timestamp mount parm
    - cifs: Simplify SMB...

Read more...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Paul Gear (paulgear) wrote :

Azure timesync is preventing a number of our production services from getting accurate timesync; when is the xenial SRU likely to be available?

Joshua R. Poulson (jrp) wrote :

The interaction of systemd, timesync, and unattended-upgrades is preventing security updates. The problem is alleviated with the PTP clock source.

Changed in linux (Ubuntu Xenial):
importance: Medium → High
Joseph Salisbury (jsalisbury) wrote :

I built a Xenial test kernel with commit 3716a49a81. It required the same 39 prereq commits as the Yakkety kernel. However, Xenial also required an additional 14 prereq commits.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1676635/

Can this kernel be tested to see if it resolves this bug?

Thanks in advance!

Marcelo Cerri (mhcerri) on 2017-06-26
Changed in linux (Ubuntu Yakkety):
assignee: Tim Gardner (timg-tpi) → nobody
Alex Ng (alexng-v) wrote :

Please ignore my last comment #13, I intended it for another bug.

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Marcelo Cerri (mhcerri) on 2017-07-04
Changed in linux (Ubuntu Xenial):
assignee: Joseph Salisbury (jsalisbury) → Marcelo Cerri (mhcerri)
Marcelo Cerri (mhcerri) wrote :

Alex and Joshua,

We have now this bug fixed in the xenial kernel in -proposed. Can you test it and confirm the issue is fixed?

You can follow the instructions in this link[1] in order to enable -proposed on your system and install the new kernel from there.

[1] https://wiki.ubuntu.com/Testing/EnableProposed

Thank you.

Alex Ng (alexng-v) wrote :

Hi Marcelo,

I kicked the tires on 4.4.0-85-generic kernel from xenial-proposed. The fixes look good. I see the PTP device and TimeSync is not causing "Time has been changed" messages in systemd. I also see that apt-daily timer is no longer being randomly delayed due to the clock changes.

Brad Figg (brad-figg) on 2017-07-07
tags: added: verification-done-xenial
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.4.0-87.110

---------------
linux (4.4.0-87.110) xenial; urgency=low

  * linux: 4.4.0-87.110 -proposed tracker (LP: #1704982)

  * CVE-2017-1000364
    - mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
    - mm/mmap.c: expand_downwards: don't require the gap if !vm_prev

  * CIFS causes oops (LP: #1704857)
    - CIFS: Fix null pointer deref during read resp processing
    - CIFS: Fix some return values in case of error in 'crypt_message'

 -- Kleber Sacilotto de Souza <email address hidden> Tue, 18 Jul 2017 13:58:43 +0200

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers