Bug #1719045 “[Hyper-V] Docker failures with linux-azure 4.11.0-...” : Bugs : linux-azure package : Ubuntu

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-09-23:

#1

https://github.com/golang/go/blob/964639cc338db650ccadeafb7424bc8ebb2c0f6c/src/runtime/mem_linux.go has the message around the line 138:

                if uintptr(v)&(physPageSize-1) != 0 || n&(physPageSize-1) != 0 {
                                // madvise will round this to any physical page
                                // *covered* by this range, so an unaligned madvise
                                // will release more memory than intended.
                                throw("unaligned sysUnused")

Changed in linux-azure (Ubuntu):
status:	New → Confirmed

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-09-23:

#2

https://golang.org/src/runtime/mgcwork.go around line 306

  // workbuf factory routines. These funcs are used to manage the
  // workbufs.
  // If the GC asks for some work these are the only routines that
  // make wbufs available to the GC.

  func (b *workbuf) checknonempty() {
   if b.nobj == 0 {
    throw("workbuf is empty")
   }
  }

  func (b *workbuf) checkempty() {
   if b.nobj != 0 {
    throw("workbuf is not empty")
   }
  }

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-09-23:

#3

Trying to build the go tests... and I am having trouble building it:
$ GOOS=linux GOARCH=amd64 ./bootstrap.bash
#### Copying to ../../go-linux-amd64-bootstrap

#### Cleaning ../../go-linux-amd64-bootstrap

#### Building ../../go-linux-amd64-bootstrap

##### Building Go bootstrap tool.
cmd/dist

##### Building Go toolchain using /usr/lib/go-1.6.
bootstrap/cmd/internal/src
bootstrap/cmd/internal/objabi
bootstrap/cmd/internal/sys
bootstrap/cmd/asm/internal/flags
bootstrap/cmd/internal/dwarf
bootstrap/cmd/internal/bio
bootstrap/math/bits
bootstrap/math/big
bootstrap/cmd/compile/internal/syntax
bootstrap/cmd/internal/gcprog
bootstrap/debug/pe
bootstrap/cmd/asm/internal/lex
bootstrap/cmd/internal/obj
bootstrap/cmd/link/internal/ld
bootstrap/cmd/internal/obj/arm
bootstrap/cmd/internal/obj/arm64
bootstrap/cmd/internal/obj/mips
bootstrap/cmd/internal/obj/ppc64
bootstrap/cmd/internal/obj/s390x
bootstrap/cmd/internal/obj/x86
bootstrap/cmd/compile/internal/types
bootstrap/cmd/asm/internal/arch
bootstrap/cmd/compile/internal/ssa
bootstrap/cmd/asm/internal/asm
bootstrap/cmd/asm
bootstrap/cmd/link/internal/amd64
bootstrap/cmd/link/internal/arm
bootstrap/cmd/link/internal/arm64
bootstrap/cmd/link/internal/mips
bootstrap/cmd/link/internal/mips64
bootstrap/cmd/link/internal/ppc64
bootstrap/cmd/link/internal/s390x
bootstrap/cmd/link/internal/x86
bootstrap/cmd/link
bootstrap/cmd/compile/internal/gc
bootstrap/cmd/compile/internal/amd64
bootstrap/cmd/compile/internal/arm
bootstrap/cmd/compile/internal/arm64
bootstrap/cmd/compile/internal/mips
bootstrap/cmd/compile/internal/mips64
bootstrap/cmd/compile/internal/ppc64
bootstrap/cmd/compile/internal/s390x
bootstrap/cmd/compile/internal/x86
bootstrap/cmd/compile
go tool dist: FAILED: /usr/lib/go-1.6/bin/go install -gcflags=-l -tags=math_big_pure_go -v bootstrap/cmd/...: signal: segmentation fault (core dumped)

Revision history for this message

Dexuan Cui (decui) wrote on 2017-09-25:

#4

Let me add 2 more symptoms:

1) dockerd can exit due to SIGSEGV. The issue does NOT always happen: after such a failure, sometimes a retry can succeed.

# systemctl start docker; systemctl status docker
Job for docker.service failed because a fatal signal was delivered to the control process. See "systemctl status docker.service" and "journalctl -xe" for details.
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: failed (Result: signal) since Mon 2017-09-25 23:22:04 UTC; 4ms ago
     Docs: https://docs.docker.com
  Process: 24853 ExecStart=/usr/bin/dockerd -H fd:// $DOCKER_OPTS (code=killed, signal=SEGV)
Main PID: 24853 (code=killed, signal=SEGV)

Sep 25 23:22:02 decui-u-nested systemd[1]: Starting Docker Application Container Engine...
Sep 25 23:22:02 decui-u-nested dockerd[24853]: time="2017-09-25T23:22:02.027039570Z" level=info msg="libcontainerd: new containerd process, pid: 24867"
Sep 25 23:22:04 decui-u-nested systemd[1]: docker.service: Main process exited, code=killed, status=11/SEGV
Sep 25 23:22:04 decui-u-nested systemd[1]: Failed to start Docker Application Container Engine.
Sep 25 23:22:04 decui-u-nested systemd[1]: docker.service: Unit entered failed state.
Sep 25 23:22:04 decui-u-nested systemd[1]: docker.service: Failed with result 'signal'.

2) "code=exited, status=2"

# systemctl start docker; systemctl status docker

Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2017-09-25 23:32:37 UTC; 5ms ago
     Docs: https://docs.docker.com
  Process: 26237 ExecStart=/usr/bin/dockerd -H fd:// $DOCKER_OPTS (code=exited, status=2)
Main PID: 26237 (code=exited, status=2)

Sep 25 23:32:37 decui-u-nested dockerd[26237]: runtime.gcBgMarkWorker(0xc82001e000)
Sep 25 23:32:37 decui-u-nested dockerd[26237]: /usr/lib/go-1.6/src/runtime/mgc.go:1433 +0x25f fp=0xc820483fb8 sp=0xc820483f40
Sep 25 23:32:37 decui-u-nested dockerd[26237]: runtime.goexit()
Sep 25 23:32:37 decui-u-nested dockerd[26237]: /usr/lib/go-1.6/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc820483fc0 sp=0xc820483fb8
Sep 25 23:32:37 decui-u-nested dockerd[26237]: created by runtime.gcBgMarkStartWorkers
Sep 25 23:32:37 decui-u-nested dockerd[26237]: /usr/lib/go-1.6/src/runtime/mgc.go:1330 +0x92
Sep 25 23:32:37 decui-u-nested systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 25 23:32:37 decui-u-nested systemd[1]: Failed to start Docker Application Container Engine.
Sep 25 23:32:37 decui-u-nested systemd[1]: docker.service: Unit entered failed state.
Sep 25 23:32:37 decui-u-nested systemd[1]: docker.service: Failed with result 'exit-code'.

It looks the issues are more likely caused by dockerd itself.

Let me add 2 more symptoms:

1) dockerd can exit due to SIGSEGV. The issue does NOT always happen: after such a failure, sometimes a retry can succeed.

# systemctl start docker; systemctl status docker
Job for docker.service failed because a fatal signal was delivered to the control process. See "systemctl status docker.service" and "journalctl -xe" for details.
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: failed (Result: signal) since Mon 2017-09-25 23:22:04 UTC; 4ms ago
     Docs: https://docs.docker.com
  Process: 24853 ExecStart=/usr/bin/dockerd -H fd:// $DOCKER_OPTS (code=killed, signal=SEGV)
 Main PID: 24853 (code=killed, signal=SEGV)

Sep 25 23:22:02 decui-u-nested systemd[1]: Starting Docker Application Container Engine...
Sep 25 23:22:02 decui-u-nested dockerd[24853]: time="2017-09-25T23:22:02.027039570Z" level=info msg="libcontainerd: new containerd process, pid: 24867"
Sep 25 23:22:04 decui-u-nested systemd[1]: docker.service: Main process exited, code=killed, status=11/SEGV
Sep 25 23:22:04 decui-u-nested systemd[1]: Failed to start Docker Application Container Engine.
Sep 25 23:22:04 decui-u-nested systemd[1]: docker.service: Unit entered failed state.
Sep 25 23:22:04 decui-u-nested systemd[1]: docker.service: Failed with result 'signal'.

2) "code=exited, status=2"

# systemctl start docker; systemctl status docker

Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2017-09-25 23:32:37 UTC; 5ms ago
     Docs: https://docs.docker.com
  Process: 26237 ExecStart=/usr/bin/dockerd -H fd:// $DOCKER_OPTS (code=exited, status=2)
 Main PID: 26237 (code=exited, status=2)

Sep 25 23:32:37 decui-u-nested dockerd[26237]: runtime.gcBgMarkWorker(0xc82001e000)
Sep 25 23:32:37 decui-u-nested dockerd[26237]:         /usr/lib/go-1.6/src/runtime/mgc.go:1433 +0x25f fp=0xc820483fb8 sp=0xc820483f40
Sep 25 23:32:37 decui-u-nested dockerd[26237]: runtime.goexit()
Sep 25 23:32:37 decui-u-nested dockerd[26237]:         /usr/lib/go-1.6/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc820483fc0 sp=0xc820483fb8
Sep 25 23:32:37 decui-u-nested dockerd[26237]: created by runtime.gcBgMarkStartWorkers
Sep 25 23:32:37 decui-u-nested dockerd[26237]:         /usr/lib/go-1.6/src/runtime/mgc.go:1330 +0x92
Sep 25 23:32:37 decui-u-nested systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 25 23:32:37 decui-u-nested systemd[1]: Failed to start Docker Application Container Engine.
Sep 25 23:32:37 decui-u-nested systemd[1]: docker.service: Unit entered failed state.
Sep 25 23:32:37 decui-u-nested systemd[1]: docker.service: Failed with result 'exit-code'.

It looks the issues are more likely caused by dockerd itself.

Revision history for this message

Dexuan Cui (decui) wrote on 2017-09-25:

#5

BTW, issue #2 in the last comment doesn't always happen either. Sometimes a retry would succeed.

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-09-26:

#6

I'm also getting a report that if everything is recompiled with golang-1.9 this does not reproduce.

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-09-26:

#7

If I install golang-1.4 and golang-1.9 and remove golang-1.6 I can build go from source (1.4 is needed for the bootstrapping) and everything runs without error. If I have golang-1.6 installed, I get segfaults.

Marcelo Cerri (mhcerri) on 2017-09-26

Changed in linux-azure (Ubuntu):
assignee:	nobody → Marcelo Cerri (mhcerri)
importance:	Undecided → High

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-09-26:

#8

Does it happen in some specific instance? I restarted docker more than 50 times without any problem in a Standard DS15 v2 instance (20 vcpus, 140 GB memory) with the linux-azure kernel.

Revision history for this message

Dexuan Cui (decui) wrote on 2017-09-27:

#9

I got the issue (comment #4) with Standard_D16_v3 (16 vcpus, 64GB memory).

Revision history for this message

Alexander Yukhanov (a-yukhanov) wrote on 2017-09-27:

#10

I easily (~25% docker installation fails) reproduce the issue on NC6 (gpu) vm.

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-09-27:

#11

It certainly seems to happen much more easily on GPU-based instances.

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-09-27:

#12

I would test with SR-IOV as well, I'm wondering if this is related to PCI passthrough.

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-09-27:

#13

I was able to reproduce it on a Standard_D16_v3 instance when I create some containers that are configured to restart with dockerd. It doesn't happen very often though, and sometimes I can't reproduce it for some time. I will try to test it on a GPU-based instance.

Revision history for this message

Jason Greathouse (jgreat) wrote on 2017-09-28:

#14

I'm seeing this pretty often on Standard_D2s_v3 instances. About 1/3 of my new vm instances are failing. Please let me know if I can provide logs or testing.

Revision history for this message

Michal Koziorowski (mkozioro) wrote on 2017-09-28:

#15

syslog Edit (850.2 KiB, text/plain)

On all my azure machines using Standard_DS2_v2 - North Europe telegraf crashed with that error. I'm not using docker.

root@frontvm-fas427000000:/# uname -a
Linux frontvm-fas427000000 4.11.0-1011-azure #11-Ubuntu SMP Tue Sep 19 19:03:54 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Telegraf: 1.4.0-1

Telegraf is working properly on instances with non azure kernel.

I'm attaching syslog with all information.

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-09-28:

#16

The 4.11 base kernel for linux-azure doesn't show the problem and the same is true for linux-azure-edge (linux-azure-edge causes the failure while it's 4.13 base kernel doesn't).

I'm bisecting the delta between them to identify the commit that introduced the problem.

Changed in linux-azure (Ubuntu):
importance:	High → Critical

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-09-28:

#17

It's interesting that telegraf is also made with golang.

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-09-28:

#18

Yes. Probably something in golang runtime is triggering the bug.

The last good version is 4.11.0-1004.4 and the problem still happens with 4.11.0-1005.5. 1005 comprehends just Bug #1707285, which introduced several changes. I'm restricting the bisect to it.

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-09-28:

#19

Reverting the following commits seems to avoid the problem:

47755dae5ca9 UBUNTU: SAUCE: tracing/hyper-v: trace hyperv_mmu_flush_tlb_others()
0aa1395238f7 UBUNTU: SAUCE: x86/hyper-v: support extended CPU ranges for TLB flush hypercalls
5e61aa7ca654 UBUNTU: SAUCE: x86/hyper-v: use hypercall for remote TLB flush

Josh, can you confirm if that works? I built the latest linux-azure with those patches reverted and placed the deb packages at:

http://kernel.ubuntu.com/~mhcerri/azure/golang/002/

The upstream versions of those commits are significantly different. I will backport them and check if the problem still happens.

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-09-28:

#20

We will check out the kernel as soon as possible. Thanks!

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-09-28:

#21

Backporting the changes from upstream causes the problem to happen. I'm building the mainline kernel to test it.

Changed in linux-azure (Ubuntu):
status:	Confirmed → In Progress

Marcelo Cerri (mhcerri) on 2017-09-28

Changed in linux-azure (Ubuntu):
importance:	Critical → High

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-09-29:

#22

Mainline doesn't cause docker to fail. I will start to check if linux-azure has some issue in TBL or in the helpers for hyper-v hypercalls.

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-09-29:

#23

Also, the problem only happens when using HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX/LIST_EX hypercalls.

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-09-29:

#24

The test kernel seems to be pretty good so far.

Here are the tests we’ve done so far in Azure
1. InfiniBand Tests – PASSING.
2. GPU Tests – PASSING.
3. GPU Tests on RDMA loaded VM (NC24r) – PASSING.
4. Docker Tests – PASSING
a. Note that, we’re using official script to install and verify docker –
https://get.docker.com/
b. Verified Docker version - 17.09.0-ce
c. GoLang version - go1.8.3

Docker tests for following scenarios are also done –
1. SRIOV enabled VMs.
2. InfiniBand enabled VMs.
3. Nvidia GPU driver loaded VMs.
No issues found so far.

Also a Docker loop test for 100 Standard_D2s_v3 VMs.
Current progress – 10%. No issues so far.

Our basic validation tests also all pass.

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-09-29:

#25

Thanks for all the testing, Josh. I reverted the paravirtualized TLB flushing patches in the test kernel. Do you think we should spin a new kernel without it while we try to find the main cause of the problem?

I enabled the tracepoint available in arch/x86/hyperv/mmu.c for both the mainline and linux-azure kernel and I got some interesting information. The mainline kernel never does a call to flush_tlb_others passing TLB_FLUSH_ALL while in the 4.11 and 4.13 linux-azure kernels that is done very ofter.

I'm attaching the tracing files for both kernels. You can check that TLB_FLUSH_ALL is given to flush_tlb_others when `end` is equal to "ffffffffffffffff" (-1ULL).

Also if I force hyperv_flush_tlb_others_ex() to do a native flush when end is equal to TLB_FLUSH_ALL the problem does not occur. That is another alternative for a temporary fix.

I believe the mainline kernel is carrying the same bug as linux-azure but the problematic path (end == TLB_FLUSH_ALL) is not being executed.

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-09-29:

#26

trace_linux-azure-4.11.0-1011.11 Edit (91.7 KiB, text/plain)

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-09-29:

#27

trace_mainline.txt Edit (32.2 KiB, text/plain)

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-09-29:

#28

Josh, is there any documentation available for the Hyper-V hypercalls?

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-09-29:

#29

Yes, let's spin one into proposed with KVM enabled and these patches reverted for SRU.

You might be interested in Vitaly's upstream submission of tracing today.... we'll work that into a future PR. As far as Hypercalls go, I can always take questions back to the team for detailed answers.

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-09-29:

#30

You may also not be surprised to see this in a run I did with upstream 4.14-rc2:

[ 27.274499] general protection fault: 0000 [#1] SMP
[ 27.274499] Modules linked in: mlx4_en mlx4_core devlink sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel hv_utils pcbc hyperv_keyboard ptp snd_pcm aesni_intel hyperv_fb pps_core snd_timer sg aes_x86_64 pci_hyperv serio_raw hv_balloon crypto_simd glue_helper cryptd snd soundcore pcspkr evdev joydev button intel_rapl_perf ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto sd_mod hid_generic sr_mod cdrom ata_generic ata_piix libata hv_storvsc hid_hyperv scsi_transport_fc hv_netvsc hid crc32c_intel psmouse scsi_mod i2c_piix4 hv_vmbus floppy
[ 27.274499] CPU: 8 PID: 1030 Comm: ntpd Not tainted 4.14.0-rc2 #1
[ 27.274499] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 01/06/2017
[ 27.274499] task: ffff9bf72f3aa0c0 task.stack: ffffa99322e78000
[ 27.274499] RIP: 0010:0xffffa9930c489000
[ 27.274499] RSP: 0000:ffffa99322e7bc60 EFLAGS: 00000046
[ 27.274499] RAX: 0000000000000000 RBX: ffffc9817f803000 RCX: 0000000100060014
[ 27.274499] RDX: 00002dadff803000 RSI: 0000000100060014 RDI: 0000000000000002
[ 27.274499] RBP: ffffa99322e7bca8 R08: 0000000000000000 R09: 00007f60f0f47000
[ 27.274499] R10: ffffc9817f803020 R11: 0000000000000000 R12: 0000000100060014
[ 27.274499] R13: ffffa99322e7bcb8 R14: 0000000000000001 R15: ffff9be5ae658540
[ 27.274499] FS: 00007f60f2900700(0000) GS:ffff9be5b9000000(0000) knlGS:0000000000000000
[ 27.274499] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 27.274499] CR2: 00007f60f0f46e28 CR3: 000000121c34c000 CR4: 00000000001406e0
[ 27.274499] Call Trace:
[ 27.274499] ? hyperv_flush_tlb_others_ex+0x1b0/0x440
[ 27.274499] flush_tlb_mm_range+0xad/0x110
[ 27.274499] ptep_clear_flush+0x39/0x40
[ 27.274499] wp_page_copy+0x27f/0x630
[ 27.274499] do_wp_page+0x151/0x4f0
[ 27.274499] __handle_mm_fault+0x80c/0x1140
[ 27.274499] ? do_mmap+0x445/0x510
[ 27.274499] handle_mm_fault+0xce/0x1c0
[ 27.274499] __do_page_fault+0x23e/0x4e0
[ 27.274499] do_page_fault+0x2e/0xd0
[ 27.274499] page_fault+0x22/0x30
[ 27.274499] RIP: 0033:0x7f60f26f4f1b
[ 27.274499] RSP: 002b:00007f60f28fed50 EFLAGS: 00010202
[ 27.274499] RAX: 00007f60f0f46e20 RBX: 00007f60ec000ac0 RCX: 0000000000000028
[ 27.274499] RDX: 00007f60f0f46f60 RSI: 00007f60ec000b00 RDI: 000000006fffffff
[ 27.274499] RBP: 00007f60f28fee90 R08: 00007f60f0d42000 R09: 0000000070000021
[ 27.274499] R10: 0000000000000031 R11: 000000006ffffdff R12: 00007f60f28fef78
[ 27.274499] R13: 000000037ffff1a0 R14: 0000000000000801 R15: 00007f60f28fef30
[ 27.274499] Code: <0f> 01 c1 c3 8b c8 b8 11 00 00 00 0f 01 c1 c3 48 8b c1 48 c7 c1 11
[ 27.274499] RIP: 0xffffa9930c489000 RSP: ffffa99322e7bc60
[ 27.274499] ---[ end trace f506aa4e20966c27 ]---

You may also not be surprised to see this in a run I did with upstream 4.14-rc2:

[   27.274499] general protection fault: 0000 [#1] SMP
[   27.274499] Modules linked in: mlx4_en mlx4_core devlink sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel hv_utils pcbc hyperv_keyboard ptp snd_pcm aesni_intel hyperv_fb pps_core snd_timer sg aes_x86_64 pci_hyperv serio_raw hv_balloon crypto_simd glue_helper cryptd snd soundcore pcspkr evdev joydev button intel_rapl_perf ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto sd_mod hid_generic sr_mod cdrom ata_generic ata_piix libata hv_storvsc hid_hyperv scsi_transport_fc hv_netvsc hid crc32c_intel psmouse scsi_mod i2c_piix4 hv_vmbus floppy
[   27.274499] CPU: 8 PID: 1030 Comm: ntpd Not tainted 4.14.0-rc2 #1
[   27.274499] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  01/06/2017
[   27.274499] task: ffff9bf72f3aa0c0 task.stack: ffffa99322e78000
[   27.274499] RIP: 0010:0xffffa9930c489000
[   27.274499] RSP: 0000:ffffa99322e7bc60 EFLAGS: 00000046
[   27.274499] RAX: 0000000000000000 RBX: ffffc9817f803000 RCX: 0000000100060014
[   27.274499] RDX: 00002dadff803000 RSI: 0000000100060014 RDI: 0000000000000002
[   27.274499] RBP: ffffa99322e7bca8 R08: 0000000000000000 R09: 00007f60f0f47000
[   27.274499] R10: ffffc9817f803020 R11: 0000000000000000 R12: 0000000100060014
[   27.274499] R13: ffffa99322e7bcb8 R14: 0000000000000001 R15: ffff9be5ae658540
[   27.274499] FS:  00007f60f2900700(0000) GS:ffff9be5b9000000(0000) knlGS:0000000000000000
[   27.274499] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.274499] CR2: 00007f60f0f46e28 CR3: 000000121c34c000 CR4: 00000000001406e0
[   27.274499] Call Trace:
[   27.274499]  ? hyperv_flush_tlb_others_ex+0x1b0/0x440
[   27.274499]  flush_tlb_mm_range+0xad/0x110
[   27.274499]  ptep_clear_flush+0x39/0x40
[   27.274499]  wp_page_copy+0x27f/0x630
[   27.274499]  do_wp_page+0x151/0x4f0
[   27.274499]  __handle_mm_fault+0x80c/0x1140
[   27.274499]  ? do_mmap+0x445/0x510
[   27.274499]  handle_mm_fault+0xce/0x1c0
[   27.274499]  __do_page_fault+0x23e/0x4e0
[   27.274499]  do_page_fault+0x2e/0xd0
[   27.274499]  page_fault+0x22/0x30
[   27.274499] RIP: 0033:0x7f60f26f4f1b
[   27.274499] RSP: 002b:00007f60f28fed50 EFLAGS: 00010202
[   27.274499] RAX: 00007f60f0f46e20 RBX: 00007f60ec000ac0 RCX: 0000000000000028
[   27.274499] RDX: 00007f60f0f46f60 RSI: 00007f60ec000b00 RDI: 000000006fffffff
[   27.274499] RBP: 00007f60f28fee90 R08: 00007f60f0d42000 R09: 0000000070000021
[   27.274499] R10: 0000000000000031 R11: 000000006ffffdff R12: 00007f60f28fef78
[   27.274499] R13: 000000037ffff1a0 R14: 0000000000000801 R15: 00007f60f28fef30
[   27.274499] Code: <0f> 01 c1 c3 8b c8 b8 11 00 00 00 0f 01 c1 c3 48 8b c1 48 c7 c1 11 
[   27.274499] RIP: 0xffffa9930c489000 RSP: ffffa99322e7bc60
[   27.274499] ---[ end trace f506aa4e20966c27 ]---

Revision history for this message

Dexuan Cui (decui) wrote on 2017-09-29:

#31

@mhcerri: I believe the hypercalls are documented in "Hypervisor Top Level Functional Specification v5.0b.pdf" (https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs).

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-10-02:

#32

https://lists.ubuntu.com/archives/kernel-team/2017-October/087336.html

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-10-02:

#33

Including the rationale to the commit messages:

https://lists.ubuntu.com/archives/kernel-team/2017-October/087342.html

Marcelo Cerri (mhcerri) on 2017-10-02

Changed in linux-azure (Ubuntu Xenial):
status:	New → In Progress
importance:	Undecided → High
assignee:	nobody → Marcelo Cerri (mhcerri)
status:	In Progress → Fix Committed
Changed in linux-azure (Ubuntu):
status:	In Progress → Fix Committed

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-10-03:

#34

Thanks, Dexuan.

Reading the specification I believe I found the problem. The original implementation is assuming the entire hv_vp_set structure is variable and it's passing `nr_bank + 2` as variable header size for hv_do_rep_hypercall().

However the specification states:

"Since the fixed header size is implicit, instead of supplying the total header size, only the variable portion is supplied in the input controls."

Based on that, the first two fields in hv_vp_set should not be considered for the variable header size and `nr_bank` should be used instead of `nr_bank + 2`.

I'm attaching the backport for the 3 upstream commits plus a forth patch with that fix. I also built a kernel with those patches for testing purposes and it can be downloaded from:

http://kernel.ubuntu.com/~mhcerri/azure/lp1719045-fix/

I was able to restart dockerd with this kernel hundred of times without any problem.

Let me know if the solution seems reasonable for you. Once that is tested I will submit the fix upstream.

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-10-03:

#35

0001-x86-hyper-v-Use-hypercall-for-remote-TLB-flush.patch Edit (8.8 KiB, text/plain)

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-10-03:

#36

0002-x86-hyper-v-Support-extended-CPU-ranges-for-TLB-flus.patch Edit (7.3 KiB, text/plain)

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-10-03:

#37

0003-tracing-hyper-v-Trace-hyperv_mmu_flush_tlb_others.patch Edit (4.2 KiB, text/plain)

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-10-03:

#38

0004-UBUNTU-SAUCE-x86-hyper-v-fix-hypercalls-with-extende.patch Edit (1.9 KiB, text/plain)

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-10-03:

#39

The kernel with the reverted changes is already in the pipeline to be published to the archive. So we can test this fix at a normal pace.

Ubuntu Foundations Team Bug Bot (crichton) on 2017-10-03

tags:

added: patch

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-10-03:

#40

linux-azure 4.11.0.1013.13 (with the reverted changes) just landed in -proposed. Can you please confirm the issue does not happen with this version?

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-10-03:

#41

Yes, I've already kicked off the call for testing with the local teams and people experiencing the issue.

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-10-03:

#42

Early results from @Chris

The proposed kernel looks good from our BVT testing on Hyper-V.
Besides the BVT, @Ionut also has done the following scenarios:
- Compiled go v1.6 and telegraf v1.2 from sources
- Compiled go v.18.3 and telegraf v1.5 from sources.

Both series worked fine, with no seg faults errors.

BVT covers:
- Core tests and time sync
- Networking tests
- Memory – DM & Runtime Memory results

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-10-05:

#43

Later results also look good. IB, BVT, Deployment, Perf, GPU, LTP, kdump, Docker, and Telegraf tests all passed. Please publish to updates.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-10-09:

#44

This bug was fixed in the package linux-azure - 4.11.0-1013.13

---------------
linux-azure (4.11.0-1013.13) xenial; urgency=low

* linux-azure: 4.11.0-1013.13 -proposed tracker (LP: #1720818)

  * [Hyper-V] Docker failures with linux-azure 4.11.0-1011 (LP: #1719045)
    - Revert "UBUNTU: SAUCE: tracing/hyper-v: trace hyperv_mmu_flush_tlb_others()"
    - Revert "UBUNTU: SAUCE: x86/hyper-v: support extended CPU ranges for TLB
      flush hypercalls"
    - Revert "UBUNTU: SAUCE: x86/hyper-v: use hypercall for remote TLB flush"

linux-azure (4.11.0-1012.12) xenial; urgency=low

* linux-azure: 4.11.0-1012.12 -proposed tracker (LP: #1718763)

* linux-azure: KVM nested virtualization is disabled (LP: #1718740)
- [Config] azure: enable KVM support

-- Marcelo Henrique Cerri <email address hidden> Mon, 02 Oct 2017 12:20:26 -0300

Changed in linux-azure (Ubuntu Xenial):
status:	Fix Committed → Fix Released

Revision history for this message

Marcelo Cerri (mhcerri) wrote on 2017-10-17:

#45

@jrp, should we include the following upstream fixes into the azure kernel?

ab7ff471aa5d x86/hyperv: Fix hypercalls with extended CPU ranges for TLB flushing
60d73a7c9660 x86/hyperv: Don't use percpu areas for pcpu_flush/pcpu_flush_ex structures
a3b742439292 x86/hyperv: Clear vCPU banks between calls to avoid flushing unneeded vCPUs

I can build a test kernel with those changes if you prefer.

Revision history for this message

Joshua R. Poulson (jrp) wrote on 2017-10-17:

#46

@mhcerri Yes, we should queue these up, but we'd like to switch our focus to 4.13 soon.

Thanks, --jrp

Ubuntu
linux-azure package

[Hyper-V] Docker failures with linux-azure 4.11.0-1011

Bug Description

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	linux-azure (Ubuntu)	Fix Committed	High	Marcelo Cerri
	Xenial	Fix Released	High	Marcelo Cerri

Ubuntulinux-azure package

[Hyper-V] Docker failures with linux-azure 4.11.0-1011

Bug Description

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntu
linux-azure package