Activity log for bug #1792195

Date Who What changed Old value New value Message
2018-09-12 17:49:13 bugproxy bug added bug
2018-09-12 17:59:31 bugproxy tags architecture-ppc64le bugnameltc-171273 severity-high targetmilestone-inin1804
2018-09-13 00:18:57 Ubuntu Foundations Team Bug Bot tags architecture-ppc64le bugnameltc-171273 severity-high targetmilestone-inin1804 architecture-ppc64le bot-comment bugnameltc-171273 severity-high targetmilestone-inin1804
2018-09-13 15:08:08 Brian Murray affects ubuntu linux (Ubuntu)
2018-09-17 08:16:48 Joseph Salisbury linux (Ubuntu): importance Undecided Medium
2018-09-17 08:16:52 Joseph Salisbury linux (Ubuntu): status New Incomplete
2018-09-26 17:06:42 Manoj Iyer bug task added ubuntu-power-systems
2018-09-26 17:07:02 Manoj Iyer ubuntu-power-systems: assignee Canonical Kernel Team (canonical-kernel-team)
2018-09-26 17:07:17 Manoj Iyer linux (Ubuntu): assignee Canonical Kernel Team (canonical-kernel-team)
2018-09-26 17:07:21 Manoj Iyer ubuntu-power-systems: importance Undecided High
2018-09-26 18:21:01 Joseph Salisbury linux (Ubuntu): status Incomplete Triaged
2018-09-26 18:22:15 Joseph Salisbury linux (Ubuntu): assignee Canonical Kernel Team (canonical-kernel-team) Joseph Salisbury (jsalisbury)
2018-10-01 13:38:38 Manoj Iyer ubuntu-power-systems: status New Triaged
2018-10-01 18:13:23 Joseph Salisbury linux (Ubuntu): status Triaged In Progress
2018-10-01 18:15:47 Joseph Salisbury nominated for series Ubuntu Bionic
2018-10-01 18:15:47 Joseph Salisbury bug task added linux (Ubuntu Bionic)
2018-10-01 18:15:55 Joseph Salisbury linux (Ubuntu Bionic): status New In Progress
2018-10-01 18:15:59 Joseph Salisbury linux (Ubuntu Bionic): importance Undecided Medium
2018-10-01 18:16:03 Joseph Salisbury linux (Ubuntu Bionic): assignee Joseph Salisbury (jsalisbury)
2018-10-08 14:32:39 Frank Heimes ubuntu-power-systems: status Triaged Incomplete
2018-10-10 07:39:48 bugproxy attachment added backported dd1 plus patch https://bugs.launchpad.net/bugs/1792195/+attachment/5199401/+files/lp1792195-removedd1-fix-dd2.2.patch
2018-10-10 09:26:45 Andrew Cloke ubuntu-power-systems: status Incomplete In Progress
2018-10-10 13:38:55 Andrew Cloke ubuntu-power-systems: importance High Critical
2018-10-10 13:38:57 Manoj Iyer linux (Ubuntu): importance Medium Critical
2018-10-10 13:38:58 Manoj Iyer linux (Ubuntu Bionic): importance Medium Critical
2018-10-11 16:21:20 Joseph Salisbury nominated for series Ubuntu Cosmic
2018-10-11 16:21:20 Joseph Salisbury bug task added linux (Ubuntu Cosmic)
2018-10-11 16:21:56 Joseph Salisbury description -- Problem Description -- GPFS mmfsd daemon is mapping shared tracing buffer(allocated from kernel driver using vmalloc) and then writing trace records from user space threads in parallel. While the SIGBUS happened, the access virtual memory address is in the mapped range, no overflow on access. Worked with Benjamin Herrenschmidt on GPFS tracing kernel driver code and he made a suggestion as workaround on the driver code to bypass the problem, and it works.... the workaround code change as below: - rc = remap_pfn_range(vma, start, pfn, PAGE_SIZE, PAGE_SHARED); + rc = remap_pfn_range(vma, start, pfn, PAGE_SIZE, __pgprot(pgprot_val(PAGE_SHARED)|_PAGE_DIRTY); As Benjamin mentioned, this is a Linux kernel bug and this is just a workaround. He will give the details about the kernel bug and why this workaround works.... The root cause is that for PTEs created by a driver at mmap time (ie, that aren't created dynamically at fault time), it's not legit for ptep_set_access_flags() to make them invalid even temporarily. A concurrent access while they are invalid will be unable to service the page fault and will cause as SIGBUS. Thankfully such PTEs shouldn't normally be the subject of a RO->RW privilege escalation. What happens is that the GPFS driver creates the PTEs using remap_pfn_range(...,PAGE_SHARED). PAGE_SHARED has _PAGE_ACCESSED (R) but not _PAGE_DIRTY (C) set. Thus on the first write, we try set C and while doing so, hit the above workaround, which causes the problem described earlier. The proposed patch will ensure we only do the Nest MMU hack when changing _PAGE_RW and not for normal R/C updates. The workaround tested by the GPFS team consists of adding _PAGE_DIRTY to the mapping created by remap_pfn_range() to avoid the RC update fault completely. This is fixed by these: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd0dbb73e01306a1060e56f81e5fe287be936477 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f08d08f3db55452d31ba4a37c702da6245876b96 Since DD1 support is still in (ie, 2bf1071a8d50928a4ae366bb3108833166c2b70c is not applied) the second doesn't apply cleanly. Did you want that attached? == SRU Justification == IBM is requesting these commits in bionic and cosmic. These commits also rely on commit 7acf50e4efa6, which was SRU'd in bug 1792102. The first patch, commit 2bf1071a8d50, was backported by IBM themself. Description of bug: GPFS mmfsd daemon is mapping shared tracing buffer(allocated from kernel driver using vmalloc) and then writing trace records from user space threads in parallel. While the SIGBUS happened, the access virtual memory address is in the mapped range, no overflow on access. The root cause is that for PTEs created by a driver at mmap time (ie, that aren't created dynamically at fault time), it's not legit for ptep_set_access_flags() to make them invalid even temporarily. A concurrent access while they are invalid will be unable to service the page fault and will cause as SIGBUS. == Fixes == 2bf1071a8d50 ("powerpc/64s: Remove POWER9 DD1 support") bd0dbb73e013 ("powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.") f08d08f3db55 ("powerpc/mm/radix: Only need the Nest MMU workaround for R -> RW transition") == Regression Potential == Low. Limited to powerpc. == Test Case == A test kernel was built with these patches and tested by IBM. IBM states the test kernel resolved the bug. -- Problem Description -- GPFS mmfsd daemon is mapping shared tracing buffer(allocated from kernel driver using vmalloc) and then writing trace records from user space threads in parallel. While the SIGBUS happened, the access virtual memory address is in the mapped range, no overflow on access. Worked with Benjamin Herrenschmidt on GPFS tracing kernel driver code and he made a suggestion as workaround on the driver code to bypass the problem, and it works.... the workaround code change as below:  - rc = remap_pfn_range(vma, start, pfn, PAGE_SIZE, PAGE_SHARED); + rc = remap_pfn_range(vma, start, pfn, PAGE_SIZE, __pgprot(pgprot_val(PAGE_SHARED)|_PAGE_DIRTY); As Benjamin mentioned, this is a Linux kernel bug and this is just a workaround. He will give the details about the kernel bug and why this workaround works.... The root cause is that for PTEs created by a driver at mmap time (ie, that aren't created dynamically at fault time), it's not legit for ptep_set_access_flags() to make them invalid even temporarily. A concurrent access while they are invalid will be unable to service the page fault and will cause as SIGBUS. Thankfully such PTEs shouldn't normally be the subject of a RO->RW privilege escalation. What happens is that the GPFS driver creates the PTEs using remap_pfn_range(...,PAGE_SHARED). PAGE_SHARED has _PAGE_ACCESSED (R) but not _PAGE_DIRTY (C) set. Thus on the first write, we try set C and while doing so, hit the above workaround, which causes the problem described earlier. The proposed patch will ensure we only do the Nest MMU hack when changing _PAGE_RW and not for normal R/C updates. The workaround tested by the GPFS team consists of adding _PAGE_DIRTY to the mapping created by remap_pfn_range() to avoid the RC update fault completely. This is fixed by these: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd0dbb73e01306a1060e56f81e5fe287be936477 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f08d08f3db55452d31ba4a37c702da6245876b96 Since DD1 support is still in (ie, 2bf1071a8d50928a4ae366bb3108833166c2b70c is not applied) the second doesn't apply cleanly. Did you want that attached?
2018-10-16 14:47:40 Joseph Salisbury description == SRU Justification == IBM is requesting these commits in bionic and cosmic. These commits also rely on commit 7acf50e4efa6, which was SRU'd in bug 1792102. The first patch, commit 2bf1071a8d50, was backported by IBM themself. Description of bug: GPFS mmfsd daemon is mapping shared tracing buffer(allocated from kernel driver using vmalloc) and then writing trace records from user space threads in parallel. While the SIGBUS happened, the access virtual memory address is in the mapped range, no overflow on access. The root cause is that for PTEs created by a driver at mmap time (ie, that aren't created dynamically at fault time), it's not legit for ptep_set_access_flags() to make them invalid even temporarily. A concurrent access while they are invalid will be unable to service the page fault and will cause as SIGBUS. == Fixes == 2bf1071a8d50 ("powerpc/64s: Remove POWER9 DD1 support") bd0dbb73e013 ("powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.") f08d08f3db55 ("powerpc/mm/radix: Only need the Nest MMU workaround for R -> RW transition") == Regression Potential == Low. Limited to powerpc. == Test Case == A test kernel was built with these patches and tested by IBM. IBM states the test kernel resolved the bug. -- Problem Description -- GPFS mmfsd daemon is mapping shared tracing buffer(allocated from kernel driver using vmalloc) and then writing trace records from user space threads in parallel. While the SIGBUS happened, the access virtual memory address is in the mapped range, no overflow on access. Worked with Benjamin Herrenschmidt on GPFS tracing kernel driver code and he made a suggestion as workaround on the driver code to bypass the problem, and it works.... the workaround code change as below:  - rc = remap_pfn_range(vma, start, pfn, PAGE_SIZE, PAGE_SHARED); + rc = remap_pfn_range(vma, start, pfn, PAGE_SIZE, __pgprot(pgprot_val(PAGE_SHARED)|_PAGE_DIRTY); As Benjamin mentioned, this is a Linux kernel bug and this is just a workaround. He will give the details about the kernel bug and why this workaround works.... The root cause is that for PTEs created by a driver at mmap time (ie, that aren't created dynamically at fault time), it's not legit for ptep_set_access_flags() to make them invalid even temporarily. A concurrent access while they are invalid will be unable to service the page fault and will cause as SIGBUS. Thankfully such PTEs shouldn't normally be the subject of a RO->RW privilege escalation. What happens is that the GPFS driver creates the PTEs using remap_pfn_range(...,PAGE_SHARED). PAGE_SHARED has _PAGE_ACCESSED (R) but not _PAGE_DIRTY (C) set. Thus on the first write, we try set C and while doing so, hit the above workaround, which causes the problem described earlier. The proposed patch will ensure we only do the Nest MMU hack when changing _PAGE_RW and not for normal R/C updates. The workaround tested by the GPFS team consists of adding _PAGE_DIRTY to the mapping created by remap_pfn_range() to avoid the RC update fault completely. This is fixed by these: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd0dbb73e01306a1060e56f81e5fe287be936477 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f08d08f3db55452d31ba4a37c702da6245876b96 Since DD1 support is still in (ie, 2bf1071a8d50928a4ae366bb3108833166c2b70c is not applied) the second doesn't apply cleanly. Did you want that attached? == SRU Justification == IBM is requesting these commits in bionic and cosmic. These commits also rely on commit 7acf50e4efa6, which was SRU'd in bug 1792102. Description of bug: GPFS mmfsd daemon is mapping shared tracing buffer(allocated from kernel driver using vmalloc) and then writing trace records from user space threads in parallel. While the SIGBUS happened, the access virtual memory address is in the mapped range, no overflow on access. The root cause is that for PTEs created by a driver at mmap time (ie, that aren't created dynamically at fault time), it's not legit for ptep_set_access_flags() to make them invalid even temporarily. A concurrent access while they are invalid will be unable to service the page fault and will cause as SIGBUS. == Fixes == bd0dbb73e013 ("powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.") f08d08f3db55 ("powerpc/mm/radix: Only need the Nest MMU workaround for R -> RW transition") == Regression Potential == Low. Limited to powerpc. == Test Case == A test kernel was built with these patches and tested by IBM. IBM states the test kernel resolved the bug. -- Problem Description -- GPFS mmfsd daemon is mapping shared tracing buffer(allocated from kernel driver using vmalloc) and then writing trace records from user space threads in parallel. While the SIGBUS happened, the access virtual memory address is in the mapped range, no overflow on access. Worked with Benjamin Herrenschmidt on GPFS tracing kernel driver code and he made a suggestion as workaround on the driver code to bypass the problem, and it works.... the workaround code change as below:  - rc = remap_pfn_range(vma, start, pfn, PAGE_SIZE, PAGE_SHARED); + rc = remap_pfn_range(vma, start, pfn, PAGE_SIZE, __pgprot(pgprot_val(PAGE_SHARED)|_PAGE_DIRTY); As Benjamin mentioned, this is a Linux kernel bug and this is just a workaround. He will give the details about the kernel bug and why this workaround works.... The root cause is that for PTEs created by a driver at mmap time (ie, that aren't created dynamically at fault time), it's not legit for ptep_set_access_flags() to make them invalid even temporarily. A concurrent access while they are invalid will be unable to service the page fault and will cause as SIGBUS. Thankfully such PTEs shouldn't normally be the subject of a RO->RW privilege escalation. What happens is that the GPFS driver creates the PTEs using remap_pfn_range(...,PAGE_SHARED). PAGE_SHARED has _PAGE_ACCESSED (R) but not _PAGE_DIRTY (C) set. Thus on the first write, we try set C and while doing so, hit the above workaround, which causes the problem described earlier. The proposed patch will ensure we only do the Nest MMU hack when changing _PAGE_RW and not for normal R/C updates. The workaround tested by the GPFS team consists of adding _PAGE_DIRTY to the mapping created by remap_pfn_range() to avoid the RC update fault completely. This is fixed by these: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd0dbb73e01306a1060e56f81e5fe287be936477 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f08d08f3db55452d31ba4a37c702da6245876b96 Since DD1 support is still in (ie, 2bf1071a8d50928a4ae366bb3108833166c2b70c is not applied) the second doesn't apply cleanly. Did you want that attached?
2018-10-23 14:46:32 Kleber Sacilotto de Souza linux (Ubuntu Bionic): status In Progress Fix Committed
2018-10-24 13:35:28 Brad Figg tags architecture-ppc64le bot-comment bugnameltc-171273 severity-high targetmilestone-inin1804 architecture-ppc64le bot-comment bugnameltc-171273 severity-high targetmilestone-inin1804 verification-needed-bionic
2018-10-24 14:50:01 Brad Figg tags architecture-ppc64le bot-comment bugnameltc-171273 severity-high targetmilestone-inin1804 verification-needed-bionic architecture-ppc64le bot-comment bugnameltc-171273 severity-high targetmilestone-inin1804 verification-needed-bionic verification-needed-cosmic
2018-10-25 19:28:40 Mike Ranweiler tags architecture-ppc64le bot-comment bugnameltc-171273 severity-high targetmilestone-inin1804 verification-needed-bionic verification-needed-cosmic architecture-ppc64le bot-comment bugnameltc-171273 severity-high targetmilestone-inin1804 verification-done-bionic verification-needed-cosmic
2018-11-08 13:43:23 Kleber Sacilotto de Souza linux (Ubuntu Cosmic): status In Progress Fix Committed
2018-11-12 14:43:10 Andrew Cloke ubuntu-power-systems: status In Progress Fix Committed
2018-11-12 19:26:24 Terry Rudd bug added subscriber Terry Rudd
2018-11-13 18:51:26 Launchpad Janitor linux (Ubuntu Bionic): status Fix Committed Fix Released
2018-11-13 18:51:26 Launchpad Janitor cve linked 2017-13168
2018-11-13 18:51:26 Launchpad Janitor cve linked 2018-15471
2018-11-13 18:51:26 Launchpad Janitor cve linked 2018-16658
2018-11-13 18:51:26 Launchpad Janitor cve linked 2018-9363
2018-11-13 19:09:36 Launchpad Janitor linux (Ubuntu Cosmic): status Fix Committed Fix Released
2018-11-13 19:09:37 Launchpad Janitor linux (Ubuntu Cosmic): status Fix Committed Fix Released
2018-11-13 19:15:48 Andrew Cloke ubuntu-power-systems: status Fix Committed In Progress
2018-11-14 16:01:57 Joseph Salisbury linux (Ubuntu): status In Progress Fix Released
2018-11-14 16:33:36 Andrew Cloke ubuntu-power-systems: status In Progress Fix Released
2019-07-24 20:53:58 Brad Figg tags architecture-ppc64le bot-comment bugnameltc-171273 severity-high targetmilestone-inin1804 verification-done-bionic verification-needed-cosmic architecture-ppc64le bot-comment bugnameltc-171273 cscc severity-high targetmilestone-inin1804 verification-done-bionic verification-needed-cosmic