Disable ECKD Thin Provisioning to prevent data loss

Bug #1860535 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Unassigned
linux (Ubuntu)
Invalid
Undecided
Skipper Bug Screeners
Eoan
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:
------------------

[Impact]

* A severe problem with 'thin provisioning ECKD volumes', introduced with 19.10's kernel 5.3, was identified.

* For enhanced space efficient (ese) volumes, errors may occur when accessing not formatted tracks.

* In such a case the driver either formats the track on the fly for write requests or returns zero data for read requests.

* But if a write request spans multiple tracks, the indication of an unformatted track can be in wc applied to all tracks.

* Hence tracks containing data will be handled as empty tracks, resulting in zero data being returned on read, or overwriting
existing data with zero on write.

[Fix]

* Backport: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860535/+attachment/5323313/+files/0001-s390-dasd-disable-ese-support-due-to-possible-data-c.patch

[Test Case]

* An s390x LPAR with Eoan / kernel 5.3 and at least one 3390 DASD (ECKD) disk is needed ('discard' enabled, which is default).

* Writing arbitrary files (but with known content, e.g. all '1's) to fill the disk up to a certain level

* Since all 3390 DASDs (mod-3, 9, 27 or 54 ...) have 56,664 bytes per track, writing a file (again with simple but known content) with a size of a multiple of 56,664 bytes on a thin provisioned ECKD DASD device should provoke the error situation.

* Check the files for any modifications (partially filled with '0', cut/truncated, deleted/zero length).

[Regression Potential]

* The regression potential is moderate since this is purely s390x specific,

* limited to thin provisioned DASD disks on Eoan / kernel 5.3

* and just disables the broken feature and reverts things back to a DASD functionality that is known to work.

[Other Info]

* For 19.10 / Eoan no real fix will be provided, but a patch for disabling this feature completely (this bug/patch).

* The broken functionality that got introduced in Eoan, got already partially removed due to problems on z/VM.

* For 20.04 / Focal a proper fix is in the works that will be made available as backport to Focal's kernel 5.4.

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-183399 severity-high targetmilestone-inin1910
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
importance: Undecided → High
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-01-22 08:25 EDT-------
IBM has discovered a problem with a new feature within Ubuntu 19.10
with following feature request which might result in data loss.,
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1830731

For 19.10 we will not provide a fix for it, but a patch for disabling this feature.
For 20.04 a fix will be made available via backport from kernel 5.6 where the patch is targeted for.

Info will follow

Revision history for this message
Frank Heimes (fheimes) wrote : Re: [UBUNTU] - Disable Thin Provisioning to prevent data loss

After the work on LP 1830731 got completed, problems occurred prior to the release of Eoan / kernel 5.3 and it got partially reverted based on LP 1846219:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1846219

Is the revert in LP 1846219 already sufficient to complete this ticket for Eoan? (I assume yes) Or do all the patches from LP 1830731 need to be reverted?

Changed in ubuntu-z-systems:
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-01-23 02:45 EDT-------
Comment: Is the revert in LP 1846219 already sufficient to complete this ticket for Eoan? (I assume yes) Or do all the patches from LP 1830731 need to be reverted?

Answer:
No, this is not sufficient. The revert in LP 1846219 only reverts a part of the feature.
Unfortunately we have to disable the whole feature. But we would like to disable the feature instead of reverting all patches.

The patch to disable the feature is attached here

Created attachment 140187 [details]
s390/dasd: disable ese support due to possible data corruption

Devices are formatted in multiple of tracks.
For an enhanced space efficient (ese) volume we get errors when accessing
not formatted tracks. In this case the driver either formats the track
on the flight for write requests or returns zero data for read requests.

In case a request spans multiple tracks, the indication of an unformatted
track presented for the first track is incorrectly applied to all tracks
covered by the request. As a result, tracks containing data will be handled
as if empty, resulting in zero data being returned on read, or overwriting
existing data with zero on write.

While working on a proper fix disable the feature by always returning zero
for the ese check. This disables all ese special handling and prevents the
possible data corruption.

Revision history for this message
bugproxy (bugproxy) wrote : s390/dasd: disable ese support due to possible data corruption

------- Comment on attachment From <email address hidden> 2020-01-27 07:55 EDT-------

Attachment relaunched external

Frank Heimes (fheimes)
summary: - [UBUNTU] - Disable Thin Provisioning to prevent data loss
+ Disable ECKD Thin Provisioning to prevent data loss
Revision history for this message
Frank Heimes (fheimes) wrote :

Kernel SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2020-January/thread.html#107106
Changing status to In Progress.

description: updated
Changed in linux (Ubuntu):
status: New → Triaged
status: Triaged → Invalid
Changed in linux (Ubuntu Eoan):
status: New → Triaged
Changed in ubuntu-z-systems:
status: Incomplete → In Progress
Frank Heimes (fheimes)
Changed in linux (Ubuntu Eoan):
status: Triaged → In Progress
Changed in linux (Ubuntu Eoan):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
Revision history for this message
Frank Heimes (fheimes) wrote :

As per comment #3 and bug description, the decision for focal is still open.
Focal is now tracked in a separate ticket:
[UBUNTU 20.04] Thin Provisioning support (disablement or final solution)
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1862749

Revision history for this message
Khaled El Mously (kmously) wrote :

@fheimes - is it possible to have this bug verified on eoan?

Revision history for this message
Frank Heimes (fheimes) wrote :

@kmously sorry to not coming back earlier to that
I actually already did a verification some days ago when I setup a fresh LPAR with Eoan and having proposed enabled.
Since the patch/backport disables thin provisioning it's a regression test what's needed to see if something else got accidentally harmed.
For a different ticket I had to recompile a patched kernel (incl. headers, etc.) and everything worked fine and I didn't faced any issue.
Today in my morning I setup a z/VM guest on top with Eoan, enabled proposed, and updated also there to kernel 5.3.0-40 and explicitly wrote file crossing track boundaries and everything was fine (like expected, since ECKD thin provisioning was not active - and can't be activated anymore with that kernel, since the code was revoked).
Hence I successfully verified the Eoan proposed kernel (and with that adjusting the tags).

tags: added: verification-done-eoan
removed: verification-needed-eoan
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (78.1 KiB)

This bug was fixed in the package linux - 5.3.0-40.32

---------------
linux (5.3.0-40.32) eoan; urgency=medium

  * eoan/linux: 5.3.0-40.32 -proposed tracker (LP: #1861214)

  * No sof soundcard for 'ASoC: CODEC DAI intel-hdmi-hifi1 not registered' after
    modprobe sof (LP: #1860248)
    - ASoC: SOF: Intel: fix HDA codec driver probe with multiple controllers

  * ocfs2-tools is causing kernel panics in Ubuntu Focal (Ubuntu-5.4.0-9.12)
    (LP: #1852122)
    - ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less

  * QAT drivers for C3XXX and C62X not included as modules (LP: #1845959)
    - [Config] CRYPTO_DEV_QAT_C3XXX=m, CRYPTO_DEV_QAT_C62X=m and
      CRYPTO_DEV_QAT_DH895xCC=m

  * Eoan update: upstream stable patchset 2020-01-24 (LP: #1860816)
    - scsi: lpfc: Fix discovery failures when target device connectivity bounces
    - scsi: mpt3sas: Fix clear pending bit in ioctl status
    - scsi: lpfc: Fix locking on mailbox command completion
    - Input: atmel_mxt_ts - disable IRQ across suspend
    - f2fs: fix to update time in lazytime mode
    - iommu: rockchip: Free domain on .domain_free
    - iommu/tegra-smmu: Fix page tables in > 4 GiB memory
    - dmaengine: xilinx_dma: Clear desc_pendingcount in xilinx_dma_reset
    - scsi: target: compare full CHAP_A Algorithm strings
    - scsi: lpfc: Fix SLI3 hba in loop mode not discovering devices
    - scsi: csiostor: Don't enable IRQs too early
    - scsi: hisi_sas: Replace in_softirq() check in hisi_sas_task_exec()
    - powerpc/pseries: Mark accumulate_stolen_time() as notrace
    - powerpc/pseries: Don't fail hash page table insert for bolted mapping
    - powerpc/tools: Don't quote $objdump in scripts
    - dma-debug: add a schedule point in debug_dma_dump_mappings()
    - leds: lm3692x: Handle failure to probe the regulator
    - clocksource/drivers/asm9260: Add a check for of_clk_get
    - clocksource/drivers/timer-of: Use unique device name instead of timer
    - powerpc/security/book3s64: Report L1TF status in sysfs
    - powerpc/book3s64/hash: Add cond_resched to avoid soft lockup warning
    - ext4: update direct I/O read lock pattern for IOCB_NOWAIT
    - ext4: iomap that extends beyond EOF should be marked dirty
    - jbd2: Fix statistics for the number of logged blocks
    - scsi: tracing: Fix handling of TRANSFER LENGTH == 0 for READ(6) and WRITE(6)
    - scsi: lpfc: Fix duplicate unreg_rpi error in port offline flow
    - f2fs: fix to update dir's i_pino during cross_rename
    - clk: qcom: Allow constant ratio freq tables for rcg
    - clk: clk-gpio: propagate rate change to parent
    - irqchip/irq-bcm7038-l1: Enable parent IRQ if necessary
    - irqchip: ingenic: Error out if IRQ domain creation failed
    - fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
    - scsi: lpfc: fix: Coverity: lpfc_cmpl_els_rsp(): Null pointer dereferences
    - PCI: rpaphp: Fix up pointer to first drc-info entry
    - scsi: ufs: fix potential bug which ends in system hang
    - powerpc/pseries/cmm: Implement release() function for sysfs device
    - PCI: rpaphp: Don't rely on firmware feature to imply drc-info support
    - PCI: rpaphp: Annotate and corr...

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-02-17 10:14 EDT-------
IBM Bugzilla status -> closed, Fix Released by requested distro

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.