Cannot build kernel 4.15.0-48.51 due to an in-source-tree ZFS module.

Bug #1828763 reported by Hyeonho Seo
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Low
Andrea Righi
Bionic
Fix Released
Low
Unassigned
zfs-linux (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

* In b49151d684f44 tx_waited has been renamed to tx_dirty_delayed, but
only in the tracepoint definition (in trace_dmu.h) and not in the rest
of the code, causing build errors if zfs tracepoints are enabled.

* Fix by reverting tx_dirty_delayed back to the original name tx_waited

NOTE: this bug doesn't show up in regular kernel builds, because zfs tracepoints are not enabled by default, it is possible to see this problem only by enabling them and recompiling zfs outside our regular build process, but it's a good cleanup anyway (in case we need to enable zfs tracepoints in the future).

[Test Case]

 * enable zfs tracepoints in config and build zfs

[Fix]

 * Restore the old struct name to fix the build bug

[Regression Potential]

 * It is a very small fix (just a rename of a struct member), so regression potential is minimal

[Original bug report]

I tried to build a kernel 4.15.0-48.51 that cloned from the Ubuntu kernel git (x86_64-generic flavour).

commit c50532b9d7b623ff98aeaf0b848e58adae54ca75 (HEAD -> master, tag: Ubuntu-4.15.0-48.51, origin/master, origin/HEAD)
Author: Andrea Righi <email address hidden>
Date: Tue Apr 2 18:31:55 2019 +0200

    UBUNTU: Ubuntu-4.15.0-48.51

    Signed-off-by: Andrea Righi <email address hidden>

1. Build a kernel image with 'zfs_enable' flag set as false.
2. Build SPL/ZFS modules independently by 'make' command (debug purpose; DECLARE_EVENT_CLASS() enabled).

But 'make' was always failed at the same point, 'zfs/module/zfs/trace.c' (as seem as below).

In file included from /home/np/linux-4.15/include/trace/define_trace.h:96:0,
                 from /home/np/linux-4.15/zfs/include/sys/trace_dmu.h:127,
                 from /home/np/linux-4.15/zfs/module/zfs/trace.c:45:
/home/np/linux-4.15/zfs/include/sys/trace_dmu.h: In function ‘trace_event_raw_event_zfs_delay_mintime_class’:
/home/np/linux-4.15/zfs/include/sys/trace_dmu.h:65:37: error: ‘dmu_tx_t {aka struct dmu_tx}’ has no member named ‘tx_dirty_delayed’
      __entry->tx_dirty_delayed = tx->tx_dirty_delayed;
                                     ^
/home/np/linux-4.15/include/trace/trace_events.h:719:4: note: in definition of macro ‘DECLARE_EVENT_CLASS’
  { assign; } \
    ^~~~~~
/home/np/linux-4.15/zfs/include/sys/trace_dmu.h:60:2: note: in expansion of macro ‘TP_fast_assign’
  TP_fast_assign(
  ^~~~~~~~~~~~~~
In file included from /home/np/linux-4.15/include/trace/define_trace.h:97:0,
                 from /home/np/linux-4.15/zfs/include/sys/trace_dmu.h:127,
                 from /home/np/linux-4.15/zfs/module/zfs/trace.c:45:
/home/np/linux-4.15/zfs/include/sys/trace_dmu.h: In function ‘perf_trace_zfs_delay_mintime_class’:
/home/np/linux-4.15/zfs/include/sys/trace_dmu.h:65:37: error: ‘dmu_tx_t {aka struct dmu_tx}’ has no member named ‘tx_dirty_delayed’
      __entry->tx_dirty_delayed = tx->tx_dirty_delayed;
                                     ^
/home/np/linux-4.15/include/trace/perf.h:66:4: note: in definition of macro ‘DECLARE_EVENT_CLASS’
  { assign; } \
    ^~~~~~
/home/np/linux-4.15/zfs/include/sys/trace_dmu.h:60:2: note: in expansion of macro ‘TP_fast_assign’
  TP_fast_assign(
  ^~~~~~~~~~~~~~

I tried to debug this issue with myself and found something intriguing.
In 'zfs/include/sys/trace_dmu.h', the failed code accessing 'tx_dirty_delayed' from 'dmu_tx_t' (which same as 'struct dmu_tx').

DECLARE_EVENT_CLASS(zfs_delay_mintime_class,
        TP_PROTO(dmu_tx_t *tx, uint64_t dirty, uint64_t min_tx_time),
        TP_ARGS(tx, dirty, min_tx_time),
        TP_STRUCT__entry(
            __field(uint64_t, tx_txg)
            __field(uint64_t, tx_lastsnap_txg)
            __field(uint64_t, tx_lasttried_txg)
            __field(boolean_t, tx_anyobj)
            __field(boolean_t, tx_dirty_delayed)
            __field(hrtime_t, tx_start)
            __field(boolean_t, tx_wait_dirty)
            __field(int, tx_err)
            __field(uint64_t, min_tx_time)
            __field(uint64_t, dirty)
        ),
        TP_fast_assign(
            __entry->tx_txg = tx->tx_txg;
            __entry->tx_lastsnap_txg = tx->tx_lastsnap_txg;
            __entry->tx_lasttried_txg = tx->tx_lasttried_txg;
            __entry->tx_anyobj = tx->tx_anyobj;
            __entry->tx_dirty_delayed = tx->tx_dirty_delayed;
            __entry->tx_start = tx->tx_start;
            __entry->tx_wait_dirty = tx->tx_wait_dirty;
            __entry->tx_err = tx->tx_err;
            __entry->dirty = dirty;
            __entry->min_tx_time = min_tx_time;
        ),
        TP_printk("tx { txg %llu lastsnap_txg %llu tx_lasttried_txg %llu "
            "anyobj %d dirty_delayed %d start %llu wait_dirty %d err %i "
            "} dirty %llu min_tx_time %llu",
            __entry->tx_txg, __entry->tx_lastsnap_txg,
            __entry->tx_lasttried_txg, __entry->tx_anyobj,
            __entry->tx_dirty_delayed, __entry->tx_start,
            __entry->tx_wait_dirty, __entry->tx_err,
            __entry->dirty, __entry->min_tx_time)
);

But, implementation of 'strcut dmu_tx' doesn't contain 'tx_dirty_delayed' (in zfs/include/sys/dmu_tx.h)

struct dmu_tx {
        /*
         * No synchronization is needed because a tx can only be handled
         * by one thread.
         */
        list_t tx_holds; /* list of dmu_tx_hold_t */
        objset_t *tx_objset;
        struct dsl_dir *tx_dir;
        struct dsl_pool *tx_pool;
        uint64_t tx_txg;
        uint64_t tx_lastsnap_txg;
        uint64_t tx_lasttried_txg;
        txg_handle_t tx_txgh;
        void *tx_tempreserve_cookie;
        struct dmu_tx_hold *tx_needassign_txh;

        /* list of dmu_tx_callback_t on this dmu_tx */
        list_t tx_callbacks;

        /* placeholder for syncing context, doesn't need specific holds */
        boolean_t tx_anyobj;

        /* has this transaction already been delayed? */
        boolean_t tx_waited;

        /* transaction is marked as being a "net free" of space */
        boolean_t tx_netfree;

        /* time this transaction was created */
        hrtime_t tx_start;

        /* need to wait for sufficient dirty space */
        boolean_t tx_wait_dirty;

        int tx_err;
};

Looks above, current in-source-tree ZFS module seems not valid... :(
Are there any workarounds for this problem?
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw----+ 1 root audio 116, 1 May 13 15:35 seq
 crw-rw----+ 1 root audio 116, 33 May 13 15:35 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.6
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=/dev/mapper/ubuntu--vg-swap_1
InstallationDate: Installed on 2019-05-13 (0 days ago)
InstallationMedia: Ubuntu-Server 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb:
 Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd
 Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: Xen HVM domU
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 cirrusdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-48-generic root=/dev/mapper/hostname--vg-root ro
ProcVersionSignature: Ubuntu 4.15.0-48.51-generic 4.15.18
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-48-generic N/A
 linux-backports-modules-4.15.0-48-generic N/A
 linux-firmware 1.173.5
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic
Uname: Linux 4.15.0-48-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 10/26/2018
dmi.bios.vendor: Xen
dmi.bios.version: 4.7.6-6.2.1.xcp
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias: dmi:bvnXen:bvr4.7.6-6.2.1.xcp:bd10/26/2018:svnXen:pnHVMdomU:pvr4.7.6-6.2.1.xcp:cvnXen:ct1:cvr:
dmi.product.name: HVM domU
dmi.product.version: 4.7.6-6.2.1.xcp
dmi.sys.vendor: Xen

Revision history for this message
Hyeonho Seo (seohho) wrote :
Hyeonho Seo (seohho)
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1828763

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Hyeonho Seo (seohho) wrote : CRDA.txt

apport information

tags: added: apport-collected bionic
description: updated
Revision history for this message
Hyeonho Seo (seohho) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Hyeonho Seo (seohho) wrote : Lspci.txt

apport information

Revision history for this message
Hyeonho Seo (seohho) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Hyeonho Seo (seohho) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Hyeonho Seo (seohho) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Hyeonho Seo (seohho) wrote : ProcModules.txt

apport information

Revision history for this message
Hyeonho Seo (seohho) wrote : UdevDb.txt

apport information

Revision history for this message
Hyeonho Seo (seohho) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Andrea Righi (arighi) wrote :

Hi Hyeonho, good catch. It looks like b49151d684f44 renamed tx_waited to tx_dirty_delayed, but only in the tracepoint definition in trace_dmu.h and not in the rest of the code. I think we should simply restore the original name.

Can you check if this patch solves the problem? Thanks.

Andrea Righi (arighi)
tags: added: patch
Changed in linux (Ubuntu):
importance: Undecided → Low
assignee: nobody → Andrea Righi (arighi)
Revision history for this message
Hyeonho Seo (seohho) wrote :

Hi, Andrea, thanks for your prompt response.
I tried to build kernel sources applied your patch, and it failed with an error.

/home/np/linux-4.15/zfs/include/sys/trace_dmu.h:65:33: error: ‘dmu_tx_t {aka struct dmu_tx}’ has no member named ‘waited’; did you mean ‘tx_waited’?
      __entry->tx_waited = tx->waited;

According to the error message, the following diff caused the problem.

@@ -62,7 +62,7 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class,
      __entry->tx_lastsnap_txg = tx->tx_lastsnap_txg;
      __entry->tx_lasttried_txg = tx->tx_lasttried_txg;
      __entry->tx_anyobj = tx->tx_anyobj;
- __entry->tx_dirty_delayed = tx->tx_dirty_delayed;
+ __entry->tx_waited = tx->waited;

I think we should assign '__entry->tx_waited' as 'tx->tx_waited', not 'tx->waited'.

Revision history for this message
Andrea Righi (arighi) wrote :

Hyeonho, you're absolutely right, there's a typo in my previous patch. Can you double check if this one is correct? Thanks again!

Revision history for this message
Hyeonho Seo (seohho) wrote :

The second patchset works perfectly on my machine.
There were no any errors while compiling, then successfully built.
Appreciate you for taking the trouble to help me, Andrea! :)

Andrea Righi (arighi)
description: updated
Revision history for this message
Colin Ian King (colin-king) wrote :

FYI I've sponsored this package and it's waiting now for the SRU team to handle.

Revision history for this message
Terry Rudd (terrykrudd) wrote :

I've added this bug for SRU in an upcoming SRU Cycle

Stefan Bader (smb)
Changed in linux (Ubuntu Bionic):
importance: Undecided → Low
status: New → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu Bionic):
status: Confirmed → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Andrea Righi (arighi)
tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Hyeonho, or anyone else affected,

Accepted zfs-linux into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.7.5-1ubuntu16.6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in zfs-linux (Ubuntu):
status: New → Invalid
Changed in zfs-linux (Ubuntu Bionic):
status: New → Fix Committed
Revision history for this message
Hyeonho Seo (seohho) wrote :

Hello Łukasz.
Sorry for the delayed response.
I've tested the proposed package, zfs-linux 0.7.5-1ubuntu16.6; it works well.
Thank you for your attention to this bug. :)

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (11.2 KiB)

This bug was fixed in the package linux - 4.15.0-55.60

---------------
linux (4.15.0-55.60) bionic; urgency=medium

  * linux: 4.15.0-55.60 -proposed tracker (LP: #1834954)

  * Request backport of ceph commits into bionic (LP: #1834235)
    - ceph: use atomic_t for ceph_inode_info::i_shared_gen
    - ceph: define argument structure for handle_cap_grant
    - ceph: flush pending works before shutdown super
    - ceph: send cap releases more aggressively
    - ceph: single workqueue for inode related works
    - ceph: avoid dereferencing invalid pointer during cached readdir
    - ceph: quota: add initial infrastructure to support cephfs quotas
    - ceph: quota: support for ceph.quota.max_files
    - ceph: quota: don't allow cross-quota renames
    - ceph: fix root quota realm check
    - ceph: quota: support for ceph.quota.max_bytes
    - ceph: quota: update MDS when max_bytes is approaching
    - ceph: quota: add counter for snaprealms with quota
    - ceph: avoid iput_final() while holding mutex or in dispatch thread

  * QCA9377 isn't being recognized sometimes (LP: #1757218)
    - SAUCE: USB: Disable USB2 LPM at shutdown

  * hns: fix ICMP6 neighbor solicitation messages discard problem (LP: #1833140)
    - net: hns: fix ICMP6 neighbor solicitation messages discard problem
    - net: hns: fix unsigned comparison to less than zero

  * Fix occasional boot time crash in hns driver (LP: #1833138)
    - net: hns: Fix probabilistic memory overwrite when HNS driver initialized

  * use-after-free in hns_nic_net_xmit_hw (LP: #1833136)
    - net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()

  * hns: attempt to restart autoneg when disabled should report error
    (LP: #1833147)
    - net: hns: Restart autoneg need return failed when autoneg off

  * systemd 237-3ubuntu10.14 ADT test failure on Bionic ppc64el (test-seccomp)
    (LP: #1821625)
    - powerpc: sys_pkey_alloc() and sys_pkey_free() system calls
    - powerpc: sys_pkey_mprotect() system call

  * [UBUNTU] pkey: Indicate old mkvp only if old and curr. mkvp are different
    (LP: #1832625)
    - pkey: Indicate old mkvp only if old and current mkvp are different

  * [UBUNTU] kernel: Fix gcm-aes-s390 wrong scatter-gather list processing
    (LP: #1832623)
    - s390/crypto: fix gcm-aes-s390 selftest failures

  * System crashes on hot adding a core with drmgr command (4.15.0-48-generic)
    (LP: #1833716)
    - powerpc/numa: improve control of topology updates
    - powerpc/numa: document topology_updates_enabled, disable by default

  * Kernel modules generated incorrectly when system is localized to a non-
    English language (LP: #1828084)
    - scripts: override locale from environment when running recordmcount.pl

  * [UBUNTU] kernel: Fix wrong dispatching for control domain CPRBs
    (LP: #1832624)
    - s390/zcrypt: Fix wrong dispatching for control domain CPRBs

  * CVE-2019-11815
    - net: rds: force to destroy connection if t_sock is NULL in
      rds_tcp_kill_sock().

  * Sound device not detected after resume from hibernate (LP: #1826868)
    - drm/i915: Force 2*96 MHz cdclk on glk/cnl when audio power is enabled
    - drm/i915: Save the old CDCLK atomic state
...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.7.5-1ubuntu16.6

---------------
zfs-linux (0.7.5-1ubuntu16.6) bionic; urgency=medium

  * Fix hung z_zvol tasks during 'zfs receive' (LP: #1772412)
    - Adds a dedicated, per-pool, prefetch taskq to prevent the traverse
      code from monopolizing the global (and limited) system_taskq by
      inappropriately scheduling long running tasks on it. This fixes
      z_zvol hung tasks.

zfs-linux (0.7.5-1ubuntu16.5) bionic; urgency=medium

  * Fix build error with tracepoints enabled (LP: #1828763)
    - In b49151d684f44 (bionic kernel master-next branch) tx_waited has been
      renamed to tx_dirty_delayed, but only in the tracepoint definition (in
      trace_dmu.h) and not in the rest of the code, causing build errors if
      zfs tracepoints are enabled; fix by reverting tx_dirty_delayed back to
      the original name tx_waited.

 -- Colin Ian King <email address hidden> Wed, 29 May 2019 17:24:22 +0100

Changed in zfs-linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Robie Basak (racb) wrote : Update Released

The verification of the Stable Release Update for zfs-linux has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.