Ubuntu 16.04: kdump fails with error "kdump-tools[1532]: /etc/init.d/kdump-tools: 26: [: -ne: unexpected operator" when / file system is xfs.

Bug #1714485 reported by bugproxy on 2017-09-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
High
Canonical Kernel Team
makedumpfile (Ubuntu)
High
Canonical Kernel Team
Xenial
High
Canonical Kernel Team
Zesty
Undecided
Unassigned
Artful
High
Canonical Kernel Team
Bionic
High
Canonical Kernel Team

Bug Description

== Comment: #0 - PAVITHRA R. PRAKASH <> - 2017-08-31 00:33:37 ==
---Problem Description---

Ubuntu 16.04.03: kdump fails with error "kdump-tools[1532]: /etc/init.d/kdump-tools: 26: [: -ne: unexpected operator" when / file system is xfs.

---Steps to Reproduce---

1. Install Ubuntu 16.04.03 with / as xfs.
2. Configure kdump.
3. trigger crash.

Machine hangs after below log. Attaching console log.

[ OK ] Reached target Network is Online.
         Starting Kernel crash dump capture service...
         Starting iSCSI initiator daemon (iscsid)...
[ 12.263089] kdump-tools[1205]: /etc/init.d/kdump-tools: 26: [: -ne: unexpected operator
[ OK ] Started Kernel crash dump capture service.
[ OK ] Started iSCSI initiator daemon (iscsid).
         Starting Login to default iSCSI targets...
[ OK ] Started Login to default iSCSI targets.
[ OK ] Reached target Remote File Systems (Pre).

4. After manual reboot /etc/default/kdump-tools is empty.

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-158164 severity-high targetmilestone-inin---
bugproxy (bugproxy) wrote :

Default Comment by Bridge

bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → makedumpfile (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: kernel-da-key

What is the content of /etc/default/kdump-tools ? It seems it doesn't have USE_KDUMP set at all.

Cascardo.

------- Comment From <email address hidden> 2017-09-01 11:37 EDT-------
(In reply to comment #8)
> == Comment: #0 - PAVITHRA R. PRAKASH <> - 2017-08-31 00:33:37 ==
> ---Problem Description---
>
> Ubuntu 16.04.03: kdump fails with error "kdump-tools[1532]:
> /etc/init.d/kdump-tools: 26: [: -ne: unexpected operator" when / file system
> is xfs.
> 4. After manual reboot /etc/default/kdump-tools is empty.
>

Hello Cascardo,

The file is ending up empty..

Thanks
Hari

Manoj Iyer (manjo) on 2017-09-11
Changed in makedumpfile (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g

Well, it shouldn't be empty, unless there is a bug on XFS. Can you verify what is the content of the file right before rebooting?

Thank you very much.
Cascardo.

Reproduced here on a x86_64 virtual machine. Will investigate further.

Thanks.
Cascardo.

Changed in ubuntu-power-systems:
status: New → Triaged
Manoj Iyer (manjo) on 2017-11-06
Changed in makedumpfile (Ubuntu):
status: New → Triaged
importance: Undecided → High

The investigation led me to find out that the config file is written by the maintainer script, which does not sync it to disk. So, when the crash is triggered right after the installation of kdump-tools, the file will be empty and the dump won't happen.

I have fixed this upstream by changing the maintainer script to use ucf and syncing it. Downstream, I may simply add the sync.

The other thing I might fix is dumping and rebooting when we are running the dump kernel, because USE_KDUMP should be about automatically loading the kdump kernel, not about whether to kdump or not when kdump kernel is running.

Cascardo.

Changed in makedumpfile (Ubuntu):
status: Triaged → In Progress

------- Comment From <email address hidden> 2017-11-17 01:42 EDT-------
(In reply to comment #15)
> The investigation led me to find out that the config file is written by the
> maintainer script, which does not sync it to disk. So, when the crash is
> triggered right after the installation of kdump-tools, the file will be
> empty and the dump won't happen.

Good that it is not a crazy file system bug.
Thanks!

Hello,

I would like to add that triggering a kernel dump in seconds following the installation of the package remains a corner case in a test context and not the usual behavior.

I have seen that too in my test scripts and syncing right after the installation fixed the issue as well.

Bringing in the upstream change is a good solution.

...Louis

tags: added: ppc64el-kdump
tags: added: triage-a
removed: triage-g
Manoj Iyer (manjo) wrote :

Cascardo, did you get chance to make those changes you mentioned in comment #8 ? Please update this bug if this fix was already applied to x/a/z releases.

Changed in makedumpfile (Ubuntu Zesty):
status: New → Won't Fix

Should already be fixed in bionic. If we backport the bionic package to xenial, we will get this working as well.

Cascardo.

Changed in makedumpfile (Ubuntu Bionic):
status: In Progress → Fix Released
summary: - Ubuntu 16.04.03: kdump fails with error "kdump-tools[1532]: /etc/init.d
+ Ubuntu 16.04: kdump fails with error "kdump-tools[1532]: /etc/init.d
/kdump-tools: 26: [: -ne: unexpected operator" when / file system is
xfs.
Manoj Iyer (manjo) on 2018-03-05
Changed in makedumpfile (Ubuntu Xenial):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in makedumpfile (Ubuntu Artful):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in makedumpfile (Ubuntu Xenial):
importance: Undecided → High
Changed in makedumpfile (Ubuntu Artful):
importance: Undecided → High
Andrew Cloke (andrew-cloke) wrote :

We believe this issue is resolved in Bionic. Could IBM verify that this bug is actually fixed in bionic?
The backport to Xenial is complex, and so we are investigating other routes.

When testing kdump-tools, take in mind that a reboot is needed after installing it. Otherwise, there is a chance that crashkernel memory space was not reserved by the kernel until a reboot is done. When doing that, this bug won't trigger, so it's also a workaround for this test. That is, reboot after installing kdump-tools, then trigger a crash.

Thank you.
Cascardo.

tags: removed: kernel-da-key
bugproxy (bugproxy) on 2018-03-22
tags: added: targetmilestone-inin1804
removed: targetmilestone-inin---

------- Comment From <email address hidden> 2018-03-22 11:43 EDT-------
Issue is not observed with 18.04.

# df -Th
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 211G 0 211G 0% /dev
tmpfs tmpfs 45G 13M 45G 1% /run
/dev/sda2 xfs 932G 5.3G 926G 1% /
tmpfs tmpfs 222G 0 222G 0% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 222G 0 222G 0% /sys/fs/cgroup
tmpfs tmpfs 45G 0 45G 0% /run/user/1000

Starting Kernel crash dump capture service...
[ 16.986777] kdump-tools[1459]: Starting kdump-tools: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/201803221033/dump-incomplete
Copying data : [100.0 %] - eta: 0s
[ 27.935305] kdump-tools[1459]: The kernel version is not supported.
[ 27.935383] kdump-tools[1459]: The makedumpfile operation may be incomplete.
[ 27.935456] kdump-tools[1459]: The dumpfile is saved to /var/crash/201803221033/dump-incomplete.
[ 27.935525] kdump-tools[1459]: makedumpfile Completed.
[ 27.962918] kdump-tools[1459]: * kdump-tools: saved vmcore in /var/crash/201803221033
[ 28.466833] kdump-tools[1459]: * running makedumpfile --dump-dmesg /proc/vmcore /var/crash/201803221033/dmesg.201803221033
[ 28.474788] kdump-tools[1459]: The kernel version is not supported.
[ 28.474895] kdump-tools[1459]: The makedumpfile operation may be incomplete.
[ 28.474967] kdump-tools[1459]: The dmesg log is saved to /var/crash/201803221033/dmesg.201803221033.
[ 28.475051] kdump-tools[1459]: makedumpfile Completed.
[ 28.475126] kdump-tools[1459]: * kdump-tools: saved dmesg content in /var/crash/201803221033
[ 28.550660] kdump-tools[1459]: Thu, 22 Mar 2018 10:33:27 -0500
[ 28.600047] kdump-tools[1459]: Rebooting.
[ 29.003219] reboot: Restarting system

/var/crash# ls
201803220915 201803221038 linux-image-4.15.0-12-generic-201803220915.crash
201803221033 kexec_cmd

Thanks,
Pavithra

Manoj Iyer (manjo) on 2018-03-26
tags: added: triage-g
removed: triage-a

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.3-2~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in makedumpfile (Ubuntu Xenial):
status: New → Fix Committed
Changed in makedumpfile (Ubuntu Artful):
status: New → Fix Committed

Also available on artful-proposed.

tags: added: verification-needed-artful verification-needed-xenial
Manoj Iyer (manjo) on 2018-06-11
Changed in ubuntu-power-systems:
status: Triaged → Fix Committed
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-06-22 05:24 EDT-------
(In reply to comment #24)
>
> Hello bugproxy, or anyone else affected,
>
> Accepted makedumpfile into xenial-proposed. The package will build now and
> be available at
> https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.3-2~16.04.1 in a few
> hours, and then in the -proposed repository.
>
> Please help us by testing this new package. See
> https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to
> enable and use -proposed.Your feedback will aid us getting this update out
> to other Ubuntu users.
>
> If this package fixes the bug for you, please add a comment to this bug,
> mentioning the version of the package you tested and change the tag from
> verification-needed-xenial to verification-done-xenial. If it does not fix
> the bug for you, please add a comment stating that, and change the tag to
> verification-failed-xenial. In either case, without details of your testing
> we will not be able to proceed.
>
> Further information regarding the verification process can be found at
> https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
> advance!
>
> Also available on artful-proposed.

Tried with proposed package, dump capture is successful.

root@ltc-garri3:~# uname -a
Linux ltc-garri3 4.4.0-128-generic #154-Ubuntu SMP Fri May 25 14:13:59 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
root@ltc-garri3:~# df -Th
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 211G 0 211G 0% /dev
tmpfs tmpfs 45G 20M 45G 1% /run
/dev/sda2 xfs 929G 1.6G 927G 1% /
tmpfs tmpfs 222G 0 222G 0% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 222G 0 222G 0% /sys/fs/cgroup
tmpfs tmpfs 45G 0 45G 0% /run/user/1000

root@ltc-garri3:/var/crash# cd 201806220415
root@ltc-garri3:/var/crash/201806220415# ls
dmesg.201806220415 dump.201806220415
root@ltc-garri3:/var/crash/201806220415# tail dmesg.201806220415
[ 207.574328] Instruction dump:
[ 207.574359] 3842e720 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019 39490664
[ 207.574612] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6
[ 207.574870] ---[ end trace 11a9e3d935ec528f ]---
[ 207.734198]
[ 207.734265] Sending IPI to other CPUs
[ 207.735462] IPI complete
[ 207.736681] kexec: waiting for cpu 8 (physical 32) to enter OPAL
[ 207.737915] kexec: waiting for cpu 9 (physical 33) to enter OPAL
[ 207.739214] kexec: waiting for cpu 43 (physical 99) to enter OPAL

Thanks.
Pavithra

tags: added: verification-done-xenial
removed: verification-needed-xenial
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-06-24 10:16 EDT-------
Moving this to FIX-AVAILABLE state.

Changed in makedumpfile (Ubuntu Artful):
status: Fix Committed → Invalid
Changed in makedumpfile (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers