ubuntu_blktrace_smoke_test failed on Bionic P9

Bug #1827318 reported by Po-Hsu Lin
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Medium
Canonical Kernel Team
ubuntu-kernel-tests
Fix Released
Medium
Colin Ian King
blktrace (Ubuntu)
Fix Released
Medium
Unassigned
Xenial
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned

Bug Description

== SRU justification [Xenial, Bionic] ==

Running blktrace on systems where CPUs are offlined such that:

1. sysconf(_SC_NPROCESSORS_ONLN) < sysconf(_SC_NPROCESSORS_CONF) and/or
2. CPU 0..sysconf(_SC_NPROCESSORS_ONLN) - 1 is offline

..causes blktrace to fail with an error such as:

FAILED to start thread on CPU 156: 22/Invalid argument
FAILED to start thread on CPU 157: 22/Invalid argument
FAILED to start thread on CPU 158: 22/Invalid argument
..
etc

== Test case ==

Run blktrace on a system that has sysconf(_SC_NPROCESSORS_ONLN) < sysconf(_SC_NPROCESSORS_CONF). It will break with the invalid argument error for sysconf(_SC_NPROCESSORS_ONLN) <= CPU < sysconf(_SC_NPROCESSORS_ONLN).

With the fix, blktrace on these systems will work find and the ubuntu blktrace regresssion test will also pass.

== The Fix(es) ==

commit 80c4041b2e7a7d5afb75df563bf51bb27773c095
Author: Abutalib Aghayev <email address hidden>
Date: Tue Feb 9 08:17:50 2016 -0700

    blktrace: Use number of online CPUs

the above fix addresses the case where the number of online cpus is less than the number of cpus configured.

commit d045a704a378b9041ebe3d60c497a5656a79d439
Author: Jan Kara <email address hidden>
Date: Thu Jan 26 11:23:53 2017 +0100

    blktrace: Add support for sparse CPU numbers

the above commit fixes the case where one or more CPUs 0..sysconf(_SC_NPROCESSORS_ONLN) is offline causing sparse CPU numbering.

== Regression Potential ==

Small, the fixes have been in blktrace since 2017 and have had no subsequent fixes so the fixes look stable. The fixes are designed to fix this specific issue as seen in the bug. The fixes adjust the number of CPUs accounted for by blktrace; at worst the fixes could break blktrace.

----------------

This issue could be found on 4.15 / 4.18 / 5.0 Bionic.
Investigation show this was caused by a CPU offline issue for this node "baltar" (bug 1827335) and can be reproduced on another P9 node "bobone"

Combined with the issue to the blktrace itself, (Upstream bug for blktrace attempting to operate on offline CPUs: https://bugzilla.redhat.com/show_bug.cgi?id=1321875) the test failed.

$ sudo ./ubuntu_blktrace_smoke_test.sh
PASSED (CONFIG_BLK_DEV_IO_TRACE=y in /boot/config-4.15.0-48-generic)

Using block device /dev/loop0 for path /home/ubuntu/autotest-client-tests/ubuntu_blktrace_smoke_test/mnt

Test regime:
  dd performing 65536 1K block writes
  looking for at least 1024 blktrace events

Thu May 2 03:45:41 UTC 2019: blktrace starting
Thu May 2 03:45:41 UTC 2019: dd starting
FAILED to start thread on CPU 156: 22/Invalid argument
FAILED to start thread on CPU 157: 22/Invalid argument
FAILED to start thread on CPU 158: 22/Invalid argument
FAILED to start thread on CPU 159: 22/Invalid argument
Thu May 2 03:45:45 UTC 2019: dd stopped
Thu May 2 03:45:45 UTC 2019: waiting for 10 seconds
Thu May 2 03:45:55 UTC 2019: blktrace being terminated
Thu May 2 03:45:55 UTC 2019: blktrace terminated
Thu May 2 03:45:56 UTC 2019: blktrace data parsed

./ubuntu_blktrace_smoke_test.sh: line 237: [: -eq: unary operator expected
./ubuntu_blktrace_smoke_test.sh: line 240: [: -eq: unary operator expected
FAILED (expecting at least 1024 block traces events from the dd process, got 0)
FAILED (expecting at least 1024 block read traces events, got 0)
FAILED (expecting at least 1024 block write traces events, got 0)

Summary: 1 passed, 3 failed

Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

It seems that the blktrace command is not working at all.

# blktrace -d $DEV
FAILED to start thread on CPU 156: 22/Invalid argument
FAILED to start thread on CPU 157: 22/Invalid argument
FAILED to start thread on CPU 158: 22/Invalid argument
FAILED to start thread on CPU 159: 22/Invalid argument
# echo $DEV
/dev/loop0

So the output is empty.

tags: added: bionic
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This Invalid argument is caused by offline CPUs:
# cat /sys/devices/system/cpu/offline
156-159

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Upstream bug for blktrace attempting to operate on offline CPUs:
https://bugzilla.redhat.com/show_bug.cgi?id=1321875

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Bug for P9 CPU in offline state (correct the wrong link in comment #4, which has been hide to prevent confusion).
https://bugs.launchpad.net/bugs/1827335

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Tried this on another two Bionic P9 node:

1. Node: "bobone"
This node has the same CPU offline issue (bug 1827335)
And the test will fail with the same issue.

2. Node "dradis"
This node does not have the CPU offline issue, all its 160 CPUs are in online state, and it can pass the test without issue.

However if you turn some CPU off, the test will fail with the same issue.

Also,
this applies to Bionic AMD64 nodes as well, if you turn off some CPU then this test will fail. This issue will gone with Cosmic, looks like the blktrace tool was fixed in Cosmic.

description: updated
Manoj Iyer (manjo)
Changed in blktrace (Ubuntu):
milestone: none → bionic-updates
Changed in ubuntu-power-systems:
importance: Undecided → Medium
Changed in ubuntu-kernel-tests:
importance: Undecided → Medium
Changed in blktrace (Ubuntu):
importance: Undecided → Medium
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Triaged
tags: added: universe
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Manoj Iyer (manjo)
Changed in blktrace (Ubuntu):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Changed in ubuntu-power-systems:
assignee: Canonical Kernel Team (canonical-kernel-team) → Canonical Foundations Team (canonical-foundations)
Changed in ubuntu-power-systems:
assignee: Canonical Foundations Team (canonical-foundations) → Canonical Kernel Team (canonical-kernel-team)
Changed in blktrace (Ubuntu):
assignee: Canonical Foundations Team (canonical-foundations) → nobody
Revision history for this message
Steve Langasek (vorlon) wrote :

This bug has been marked 'medium' for the blktrace package and is in universe, which means it would not normally warrant an SRU. If the Kernel Team needs Foundations support to resolve this for bionic in order to fix the test suite, the Kernel Team should ask Foundations directly.

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Re-assigning to kernel-team, as blktrace is a Universe package.

Po-Hsu Lin, does the kernel-test suite rely on blktrace for it's correct operation, or is this an issue with the kernel-test suite?

Foundations have asked that you reach out directly to them if you require blktrace fixed.

Changed in ubuntu-kernel-tests:
assignee: nobody → Colin Ian King (colin-king)
status: New → In Progress
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :

Two fixes required:

commit 80c4041b2e7a7d5afb75df563bf51bb27773c095
Author: Abutalib Aghayev <email address hidden>
Date: Tue Feb 9 08:17:50 2016 -0700

    blktrace: Use number of online CPUs

(partial solution)

And also a more complete solution:

commit d045a704a378b9041ebe3d60c497a5656a79d439
Author: Jan Kara <email address hidden>
Date: Thu Jan 26 11:23:53 2017 +0100

    blktrace: Add support for sparse CPU numbers

description: updated
Revision history for this message
Colin Ian King (colin-king) wrote :

I've uploaded fixed packages in https://launchpad.net/~colin-king/+archive/ubuntu/ppa - tested on a power box, works fine.

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Status update: blktrace package with fix is in SRU queue.

Revision history for this message
Robie Basak (racb) wrote :

> This issue will gone with Cosmic, looks like the blktrace tool was fixed in Cosmic.

Setting Fix Released for the development task accordingly.

Changed in blktrace (Ubuntu):
status: New → Fix Released
Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello Po-Hsu, or anyone else affected,

Accepted blktrace into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/blktrace/1.1.0-2+deb9u1ubuntu0.16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in blktrace (Ubuntu Xenial):
status: New → Fix Committed
tags: added: verification-needed verification-needed-xenial
Changed in blktrace (Ubuntu Bionic):
status: New → Fix Committed
tags: added: verification-needed-bionic
Revision history for this message
Robie Basak (racb) wrote :

Hello Po-Hsu, or anyone else affected,

Accepted blktrace into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/blktrace/1.1.0-2+deb9u1ubuntu0.18.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Tested on Bionic P9, test passed without any issue:
 PASSED (CONFIG_BLK_DEV_IO_TRACE=y in /boot/config-4.15.0-55-generic)

 Using block device /dev/loop0 for path /home/ubuntu/autotest/client/results/default/ubuntu_blktrace_smoke_test.blktrace-smoke-test/mnt

 Test regime:
   dd performing 65536 1K block writes
   looking for at least 1024 blktrace events

 Wed Jul 24 06:25:27 UTC 2019: blktrace starting
 Wed Jul 24 06:25:27 UTC 2019: dd starting
 Wed Jul 24 06:25:32 UTC 2019: dd stopped
 Wed Jul 24 06:25:32 UTC 2019: waiting for 10 seconds
 Wed Jul 24 06:25:42 UTC 2019: blktrace being terminated
 Wed Jul 24 06:25:42 UTC 2019: blktrace terminated
 Wed Jul 24 06:25:48 UTC 2019: blktrace data parsed

 PASSED (got 774204 block trace events)
 PASSED (got 65536 block read trace events)
 PASSED (got 63546 block write trace events)

 Summary: 4 passed, 0 failed

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

For Xenial, I can't deploy P9 node with it.
There is a 18.04 minimum requirement limit on power8-maas.

Revision history for this message
Frank Heimes (fheimes) wrote :

Many thanks for taking the time and doing the verification on bionic.
Indeed 16.04 does not support P9.
However, attempting to satisfy the verification request for xenial, I just did a quick and basic test on a P8 system with xenial (running blktrace - and blkparse) and didn't faced any regressions.
I hope this satisfies the needed verification for xenial.

Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Updated tags to reflect "verification-done-xenial" based on Frank's posting #17.

tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial
Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
status: In Progress → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for blktrace has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package blktrace - 1.1.0-2+deb9u1ubuntu0.16.04.1

---------------
blktrace (1.1.0-2+deb9u1ubuntu0.16.04.1) xenial; urgency=medium

  * Fix failure when CPUs are offline (LP: #1827318)
    If one or more CPUs are offline then currently blktrace breaks
    because it detects the number of CPUs based on the number of
    CPUs rather than the number of online CPUs. Requires two upstream
    blktrace fixes to fully address this issue
    - 80c4041b2e7a7d5 ("blktrace: Use number of online CPUs")
    - d045a704a378b90 ("blktrace: Add support for sparse CPU numbers")

 -- Colin Ian King <email address hidden> Wed, 22 May 2019 13:22:31 +0100

Changed in blktrace (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package blktrace - 1.1.0-2+deb9u1ubuntu0.18.04.1

---------------
blktrace (1.1.0-2+deb9u1ubuntu0.18.04.1) bionic; urgency=medium

  * Fix failure when CPUs are offline (LP: #1827318)
    If one or more CPUs are offline then currently blktrace breaks
    because it detects the number of CPUs based on the number of
    CPUs rather than the number of online CPUs. Requires two upstream
    blktrace fixes to fully address this issue
    - 80c4041b2e7a7d5 ("blktrace: Use number of online CPUs")
    - d045a704a378b90 ("blktrace: Add support for sparse CPU numbers")

 -- Colin Ian King <email address hidden> Wed, 22 May 2019 13:22:31 +0100

Changed in blktrace (Ubuntu Bionic):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.