Tight timeout for bcache removal causes spurious failures
- Cosmic (18.10)
- Bug #1796292
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
curtin |
Fix Released
|
High
|
Andrea Righi | ||
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned | ||
Cosmic |
Confirmed
|
Undecided
|
Unassigned | ||
Disco |
Confirmed
|
Undecided
|
Unassigned | ||
Eoan |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
I've had a number of deployment faults where curtin would report Timeout exceeded for removal of /sys/fs/bcache/xxx when doing a mass-deployment of 30+ nodes. Upon retrying the node would usually deploy fine. Experimentally I've set the timeout ridiculously high, and it seems I'm getting no faults with this. I'm wondering if the timeout for removal is set too tight, or might need to be made configurable.
--- curtin/util.py~ 2018-05-18 18:40:48.000000000 +0000
+++ curtin/util.py 2018-10-05 09:40:06.807390367 +0000
@@ -263,7 +263,7 @@
return _subp(*args, **kwargs)
-def wait_for_
+def wait_for_
if not path:
raise ValueError(
Related branches
- Ryan Harper (community): Approve
- Server Team CI bot: Approve (continuous-integration)
- Dan Watkins (community): Approve
-
Diff: 1778 lines (+1112/-438)9 files modifiedapicurtin/block/bcache.py (+204/-25)
curtin/block/clear_holders.py (+42/-109)
examples/tests/bcache-ceph-nvme.yaml (+227/-0)
examples/tests/dirty_disks_config.yaml (+18/-1)
tests/data/bcache-super-show-backing (+14/-0)
tests/data/bcache-super-show-caching (+18/-0)
tests/unittests/test_block_bcache.py (+448/-0)
tests/unittests/test_clear_holders.py (+48/-303)
tests/vmtests/test_bcache_ceph.py (+93/-0)
CVE References
Scott Moser (smoser) wrote : | #1 |
Ryan Harper (raharper) wrote : | #2 |
Yeah, I think it make sense for shutdown_bcache to adjust. Currently it does this:
# bcache device removal should be fast but in an extreme
# case, might require the cache device to flush large
# amounts of data to a backing device. The strategy here
# is to wait for approximately 30 seconds but to check
# frequently since curtin cannot proceed until devices
# cleared.
removal_retries = [0.2] * 150 # 30 seconds total
1200 seems extreme to me. Did you try anything lower than that?
I'd be more inclined to try 5 minutes...
remove_retries = [0.2] * 1500 # 300 seconds/5 min total
Or we could scale it up like:
remove_retries = ([0.2] * 150 + [20] * 15) # 30 seconds + 5 minutes
Vladimir Grevtsev (vlgrevtsev) wrote : | #3 |
- ln-sv-ostk02-failed-deployment-bcache-timeout.log.gz Edit (1.1 MiB, application/octet-stream)
It looks like we have this reproduced:
2018-12-
2018-12-
2018-12-
2018-12-
2018-12-
2018-12-
2018-12-
2018-12-
2018-12-
2018-12-
2018-12-
2018-12-
2018-12-
tags: | added: cpe-onsite |
Ryan Harper (raharper) wrote : Re: [Bug 1796292] Re: Tight timeout for bcache removal causes spurious failures | #4 |
Thank for the log. If it's reproducible, can you re-run the deployment
with curtin verbosity set ?
And can you attach the configuration that you're deploying? I'm especially
interested in the number of bcache backing devices and how many cache
devices.
% maas {profile} maas set-config name=curtin_verbose value=true
https:/
On Fri, Dec 14, 2018 at 5:01 PM Pedro Guimarães <email address hidden>
wrote:
> ** Tags added: cpe-onsite
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https:/
>
> Title:
> Tight timeout for bcache removal causes spurious failures
>
> To manage notifications about this bug go to:
> https:/
>
Changed in curtin: | |
importance: | Undecided → High |
status: | New → Incomplete |
Vladimir Grevtsev (vlgrevtsev) wrote : | #5 |
Hi Ryan,
curtin config: https:/
Curtin logs are empty:
ubuntu@
Success.
Machine-readable output follows:
[]
However, they are available in MAAS UI, and I've exported them: https:/
This part looks like problematic one:
Error rescanning devices, possibly known issue LP: #1489521
cmd: ['blockdev', '--rereadpt', '/dev/sda', '/dev/sdb', '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf', '/dev/sdg', '/dev/sdh']
stdout:''
stderr:blockdev: ioctl error on BLKRRPART: Device or resource busy
Thoughts?
Changed in curtin: | |
status: | Incomplete → New |
Ryan Harper (raharper) wrote : | #6 |
Thank you for the config and an install log.
You don't happen to have a verbose curtin install log where it times out (the failure path)?
The error on rescanning partitions is a side-effect from one or more of the bcache devices
from not getting shut down properly which keeps a hold on the underlying device which prevents
access to the device exclusively, hence the error; however the failure is before this.
Getting the verbose logging from curtin (like you did for this successful install) will help me understand where the bug occurred.
Changed in curtin: | |
status: | New → Incomplete |
John George (jog) wrote : | #7 |
@Ryan we're hitting this issue during solutions QA test runs. The latest recreate is here:
https:/
There is a link at the bottom of that page to all the test artefacts, which includes the maas logs at this link:
https:/
Ryan Harper (raharper) wrote : | #8 |
On Thu, Jan 17, 2019 at 12:15 PM John George <email address hidden>
wrote:
> @Ryan we're hitting this issue during solutions QA test runs. The latest
> recreate is here:
>
> https:/
>
> There is a link at the bottom of that page to all the test artefacts,
> which includes the maas logs at this link:
>
> https:/
Can you narrow down which systems I should be looking at?
>
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Tight timeout for bcache removal causes spurious failures
>
> To manage notifications about this bug go to:
> https:/
>
Jason Hobbs (jason-hobbs) wrote : | #9 |
Changed in curtin: | |
status: | Incomplete → New |
Jason Hobbs (jason-hobbs) wrote : | #10 |
sub'd to field high as we've hit this 4 times in CI, and it seems related to bug 1808231 and bug 1811117, and we don't have a workaround.
Ryan Harper (raharper) wrote : | #11 |
Do you turn on curtin_verbose during the QA runs? If not, can we ensure we get a reproduce with the curtin_verbose enabled?
Jason Hobbs (jason-hobbs) wrote : | #12 |
Yes, we do turn on curtin_verbose
On Wed, Jan 23, 2019 at 6:26 PM Ryan Harper <email address hidden>
wrote:
> Do you turn on curtin_verbose during the QA runs? If not, can we ensure
> we get a reproduce with the curtin_verbose enabled?
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1808231).
> https:/
>
> Title:
> Tight timeout for bcache removal causes spurious failures
>
> Status in curtin:
> New
>
> Bug description:
> I've had a number of deployment faults where curtin would report
> Timeout exceeded for removal of /sys/fs/bcache/xxx when doing a mass-
> deployment of 30+ nodes. Upon retrying the node would usually deploy
> fine. Experimentally I've set the timeout ridiculously high, and it
> seems I'm getting no faults with this. I'm wondering if the timeout
> for removal is set too tight, or might need to be made configurable.
>
> --- curtin/util.py~ 2018-05-18 18:40:48.000000000 +0000
> +++ curtin/util.py 2018-10-05 09:40:06.807390367 +0000
> @@ -263,7 +263,7 @@
> return _subp(*args, **kwargs)
>
>
> -def wait_for_
> +def wait_for_
> if not path:
> raise ValueError(
>
> To manage notifications about this bug go to:
> https:/
>
Ryan Harper (raharper) wrote : | #13 |
@jhobbs
I see that now; I missed the curtin section since the log had more than just the single curtin install boot.
I think I see something; but since I've never been able to reproduce this in our integration harness it's been hard to diagnose.
The Shutdown Plan for cleaning devices shows that it has detected bcache0p1 as a 'bcache' device, when it's really a 'partition' which should only trigger a wipe of the partition, the actual bcache device is the parent (bcache0).
Shutdown Plan:
{'level': 3, 'dev_type': 'bcache', 'device': '/sys/class/
{'level': 2, 'dev_type': 'bcache', 'device': '/sys/class/
{'level': 2, 'dev_type': 'bcache', 'device': '/sys/class/
{'level': 1, 'dev_type': 'partition', 'device': '/sys/class/
{'level': 1, 'dev_type': 'partition', 'device': '/sys/class/
{'level': 1, 'dev_type': 'partition', 'device': '/sys/class/
{'level': 1, 'dev_type': 'partition', 'device': '/sys/class/
{'level': 1, 'dev_type': 'partition', 'device': '/sys/class/
{'level': 0, 'dev_type': 'disk', 'device': '/sys/class/
{'level': 0, 'dev_type': 'disk', 'device': '/sys/class/
{'level': 0, 'dev_type': 'disk', 'device': '/sys/class/
Then this leads us to attempt to write to bcache0/
Error writing to bcache stop file /sys/class/
waiting for /sys/class/
sleeping 0.2
We repeat and fail each time.
With that in mind, I've updated the fix for bcache-partitions with a change that prevents partitions on top of bcache devices from being classified as 'bcache' devices. This will fix the issue that you produced, but it's not the original issue (which didn't involve bcache devices with partitions).
So, with the updated curtin (18.2-0ubuntu10
Changed in curtin: | |
status: | New → Incomplete |
Jason Hobbs (jason-hobbs) wrote : | #14 |
We've had a number of successful test runs with this now, and no more failures, so I think it's fixed.
Changed in curtin: | |
status: | Incomplete → New |
Jason Hobbs (jason-hobbs) wrote : | #15 |
We hit another failure that could possibly be this. http://
Ryan Harper (raharper) wrote : | #16 |
Hi,
Can you provide the curtin-logs.tar from one of these failures?
maas {profile} node-results read system_
name=/tmp/
https:/
And if possible, can you test with curtin trunk (daily ppa)?
https:/
On Tue, Feb 12, 2019 at 10:40 AM Jason Hobbs <email address hidden>
wrote:
> We hit another failure that could possibly be this.
> http://
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Tight timeout for bcache removal causes spurious failures
>
> To manage notifications about this bug go to:
> https:/
>
Changed in curtin: | |
status: | New → Incomplete |
Jason Hobbs (jason-hobbs) wrote : | #17 |
- curtin-logs.tar Edit (140.0 KiB, application/x-tar)
I've attached curtin-logs from a failure. This was with curtin from proposed - 18.2-10-
Changed in curtin: | |
status: | Incomplete → New |
Ryan Harper (raharper) wrote : | #18 |
Thanks.
Do we have logs of the ceph charm which create those ceph logical volumes?
I'd like to recreate those LVs on top of the instance so I can have an instance with the same setup.
Changed in curtin: | |
status: | New → Incomplete |
Jason Hobbs (jason-hobbs) wrote : | #19 |
There are logs of that creation here:
The logs for unit ceph-osd/0 are in ceph-osd_
ceph-osd_
Has stuff about lvm creation.
Changed in curtin: | |
status: | Incomplete → New |
Ryan Harper (raharper) wrote : | #20 |
I've seen this failure in another scenario, tracked in this private bug (LP: #1815018); so I'm copying in the portion that's relevant here.
--
The failure you saw in the 1 in 30 case I now believe is related to the time it takes to flush the cache device. Curtin currently finds a bcache's cacheset device and stops that first. In your ceph deployment scenario, each cache device has 3 backing devices being cached, which may contain a large amount of dirty data that needs to be flush; and this is where the longer timeout that was initially mentioned in LP: #1796292 helped.
Curtin will now do the following for stopping bcache devices.
1. wipe the bcache device contents
2. extract the cacheset uuid
3. extract the backing device
4. detached bcache from cacheset
5. stop the bcacheN device
6. wait for removal of sysfs path to bcacheN, bcacheN/bcache and
backing/bcache to go away
7. Check how many other backing devices are attached to
cset_uuid, if zero, stop cset
Notably at step 4, we will monitor the bcache device's state, and wait until the cacheset is no longer attached. And then Step 7, we only remove a cache device once all if the devices it was caching have been stopped.
[ 83.467348] cloud-init[1081]: shutdown running on holder type: 'bcache' syspath: '/sys/class/
[ 83.470549] cloud-init[1081]: Wiping superblock on bcache device: /sys/class/
[ 83.472919] cloud-init[1081]: wiping superblock on /dev/bcache0
[ 83.474801] cloud-init[1081]: wiping /dev/bcache0 attempt 1/4
[ 83.477077] cloud-init[1081]: wiping 1M on /dev/bcache0 at offsets [0, -1048576]
[ 83.479757] cloud-init[1081]: successfully wiped device /dev/bcache0 on attempt 1/4
[ 83.481915] cloud-init[1081]: os.path.exists on blockdevs:
[ 83.484922] cloud-init[1081]: [('/sys/
[ 83.489019] cloud-init[1081]: bcache: detaching /sys/class/
[ 83.492648] cloud-init[1081]: /sys/class/
[ 83.496051] cloud-init[1081]: /sys/class/
[ 85.372738] cloud-init[1081]: /sys/class/
[ 85.373121] cloud-init[1081]: /sys/class/
[ 85.377747] cloud-init[1081]: stopping bcache backing device at: /sys/class/
[ 85.379303] cloud-init[1081]: waiting for /sys/class/
[ 85.380359] cloud-init[1081]: sleeping 0.2
[ 85.569818] cloud-init[1081]: /sys/class/
[ 85.571160] cloud-init[1081]: waiting for /sys/class/
[ 85.571727] cloud-init[1081]: /sys/class/
[ 85.576889] cloud-init[1081]: waiting for /sys/class/
[ 85.580122] cloud-init[1081]: /sys/class/
[ 85.584239] cloud-init[1081]: Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
[ ...
Changed in curtin: | |
status: | New → In Progress |
Server Team CI bot (server-team-bot) wrote : | #21 |
This bug is fixed with commit 36351dea to curtin on branch master.
To view that commit see the following URL:
https:/
Changed in curtin: | |
status: | In Progress → Fix Committed |
Ryan Harper (raharper) wrote : | #22 |
Jason,
Would you be able to give the current curtin-daily ppa build a test? I believe we've got this issue fixed but would like some feedback/results on this fix before we start an SRU.
ppa:curtin-
Ryan Harper (raharper) wrote : | #23 |
I've copied what's in disco into this ppa; so it won't change on you while testing.
Jason Hobbs (jason-hobbs) wrote : | #24 |
- logs-2019-04-25-12.04.12.tar Edit (163.9 MiB, application/x-tar)
we hit this issue with the version from the curtin-dev ppa - so it is apparently not yet resolved:
18.2-19-
Changed in curtin: | |
status: | Fix Committed → New |
Jason Hobbs (jason-hobbs) wrote : | #25 |
In comment 24, the failure was on spearow; unpack the logs for 10.244.40.32 and it's at:
./10.244.
Jason Hobbs (jason-hobbs) wrote : | #26 |
err, ./10.244.
Ryan Harper (raharper) wrote : | #27 |
After looking at the logs, I believe curtin is doing all that it can to shut down the bcache device.
Waiting 1200 seconds is more than reasonable and so I believe there is something else going on in the bcache device in the kernel at this point. I would like to either open a task against the kernel or start a new bug with these details.
Some conversation from IRC for context.
<rharper> that's a lot of seconds to wait
<rharper> I wonder if there's a kernel bug in there; is it reasonable to wait 2400 seconds? 3600 seconds?
<jhobbs> no
<rharper> jhobbs: do you know if we capture dmesg from the host?
<jhobbs> that's all ridiculous
<jhobbs> for nvme...
<rharper> right; so it smells like a bug, or deadlock in bcache itself; I found *many* of those while working on a more reliable way to stop them
<rharper> jhobbs: so I'm generally happy with the curtin code; I think we're doing a reliable job of shutting them down and waiting a more than reasonable amount of time for the device to stop at this point. We may need to open up a different/new issue against the kernel to see if we can get to the bottom of why it isn't shutting down;
<jhobbs> rharper: that log is from the node logging syslog to maas, we don't have any hooks in there to get dmesg during install
<rharper> jhobbs: ok
James Troup (elmo) wrote : | #28 |
Ryan Harper <email address hidden> writes:
> After looking at the logs, I believe curtin is doing all that it can to
> shut down the bcache device.
>
> Waiting 1200 seconds is more than reasonable and so I believe there is
> something else going on in the bcache device in the kernel at this
> point. I would like to either open a task against the kernel or start a
> new bug with these details.
https:/
May or may not be relevant.
--
James
Joshua Powers (powersj) wrote : | #29 |
Adding affects linux package
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs. | #30 |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1796292
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
Jason Hobbs (jason-hobbs) wrote : | #31 |
This occurrs on a target machine during maas install. Apport is not collected in this case.
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
Trent Lloyd (lathiat) wrote : | #32 |
I have been running into this (curtin 18.1-17-
I think this commit basically agrees with my thoughts but I just wanted to share them explicitly in case they are interesting
(1) If you *unregister* the cache device from the backing device, it first has to purge all the dirty data back to the backing device. This may obviously take a while.
(2) When doing that, I managed to deadlock bcache at least once on xenial-hwe 4.15 where it was trying to reclaim memory from XFS, which I assume was trying to write to the bcache.. traceback: https:/
(3) However generally I had good luck simplying "stop"ing the cache devices (it seems perhaps that is what this bug is designed to do, switch to stop, instead of unregister?). Specifically though I was stopping the backing devices, and then later the cache device. It seems like the current commit is the other way around?
tags: | added: sts |
Ryan Harper (raharper) wrote : Re: [Bug 1796292] Re: Tight timeout for bcache removal causes spurious failures | #33 |
On Wed, May 8, 2019 at 11:55 PM Trent Lloyd <email address hidden>
wrote:
> I have been running into this (curtin 18.1-17-gae48e86f-
> 0ubuntu1~16.04.1)
>
> I think this commit basically agrees with my thoughts but I just wanted
> to share them explicitly in case they are interesting
>
> (1) If you *unregister* the cache device from the backing device, it
> first has to purge all the dirty data back to the backing device. This
> may obviously take a while.
>
> (2) When doing that, I managed to deadlock bcache at least once on
> xenial-hwe 4.15 where it was trying to reclaim memory from XFS, which I
> assume was trying to write to the bcache.. traceback:
> https:/
> without a reboot
>
Thanks for capturing those; Ive quite a few of my own as an unregister path
which _should_ work; but doesn't for various bugs in bcache. I need to
attach those OOPS to this bug as well.
>
> (3) However generally I had good luck simplying "stop"ing the cache
> devices (it seems perhaps that is what this bug is designed to do,
> switch to stop, instead of unregister?). Specifically though I was
> stopping the backing devices, and then later the cache device. It seems
> like the current commit is the other way around?
>
Unregister is just not stable, so stopping is what is being done now.
I did attempt stopping bcache devices first and only once all bcache
devices were
stopped to then stop and remove a cacheset; this proved unreliable under our
integration testing of various bcache scenarios.
>
> ** Tags added: sts
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Tight timeout for bcache removal causes spurious failures
>
> To manage notifications about this bug go to:
> https:/
>
Ryan Harper (raharper) wrote : | #34 |
Xenial GA kernel bcache unregister oops:
Andrey Grebennikov (agrebennikov) wrote : | #35 |
@jhobbs
Here is the script that cleans up bcache devices on recommission:
https:/
tags: | added: cdo-qa foundations-engine |
Pedro Guimarães (pguimaraes) wrote : | #36 |
I was looking into kernel commits and I came across this: https:/
So, as far as I understood, it actually deals with the issue of manual device detach during a writeback clean-up and causing deadlock. The timeline makes sense when we look for Bionic GA kernel, as well. Bionic GA should not include this fix, but HWE should.
Could we run the tests again, but focusing on Bionic hwe? Afair hwe runs 4.18, which should include this commit.
Pedro Guimarães (pguimaraes) wrote : | #37 |
This script *should* trigger the issue on Bionic GA: https:/
Try it with both GA and HWE bionic, the commit on HWE should trigger up.
Dan Watkins (oddbloke) wrote : Fixed in curtin version 19.1. | #38 |
This bug is believed to be fixed in curtin in version 19.1. If this is still a problem for you, please make a comment and set the state back to New
Thank you.
Changed in curtin: | |
status: | New → Fix Released |
Andrey Grebennikov (agrebennikov) wrote : | #39 |
Is there an estimate on getting this package in bionic-updates please?
Ryan Harper (raharper) wrote : Re: [Bug 1796292] Re: Tight timeout for bcache removal causes spurious failures | #40 |
On Mon, Jun 3, 2019 at 2:05 PM Andrey Grebennikov <
<email address hidden>> wrote:
> Is there an estimate on getting this package in bionic-updates please?
>
We are starting an SRU of curtin this week. SRU's take at least 7 days
from when they hit -proposed
possibly longer depending on test results.
I should have something up in -proposed this week and we'll go from there
on testing.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Tight timeout for bcache removal causes spurious failures
>
> To manage notifications about this bug go to:
> https:/
>
Terry Rudd (terrykrudd) wrote : | #41 |
Canonical kernel team has this item queued in the hotlist to work on. I am assigning to myself to accelerate work
Changed in curtin: | |
assignee: | nobody → Terry Rudd (terrykrudd) |
Andrea Righi (arighi) wrote : | #42 |
From a kernel perspective this big slowness on shutting down a bcache volume might be caused by a locking / race condition issue. If I read correctly this problem has been reproduced in bionic (and in xenial we even got a kernel oops - it looks like caused by a NULL pointer dereference). I would try to address these issues separately.
About bionic it would be nice to test this commit (also mentioned by @elmo in comment #28):
Moreover, even if we didn't get an explicit NULL pointer dereference with bionic, I think it would be interesting to test also the following fixes:
https:/
https:/
https:/
I've already backported all of them and applied to the latest bionic kernel. A test kernel is available here:
https:/
If it doesn't cost too much it would be great to do a test with it. In the meantime I'll try to reproduce the problem locally. Thanks in advance!
Jason Hobbs (jason-hobbs) wrote : | #43 |
This is difficult for us to test in our lab because we are using MAAS, and
we hit this during MAAS deployments of nodes, so we would need MAAS images
built with these kernels. Additionally, this doesn't reproduce every time,
it is maybe 1/4 test runs. It may be best to find a way to reproduce this
outside of MAAS.
On Wed, Jul 3, 2019 at 11:16 AM Andrea Righi <email address hidden>
wrote:
> >From a kernel perspective this big slowness on shutting down a bcache
> volume might be caused by a locking / race condition issue. If I read
> correctly this problem has been reproduced in bionic (and in xenial we
> even got a kernel oops - it looks like caused by a NULL pointer
> dereference). I would try to address these issues separately.
>
> About bionic it would be nice to test this commit (also mentioned by
> @elmo in comment #28):
>
>
> https:/
>
> Moreover, even if we didn't get an explicit NULL pointer dereference
> with bionic, I think it would be interesting to test also the following
> fixes:
>
>
> https:/
>
> https:/
>
> https:/
>
> I've already backported all of them and applied to the latest bionic
> kernel. A test kernel is available here:
>
> https:/
>
> If it doesn't cost too much it would be great to do a test with it. In
> the meantime I'll try to reproduce the problem locally. Thanks in
> advance!
>
> --
> You received this bug notification because you are a member of Canonical
> Field High, which is subscribed to the bug report.
> https:/
>
> Title:
> Tight timeout for bcache removal causes spurious failures
>
> Status in curtin:
> Fix Released
> Status in linux package in Ubuntu:
> Confirmed
> Status in linux source package in Bionic:
> New
> Status in linux source package in Cosmic:
> New
> Status in linux source package in Disco:
> New
> Status in linux source package in Eoan:
> Confirmed
>
> Bug description:
> I've had a number of deployment faults where curtin would report
> Timeout exceeded for removal of /sys/fs/bcache/xxx when doing a mass-
> deployment of 30+ nodes. Upon retrying the node would usually deploy
> fine. Experimentally I've set the timeout ridiculously high, and it
> seems I'm getting no faults with this. I'm wondering if the timeout
> for removal is set too tight, or might need to be made configurable.
>
> --- curtin/util.py~ 2018-05-18 18:40:48.000000000 +0000
> +++ curtin/util.py 2018-10-05 09:40:06.807390367 +0000
> @@ -263,7 +263,7 @@
> return _subp(*args, **kwargs)
>
>
> -def wait_for_
> +def wait_for_
> if not path:
> raise ValueError(
Ryan Harper (raharper) wrote : | #44 |
I've setup our integration test that runs the the CDO-QA bcache/ceph setup.
On the updated kernel I got through 10 loops on the deployment before it stacktraced:
http://
[ 3939.846908] bcache: bch_cached_
[ 3939.878388] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event)
[ 3939.904984] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event)
[ 3939.972715] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event)
[ 3940.059415] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event)
[ 3940.129854] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event)
[ 3949.257051] md: md0: resync done.
[ 4109.273558] INFO: task python3:19635 blocked for more than 120 seconds.
[ 4109.279331] Tainted: P O 4.15.0-55-generic #60+lp796292
[ 4109.284767] "echo 0 > /proc/sys/
[ 4109.288771] python3 D 0 19635 16361 0x00000000
[ 4109.288774] Call Trace:
[ 4109.288818] __schedule+
[ 4109.288822] ? __switch_
[ 4109.288824] ? __switch_
[ 4109.288826] schedule+0x2c/0x80
[ 4109.288852] bch_bucket_
[ 4109.288866] ? wait_woken+
[ 4109.288872] __bch_bucket_
[ 4109.288876] bch_bucket_
[ 4109.288882] __uuid_
[ 4109.288895] ? submit_
[ 4109.288900] ? __write_
[ 4109.288905] bch_uuid_
[ 4109.288911] __cached_
[ 4109.288916] bch_cached_
[ 4109.288992] sysfs_kf_
[ 4109.288998] kernfs_
[ 4109.289001] __vfs_write+
[ 4109.289003] vfs_write+
[ 4109.289004] SyS_write+0x55/0xc0
[ 4109.289010] do_syscall_
[ 4109.289014] entry_SYSCALL_
[ 4109.289016] RIP: 0033:0x7f8d2833e154
[ 4109.289018] RSP: 002b:00007ffcda
[ 4109.289020] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f8d2833e154
[ 4109.289021] RDX: 0000000000000008 RSI: 00000000022b7360 RDI: 0000000000000003
[ 4109.289022] RBP: 00007f8d288396c0 R08: 0000000000000000 R09: 0000000000000000
[ 4109.289022] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
[ 4109.289026] R13: 0000000000000000 R14: 00000000022b7360 R15: 0000000001fe8db0
[ 4109.289033] INFO: task bcache_
[ 4109.292172] Tainted: P O 4.15.0-55-generic #60+lp796292
[ 4109.295345] "echo 0 > /proc/sys/
[ 4109.298208] bcache_allocato D 0 22317 2 0x80000000
[ 4109.298212] Call Trace:
[ 4109.298217] __schedule+
[ 4109.298225] schedule+0x2c/0x80
[ 4109.298232] bch_bucket_
[ 4109.2982...
Ryan Harper (raharper) wrote : | #45 |
Without the patch, I can reproduce the hang fairly frequently, in one or two loops, which fails in this way:
[ 1069.711956] bcache: cancel_
[ 1088.583986] INFO: task kworker/0:2:436 blocked for more than 120 seconds.
[ 1088.590330] Tainted: P O 4.15.0-54-generic #58-Ubuntu
[ 1088.595831] "echo 0 > /proc/sys/
[ 1088.598210] kworker/0:2 D 0 436 2 0x80000000
[ 1088.598244] Workqueue: events update_
[ 1088.598246] Call Trace:
[ 1088.598255] __schedule+
[ 1088.598258] ? __switch_
[ 1088.598260] schedule+0x2c/0x80
[ 1088.598262] schedule_
[ 1088.598264] __mutex_
[ 1088.598266] ? __switch_
[ 1088.598268] ? __switch_
[ 1088.598270] ? __switch_
[ 1088.598272] __mutex_
[ 1088.598274] ? __mutex_
[ 1088.598276] mutex_lock+
[ 1088.598281] update_
[ 1088.598285] process_
[ 1088.598287] worker_
[ 1088.598289] kthread+0x121/0x140
[ 1088.598291] ? process_
[ 1088.598293] ? kthread_
[ 1088.598295] ret_from_
Changed in curtin: | |
assignee: | Terry Rudd (terrykrudd) → Andrea Righi (arighi) |
Andrea Righi (arighi) wrote : | #46 |
Thanks tons for the tests Ryan! Well, at least the hung task timeout trace is different, so we're making some progress.
With the new kernel it seems that we're stuck in bch_bucket_alloc(). I've identified other upstream fixes that could help to prevent this problem.
If you're willing to do few more tests, here's a new test kernel (based on 4.15.0-54-generic + set of bcache upstream fixes):
https:/
And, just in case, I've also applied the same set of fixes also to the latest bionic's master-next:
https:/
Testing these two kernels should give us more information about the nature of the problem.
Andrea Righi (arighi) wrote : | #47 |
Good news! I've been able to reproduce the hung task in bch_bucket_alloc() issue locally, using the test case from bug 1784665. I think we're hitting the same problem now. I'll do more tests and will keep you updated.
Launchpad Janitor (janitor) wrote : | #48 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in linux (Ubuntu Bionic): | |
status: | New → Confirmed |
Changed in linux (Ubuntu Cosmic): | |
status: | New → Confirmed |
Changed in linux (Ubuntu Disco): | |
status: | New → Confirmed |
Andrea Righi (arighi) wrote : | #51 |
Hi Ryan, I've uploaded a new test kernel:
https:/
This one is based on 4.15.0-54.58 and it addresses specifically the bch_bucket_alloc() problem (with this patch applied: https:/
With this kernel I wasn't able to reproduce the hung task timeout issue in bch_bucket_alloc() anymore.
It would be great if you could repeat your test also with this kernel. Thanks in advance!
Andrea Righi (arighi) wrote : | #52 |
... and, just in case, I've uploaded also a test kernel based on the latest bionic's master-next + a bunch of extra bcache fixes:
https:/
If the previous kernel is still buggy it'd be nice to try also this one.
Ryan Harper (raharper) wrote : | #53 |
Andrea, thanks for the updated kernels.
On the first one, I got 23 installs before I ran into an issue; I'll test the newer kernel next.
https:/
[ 5436.870482] BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
[ 5436.873374] IP: cache_set_
[ 5436.875208] PGD 0 P4D 0
[ 5436.876488] Oops: 0000 [#1] SMP PTI
[ 5436.877842] Modules linked in: zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) nls_utf8 isofs ppdev nls_iso8859_1 kvm_intel kvm irqbypass joydev input_leds serio_raw parport_pc parport qemu_fw_cfg mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_
[ 5436.896104] CPU: 0 PID: 11216 Comm: kworker/0:1 Tainted: P O 4.15.0-54-generic #58+lp1796292+1
[ 5436.899985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 5436.902645] Workqueue: events cache_set_flush [bcache]
[ 5436.904374] RIP: 0010:cache_
[ 5436.906183] RSP: 0018:ffffab5282
[ 5436.909050] RAX: 0000000000000000 RBX: ffff94d104aa0cc0 RCX: 0000000000000000
[ 5436.911939] RDX: 0000000000000000 RSI: 0000000020000001 RDI: 0000000000000000
[ 5436.914448] RBP: ffffab52826bbe78 R08: ffff94d13f61ac30 R09: ffff94d13f342b98
[ 5436.917113] R10: ffffab52803b3d10 R11: 00000000000002c6 R12: 0000000000000001
[ 5436.919210] R13: ffff94d13f622140 R14: ffff94d104aa0db8 R15: 0000000000000000
[ 5436.921401] FS: 000000000000000
[ 5436.923743] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5436.926299] CR2: 00000000000009b8 CR3: 0000000038252000 CR4: 00000000000006f0
[ 5436.929093] Call Trace:
[ 5436.930818] process_
[ 5436.932818] worker_
[ 5436.935332] kthread+0x121/0x140
[ 5436.937309] ? process_
[ 5436.939393] ? kthread_
[ 5436.941263] ret_from_
[ 5436.943060] Code: b8 00 00 00 a8 02 74 c8 31 f6 4c 89 e7 e8 43 0e ff ff eb bc 66 83 bb 4c f7 ff ff 00 48 8b 83 58 ff ff ff 74 31 41 bc 01 00 00 00 <48> 8b b8 b8 09 00 00 48 85 ff 74 05 e8 f9 9d 0d d3 0f b7 8b 4c
[ 5436.950188] RIP: cache_set_
[ 5436.952796] CR2: 00000000000009b8
[ 5436.954567] ---[ end trace b771415397e98c3d ]---
Ryan Harper (raharper) wrote : | #54 |
The newer kernel went about 16 runs and then popped this:
[ 2137.810559] md: md0: resync done.
[ 2296.795633] INFO: task python3:11639 blocked for more than 120 seconds.
[ 2296.800320] Tainted: P O 4.15.0-55-generic #60+lp1796292+1
[ 2296.805097] "echo 0 > /proc/sys/
[ 2296.810071] python3 D 0 11639 8301 0x00000000
[ 2296.810075] Call Trace:
[ 2296.810100] __schedule+
[ 2296.810102] ? __switch_
[ 2296.810103] ? __switch_
[ 2296.810105] schedule+0x2c/0x80
[ 2296.810118] bch_bucket_
[ 2296.810128] ? wait_woken+
[ 2296.810132] __bch_bucket_
[ 2296.810137] bch_bucket_
[ 2296.810143] __uuid_
[ 2296.810148] ? __write_
[ 2296.810153] bch_uuid_
[ 2296.810159] __cached_
[ 2296.810160] ? __switch_
[ 2296.810161] ? __switch_
[ 2296.810163] ? __switch_
[ 2296.810167] bch_cached_
[ 2296.810181] sysfs_kf_
[ 2296.810182] kernfs_
[ 2296.810185] __vfs_write+
[ 2296.810187] vfs_write+
[ 2296.810189] SyS_write+0x55/0xc0
[ 2296.810193] do_syscall_
[ 2296.810194] entry_SYSCALL_
[ 2296.810196] RIP: 0033:0x7f8e80077154
[ 2296.810197] RSP: 002b:00007fffc2
[ 2296.810199] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f8e80077154
[ 2296.810200] RDX: 0000000000000008 RSI: 0000000000d734b0 RDI: 0000000000000003
[ 2296.810201] RBP: 00007f8e805726c0 R08: 0000000000000000 R09: 0000000000000000
[ 2296.810202] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
[ 2296.810203] R13: 0000000000000000 R14: 0000000000d734b0 R15: 0000000000aa4db0
[ 2417.627259] INFO: task python3:11639 blocked for more than 120 seconds.
[ 2417.632687] Tainted: P O 4.15.0-55-generic #60+lp1796292+1
[ 2417.638276] "echo 0 > /proc/sys/
tags: | added: cscc |
Chris Gregan (cgregan) wrote : | #55 |
Escalated to Field Critical as it now happens often enough to block our ability to test proposed product releases. We are unable to test openstack-next at the moment because our test runs fail behind this bug.
Andrea Righi (arighi) wrote : | #56 |
I've uploaded a new test kernel based on the latest bionic kernel from master-next:
https:/
In addition to that I've backported all the recent upstream bcache fixes and applied my proposed fix for the potential deadlock in bch_allocator_
I've tested this kernel both on a VM and on a bare metal box, running the test case from bug
1784665 (https:/
The tests have been running for more than 1h without triggering any problem (and they're still going).
Ryan / Chris: it would be really nice if you could do one more test with this new kernel... and if you're still hitting issues we can try to work on a better reproducer. Thanks again!
Ryan Harper (raharper) wrote : | #57 |
ubuntu@ubuntu:~$ uname -r
4.15.0-56-generic
ubuntu@ubuntu:~$ cat /proc/version
Linux version 4.15.0-56-generic (arighi@kathleen) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~
This failed on the second install while running bcache-super-show /dev/vdg
Hung task.
[ 259.150347] bcache: run_cache_set() invalidating existing data
[ 259.158038] bcache: register_cache() registered cache device nvme1n1p2
[ 259.251093] bcache: register_bdev() registered backing device vdg
[ 259.379809] bcache: bch_cached_
[ 259.411486] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
[ 259.537070] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
[ 259.797830] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
[ 259.900392] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
[ 271.682662] md: md0: resync done.
[ 484.525529] INFO: task python3:11257 blocked for more than 120 seconds.
[ 484.528933] Tainted: P O 4.15.0-56-generic #62~lp1796292
[ 484.532221] "echo 0 > /proc/sys/
[ 484.535936] python3 D 0 11257 7974 0x00000000
[ 484.535941] Call Trace:
[ 484.535952] __schedule+
[ 484.535957] schedule+0x2c/0x80
[ 484.535977] bch_bucket_
[ 484.535984] ? wait_woken+
[ 484.535993] __bch_bucket_
[ 484.536002] bch_bucket_
[ 484.536014] __uuid_
[ 484.536025] ? __write_
[ 484.536035] bch_uuid_
[ 484.536046] __cached_
[ 484.536057] bch_cached_
[ 484.536061] sysfs_kf_
[ 484.536064] kernfs_
[ 484.536069] __vfs_write+
[ 484.536071] vfs_write+
[ 484.536075] SyS_write+0x5c/0xe0
[ 484.536081] do_syscall_
[ 484.536085] entry_SYSCALL_
[ 484.536088] RIP: 0033:0x7f2ac7aa7154
[ 484.536090] RSP: 002b:00007ffff1
[ 484.536093] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f2ac7aa7154
[ 484.536095] RDX: 0000000000000008 RSI: 0000000001c96340 RDI: 0000000000000003
[ 484.536096] RBP: 00007f2ac7fa26c0 R08: 0000000000000000 R09: 0000000000000000
[ 484.536098] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
[ 484.536099] R13: 0000000000000000 R14: 0000000001c96340 R15: 00000000019c7db0
I'll note that, I can run bcache-super-show on the device while this was hung.
$ sudo bash
root@ubuntu:~# bcache-super-show /dev/vdg
sb.magic ok
sb.first_sector 8 [match]
sb.csum C71A896B52F1C486 [match]
sb.version 1 [backing device]
dev.label osddata5
dev.uuid 11df8370-
dev.sec...
Andrea Righi (arighi) wrote : | #58 |
Thanks Ryan, this is very interesting:
[ 259.411486] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
[ 259.537070] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
[ 259.797830] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
[ 259.900392] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
It looks that we're trying to register /dev/vdg multiple times as a backing device (make-bcache -B). I'm not getting this message during my tests, so that might be required to reproduce that particular deadlock.
I'll modify my test case to trigger these errors and see if I can reproduce the hung task timeout issue.
Ryan Harper (raharper) wrote : | #59 |
On Thu, Aug 1, 2019 at 10:15 AM Andrea Righi <email address hidden>
wrote:
> Thanks Ryan, this is very interesting:
>
> [ 259.411486] bcache: register_bcache() error /dev/vdg: device already
> registered (emitting change event)
> [ 259.537070] bcache: register_bcache() error /dev/vdg: device already
> registered (emitting change event)
> [ 259.797830] bcache: register_bcache() error /dev/vdg: device already
> registered (emitting change event)
> [ 259.900392] bcache: register_bcache() error /dev/vdg: device already
> registered (emitting change event)
>
> It looks that we're trying to register /dev/vdg multiple times as a
> backing device (make-bcache -B). I'm not getting this message during my
> tests, so that might be required to reproduce that particular deadlock.
>
We carry a specific sauce patch to ensure that if the cacheset is already
online, and a backing device
shows up later, that the kernel emits the change event to trigger the udev
rules to generate the
symlink for /dev/bcache/
issue since we are just detecting
the re-register scenario and emitting a change uevent;
https:/
We may want to resubmit that now to see if they'll take that or even want
to deal with the scenario
in a cleaner way;
>
> I'll modify my test case to trigger these errors and see if I can
> reproduce the hung task timeout issue.
>
I can provide you a setup to reproduce this. I'll put together a doc.
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Tight timeout for bcache removal causes spurious failures
>
> To manage notifications about this bug go to:
> https:/
>
Ryan Harper (raharper) wrote : | #60 |
Andrea Righi (arighi) wrote : | #61 |
Ryan, unfortunately the last reproducer script is giving me a lot of errors and I'm still trying to figure out how to make it run to the end (or at least to a point where it's start to run some bcache commands).
In the meantime (as anticipated on IRC) I've uploaded a test kernel reverting the patch "UBUNTU: SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent":
https:/
As we know, this would re-introduce the problem discussed in bug 1729145, but it'd be interesting to test it anyway, just to see if this patch is somehow related to the bch_bucket_alloc() deadlock.
In addition to that I've spent some time looking at the last kernel trace and the code. It looks like bch_bucket_alloc() is always releasing the mutex &ca->set-
https:/
Sorry for asking all these tests... if I can't find a way to reproduce the bug on my side, asking you to test is the only way that I have to debug this issue. :)
Ryan Harper (raharper) wrote : | #62 |
I tried the +3 kernel first, and I got 3 installs and then this hang:
[ 549.828710] bcache: run_cache_set() invalidating existing data
[ 549.836485] bcache: register_cache() registered cache device nvme1n1p2
[ 549.937486] bcache: register_bdev() registered backing device vdg
[ 550.018855] bcache: bch_cached_
[ 550.074760] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
[ 550.316246] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
[ 550.545840] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
[ 550.565928] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
[ 550.724285] bcache: register_bcache() error /dev/vdg: device already registered (emitting change event)
[ 562.352520] md: md0: resync done.
[ 725.980380] INFO: task python3:27303 blocked for more than 120 seconds.
[ 725.982364] Tainted: P O 4.15.0-56-generic #62~lp1796292+3
[ 725.984228] "echo 0 > /proc/sys/
[ 725.986284] python3 D 0 27303 27293 0x00000000
[ 725.986287] Call Trace:
[ 725.986319] __schedule+
[ 725.986322] schedule+0x2c/0x80
[ 725.986337] bch_bucket_
[ 725.986359] ? wait_woken+
[ 725.986363] __bch_bucket_
[ 725.986367] bch_bucket_
[ 725.986372] __uuid_
[ 725.986377] ? __write_
[ 725.986382] bch_uuid_
[ 725.986386] __cached_
[ 725.986391] bch_cached_
[ 725.986399] sysfs_kf_
[ 725.986401] kernfs_
[ 725.986406] __vfs_write+
[ 725.986407] vfs_write+
[ 725.986409] SyS_write+0x5c/0xe0
[ 725.986416] do_syscall_
[ 725.986419] entry_SYSCALL_
[ 725.986421] RIP: 0033:0x7f2d7c39a154
[ 725.986422] RSP: 002b:00007fff80
[ 725.986426] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f2d7c39a154
[ 725.986426] RDX: 0000000000000008 RSI: 000000000247e7d0 RDI: 0000000000000003
[ 725.986427] RBP: 00007f2d7c8956c0 R08: 0000000000000000 R09: 0000000000000000
[ 725.986428] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
[ 725.986428] R13: 0000000000000000 R14: 000000000247e7d0 R15: 00000000021b6e60
Ryan Harper (raharper) wrote : | #63 |
Trying the first kernel without the change event sauce also fails:
[ 532.823594] bcache: run_cache_set() invalidating existing data
[ 532.828876] bcache: register_cache() registered cache device nvme0n1p2
[ 532.869716] bcache: register_bdev() registered backing device vda1
[ 532.994355] bcache: bch_cached_
[ 533.051588] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.094717] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.120063] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.142517] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.191069] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.249877] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.282653] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.301225] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.310505] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.318959] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.374121] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.536920] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.581468] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.589270] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.595986] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.602638] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.651848] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.677836] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.712074] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.717682] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.723354] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.728951] bcache: register_bcache() error /dev/vda1: device already registered
[ 533.777602] bcache: register_bcache() error /dev/vda1: device already registered
[ 553.784393] md: md0: resync done.
[ 725.983387] INFO: task python3:413 blocked for more than 120 seconds.
[ 725.985099] Tainted: P O 4.15.0-56-generic #62~lp1796292+1
[ 725.986820] "echo 0 > /proc/sys/
[ 725.988649] python3 D 0 413 405 0x00000000
[ 725.988652] Call Trace:
[ 725.988684] __schedule+
[ 725.988687] schedule+0x2c/0x80
[ 725.988710] bch_bucket_
[ 725.988722] ? wait_woken+
[ 725.988726] __bch_bucket_
[ 725.988729] bch_bucket_
[ 725.988734] __uuid_
[ 725.988738] ? __write_
[ 725.988742] bch_uuid_
[ 725.988746] __cached_
[ 725.988750] ...
Andrea Righi (arighi) wrote : | #64 |
After some help from Ryan (on IRC) I've been able to run the last reproducer script and trigger the same trace. Now I should be able to collect all the information that I need and hopefully post a new test kernel (fixed for real...) soon.
Andrea Righi (arighi) wrote : | #65 |
Some additional info about the deadlock:
crash> bt 16588
PID: 16588 TASK: ffff9ffd7f332b00 CPU: 1 COMMAND: "bcache_allocato"
[exception RIP: bch_crc64+57]
RIP: ffffffffc093b2c9 RSP: ffffab9585767e28 RFLAGS: 00000286
RAX: f1f51403756de2bd RBX: 0000000000000000 RCX: 0000000000000065
RDX: 0000000000000065 RSI: ffff9ffd63980000 RDI: ffff9ffd63925346
RBP: ffffab9585767e28 R8: ffffffffc093db60 R9: ffffab9585739000
R10: 000000000000007f R11: 000000001ffef001 R12: 0000000000000000
R13: 0000000000000008 R14: ffff9ffd63900000 R15: ffff9ffd683d0000
CS: 0010 SS: 0018
#0 [ffffab9585767e30] bch_prio_write at ffffffffc09325c0 [bcache]
#1 [ffffab9585767eb0] bch_allocator_
#2 [ffffab9585767f08] kthread at ffffffffa80b2481
#3 [ffffab9585767f50] ret_from_fork at ffffffffa8a00205
crash> bt 14658
PID: 14658 TASK: ffff9ffd7a9f0000 CPU: 0 COMMAND: "python3"
#0 [ffffab958380bb48] __schedule at ffffffffa89ae441
#1 [ffffab958380bbe8] schedule at ffffffffa89aea7c
#2 [ffffab958380bbf8] bch_bucket_alloc at ffffffffc091c370 [bcache]
#3 [ffffab958380bc68] __bch_bucket_
#4 [ffffab958380bcb8] bch_bucket_
#5 [ffffab958380bcf8] __uuid_write at ffffffffc0931b69 [bcache]
#6 [ffffab958380bda0] bch_uuid_write at ffffffffc0931f76 [bcache]
#7 [ffffab958380bdc0] __cached_dev_store at ffffffffc0937c08 [bcache]
#8 [ffffab958380be20] bch_cached_
#9 [ffffab958380be50] sysfs_kf_write at ffffffffa830c97c
#10 [ffffab958380be60] kernfs_fop_write at ffffffffa830c3e5
#11 [ffffab958380bea0] __vfs_write at ffffffffa827e5bb
#12 [ffffab958380beb0] vfs_write at ffffffffa827e781
#13 [ffffab958380bee8] sys_write at ffffffffa827e9fc
#14 [ffffab958380bf30] do_syscall_64 at ffffffffa8003b03
#15 [ffffab958380bf50] entry_SYSCALL_
RIP: 00007faffc7bd154 RSP: 00007ffe307cbc88 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007faffc7bd154
RDX: 0000000000000008 RSI: 00000000011ce7f0 RDI: 0000000000000003
RBP: 00007faffccb86c0 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000000000 R14: 00000000011ce7f0 R15: 0000000000f33e60
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
In this case the task "python3" (pid 14658) gets stuck in a wait that never completes from bch_bucket_alloc(). The task should that should resume "python3" from this wait is "bcache_allocator" (pid 16588), but the resume never happens, because bcache_allocator is stuck in this "retry_invalidate" busy loop:
static int bch_allocator_
{
...
retry_invalidate:
/*
* Now, we write their new gens to disk so we can start writing
* new stuff to them:
*/
...
Andrea Righi (arighi) wrote : | #66 |
Ryan, I've uploaded a new test kernel with the fix mentioned in the comment before:
https:/
I've performed over 100 installations using curtin-nvme.sh (install_count = 100), no hung task timeout. I'll run other stress tests to make sure we're not breaking anything else with this fix, but results look promising so far.
It'd be great if you could also do a test on your side. Thanks!
Ryan Harper (raharper) wrote : | #67 |
On Mon, Aug 5, 2019 at 8:01 AM Andrea Righi <email address hidden>
wrote:
> Ryan, I've uploaded a new test kernel with the fix mentioned in the
> comment before:
>
> https:/
>
> I've performed over 100 installations using curtin-nvme.sh
> (install_count = 100), no hung task timeout. I'll run other stress tests
> to make sure we're not breaking anything else with this fix, but results
> look promising so far.
>
> It'd be great if you could also do a test on your side. Thanks!
>
Thats excellent news. I'm starting my tests on this kernel now.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Tight timeout for bcache removal causes spurious failures
>
> To manage notifications about this bug go to:
> https:/
>
Ryan Harper (raharper) wrote : | #68 |
On Mon, Aug 5, 2019 at 1:19 PM Ryan Harper <email address hidden>
wrote:
>
>
> On Mon, Aug 5, 2019 at 8:01 AM Andrea Righi <email address hidden>
> wrote:
>
>> Ryan, I've uploaded a new test kernel with the fix mentioned in the
>> comment before:
>>
>> https:/
>>
>> I've performed over 100 installations using curtin-nvme.sh
>> (install_count = 100), no hung task timeout. I'll run other stress tests
>> to make sure we're not breaking anything else with this fix, but results
>> look promising so far.
>>
>> It'd be great if you could also do a test on your side. Thanks!
>>
>
> Thats excellent news. I'm starting my tests on this kernel now.
>
I've got 233 consecutive installs successful.
>
>
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https:/
>>
>> Title:
>> Tight timeout for bcache removal causes spurious failures
>>
>> To manage notifications about this bug go to:
>> https:/
>>
>
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : | #69 |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-bionic |
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : | #70 |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-xenial |
Launchpad Janitor (janitor) wrote : | #71 |
This bug was fixed in the package linux - 5.2.0-13.14
---------------
linux (5.2.0-13.14) eoan; urgency=medium
* eoan/linux: 5.2.0-13.14 -proposed tracker (LP: #1840261)
* NULL pointer dereference when Inserting the VIMC module (LP: #1840028)
- media: vimc: fix component match compare
* Miscellaneous upstream changes
- selftests/bpf: remove bpf_util.h from BPF C progs
linux (5.2.0-12.13) eoan; urgency=medium
* eoan/linux: 5.2.0-12.13 -proposed tracker (LP: #1840184)
* Eoan update: v5.2.8 upstream stable release (LP: #1840178)
- scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure
- libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant
- libnvdimm/bus: Fix wait_nvdimm_
- ALSA: usb-audio: Sanity checks for each pipe and EP types
- ALSA: usb-audio: Fix gpf in snd_usb_
- HID: wacom: fix bit shift for Cintiq Companion 2
- HID: Add quirk for HP X1200 PIXART OEM mouse
- atm: iphase: Fix Spectre v1 vulnerability
- bnx2x: Disable multi-cos feature.
- drivers/
- ife: error out when nla attributes are empty
- ip6_gre: reload ipv6h in prepare_
- ip6_tunnel: fix possible use-after-free on xmit
- ipip: validate header length in ipip_tunnel_xmit
- mlxsw: spectrum: Fix error path in mlxsw_sp_
- mvpp2: fix panic on module removal
- mvpp2: refactor MTU change code
- net: bridge: delete local fdb on device init failure
- net: bridge: mcast: don't delete permanent entries when fast leave is
enabled
- net: bridge: move default pvid init/deinit to NETDEV_
- net: fix ifindex collision during namespace removal
- net/mlx5e: always initialize frag->last_in_page
- net/mlx5: Use reversed order when unregister devices
- net: phy: fixed_phy: print gpio error only if gpio node is present
- net: phylink: don't start and stop SGMII PHYs in SFP modules twice
- net: phylink: Fix flow control for fixed-link
- net: phy: mscc: initialize stats array
- net: qualcomm: rmnet: Fix incorrect UL checksum offload logic
- net: sched: Fix a possible null-pointer dereference in dequeue_func()
- net sched: update vlan action for batched events operations
- net: sched: use temporary variable for actions indexes
- net/smc: do not schedule tx_work in SMC_CLOSED state
- net: stmmac: Use netif_tx_napi_add() for TX polling function
- NFC: nfcmrvl: fix gpio-handling regression
- ocelot: Cancel delayed work before wq destruction
- tipc: compat: allow tipc commands without arguments
- tipc: fix unitilized skb list crash
- tun: mark small packets as owned by the tap sock
- net/mlx5: Fix modify_cq_in alignment
- net/mlx5e: Prevent encap flow counter update async to user query
- r8169: don't use MSI before RTL8168d
- bpf: fix XDP vlan selftests test_xdp_vlan.sh
- selftests/bpf: add wrapper scripts for test_xdp_vlan.sh
- selftests/bpf: reduce time to execute test_xdp_vlan.sh
- net: fix bpf_xdp_adjust_head regression for generic-XDP
- hv_sock: Fi...
Changed in linux (Ubuntu Eoan): | |
status: | Confirmed → Fix Released |
Launchpad Janitor (janitor) wrote : | #72 |
This bug was fixed in the package linux - 4.15.0-60.67
---------------
linux (4.15.0-60.67) bionic; urgency=medium
* bionic/linux: 4.15.0-60.67 -proposed tracker (LP: #1841086)
* [Regression] net test from ubuntu_
compilation issue (LP: #1840935)
- SAUCE: Fix "bpf: relax verifier restriction on BPF_MOV | BPF_ALU"
* [Regression] failed to compile seccomp test from ubuntu_
(LP: #1840932)
- Revert "selftests: skip seccomp get_metadata test if not real root"
* Packaging resync (LP: #1786013)
- [Packaging] resync getabis
linux (4.15.0-59.66) bionic; urgency=medium
* bionic/linux: 4.15.0-59.66 -proposed tracker (LP: #1840006)
* zfs not completely removed from bionic tree (LP: #1840051)
- SAUCE: (noup) remove completely the zfs code
* Packaging resync (LP: #1786013)
- [Packaging] update helper scripts
* [18.04 FEAT] Enhanced hardware support (LP: #1836857)
- s390: report new CPU capabilities
- s390: add alignment hints to vector load and store
* [18.04 FEAT] Enhanced CPU-MF hardware counters - kernel part (LP: #1836860)
- s390/cpum_cf: Add support for CPU-MF SVN 6
- s390/cpumf: Add extended counter set definitions for model 8561 and 8562
* ideapad_laptop disables WiFi/BT radios on Lenovo Y540 (LP: #1837136)
- platform/x86: ideapad-laptop: Remove no_hw_rfkill_list
* Stacked onexec transitions fail when under NO NEW PRIVS restrictions
(LP: #1839037)
- SAUCE: apparmor: fix nnp subset check failure when, stacking
* bcache: bch_allocator_
timeout for bcache removal causes spurious failures (LP: #1796292)
- SAUCE: bcache: fix deadlock in bcache_allocator
* bcache: bch_allocator_
- bcache: never writeback a discard operation
- bcache: improve bcache_reboot()
- bcache: fix writeback target calc on large devices
- bcache: add journal statistic
- bcache: fix high CPU occupancy during journal
- bcache: use pr_info() to inform duplicated CACHE_SET_
- bcache: fix incorrect sysfs output value of strip size
- bcache: fix error return value in memory shrink
- bcache: fix using of loop variable in memory shrink
- bcache: Fix indentation
- bcache: Add __printf annotation to __bch_check_keys()
- bcache: Annotate switch fall-through
- bcache: Fix kernel-doc warnings
- bcache: Remove an unused variable
- bcache: Suppress more warnings about set-but-not-used variables
- bcache: Reduce the number of sparse complaints about lock imbalances
- bcache: Fix a compiler warning in bcache_
- bcache: Move couple of string arrays to sysfs.c
- bcache: Move couple of functions to sysfs.c
- bcache: Replace bch_read_
* linux hwe i386 kernel 5.0.0-21.22~18.04.1 crashes on Lenovo x220
(LP: #1838115)
- x86/mm: Check for pfn instead of page in vmalloc_sync_one()
- x86/mm: Sync also unmappings in vmalloc_sync_all()
- mm/vmalloc.c: add priority threshold to __purge_
Changed in linux (Ubuntu Bionic): | |
status: | Confirmed → Fix Released |
Can we get a console output of a failure ? or a curtin log.
also, i think the fix here would better go into shutdown_bcache in its calling of 'wait_for_removal'.