Comment 52 for bug 1328727

Revision history for this message
In , akpm (akpm-linux-kernel-bugs) wrote :

I cc'ed a bunch of people from bugzilla.

Folks, please please please remember to reply via emailed
reply-to-all. Don't use the bugzilla interface!

On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" <email address hidden> wrote:

> On 6/13/2014 6:55 AM, Johannes Weiner wrote:
> > On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote:
> >> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
> >>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
> >>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote:
> >>>>> Hi Oliver,
> >>>>>
> >>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> 1) Attached a full function-trace log + other SysRq outputs, see [1]
> >>>>>> attached.
> >>>>>>
> >>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in detail
> >>>>>> Probably more efficient when one of you guys looks directly.
> >>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up the
> >>>>> bdi_wq workqueue as it should:
> >>>>>
> >>>>> [ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits
> <-balance_dirty_pages_ratelimited
> >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> global_dirtyable_memory <-global_dirty_limits
> >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> writeback_in_progress <-balance_dirty_pages_ratelimited
> >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> bdi_start_background_writeback <-balance_dirty_pages_ratelimited
> >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on
> <-balance_dirty_pages_ratelimited
> >>>>> but the worker wakeup doesn't actually do anything:
> >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch
> <-__schedule
> >>>>> [ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq
> <-worker_thread
> >>>>> [ 249.148009] kworker/-3466 2d... 48550431us :
> need_to_create_worker <-worker_thread
> >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle
> <-worker_thread
> >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers
> <-worker_enter_idle
> >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule
> <-worker_thread
> >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule
> <-worker_thread
> >>>>>
> >>>>> My suspicion is that this fails because the bdi_wq is frozen at this
> >>>>> point and so the flush work never runs until resume, whereas before my
> >>>>> patch the effective dirty limit was high enough so that image could be
> >>>>> written in one go without being throttled; followed by an fsync() that
> >>>>> then writes the pages in the context of the unfrozen s2disk.
> >>>>>
> >>>>> Does this make sense? Rafael? Tejun?
> >>>> Well, it does seem to make sense to me.
> >>> From what I see, this is a deadlock in the userspace suspend model and
> >>> just happened to work by chance in the past.
> >> Well, it had been working for quite a while, so it was a rather large
> >> opportunity
> >> window it seems. :-)
> > No doubt about that, and I feel bad that it broke. But it's still a
> > deadlock that can't reasonably be accommodated from dirty throttling.
> >
> > It can't just put the flushers to sleep and then issue a large amount
> > of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot
> > the messenger, this bug needs to be addressed, not get papered over.
> >
> >>> Can we patch suspend-utils as follows?
> >> Perhaps we can. Let's ask the new maintainer.
> >>
> >> Rodolfo, do you think you can apply the patch below to suspend-utils?
> >>
> >>> Alternatively, suspend-utils
> >>> could clear the dirty limits before it starts writing and restore them
> >>> post-resume.
> >> That (and the patch too) doesn't seem to address the problem with existing
> >> suspend-utils
> >> binaries, however.
> > It's userspace that freezes the system before issuing buffered IO, so
> > my conclusion was that the bug is in there. This is arguable. I also
> > wouldn't be opposed to a patch that sets the dirty limits to infinity
> > from the ioctl that freezes the system or creates the image.
>
> OK, that sounds like a workable plan.
>
> How do I set those limits to infinity?

Five years have passed and people are still hitting this.

Killian described the workaround in comment 14 at
https://bugzilla.kernel.org/show_bug.cgi?id=75101.

People can use this workaround manually by hand or in scripts. But we
really should find a proper solution. Maybe special-case the freezing
of the flusher threads until all the writeout has completed. Or
something else.