Comment 39 for bug 1328727

Revision history for this message
In , hannes (hannes-linux-kernel-bugs) wrote :

On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote:
> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
> >On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
> >>On 5/6/2014 1:33 AM, Johannes Weiner wrote:
> >>>Hi Oliver,
> >>>
> >>>On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
> >>>>Hello,
> >>>>
> >>>>1) Attached a full function-trace log + other SysRq outputs, see [1]
> >>>>attached.
> >>>>
> >>>>I saw bdi_...() calls in the s2disk paths, but didn't check in detail
> >>>>Probably more efficient when one of you guys looks directly.
> >>>Thanks, this looks interesting. balance_dirty_pages() wakes up the
> >>>bdi_wq workqueue as it should:
> >>>
> >>>[ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits
> <-balance_dirty_pages_ratelimited
> >>>[ 249.148009] s2disk-3327 2.... 48550414us : global_dirtyable_memory
> <-global_dirty_limits
> >>>[ 249.148009] s2disk-3327 2.... 48550414us : writeback_in_progress
> <-balance_dirty_pages_ratelimited
> >>>[ 249.148009] s2disk-3327 2.... 48550414us :
> bdi_start_background_writeback <-balance_dirty_pages_ratelimited
> >>>[ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on
> <-balance_dirty_pages_ratelimited
> >>>but the worker wakeup doesn't actually do anything:
> >>>[ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch
> <-__schedule
> >>>[ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq
> <-worker_thread
> >>>[ 249.148009] kworker/-3466 2d... 48550431us : need_to_create_worker
> <-worker_thread
> >>>[ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle
> <-worker_thread
> >>>[ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers
> <-worker_enter_idle
> >>>[ 249.148009] kworker/-3466 2.... 48550432us : schedule
> <-worker_thread
> >>>[ 249.148009] kworker/-3466 2.... 48550432us : __schedule
> <-worker_thread
> >>>
> >>>My suspicion is that this fails because the bdi_wq is frozen at this
> >>>point and so the flush work never runs until resume, whereas before my
> >>>patch the effective dirty limit was high enough so that image could be
> >>>written in one go without being throttled; followed by an fsync() that
> >>>then writes the pages in the context of the unfrozen s2disk.
> >>>
> >>>Does this make sense? Rafael? Tejun?
> >>Well, it does seem to make sense to me.
> > From what I see, this is a deadlock in the userspace suspend model and
> >just happened to work by chance in the past.
>
> Well, it had been working for quite a while, so it was a rather large
> opportunity
> window it seems. :-)

No doubt about that, and I feel bad that it broke. But it's still a
deadlock that can't reasonably be accommodated from dirty throttling.

It can't just put the flushers to sleep and then issue a large amount
of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot
the messenger, this bug needs to be addressed, not get papered over.

> >Can we patch suspend-utils as follows?
>
> Perhaps we can. Let's ask the new maintainer.
>
> Rodolfo, do you think you can apply the patch below to suspend-utils?
>
> >Alternatively, suspend-utils
> >could clear the dirty limits before it starts writing and restore them
> >post-resume.
>
> That (and the patch too) doesn't seem to address the problem with existing
> suspend-utils
> binaries, however.

It's userspace that freezes the system before issuing buffered IO, so
my conclusion was that the bug is in there. This is arguable. I also
wouldn't be opposed to a patch that sets the dirty limits to infinity
from the ioctl that freezes the system or creates the image.