request_firmware() fails on resume from suspend

Bug #331415 reported by TJ
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Medium
Unassigned

Bug Description

On resume from suspend a driver fails to re-load the USB video-camera firmware. This cause a very long pause (60 seconds according to the log time-stamps) during suspend before the screen has been restored and makes it appear the system has locked up.

The driver makes a call to the kernel's function:

drivers/base/firmware_class.c:request_firmware()

but it returns non-zero (error condition).

uname -a
Linux hephaestion 2.6.28-7-generic #20-Ubuntu SMP Mon Feb 9 15:42:34 UTC 2009 x86_64 GNU/Linux

Note the time stamps (60294 to 60354):

kernel: [60294.780270] r5u870-0: Detected Sony VGP-VCC4
kernel: [60294.780398] usb 1-8: firmware: requesting r5u870_1836.fw
kernel: [60354.780066] r5u870-0: Microcode file "r5u870_1836.fw" is missing
kernel: [60354.780069] r5u870-0: Please see http://wiki.mediati.org/r5u870/Microcode
kernel: [60354.780072] r5u870-0: initialization failed: -2
kernel: [60354.780079] r5u870: probe of 1-8:1.0 failed with error -2

kernel: [60354.780965] ------------[ cut here ]------------
kernel: [60354.780968] WARNING: at /build/buildd/linux-2.6.28/kernel/power/main.c:177 suspend_test_finish+0x7c/0x80()
kernel: [60354.780971] Component: resume devices
kernel: [60354.780973] Modules linked in: g_file_storage dummy_hcd tun hidp binfmt_misc bridge stp bnep kvm_intel kvm acpi_cpufreq input_polldev ppdev parport_pc lp parport snd_hda_intel snd_pcm_oss arc4 snd_mixer_oss ecb snd_pcm snd_seq_dummy uvcvideo iwl3945 joydev snd_seq_oss pcmcia r5u870 snd_seq_midi mac80211 usbcam snd_rawmidi snd_seq_midi_event led_class serio_raw compat_ioctl32 videodev v4l1_compat videobuf_dma_sg snd_seq snd_timer snd_seq_device tifm_7xx1 yenta_socket rsrc_nonstatic pcmcia_core video pcspkr psmouse tifm_core sony_laptop videobuf_core iTCO_wdt iTCO_vendor_support output intel_agp btusb snd soundcore snd_page_alloc cfg80211 nvidia(P) cbc nls_iso8859_1 nls_cp437 vfat fat usb_storage ohci1394 e100 mii ieee1394 fbcon tileblit font bitblit softcursor fuse sha256_generic dm_crypt aes_x86_64 aes_generic uhci_hcd ehci_hcd [last unloaded: dummy_hcd]
kernel: [60354.781047] Pid: 17123, comm: pm-suspend Tainted: P 2.6.28-7-generic #20-Ubuntu
kernel: [60354.781050] Call Trace:
kernel: [60354.781058] [<ffffffff8024d857>] warn_slowpath+0xb7/0xf0
kernel: [60354.781063] [<ffffffff8026a368>] ? down_trylock+0x38/0x50
kernel: [60354.781068] [<ffffffff8024df70>] ? try_acquire_console_sem+0x10/0x40
kernel: [60354.781074] [<ffffffff802595b6>] ? lock_timer_base+0x36/0x70
kernel: [60354.781081] [<ffffffff8067f320>] ? printk+0x67/0x6f
kernel: [60354.781087] [<ffffffff8040baa7>] ? kobject_put+0x27/0x60
kernel: [60354.781093] [<ffffffff804a96b5>] ? put_device+0x15/0x20
kernel: [60354.781099] [<ffffffff804b169a>] ? dpm_complete+0x18a/0x1a0
kernel: [60354.781104] [<ffffffff8027cf1c>] suspend_test_finish+0x7c/0x80
kernel: [60354.781110] [<ffffffff8027d004>] suspend_devices_and_enter+0xe4/0x180
kernel: [60354.781114] [<ffffffff8027d2b9>] enter_state+0xe9/0x120
kernel: [60354.781119] [<ffffffff8027d3aa>] state_store+0xba/0x100
kernel: [60354.781123] [<ffffffff8040b947>] kobj_attr_store+0x17/0x20
kernel: [60354.781130] [<ffffffff80344185>] sysfs_write_file+0xc5/0x140
kernel: [60354.781135] [<ffffffff802e4bfb>] vfs_write+0xcb/0x130
kernel: [60354.781140] [<ffffffff802e4d50>] sys_write+0x50/0x90
kernel: [60354.781145] [<ffffffff8021253a>] system_call_fastpath+0x16/0x1b
kernel: [60354.781148] ---[ end trace ead16c576f6b94cf ]---
kernel: [60354.781783] PM: Finishing wakeup.
kernel: [60354.781784] Restarting tasks ...

Revision history for this message
TJ (tj) wrote : Re: [Bug 331415] [NEW] request_firmware() fails on resume from suspend

Digging into the source the 60-second pause is the default firmware
loading-timeout in drivers/base/firmware_class.c:

static int loading_timeout = 60; /* In seconds */

which is applied in _request_firmware():

 fw_priv = dev_get_drvdata(f_dev);

 if (uevent) {
  if (loading_timeout > 0) {
   fw_priv->timeout.expires = jiffies + loading_timeout * HZ;
   add_timer(&fw_priv->timeout);
  }
  kobject_uevent(&f_dev->kobj, KOBJ_ADD);
  wait_for_completion(&fw_priv->completion);
  set_bit(FW_STATUS_DONE, &fw_priv->status);
  del_timer_sync(&fw_priv->timeout);
 } else
  wait_for_completion(&fw_priv->completion);

 mutex_lock(&fw_lock);

Revision history for this message
TJ (tj) wrote :

The cause of this issue is that request_firmware() blocks until the
firmware image is available. It makes a request via userspace, but at
the time of resume all userspace tasks are frozen and therefore it waits
and times out.

The solution is for drivers to use the request_firmware_nowait()
function instead and provide a call-back function that is notified when
the firmware request has completed that performs the firmware upload to
the device.

I'm working on a patch for r5u870 to confirm the solution.

This could prove to be an issue for many other drivers. A quick check
shows over a hundred instances of the use of request_firmware() and only
nine uses of request_firmware_nowait():

grep --exclude-dir=debian -rn 'request_firmware(&' * | wc -l
126

grep --exclude-dir=debian -rn 'request_firmware_nowait(' * | wc -l
9

TJ (tj)
Changed in linux:
assignee: nobody → intuitivenipple
status: New → Confirmed
Revision history for this message
Tormod Volden (tormodvolden) wrote :

Interesting. I might have seen this with ipw2200. Does it also block alt-sysrq during the 60 seconds? Would this be perfectly reproducible (my troubles are not)?

Revision history for this message
TJ (tj) wrote :

Tormod: I'm not sure that it blocks SysRq but I think so. When I first observed it I tried to sync the disks (Alt+SysRq+S) and then power off (Alt+SysRq+O) and if I recall correctly it didn't respond.

Changed in linux:
importance: Undecided → Medium
status: Confirmed → Triaged
Revision history for this message
TJ (tj) wrote :

This is a small shell-script that can help identify whether a system is affected by this bug.

for log in /var/log/kern.log*; do CMD="cat "; echo $log; [ ! ${log##*.gz} ] && CMD="zcat "; $CMD $log | grep -n 'PM: \(suspend\|resume\) devices took'; done

Example output:

var/log/kern.log
/var/log/kern.log.0
3678:Feb 19 06:06:56 hephaestion kernel: [60285.296215] PM: suspend devices took 3.752 seconds
3830:Feb 19 06:06:57 hephaestion kernel: [60354.780958] PM: resume devices took 69.072 seconds
3948:Feb 19 10:19:23 hephaestion kernel: [75478.752211] PM: suspend devices took 3.752 seconds
4102:Feb 19 10:19:23 hephaestion kernel: [75488.172580] PM: resume devices took 9.024 seconds
/var/log/kern.log.1.gz
/var/log/kern.log.2.gz
/var/log/kern.log.3.gz

The resume of 69 seconds shows this system is affected.

Revision history for this message
TJ (tj) wrote :

Progress update:

I've finished a manual line-by-line analysis of every occurrence of:

a) request_firmware() and relations
b) each reference to a firmware file-name either explicitly or via a sprintf() format-string

I've built an sqlite3 database based on the analysis that tracks every occurrence and adds references to:

 * kernel version
 * module
 * source-code file
 * line number
 * legal license

I'm currently completing the translation from the original flat-file to full 3rd normal form.

To Do:

 * complete the 3NF translation and verify its integrity against the flat-file
 * analyse each firmware_request() occurrence to determine if it is in the resume call-path
 * add licence references/confirmations for each firmware

Wishlist:

 * Collect all firmware files
 * store md5sum of each firmware file
 * reference licenses on a per-md5sum basis rather than the current per-firmware-filename

TJ (tj)
tags: removed: regression-potential
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi TJ,

Just wanted to check in and see if there has been any additional progress on this bug. I noticed it's been a while since the last comment. Thanks.

Revision history for this message
Xavier Aragon (xarax-lp) wrote :

After upgrading from Ubuntu 9.04 to 9.10 (kernel 2.6.28 to 2.6.31), I have similar problem with the e100 network driver. It seems that the firmware used to be within the driver in the older kernels, but is now available as a separate binary /lib/firmware/2.6.31-14-generic/e100/d101m_ucode.bin. At boot-up the firmware is loaded correctly, but when resuming from s2ram, loading fails after a 60 second timeout.

The Intel 82557/8/9 chips used to be very popular and I assume a lot of people still have NICs based on it. I'd really like to see the e100 driver fixed to support suspend/resume.

Revision history for this message
Xavier Aragon (xarax-lp) wrote :

Just to add to my comment about e100 driver yesterday, I found discussion about this problem in the 'Intel Wired Ethernet' project in SourceForge (http://sourceforge.net/tracker/?func=detail&aid=2876241&group_id=42302&atid=447449), and also a fix to the problem in kernel.org git repositories (commit 7e15b0c9991dfe0bf05a2f8fab9154bb7827622e on October 28th).

The fix does not use request_firmware_nowait(), but instead it stores a copy of the firmware when it is first loaded from userspace (during boot typically), and then uses this copy when resuming from suspend or hibernate later. This approach seems feasible since the e100 firmware is very small, less than 200 bytes. With bigger firmwares it might not be feasible.

Hoping to see a kernel for Ubuntu Karmic with this fix included.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Triaged a while ago but has not had any updated comments for quite some time. Please let us know if this issue remains in the current Ubuntu release, http://www.ubuntu.com/getubuntu/download . If the issue remains, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

Changed in linux (Ubuntu):
assignee: TJ (intuitivenipple) → nobody
Revision history for this message
Brad Figg (brad-figg) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.