Wifi does down "crash" in Surface Pro 4
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linux |
Fix Released
|
Medium
|
|||
linux (Ubuntu) |
Incomplete
|
Medium
|
Unassigned |
Bug Description
I have a Surface Pro 4. The wifi works well in principle, but unfortunately it drops every x minutes. The only way to fix it I've found is to reboot the computer.
lsb_release -rd
Description: Ubuntu 17.10
Release: 17.10
ProblemType: Bug
DistroRelease: Ubuntu 17.10
Package: linux-image-
ProcVersionSign
Uname: Linux 4.13.0-16-generic x86_64
ApportVersion: 2.20.7-0ubuntu3.1
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/
CurrentDesktop: KDE
Date: Wed Nov 8 10:41:26 2017
HibernationDevice: RESUME=
Lsusb:
Bus 002 Device 002: ID 045e:090c Microsoft Corp.
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 045e:07e8 Microsoft Corp.
Bus 001 Device 003: ID 1286:204c Marvell Semiconductor, Inc.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Microsoft Corporation Surface Pro 4
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=
RelatedPackageV
linux-
linux-
linux-firmware 1.169
SourcePackage: linux
UpgradeStatus: Upgraded to artful on 2017-10-22 (16 days ago)
dmi.bios.date: 02/24/2017
dmi.bios.vendor: Microsoft Corporation
dmi.bios.version: 106.1624.768
dmi.board.name: Surface Pro 4
dmi.board.vendor: Microsoft Corporation
dmi.chassis.type: 9
dmi.chassis.vendor: Microsoft Corporation
dmi.modalias: dmi:bvnMicrosof
dmi.product.family: Surface
dmi.product.name: Surface Pro 4
dmi.product.
dmi.sys.vendor: Microsoft Corporation
In Linux Kernel Bug Tracker #109681, wengxt (wengxt-linux-kernel-bugs) wrote : | #17 |
In Linux Kernel Bug Tracker #109681, wengxt (wengxt-linux-kernel-bugs) wrote : | #18 |
Created attachment 197921
a more complete log when problem happens.
Attach a more complete log when problem happens.
In Linux Kernel Bug Tracker #109681, akarwar (akarwar-linux-kernel-bugs) wrote : | #19 |
Thanks for reporting the problem. This is a command timeout problem caused mostly due to a firmware bug.
Which firmware is being used here?
In Linux Kernel Bug Tracker #109681, wengxt (wengxt-linux-kernel-bugs) wrote : | #20 |
It's commit bbe4917 from linux-firmware git repository, not quite sure about the actual version.
In Linux Kernel Bug Tracker #109681, akarwar (akarwar-linux-kernel-bugs) wrote : | #21 |
Could you try following latest firmware?
http://
In Linux Kernel Bug Tracker #109681, wengxt (wengxt-linux-kernel-bugs) wrote : | #22 |
Created attachment 197961
log with firmware 15.68.7.p53
At first I though it's fixed, but I just got into similar situation. Log attached.
BTW, I also seems to have unstable connection when wifi is working. E.g. I open google.com in firefox with google's instant search feature enabled. While typing, The result usually not displayed and I have to refresh page several times to get web page displayed. The other laptop I have doesn't show the same problem so it's not network issue.
In Linux Kernel Bug Tracker #109681, anton (anton-linux-kernel-bugs) wrote : | #23 |
I have the same problem with mwifiex_pcie (latest git firmware) on Surface Pro 3
In Linux Kernel Bug Tracker #109681, wengxt (wengxt-linux-kernel-bugs) wrote : | #24 |
Created attachment 201031
A more complete log with debug flag set to 0xffffff
4.3.3 with firmware 15.68.7.p53
In Linux Kernel Bug Tracker #109681, mikael (mikael-linux-kernel-bugs) wrote : | #25 |
I have the same problem. In addition, using mwiflex_pcie on Surface Pro 4 seems to cause intermittent system crashes.
Using kernel 4.3.3-5 compiled from linux-source.
In Linux Kernel Bug Tracker #109681, andrea (andrea-linux-kernel-bugs) wrote : | #26 |
I have the same problem on my Microsoft Surface 3 (not Pro model), same issues (even though the rest of system is stable) and same dmesg printout.
Lspci lists the card on my tablet as: Marvell Technology Group Ltd. 88W8897 [AVASTAR] 802.11ac Wireless.
In Linux Kernel Bug Tracker #109681, andrea (andrea-linux-kernel-bugs) wrote : | #27 |
I have also noticed that the issues appear as soon as I start using the wireless card heavily (downloading files, apt update, etc.). As long as I just use ping, everything is fine.
In Linux Kernel Bug Tracker #109681, wengxt (wengxt-linux-kernel-bugs) wrote : | #28 |
It becomes much worse in linux 4.5 comparing with 4.4.2.
Is there any update on this issue?
In Linux Kernel Bug Tracker #109681, anton (anton-linux-kernel-bugs) wrote : | #29 |
Created attachment 209311
attachment-
The driver was updated recently:
Из git://git.
e92f8b3..a96efa0 master -> origin/master
Обновление e92f8b3..a96efa0
Fast-forward
mrvl/pcie8897_
mrvl/sd8897_
2016-03-15 0:25 GMT+03:00 <email address hidden>:
> https:/
>
> --- Comment #11 from Weng Xuetian <email address hidden> ---
> It becomes much worse in linux 4.5 comparing with 4.4.2.
>
> Is there any update on this issue?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
In Linux Kernel Bug Tracker #109681, wengxt (wengxt-linux-kernel-bugs) wrote : | #30 |
Actually I'm aware of that.
However, with latest firmware + linux 4.5, my system would simply freeze in a few minutes that I can't get any useful information for this bug.
With 15.68.7.p53 my kernel could live longer, but with kernel 4.5, wifi also enters bad state quite fast comparing with 4.4.2 or 4.3.3.
In Linux Kernel Bug Tracker #109681, wengxt (wengxt-linux-kernel-bugs) wrote : | #31 |
The only thing I notice so far is that:
With latest firmware + linux 4.3.3, I notice a lot of "mwifiex_pcie 0000:02:00.0: mwifiex_
In Linux Kernel Bug Tracker #109681, akarwar (akarwar-linux-kernel-bugs) wrote : | #32 |
Hi Weng,
Can you provide complete log for the issue mentioned in comment#14?
In Linux Kernel Bug Tracker #109681, anton (anton-linux-kernel-bugs) wrote : | #33 |
Created attachment 209441
attachment-
My system freezes too with 4.5 kernel and latest and previous versions of
firmware.
2016-03-16 6:52 GMT+03:00 <email address hidden>:
> https:/
>
> --- Comment #13 from Weng Xuetian <email address hidden> ---
> Actually I'm aware of that.
>
> However, with latest firmware + linux 4.5, my system would simply freeze
> in a
> few minutes that I can't get any useful information for this bug.
>
> With 15.68.7.p53 my kernel could live longer, but with kernel 4.5, wifi
> also
> enters bad state quite fast comparing with 4.4.2 or 4.3.3.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
In Linux Kernel Bug Tracker #109681, wengxt (wengxt-linux-kernel-bugs) wrote : | #34 |
Created attachment 210651
kernel 4.5 log with debug mask 0xfffffff
The complete log is too large(286MB), so I cut it to last 1 minute before PREP_CMD: FW is in bad state happens.
mwifiex_pcie 0000:02:00.0: info: MWIFIEX VERSION: mwifiex 1.0 (15.68.7.p66)
In Linux Kernel Bug Tracker #109681, jwhite (jwhite-linux-kernel-bugs) wrote : | #35 |
I'm experiencing a variety of problems with this driver as well.
Using Linux v4.6-rc1-
The first failure is an inability to connect. This is sporadic; it seems to happen about 1 time in 4 boots, but one time I had it three or four times in a row. I will attach a dmesg log with debug increased part way through. It seems as though this may be similar to the problem reported here:
https://<email address hidden>
The second failure is horrific performance, accompanied by constantly repeated mwifiex_
Finally, as I use this device, I find one consistent and persistent oddity. That is, even when everything is 'working', I get poor latency connecting to the device. That is, my latency when I ping 'out' is fine (< 10 ms, not much variation). But when I ping the device from a different system, I get highly variable latency. (Ranging from 10 ms to 4000 ms). No dropped packets, but horrible latency. You really notice it when you're ssh'd in :-/.
In Linux Kernel Bug Tracker #109681, jwhite (jwhite-linux-kernel-bugs) wrote : | #36 |
Created attachment 211711
Log of the failure to connect.
In Linux Kernel Bug Tracker #109681, jwhite (jwhite-linux-kernel-bugs) wrote : | #37 |
Created attachment 211721
Log of the cmd size 0 failure mode.
In Linux Kernel Bug Tracker #109681, jwhite (jwhite-linux-kernel-bugs) wrote : | #38 |
I've spent some time looking at the source code and trying to debug what is happening. I'm using the kvalo wireless-
It seems as though in the 'normal' case we get an unusual number of 'max count reached while accessing sleep cookie' messages. The usleep calls in that case make me suspicious, and being slow to wake up would seem to be a good explanation of the performance we see. (That is, sending a packet is fast, but responding is slow). It also seems like we're in a rather startling loop of awake/sleep events; doing multiple spins within a millisecond, as far as I can tell.
In the disconnect case, I see an 'invalid cmd resp' error followed by 'There is no command but got a command response'.
I have now started booting with debug_mask=
For reference, here are related areas of the code:
There is no command:
https:/
(although intriguingly to my naive mind, that code block is related to the sleep check as well).
In Linux Kernel Bug Tracker #109681, jwhite (jwhite-linux-kernel-bugs) wrote : | #39 |
Created attachment 212611
Log of a failure to connect, with debug mask set from boot.
In Linux Kernel Bug Tracker #109681, jwhite (jwhite-linux-kernel-bugs) wrote : | #40 |
Created attachment 212621
Log of a 'working' session, which has poor latency. debug_mask ffffffff from start.
38 comments hidden Loading more comments | view all 160 comments |
Hicks (hicks1gb) wrote : | #1 |
- version.log Edit (35 bytes, text/plain)
- AlsaInfo.txt Edit (40.0 KiB, text/plain; charset="utf-8")
- CRDA.txt Edit (505 bytes, text/plain; charset="utf-8")
- CurrentDmesg.txt Edit (56.5 KiB, text/plain; charset="utf-8")
- Dependencies.txt Edit (2.7 KiB, text/plain; charset="utf-8")
- IwConfig.txt Edit (477 bytes, text/plain; charset="utf-8")
- JournalErrors.txt Edit (6.3 KiB, text/plain; charset="utf-8")
- Lspci.txt Edit (11.8 KiB, text/plain; charset="utf-8")
- ProcCpuinfo.txt Edit (4.6 KiB, text/plain; charset="utf-8")
- ProcCpuinfoMinimal.txt Edit (1.1 KiB, text/plain; charset="utf-8")
- ProcEnviron.txt Edit (109 bytes, text/plain; charset="utf-8")
- ProcInterrupts.txt Edit (3.3 KiB, text/plain; charset="utf-8")
- ProcModules.txt Edit (6.8 KiB, text/plain; charset="utf-8")
- PulseList.txt Edit (21.0 KiB, text/plain; charset="utf-8")
- RfKill.txt Edit (112 bytes, text/plain; charset="utf-8")
- UdevDb.txt Edit (178.2 KiB, text/plain; charset="utf-8")
- WifiSyslog.txt Edit (93.7 KiB, text/plain; charset="utf-8")
Hicks (hicks1gb) wrote : | #2 |
Tested with "wicd" too.
Wicd crashes when wifi goes down
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed | #3 |
This change was made by a bot.
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
Hicks (hicks1gb) wrote : | #4 |
Joseph Salisbury (jsalisbury) wrote : | #5 |
Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?
Would it be possible for you to test the latest upstream kernel? Refer to https:/
If this bug is fixed in the mainline kernel, please add the following tag 'kernel-
If the mainline kernel does not fix this bug, please add the tag: 'kernel-
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".
Thanks in advance.
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
status: | Confirmed → Incomplete |
Hicks (hicks1gb) wrote : Re: [Bug 1730924] Re: Wifi does down "crash" in Surface Pro 4 | #6 |
El 14/11/17 a las 17:54, Joseph Salisbury escribió:
> Did this issue start happening after an update/upgrade? Was there a
> prior kernel version where you were not having this particular problem?
No, always
>
> Would it be possible for you to test the latest upstream kernel? Refer
> to https:/
> v4.14 kernel[0].
Testing...
>
> If this bug is fixed in the mainline kernel, please add the following
> tag 'kernel-
>
> If the mainline kernel does not fix this bug, please add the tag:
> 'kernel-
>
> Once testing of the upstream kernel is complete, please mark this bug as
> "Confirmed".
>
>
> Thanks in advance.
>
> [0] http://
>
>
> ** Changed in: linux (Ubuntu)
> Importance: Undecided => Medium
>
> ** Changed in: linux (Ubuntu)
> Status: Confirmed => Incomplete
>
Hicks (hicks1gb) wrote : | #7 |
At the moment wifi still working... No crashes.
Hicks (hicks1gb) wrote : | #8 |
More crashes! Problem still here.
Hicks (hicks1gb) wrote : | #9 |
bug-exists-upstream
Hicks (hicks1gb) wrote : | #10 |
Let's see, I just tested this patch, which supposedly solves the wifi controller instability, but it doesn't work for me. I'll leave the link for what it's worth.
https:/
Another detail I want to comment on is that every time the wifi is dropped, the surface is semi blocked. Terminal stops responding, I can't run any commands. Ctrl+c does not work either. Wicd also stops responding.
If I try to restart the terminal, it remains "waiting" for processes to close, which never finish closing.
I finally have to force the switch off.
Hicks (hicks1gb) wrote : | #11 |
Hicks (hicks1gb) wrote : | #12 |
Kai-Heng Feng (kaihengfeng) wrote : | #13 |
Try the daily mainline kernel, http://
There's a new fix for random MAC scan, worth a try.
Hicks (hicks1gb) wrote : | #14 |
Testing...
Hicks (hicks1gb) wrote : | #15 |
- wifierr.jpg Edit (99.8 KiB, image/jpeg)
At the moment the wifi has not fallen, but after a reboot of the system has been slow enough to start and gives me this error.
I'll keep trying to see if he can hold on.
Hicks (hicks1gb) wrote : | #16 |
Still doesn't work. The wifi is down again, taking the rest of the system with it.
Changed in linux: | |
importance: | Unknown → Medium |
status: | Unknown → Confirmed |
tags: | added: ubuntu-certified |
tags: | removed: ubuntu-certified |
tags: | added: cscc |
104 comments hidden Loading more comments | view all 160 comments |
In Linux Kernel Bug Tracker #109681, kitakar (kitakar-linux-kernel-bugs) wrote : | #121 |
Hi Verdre and Ganapathi,
I own Surface Book 1 (Skylake, same as Surface Pro 4) and Surface 3 (non-pro, Intel Atom Cherry Trail) for daily usage
and recently I got Surface Pro 4 (with broken sensor) for debugging purposes.
env:
- freshly installed Fedora 31 Beta on an external USB drive and use it on SP4
- run `sudo dnf upgrade` and now `uname -r` shows `5.3.2-
@Ganapathi
> I am doing different try outs, I will update as soon I could recreate these;
You mean you're having trouble recreating issues?
On my SP4:
- "cmd_timeout"
This issue will occur after a second suspend like the dmesg log on Comment#91 regardless of 2.4GHz/5GHz AP or power_save on/off. So, I'm not sure why you cannot recreate this issue.
- "Firmware wakeup failed"
This issue will randomly occur after a while when using with 5GHz AP+power_save on
- "WLAN FW already running! Skip FW dnld" after card reset
As you can see on the dmesg log (Comment#91), after card reset, the firmware will not be re-downloaded on Surface devices. I think that re-downloading the firmware is the expected behavior.
You may recreate this issue by this command:
echo 1 | sudo tee /sys/kernel/
After resetting, it shows these messages repeatedly on dmesg and wifi is not usable anymore:
kern :info : [ 175.478500] mwifiex_pcie 0000:02:00.0: cmd_wait_q terminated: -110
kern :info : [ 175.478510] mwifiex_pcie 0000:02:00.0: deleting the crypto keys
Maybe this is the simplest one to fix? and at least if resetting works, we can continue to use wifi.
To reset the card with firmware re-downloading, currently I have to call an ACPI method using acpi_call and remove/rescan the parent bridge of wifi (need to be installed) like this:
WIFI_PARENT=
ACPI_WIFI_
acall(){ echo "$1" | tee /proc/acpi/call >/dev/null && cat /proc/acpi/
modprobe acpi_call
acall $ACPI_WIFI_RST_PATH
echo 1 | tee "/sys/bus/
sleep 1
echo 1 | tee /sys/bus/pci/rescan
---
By the way, I find 2.4GHz AP is much more stable with even power_save on. 5GHz AP with power_save on causes wifi stop without any message on dmesg.
It seems that showing APs (e.g. `nmcli d wifi list`) will temporarily fix the wifi stop.
This is another issue from what we're discussing now, though.
Best regards,
Tsuchiya, Yuto (kitakar5525)
In Linux Kernel Bug Tracker #109681, kitakar (kitakar-linux-kernel-bugs) wrote : | #122 |
> acpi_call and remove/rescan the parent bridge of wifi (need to be installed)
Sorry, I made a mistake
acpi_call (need to be installed) and remove/rescan the parent bridge of wifi
In Linux Kernel Bug Tracker #109681, gbhat (gbhat-linux-kernel-bugs) wrote : | #123 |
Hi Tsuchiya/Verdre,
>- "cmd_timeout"
> This issue will occur after a second suspend like the dmesg log on
> >Comment#91 regardless of 2.4GHz/5GHz AP or power_save on/off
1. Command timeout after second suspend could be recreated in my Surface 4;
2. Logs were similar to the one in #91 & #92 by Verdre;
3. It is found that the command, which timed out did not reach the firmware at all and am trying to understand more;
4. Once a command times out, driver dumps few scratch registers and also it triggers the firmware dump; But I fail to get these in my device(I don't know why, it is a different problem); So, I still fail to confirm we are hitting same issue;
Note that, when the dump is uploaded to user-space, dmesg will say:
"== mwifiex dump information to /sys/class/
Could you please share the dump(along with dmesg), which is saved in below file(it is automatically deleted after 5 minutes):
*******
/sys/devices/
*******
>- "Firmware wakeup failed"
This could also be recreated for the first time now; I kept Surface 4 connected to AP and PING overnight; But I have no much detail yet;
I will comeback soon on remaining two concerns you raised in comment#95;
Regards,
Ganapathi
Regards,
Ganapathi
In Linux Kernel Bug Tracker #109681, kitakar (kitakar-linux-kernel-bugs) wrote : | #124 |
> But I fail to get these in my device(I don't know why, it is a different
> problem); So, I still fail to confirm we are hitting same issue;
Yeah, the device_dump is also a problematic one on Surface devices.
It's reported that if_ops.
I also confirmed that by
1. adding debugfs entry for calling that function
2. and called the function via debugfs (echo 1 | sudo tee /sys/kernel/
Sometimes I can get the device_dump log, sometimes I can't and it prevents system power off. I will post the patch which adds the debugfs entry to the next comment.
I will post the dump log (along with the dmesg log) as soon as I manage to get the logs after wifi malfunctioning.
By the way, the user manual on drivers/
References:
[1] https:/
Comment #99 : Bug #1730924 “Wifi does down “crash” in Surface Pro 4” : Bugs : linux package : Ubuntu
In Linux Kernel Bug Tracker #109681, kitakar (kitakar-linux-kernel-bugs) wrote : | #125 |
Created attachment 285391
[PATCH] mwifiex: debugfs: add entry for device_dump
This is a patch to add debugfs entry for device_dump.
Just to test device_dump functionality on Surface devices.
Usage:
echo 1 | sudo tee /sys/kernel/
In Linux Kernel Bug Tracker #109681, verdre (verdre-linux-kernel-bugs) wrote : | #126 |
Hi Ganapathi and Tsuchiya,
here's a dmesg and a firmware dump of the command timeout issue. It wasn't hard to get it, the devcoredump entry appeared as soon as "mwifiex dump information to /sys/class/
Regards,
Jonas
In Linux Kernel Bug Tracker #109681, verdre (verdre-linux-kernel-bugs) wrote : | #127 |
Created attachment 285445
dmesg of command timeout issue
In Linux Kernel Bug Tracker #109681, verdre (verdre-linux-kernel-bugs) wrote : | #128 |
Created attachment 285447
firmware dump of command timeout issue
In Linux Kernel Bug Tracker #109681, gbhat (gbhat-linux-kernel-bugs) wrote : | #129 |
>> 1. Command timeout after second suspend could be recreated in my Surface 4;
more specific observation on this(you could cross check, if possible):
1. timeout happens after a resume, if we see below message in dmesg:
"mwifiex_pci 0000:02:00.0: Refused to change power state, currently in D0"
2. if device is connected before the 1st suspend, no issue is seen after 1st resume;
3. if device is connected before 2nd suspend(i.e after 1st suspend_resume), then issue is seen after 2nd resume
4. if device is not at all connected during suspend_resume stress test, no issue is seen
Regards,
Ganapathi
In Linux Kernel Bug Tracker #109681, gbhat (gbhat-linux-kernel-bugs) wrote : | #130 |
Hi verdre,
>> went back to a clean kernel and tried disabling L1.2 to fix the command
>> timeout. While that indeed fixed the command timeout issue, the "Firmware
>> wakeup
failed" issue wasn't fixed.
could you share your changes, with which command timeout is not seen;
Regards,
Ganapathi
In Linux Kernel Bug Tracker #109681, gbhat (gbhat-linux-kernel-bugs) wrote : | #131 |
Created attachment 285537
0001-mwifiex-
Hi Tsuchiya/Verdre,
Could please give a try with attached change;
Regards,
Ganapathi
In Linux Kernel Bug Tracker #109681, andy.shevchenko (andy.shevchenko-linux-kernel-bugs) wrote : | #132 |
(In reply to Ganapathi Bhat from comment #105)
> Created attachment 285537 [details]
> 0001-mwifiex-
>
> Hi Tsuchiya/Verdre,
>
> Could please give a try with attached change;
>
> Regards,
> Ganapathi
I'm wondering if it has any effect on any system. PCI core *does* the same (*) for you already. Does it mean we have some race there?
*) See implementation of pci_pm_resume().
In Linux Kernel Bug Tracker #109681, gbhat (gbhat-linux-kernel-bugs) wrote : | #133 |
Hi Andy,
>PCI core *does* the same (*) for you already.
OK; I see, with this change, command timeout is not recreated; without this change, just after second iteration I see the issue(as given in comment#103);
>Does it mean we have some race there?
I need to understand more on this; I was just trying this change because, different open source wireless drivers have this in their resume handler;
Regards,
Ganapathi
In Linux Kernel Bug Tracker #109681, kitakar (kitakar-linux-kernel-bugs) wrote : | #134 |
> 1. timeout happens after a resume, if we see below message in dmesg:
> "mwifiex_pci 0000:02:00.0: Refused to change power state, currently in D0"
For me, it says instead:
`mwifiex_pcie 0000:02:00.0: Refused to change power state, currently in D3`
> 2. if device is connected before the 1st suspend, no issue is seen after 1st
> resume;
> 3. if device is connected before 2nd suspend(i.e after 1st suspend_resume),
> then issue is seen after 2nd resume
Yes. However, the "cmd_timeout" will sometimes also occur after 1st resume for me. Also in this case, the message from mwifiex is the same as No. 1.
> 4. if device is not at all connected during suspend_resume stress test, no
> issue is seen
For me, the "cmd_timeout" will occur regardless of AP connection status/rfkill status/power_save status.
What changed by No. 4 is that, it will not cause the "lockup" (e.g. not preventing system shutdown)
Also, I tried the patch 0001-mwifiex-
- So far, the wifi is still working after several suspend.
- I remember I've done exactly the same thing before and it did not
work before. Hmm…?
- The same message as No. 1 will still be printed on dmesg after
the patch.
- It may still not connect to AP automatically because it fails to find APs
(`nmcli d wifi list` shows empty). This is easily fixed by reloading "mwifiex_pcie" module
In Linux Kernel Bug Tracker #109681, verdre (verdre-linux-kernel-bugs) wrote : | #135 |
Hi Ganapathi,
since you can now reproduce the issues, are there any updates on a proper fix in the firmware or the kernel driver?
Regards,
Jonas
In Linux Kernel Bug Tracker #109681, gbhat (gbhat-linux-kernel-bugs) wrote : | #136 |
Hi Verdre,
I am yet to try on my side, the reason for why change in comment#105 don't cause the issue;
Please let know your inputs on the change shared in comment#105; We will improve the change considering different observations(
Regards,
Ganapathi
In Linux Kernel Bug Tracker #109681, verdre (verdre-linux-kernel-bugs) wrote : | #137 |
Hi Ganapathi,
so I've been running a 5.4 kernel with your changes from https:/
In Linux Kernel Bug Tracker #109681, gbhat (gbhat-linux-kernel-bugs) wrote : | #138 |
Created attachment 286459
attachment-
As of 6-DEC-2019, NXP has acquired Marvell’s Wireless business unit. You can now reach me at <email address hidden>
In Linux Kernel Bug Tracker #109681, akarwar (akarwar-linux-kernel-bugs) wrote : | #139 |
Created attachment 286461
attachment-
As of 6-DEC-2019, NXP has acquired Marvell’s Wireless business unit. You can now reach me at <email address hidden>
In Linux Kernel Bug Tracker #109681, verdre (verdre-linux-kernel-bugs) wrote : | #140 |
Sorry, last comment was sent accidentally...
Hi Ganapathi,
so I've been running a 5.4 kernel with your changes from comment#105 for some time now, and I'm still seeing the "Refused to change power state, currently in D3" messages after resuming from the second suspend.
It seems like the "Firmware wakeup failed" issue is not happening anymore with this change, but the "cmd timeout" issue is definitely still happening, also without suspending at all, so I'm seeing all the same things as Tsuchiya in comment#108.
Thank you,
Jonas
In Linux Kernel Bug Tracker #109681, ganapathi.bhat (ganapathi.bhat-linux-kernel-bugs) wrote : | #141 |
Hi Verdre,
>>but the "cmd timeout" issue is definitely still happening
you could share the dmesg + firmware dump;
also, we find below discussions; could you give a try with the script below:
https:/
Regards,
Ganapathi
In Linux Kernel Bug Tracker #109681, gbhat (gbhat-linux-kernel-bugs) wrote : | #142 |
Created attachment 286651
attachment-
As of 6-DEC-2019, NXP has acquired Marvell’s Wireless business unit. You can now reach me at <email address hidden>
In Linux Kernel Bug Tracker #109681, kitakar (kitakar-linux-kernel-bugs) wrote : | #143 |
Thank you for looking into this issue. Reporting current status.
== "mwifiex crash after suspend"
This issue might be the Surface devices specific problem? Not sure.
I know some ways to prevent at least mwifiex crashing:
1. Disabling d3cold for wifi device (even doable on mainline kernels):
echo 0 | sudo tee /sys/devices/
echo 0 | sudo tee /sys/devices/
2. Your patch "0001-mwifiex-
3. Disable bridge_d3 on mwifiex_
but with those ways, as I mentioned before (Comment 108), it sometimes not connect to AP automatically because it fails to find APs (`nmcli d wifi list` shows empty). This is easily fixed by reloading the "mwifiex_pcie" module.
Patch for this issue we (linux-surface community) are currently using is, based this patch from sebanc:
- https:/
Actual commit is here:
- https:/
The patch from sebanc does the following things:
- modify mwifiex_
It seems that the function mwifiex_
and the function mwifiex_
So, I think that it behaves in the same way as we unload modules before
suspend and re-load modules after suspend.
- disable bridge_d3 to fix crash after suspend
- disable "auto deep sleep" (I think not needed for this issue, but introduced
hoping to fix other issues)
I think the patch is too aggressive to be upstreamed as they are, but please use it as a reference.
The patch also fixes S0ix on KBL/KBL-R devices (such as Surface Book 2, Surface Pro 5, Surface Laptop 1 or later). SKL devices such as Surface Pro 4 or Surface Book 1 cannot achieve S0ix anyway because of other reasons outside of mwifiex.
In Linux Kernel Bug Tracker #109681, kitakar (kitakar-linux-kernel-bugs) wrote : | #144 |
== "mwifiex crash on idle/using"
I heard this issue only from Surface 3 and SP5 owners.
It can be fixed by disabling ASPM L1 state. I think this is the hardest issue to fix. So, I hope at least we could fix the mwifiex reset feature (see below).
== "mwifiex reset feature broken"
I think this issue is common on all devices that use the 88W8897 chip.
You can reproduce this issue by the following command:
echo 1 | sudo tee /sys/kernel/
Expected: mwifiex is usable after this reset.
Actual: mwifiex is broke after this reset with following dmesg output
$ echo 1 | sudo tee /sys/kernel/
reason code 3
reason code 15
[...]
Also, does anyone know what is the expected behavior on function level reset? Should firmware be re-downloaded? As you can see on the dmesg log, fw is still active after the reset. ("WLAN FW already running! Skip FW dnld")
== "power_save causes connection instability at least on 5GHz APs"
"connection instability" here means not crash, but networking stop, like ping not responding.
I think this issue is c...
In Linux Kernel Bug Tracker #109681, kitakar (kitakar-linux-kernel-bugs) wrote : | #145 |
Also, ff someone knows devices (other than Surface) that use the same 88W8897 chip (or at least similar PCIe chip like 88W8997), let me know. I want to get the device as a reference.
Thank you.
In Linux Kernel Bug Tracker #109681, kitakar (kitakar-linux-kernel-bugs) wrote : | #146 |
*if someone knows devices
In Linux Kernel Bug Tracker #109681, ganapathi.bhat (ganapathi.bhat-linux-kernel-bugs) wrote : | #147 |
Hi Tsuchiya,
>>comment#118:
>>Also, does anyone know what is the expected behavior on function level reset?
note that FLR does require support in the device; 88W8897(PCIE) do not support this;
[we will respond to remaining comments soon]
Regards,
Ganapathi
In Linux Kernel Bug Tracker #109681, verdre (verdre-linux-kernel-bugs) wrote : | #148 |
Hi everyone,
a few months have passed, any news about this bug?
Regards,
Jonas
In Linux Kernel Bug Tracker #109681, gbhat (gbhat-linux-kernel-bugs) wrote : | #149 |
Created attachment 288579
attachment-
As of 6-DEC-2019, NXP has acquired Marvell’s Wireless business unit. You can now reach me at <email address hidden>
In Linux Kernel Bug Tracker #109681, akarwar (akarwar-linux-kernel-bugs) wrote : | #150 |
Created attachment 288581
attachment-
As of 6-DEC-2019, NXP has acquired Marvell’s Wireless business unit. You can now reach me at <email address hidden>
tags: | added: hwe-networking-wifi |
In Linux Kernel Bug Tracker #109681, verdre (verdre-linux-kernel-bugs) wrote : | #151 |
Hi everyone,
so Tsuchiya and I have been looking into this quite a lot for the last few weeks and we've identified several issues and tried to find the underlying reasons as far as possible.
We hope this information will be helpful for you in indentifying issues with the firmware and the driver so you don't have to take a shot in the dark. Because most issues and their reproducers are already extensively explained in this bugreport, we've listed what we suspect to be some of the root causes for those issues here:
1) The LTR (Latency Tolerance Reporting) messages sent by the firmware might be incorrect. With the newest firmware, the card always reports the maximum latency, which prevents the System from entering the lowest C-States (pc10 and slp_s0). I'm not sure whether that actually prevents ASPM L1.2 from working, because the command timeouts I see when ASPM L1.2 is enabled (other devices see them only with (Surface 3) L1 or (Surface Pro 6) L0s) are still happening.
2) Performing a PCI function level reset of the card with firmware redownloading only works when power-cycling the card by switching it to D3cold and then to D0 again, see [1], which implements this. As mentioned in that commit, this is a quirk that's also done by Windows and seems to be specific to only Surface devices.
3) For avoid a crash of the firmware when resuming from suspend, it's important to disable D3 for the PCI bridge, see [2]. This is another quirk that also seems to be applied by Windows.
4) There's also a bluetooth powersaving issue, which makes the bluetooth device never enter USB the suspend state as long as no LE device is connected: That's because (assuming the linux bluetooth stack works correctly) the firmware never stops sending interrupts with LE advertising reports, which can easily be confirmed by using the btmon utility and by disabling Bluetooth LE functionality in /etc/bluettoth/
[1] https:/
[2] https:/
Thank you!
In Linux Kernel Bug Tracker #109681, andy.shevchenko (andy.shevchenko-linux-kernel-bugs) wrote : | #152 |
(In reply to verdre from comment #125)
> Hi everyone,
>
> so Tsuchiya and I have been looking into this quite a lot for the last few
> weeks and we've identified several issues and tried to find the underlying
> reasons as far as possible.
I think, especially after acquisition, Marvell is not capable to fix this. Perhaps we simple have to move on, i.e. promote your excellent work to upstream. I can help to assure wireless maintainer that this can be accepted even w.o. Marvell involvement.
Let's simple proceed to fix this very annoying issues.
In Linux Kernel Bug Tracker #109681, pali (pali-linux-kernel-bugs) wrote : | #153 |
Hello!
Tsuchiya & Verdre: Do you have some fixes for instability issues when card is in 5GHz AP mode? This is issue which we are observing on 88W8997 SDIO wifi chips.
I described some problems related to power save mode, nl80211/cfg80211 layer and mwifiex driver in email which I sent to linux-wireless mailing list: https:/
But I have not got any response about it yet and I think that without fixing those issues at cfg80211 layer we cannot fix mwifiex driver for power save mode.
Andy, are you still able to help us with (generic/cfg80211) wireless part?
In Linux Kernel Bug Tracker #109681, akarwar (akarwar-linux-kernel-bugs) wrote : | #154 |
Created attachment 290043
attachment-
As of 6-DEC-2019, NXP has acquired Marvell’s Wireless business unit. You can now reach me at <email address hidden>
In Linux Kernel Bug Tracker #109681, kitakar (kitakar-linux-kernel-bugs) wrote : | #155 |
(In reply to Pali Rohár from comment #127)
> Hello!
Hello!
> Tsuchiya & Verdre: Do you have some fixes for instability issues when card
> is in 5GHz AP mode? This is issue which we are observing on 88W8997 SDIO
> wifi chips.
Unfortunately, not yet. I'm using 5GHz AP with just power_save off. Maybe
all what we can do about power_save and firmware crashing is submitting
a journal log and devcoredump. I'll attach them in the next comment.
> I described some problems related to power save mode, nl80211/cfg80211 layer
> and mwifiex driver in email which I sent to linux-wireless mailing list:
> https:/
Thank you for letting us know!
In Linux Kernel Bug Tracker #109681, kitakar (kitakar-linux-kernel-bugs) wrote : | #156 |
Created attachment 290067
mwifiex crash log on SB1 with ps_on, 5GHzAP
- power_save on
- connected to 5GHz AP
- connection instability (ping not responding)
- firmware crashing eventually
Journal log and devcoredump are included in the archive. I kept sending ping to 192.168.1.1 and the output was redirected to kmsg to show connection stability
state.
$ ping 192.168.1.1 | sudo tee /dev/kmsg
[...]
64 bytes from 192.168.1.1: icmp_seq=5798 ttl=255 time=3.81 ms
64 bytes from 192.168.1.1: icmp_seq=5799 ttl=255 time=3.50 ms
64 bytes from 192.168.1.1: icmp_seq=5800 ttl=255 time=7.16 ms
#
# connection stopped suddenly
#
From 192.168.1.59 icmp_seq=5830 Destination Host Unreachable
From 192.168.1.59 icmp_seq=5831 Destination Host Unreachable
From 192.168.1.59 icmp_seq=5832 Destination Host Unreachable
[...]
From 192.168.1.59 icmp_seq=6019 Destination Host Unreachable
From 192.168.1.59 icmp_seq=6020 Destination Host Unreachable
From 192.168.1.59 icmp_seq=6025 Destination Host Unreachable
#
# connection started working again
#
64 bytes from 192.168.1.1: icmp_seq=5801 ttl=255 time=230664 ms
64 bytes from 192.168.1.1: icmp_seq=5802 ttl=255 time=229660 ms
64 bytes from 192.168.1.1: icmp_seq=5803 ttl=255 time=228647 ms
[...]
64 bytes from 192.168.1.1: icmp_seq=6082 ttl=255 time=7.24 ms
64 bytes from 192.168.1.1: icmp_seq=6083 ttl=255 time=80.0 ms
64 bytes from 192.168.1.1: icmp_seq=6084 ttl=255 time=6.73 ms
#
# firmware crashed suddenly
#
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
[...]
In Linux Kernel Bug Tracker #109681, kitakar (kitakar-linux-kernel-bugs) wrote : | #157 |
Hi Ganapathi,
We can't do anything about power_save and firmware crashing. So, I
attached journal log and devcoredump when using with power_save on and
5GHz AP, containing 1) connection instability and 2) firmware crashing
eventually.
I hope it helps in debugging.
Thank you.
In Linux Kernel Bug Tracker #109681, andy.shevchenko (andy.shevchenko-linux-kernel-bugs) wrote : | #158 |
*** Bug 195183 has been marked as a duplicate of this bug. ***
In Linux Kernel Bug Tracker #109681, andy.shevchenko (andy.shevchenko-linux-kernel-bugs) wrote : | #159 |
I have reassigned this to Jonas because it seems he knows much better the state of affairs with this crappy hardware/firmware than vendor does. Jonas, if you are not comfortable, tell me what to do with this bug report (and AFAIU the v5.15+ behaves much more stable nowadays, correct?).
Changed in linux: | |
status: | Confirmed → Incomplete |
In Linux Kernel Bug Tracker #109681, verdre (verdre-linux-kernel-bugs) wrote : | #160 |
Thanks Andy, indeed with new kernels all the stability issues should be gone.
What remains unsolved are all the firmware bugs without workarounds (see the list I put together in this email: https:/
Anyway, given that this bug is specifically about stability, I'd say we can finally close it.
Changed in linux: | |
status: | Incomplete → Fix Released |
Wifi looks fine at first after boot, but after hours of use or even idle, it goes into a unrecoverable state (unable to send and receive) until next reboot. dmesg logs shows lots of following messages:
[ 6822.586704] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[ 6822.586731] mwifiex_pcie 0000:02:00.0: failed to get signal information
[ 6822.586899] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[ 6822.586904] mwifiex_pcie 0000:02:00.0: failed to get signal information
[ 6827.925487] mwifiex_pcie 0000:02:00.0: 4296924600 : Tx timeout(#40), bss_type-num = 0-0
[ 6828.584671] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[ 6828.584687] mwifiex_pcie 0000:02:00.0: failed to get signal information
[ 6828.584755] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[ 6828.584763] mwifiex_pcie 0000:02:00.0: failed to get signal information
[ 6834.587608] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[ 6834.587623] mwifiex_pcie 0000:02:00.0: failed to get signal information
[ 6834.587698] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[ 6834.587706] mwifiex_pcie 0000:02:00.0: failed to get signal information