[upstream] Firefox 85 hangs at startup

Bug #1914147 reported by Sergio Callegari on 2021-02-02
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Fix Released
Unknown
firefox (Ubuntu)
Critical
Olivier Tilloy

Bug Description

Today I have received a security upgrade for kubuntu 20.04 including firefox 85. Unfortunately, this version of firefox appears to hang at start unless it is started passing an url. Namely:

firefox

hangs, while

firefox lwn.net

does not hang.

I have quickly tried working with a clean profile and firefox 85 seems to start fine with that. Yet, the profile is not broken: if I downgrade to firefox 84 and restore my profile from a backup, then firefox 84 works just fine.

So the issue seems to be that firefox 85 has issues in working with profiles that are totally fine with firefox 84. This breaks migration from firefox 84 to firefox 85.

Issue is totally reproducible:

downgrade to firefox 84
restore profile from firefox 84 from backup
work with firefox 84 (OK)
upgrade to firefox 85
try to work with firefox 85 (KO)

Please consider that having to reset a profile is a form of data loss. In a modern browser, the profile includes extensions, passwords, data from web apps and more.

Because this bug makes it impossible to apply a security update (firefox 85 is marked as such and shipped via the security channel), this bug is particularly serious.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: firefox 84.0.2+build1-0ubuntu0.20.04.1
ProcVersionSignature: Ubuntu 5.8.0-41.46~20.04.1-generic 5.8.18
Uname: Linux 5.8.0-41-generic x86_64
AddonCompatCheckDisabled: False
ApportVersion: 2.20.11-0ubuntu27.14
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: callegar 1454 F.... pulseaudio
BuildID: 20210105180113
CasperMD5CheckResult: skip
Channel: Unavailable
CurrentDesktop: KDE
Date: Tue Feb 2 00:52:23 2021
EcryptfsInUse: Yes
Extensions: extensions.sqlite corrupt or missing
ForcedLayersAccel: False
IncompatibleExtensions: Unavailable (corrupt or non-existant compatibility.ini or extensions.sqlite)
InstallationDate: Installed on 2020-02-16 (351 days ago)
InstallationMedia: Kubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017)
IpRoute:
 default via 192.168.10.1 dev wlan0 proto dhcp metric 600
 169.254.0.0/16 dev wlan0 scope link metric 1000
 192.168.10.0/24 dev wlan0 proto kernel scope link src 192.168.10.116 metric 600
Locales: extensions.sqlite corrupt or missing
PrefErrors: Unexpected character ',' before close parenthesis @ /usr/lib/firefox/omni.ja:greprefs.js:354
PrefSources: prefs.js
Profiles: Profile0 (Default) - LastVersion=84.0.2/20210105180113
RunningIncompatibleAddons: False
SourcePackage: firefox
Themes: extensions.sqlite corrupt or missing
UpgradeStatus: Upgraded to focal on 2020-05-23 (254 days ago)
dmi.bios.date: 10/02/2019
dmi.bios.release: 7.4
dmi.bios.vendor: INSYDE Corp.
dmi.bios.version: 1.07.04RTR1
dmi.board.asset.tag: Tag 12345
dmi.board.name: N141CU
dmi.board.vendor: SCHENKER
dmi.board.version: Not Applicable
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: Notebook
dmi.chassis.version: N/A
dmi.ec.firmware.release: 7.2
dmi.modalias: dmi:bvnINSYDECorp.:bvr1.07.04RTR1:bd10/02/2019:br7.4:efr7.2:svnSCHENKER:pnSCHENKER_SLIM14_SSL14L19:pvrNotApplicable:rvnSCHENKER:rnN141CU:rvrNotApplicable:cvnNotebook:ct10:cvrN/A:
dmi.product.family: Not Applicable
dmi.product.name: SCHENKER_SLIM14_SSL14L19
dmi.product.sku: Not Applicable
dmi.product.version: Not Applicable
dmi.sys.vendor: SCHENKER

## Environment

- macOS Big Sur 11.0.1
   - I reproduced this bug on macOS Caralina 10.15.7 too.
- https://hg.mozilla.org/mozilla-central/rev/b0865ea584621ce9e7f68833565e3d8ae117ce32

## Steps to reproduce

1. Launch Firefox.
2. Try to open a new window from the context menu on the icon in Dock

## Actual Result

- Firefox is not responsible with rainbow cursor.
- Firefox does not comeback and I need to quit Firefox forcely.
- I reproduce this bug a few days ago.
- I does not face this bug rarely on a new profile

(In reply to Tetsuharu OHZEKI [:tetsuharu] (UTC+9) from comment #0)
> - Firefox is not responsible with rainbow cursor.
> - Firefox does not comeback and I need to quit Firefox forcely.
> - I reproduce this bug a few days ago.
> - I does not face this bug rarely on a new profile

Can you clarify whether this means you do not see this with a new profile?

Do you know which Nightly this started happening with?

> Can you clarify whether this means you do not see this with a new profile?

Yes. I face this bug reraly on a new profile.

> Do you know which Nightly this started happening with?

Sorry, I don't remember the concrete date about which Nightly starts to happen......
I feel that I faced this bug frequently from 11/20 (Fri) ~ 23 (Mon). At least I seem at that time which U.S entered to thanksgiving holiday.

---------

Additional Information are:

* WIth [yeasterday's Nightly](https://hg.mozilla.org/mozilla-central/rev/abafe6c923eb566ffb94fd6afe0e06766d0c27a6), I have not faced this bug. But I'm not sure about that this bug has been fixed actually. This bug sometimes does not happen. I seem that the step to reproduce is depends on some timing issue.
* My main profile enables fission.autostart=true.

Created attachment 9191329
stack information

This is an information which I got from the macOS' dialog to report the crash shown after force quit Firefox

Based on the stack info it looks like maybe Firefox was stuck in a deadlock somewhere in the HTTP code so moving this over there for mre investigation. Is this still occurring?

> Is this still occurring?

Yes.
I seem this was some changed from the before. On the before, this bug is reproducible on launching Firefox every time.

But now,
- I face this bug on launching Firefox first time after daily update, and it's not always happens.
- I feel this bug happens if I open a context window on the dock and opening an window.
- But the timing is a bit scatterd.
- If this bug does not happen in 10 sec after launching Firefox, then I never face to this bug whilte using Firefox until close.

CacheIO Thread holds mRCWNLock and try to dispatch SyncRunnable on the main thread:
                              45 mozilla::net::nsHttpChannel::OnCacheEntryCheck(nsICacheEntry*, nsIApplicationCache*, unsigned int*) + 2485 (XUL + 7218629) [0x107e7d5c5] 1-45
                                45 mozilla::net::nsHttpChannel::OpenCacheInputStream(nsICacheEntry*, bool, bool) + 1539 (XUL + 7225683) [0x107e7f153] 1-45
                                  45 mozilla::net::CacheEntry::GetSecurityInfo(nsISupports**) + 197 (XUL + 41189653) [0x109ee3115] 1-45
                                    45 NS_DeserializeObject(nsTSubstring<char> const&, nsISupports**) + 131 (XUL + 6224003) [0x107d8a883] 1-45
                                      45 nsBinaryInputStream::ReadObject(bool, nsISupports**) + 274 (XUL + 5336850) [0x107cb1f12] 1-45
                                        45 nsCOMPtr_base::assign_from_helper(nsCOMPtr_helper const&, nsID const&) + 44 (XUL + 5114444) [0x107c7ba4c] 1-45
                                          45 nsCreateInstanceByCID::operator()(nsID const&, void**) const + 42 (XUL + 5481626) [0x107cd549a] 1-45
                                            45 nsComponentManagerImpl::CreateInstance(nsID const&, nsISupports*, nsID const&, void**) + 183 (XUL + 5470679) [0x107cd29d7] 1-45
                                              45 nsresult mozilla::psm::NSSConstructor<mozilla::psm::TransportSecurityInfo>(nsISupports*, nsID const&, void**) + 56 (XUL + 84060120) [0x10c7c57d8] 1-45
                                                45 EnsureNSSInitializedChromeOrContent() + 583 (XUL + 21727959) [0x108c53ad7] 1-45
                                                  45 mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, bool) + 156 (XUL + 5904060) [0x107d3c6bc] 1-45

The main Thread is waiting on the mRCWNLock.

Kershaw, do I recall correctly that we have this SyncRunnable only because of some test? EnsureNSSInitializedChromeOrContent should always be called on the main thread.

Dana, do you know whatt has change recently

(In reply to Dragana Damjanovic [:dragana] from comment #6)
> CacheIO Thread holds mRCWNLock and try to dispatch SyncRunnable on the main thread:
> 45 mozilla::net::nsHttpChannel::OnCacheEntryCheck(nsICacheEntry*, nsIApplicationCache*, unsigned int*) + 2485 (XUL + 7218629) [0x107e7d5c5] 1-45
> 45 mozilla::net::nsHttpChannel::OpenCacheInputStream(nsICacheEntry*, bool, bool) + 1539 (XUL + 7225683) [0x107e7f153] 1-45
> 45 mozilla::net::CacheEntry::GetSecurityInfo(nsISupports**) + 197 (XUL + 41189653) [0x109ee3115] 1-45
> 45 NS_DeserializeObject(nsTSubstring<char> const&, nsISupports**) + 131 (XUL + 6224003) [0x107d8a883] 1-45
> 45 nsBinaryInputStream::ReadObject(bool, nsISupports**) + 274 (XUL + 5336850) [0x107cb1f12] 1-45
> 45 nsCOMPtr_base::assign_from_helper(nsCOMPtr_helper const&, nsID const&) + 44 (XUL + 5114444) [0x107c7ba4c] 1-45
> 45 nsCreateInstanceByCID::operator()(nsID const&, void**) const + 42 (XUL + 5481626) [0x107cd549a] 1-45
> 45 nsComponentManagerImpl::CreateInstance(nsID const&, nsISupports*, nsID const&, void**) + 183 (XUL + 5470679) [0x107cd29d7] 1-45
> 45 nsresult mozilla::psm::NSSConstructor<mozilla::psm::TransportSecurityInfo>(nsISupports*, nsID const&, void**) + 56 (XUL + 84060120) [0x10c7c57d8] 1-45
> 45 EnsureNSSInitializedChromeOrContent() + 583 (XUL + 21727959) [0x108c53ad7] 1-45
> 45 mozilla::SyncRunnable::DispatchToThread(nsIEventTarget*, bool) + 156 (XUL + 5904060) [0x107d3c6bc] 1-45
>
> The main Thread is waiting on the mRCWNLock.
>
> Kershaw, do I recall correctly that we have this SyncRunnable only because of some test? EnsureNSSInitializedChromeOrContent should always be called on the main thread.
>
No, I think this is a different problem and this bug could be regressed by bug 1634065.

I'm not sure bug 1634065 would have changed this one way or another. It seems like we have a preexisting issue where if `EnsureNSSInitializedChromeOrContent()` has never been called, it could think it needs to dispatch to the main thread, even if NSS has already been initialized (which causes a problem if the currently running code is holding a lock that the main thread is waiting on). My guess is if we replaced ` nsCOMPtr<nsISupports> psm = do_GetService(PSM_COMPONENT_CONTRACTID, &rv);` with `EnsureNSSInitializedChromeOrContent()` at [0], this wouldn't happen.

[0] https://searchfox.org/mozilla-central/rev/6bb59b783b193f06d6744c5ccaac69a992e9ee7b/netwerk/base/nsNetUtil.cpp#2718

*** Bug 1684966 has been marked as a duplicate of this bug. ***

ni myself to take a look.

I agree with Dana. Calling `EnsureNSSInitializedChromeOrContent()` in `net_EnsurePSMInit` seems to be the best way to fix this.

Created attachment 9197123
Bug 1679933 - Call EnsureNSSInitializedChromeOrContent() during nsHttpHandler initialization

Pushed by <email address hidden>:
https://hg.mozilla.org/integration/autoland/rev/5f0a8b3326e7
Call EnsureNSSInitializedChromeOrContent() during nsHttpHandler initialization r=necko-reviewers,dragana

(In reply to Kershaw Chang [:kershaw] from comment #14)
> Could you try to use this [build](https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Vz--6YI-TTmIs-l0Tvulvw/runs/0/artifacts/public/build/target.dmg) to verify if this issue is fixed?

Thank you for your effort!
I seem this build would fix this bug. But I cannot be confident to confirm this bug has been fixed by your patch because this bug is most reproducible on updating Firefox (In other words, it's hard to reproduce this bug on the timing which is not on updating in recent build) ....

I'll try to check it again in the next nightly build which will come in the next morning of UTC+9.

(In reply to Tetsuharu OHZEKI [:tetsuharu] (UTC+9) from comment #16)
> I'll try to check it again in the next nightly build which will come in the next morning of UTC+9.

I think this has been fixed.
Thanks!

I do see that but on Linux too, Fedora 33 / Firefox 85.

Sergio Callegari (callegar) wrote :
Sergio Callegari (callegar) wrote :

Issue seems to be associated with firefox 85 saying:

###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost

Olivier Tilloy (osomon) wrote :

Thanks for the report Sergio.
Could you please do the following (in a terminal) to help us diagnose the problem?

    $ sudo apt install firefox-dbg
    $ firefox -g # to launch firefox in gdb
    # at the gdb prompt, type "run" (without double quotes) then enter
    # when firefox hangs, get a complete backtrace by issuing:
    (gdb) t a a bt

Save that backtrace to a file and attach it here. Thanks!

Changed in firefox (Ubuntu):
status: New → Incomplete
Changed in firefox:
status: Unknown → Fix Released

@Oliver Tilloy

- Behavior is exactly as in https://bugzilla.mozilla.org/show_bug.cgi?id=1689809. When the bug occurred I could only see the frame.

- I have managed working around the bug. This seems definitely to be caused by some race. Disabling all the extensions and re-enabling them one by one I managed "fixing" the problem without having to reset my profile.

Olivier Tilloy (osomon) wrote :

Thanks for following up Sergio. A backtrace would have been useful, but the issue was confirmed by someone else and the backtrace was similar to that in the upstream bug report, so I am pushing an update with the corresponding upstream patch.

Changed in firefox (Ubuntu):
status: Incomplete → In Progress
importance: Undecided → Critical
assignee: nobody → Olivier Tilloy (osomon)
summary: - Firefox hangs after upgrade to version 85
+ [upstream] Firefox 85 hangs at startup
Sergio Callegari (callegar) wrote :

Thank you Olivier for the super quick response. At current time I am a bit overwhelmed by work. However in the next days I'll try to get out again the profile that was giving the issue from the backup. If a point release comes out, I'll be able to at least confirm that everything works even with that profile.

Olivier Tilloy (osomon) wrote :

That would be super helpful, thanks Sergio!

I was unable to reproduce the issue by using Firefox 85.0a1 (2020-11-30) after and before updating it, under macOS 10.15.7.

It seems that the reporter says the fix is working (see Comment 16 and Comment 17), so I will remove the qa+ flag based on that. If further investigation is needed, please don't hesitate and ni me.

Mary S (marystern) wrote :

I have the same problem: after a security update last night on 20.04 LTS, firefox is now completely broken (displays a window frame with no content inside, ie it just copies the background image that was underneath on startup in the framebuffer?).

firefox does work if I pass it a URL at startup.

$ firefox --full-version
Mozilla Firefox 85.0 20210118153634 20210118153634

cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"

This is a critical bug for my main browser on the Ubuntu LTS channel!! Please fix ASAP!!!

Comment on attachment 9197123
Bug 1679933 - Call EnsureNSSInitializedChromeOrContent() during nsHttpHandler initialization

### Beta/Release Uplift Approval Request
* **User impact if declined**: Firefox could freeze on start up.
See bug 1689032. It seems we have some users reported the same issue.
* **Is this code covered by automated tests?**: Yes
* **Has the fix been verified in Nightly?**: Yes
* **Needs manual test from QE?**: No
* **If yes, steps to reproduce**:
* **List of other uplifts needed**: None
* **Risk to taking this patch**: Low
* **Why is the change risky/not risky? (and alternatives if risky)**: This patch is quite straightforward and this patch is already on beta and nightly for a while.
* **String changes made/needed**: N/A

Do we know when/why this started? Any idea how many users may have been affected, and why?

(In reply to Julien Cristau [:jcristau] from comment #21)
> Do we know when/why this started? Any idea how many users may have been affected, and why?

I think the code that triggers this deadlock was in bug 1325341 (since the lock is `mRCWNLock`), so it has been there for years. I think it's probably some recent changes that change the thread timings and make this deadlock happen more often.
Unfortunately, I can't tell how many users are affected, since it's all about timing to hit this.

(In reply to Kershaw Chang [:kershaw] from comment #22)
> (In reply to Julien Cristau [:jcristau] from comment #21)
> > Do we know when/why this started? Any idea how many users may have been affected, and why?
>
I think the code that triggers this deadlock was in bug 1325341 (since the lock is `mRCWNLock`), so it has been there for years. I think it's probably some recent changes that affect the thread timings and make this deadlock happen more often.
Unfortunately, I can't tell how many users are affected, since it's all about timing to hit this.

In reply to "when", I never experienced this problem up to and including 84.0.2. I first got this problem in 85.0, and still get it in 85.0.1.

Comment on attachment 9197123
Bug 1679933 - Call EnsureNSSInitializedChromeOrContent() during nsHttpHandler initialization

fixing a deadlock on startup, approved for 85.0.2

Is this worth taking on ESR78 to be safe?

(In reply to Ryan VanderMeulen [:RyanVM] from comment #27)
> Is this worth taking on ESR78 to be safe?

OK, let's uplift this to ESR78 for safe.

Comment on attachment 9197123
Bug 1679933 - Call EnsureNSSInitializedChromeOrContent() during nsHttpHandler initialization

### ESR Uplift Approval Request
* **If this is not a sec:{high,crit} bug, please state case for ESR consideration**: Firefox could freeze on start up.
* **User impact if declined**: Firefox could freeze on start up.
* **Fix Landed on Version**: 86
* **Risk to taking this patch**: Low
* **Why is the change risky/not risky? (and alternatives if risky)**: The patch is already verified on 86.
* **String or UUID changes made by this patch**: N/A

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package firefox - 85.0.1+build1-0ubuntu0.20.10.1

---------------
firefox (85.0.1+build1-0ubuntu0.20.10.1) groovy; urgency=medium

  * New upstream release (85.0.1+build1)

 -- Olivier Tilloy <email address hidden> Fri, 05 Feb 2021 12:56:13 +0100

Changed in firefox (Ubuntu):
status: In Progress → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package firefox - 85.0.1+build1-0ubuntu0.20.04.1

---------------
firefox (85.0.1+build1-0ubuntu0.20.04.1) focal; urgency=medium

  * New upstream release (85.0.1+build1)

 -- Olivier Tilloy <email address hidden> Fri, 05 Feb 2021 12:54:22 +0100

Changed in firefox (Ubuntu):
status: In Progress → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package firefox - 85.0.1+build1-0ubuntu0.18.04.1

---------------
firefox (85.0.1+build1-0ubuntu0.18.04.1) bionic; urgency=medium

  * New upstream release (85.0.1+build1)

 -- Olivier Tilloy <email address hidden> Fri, 05 Feb 2021 12:50:03 +0100

Changed in firefox (Ubuntu):
status: In Progress → Fix Released

Tried again reproducing this issue (following some leads from Bug 1689032 as well) in order to verify this on Firefox 85.0.2 and on ESR, but had no luck.

Tried creating dirty profiles (adding addons, cache from websites, etc), installing and uninstalling the browser, updating it over and over from 84.0.2 to 85.0.1, and we couldn't encounter any freezes. Tests were performed on macOS 11.2, Ubuntu 20.04 and Windows 10.

Mary S (marystern) wrote :

FYI: this fix seems to work for me.

Also, re "when": like the other reporter, this only started for me with the previous update (I think 85.0).

Comment on attachment 9197123
Bug 1679933 - Call EnsureNSSInitializedChromeOrContent() during nsHttpHandler initialization

Low-risk fix for a longstanding issue which can cause startup freezes in some situations. Approved for 78.8esr.

*** Bug 1691118 has been marked as a duplicate of this bug. ***

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.