Getting SIGSEGV and SIGILL in many programs

Bug #2058191 reported by Eduardo P. Gomez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu
New
Undecided
Unassigned
linux (Ubuntu)
New
Undecided
Kleber Sacilotto de Souza

Bug Description

Okay, recently I upgraded to 24.04. I'm getting some SIGSEGV and SIGILLs from time to time. Sometimes the entire computer freezes and i can't even turn down unless i hold the power button for 5 secs.

I tought it could be the kernel version, so I upgraded from Ubuntu's 6.8.0-11.11+1 to mainline 6.8.1. However, it didn't fix.

Here are some softwares i got SIGSEGV or SIGILLs:
 - code-insiders (vscode)
 - brave (Brave browser)
 - bun (node.js alternative)
 - node.js

I know i should upload more logs, but I didn't find the errors in syslog or journalctl.

$ lsb_release -rd
-----------------
No LSB modules are available.
Description: Ubuntu Noble Numbat (development branch)
Release: 24.04

Tags: noble
tags: added: noble
tags: removed: 24.04
Revision history for this message
Eduardo P. Gomez (eduapps) wrote (last edit ):

Found something in dmesg while i was running bun:

[ 1383.592336] traps: bun[7952] trap invalid opcode ip:5fdfeaea2fee sp:7ffeeeeb0fc0 error:0 in bun[5fdfe8296000+2c0f000]

When computer freezes, dmesg shows this (and some apparmor denies):

mar 18 03:42:13 eduapps kernel: mce: [Hardware Error]: Machine check events logged

Revision history for this message
Eduardo P. Gomez (eduapps) wrote :

As now this could be something related to hardware, let me log some cpu info:

$ sudo lshw -c cpu
------------------
  *-cpu
       description: CPU
       product: Intel(R) Core(TM) i9-14900K
       vendor: Intel Corp.
       physical id: 4f
       bus info: cpu@0
       version: 6.183.1
       serial: To Be Filled By O.E.M.
       slot: U3E1
       size: 5700MHz
       capacity: 5700MHz
       width: 64 bits
       clock: 100MHz
       capabilities: lm fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp x86-64 constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities cpufreq
       configuration: cores=24 enabledcores=24 microcode=290 threads=32

Changed in linux (Ubuntu):
assignee: nobody → Kleber Sacilotto de Souza (kleber-souza)
Revision history for this message
Andrea Righi (arighi) wrote :

The message `mce: [Hardware Error]: Machine check events logged` really seems to indicate a potential hardware malfunction.

Can you double check if this is happening only with the latest 6.8? Do you see anything similar in dmesg with other kernels?

Revision history for this message
Eduardo P. Gomez (eduapps) wrote :

I can try as many kernels as you want. Just give me some time to get back home.

23.10 was working alright, but I don't remember which kernel I was using in this version.

Revision history for this message
Andrea Righi (arighi) wrote :

Can you give it a try also with the latest upstream 6.8 (available here https://kernel.ubuntu.com/mainline/v6.8.1/). This should help to verify if it's an upstream issue or a specific issue with the Ubuntu kernel.

Thanks!

Revision history for this message
Eduardo P. Gomez (eduapps) wrote : Re: [Bug 2058191] Re: Getting SIGSEGV and SIGILL in many programs

That's the one I'm currently using. The Ubuntu's 6.8.0-11 also have
the same problem.

I will check other versions. I think the 6.5 might be working

On Tue, Mar 19, 2024, 11:15 AM Andrea Righi <email address hidden>
wrote:

> Can you give it a try also with the latest upstream 6.8 (available here
> https://kernel.ubuntu.com/mainline/v6.8.1/). This should help to verify
> if it's an upstream issue or a specific issue with the Ubuntu kernel.
>
> Thanks!
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2058191
>
> Title:
> Getting SIGSEGV and SIGILL in many programs
>
> Status in Ubuntu:
> New
> Status in linux package in Ubuntu:
> New
>
> Bug description:
> Okay, recently I upgraded to 24.04. I'm getting some SIGSEGV and
> SIGILLs from time to time. Sometimes the entire computer freezes and i
> can't even turn down unless i hold the power button for 5 secs.
>
> I tought it could be the kernel version, so I upgraded from Ubuntu's
> 6.8.0-11.11+1 to mainline 6.8.1. However, it didn't fix.
>
> Here are some softwares i got SIGSEGV or SIGILLs:
> - code-insiders (vscode)
> - brave (Brave browser)
> - bun (node.js alternative)
> - node.js
>
> I know i should upload more logs, but I didn't find the errors in
> syslog or journalctl.
>
> $ lsb_release -rd
> -----------------
> No LSB modules are available.
> Description: Ubuntu Noble Numbat (development branch)
> Release: 24.04
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+bug/2058191/+subscriptions
>
>

Revision history for this message
Eduardo P. Gomez (eduapps) wrote :

That's the one I'm currently using. The Ubuntu's 6.8.0-11 also have the same problem.

I will check other versions. I think the 6.5 might be working
Hide quoted text

Revision history for this message
Eduardo P. Gomez (eduapps) wrote :

FINALLY FOUND SOMETHING!! 6.8.1 kernel, but this time i got sigsegv from brave browser (log being attached):

SourcePackage: brave-browser
Stacktrace:
 #0 0x000058e16cb3b49e in ??? ()
 #1 0x000036d80357c469 in ??? ()
 #2 0x00007ffe5ec94230 in ??? ()
 #3 0x000058e17798e4d8 in ??? ()
 #4 0x000010880058c000 in ??? ()
 #5 0x000058e177aa3880 in ??? ()
 #6 0x00007ffe5ec938f8 in ??? ()
 #7 0x000010880058c000 in ??? ()
 #8 0x0000000000000000 in ??? ()
StacktraceAddressSignature: /opt/brave.com/brave/brave:11:/opt/brave.com/brave/brave+ff049e:[stack]+24230:/opt/brave.com/brave/brave+e14d8:[anon..partition_alloc]+20000:/opt/brave.com/brave/brave+1f6880:[stack]+238f8:[anon..partition_alloc]+20000
StacktraceTop:
 ??? ()
 ??? ()
 ??? ()
 ??? ()
 ??? ()
ThreadStacktrace:
 .
 Thread 39 (Thread 0x7d6e830006c0 (LWP 23)):
 #0 0x00007d6f1f498d61 in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7d6e82fff190, op=137, expected=0, futex_word=0x7d6e82fff288) at ./nptl/futex-internal.c:57

Revision history for this message
Eduardo P. Gomez (eduapps) wrote :

In dmesg log, i saw a line starting with "mar 20 02:04:10 eduapps whoopsie-upload-all[8574]" which was written by whoopsie that gave me a brave crash log.

brave crash log is big but gives good info of what's going on. if you search for the word "SIGSEGV" you will find where the error occurs.

Revision history for this message
Eduardo P. Gomez (eduapps) wrote :
Revision history for this message
Eduardo P. Gomez (eduapps) wrote :

Reading my crash dump, i noticed a interesting thing. Here is the back trace:

Downloading separate debug info for system-supplied DSO at 0x7ffe5ecc5000
Core was generated by `/opt/brave.com/brave/brave --type=renderer --crashpad-handler-pid=5837 --enable'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000058e16cb3b49e in ?? ()
[Current thread is 1 (LWP 1)]
(gdb) bt
#0 0x000058e16cb3b49e in ?? ()
#1 0x00007ffe5ec93780 in ?? ()
#2 0x0000000000000000 in ?? ()

The first backtrace is located inside the brave binary as (gdb) info proc mappings says:
Mapped address spaces:

          Start Addr End Addr Size Offset objfile
               [...]
      0x58e16bb4b000 0x58e1778ad000 0xbd62000 0x2e1c000 /opt/brave.com/brave/brave
               [...]

The second backtrace doesn't show in the mapped address list. But as the previous output says it's related to a "system-supplied DSO". I Googled it and a guy named fche on stackoverflow said "system-supplied-DSO means a shared library provided directly by the linux kernel such as VDSO". Is this right?

If that's true, does this mean we have a big kernel issue?

Revision history for this message
Andrea Righi (arighi) wrote :

Hm... honestly this looks more like a user-space / brave issue than a kernel issue. Do you get similar SIGSEGV with other apps?

Revision history for this message
Eduardo P. Gomez (eduapps) wrote :

Yes, I have. The most frequent ones are bun (node.js alternative) and brave.

On Wed, Mar 20, 2024, 4:50 AM Andrea Righi <email address hidden>
wrote:

> Hm... honestly this looks more like a user-space / brave issue than a
> kernel issue. Do you get similar SIGSEGV with other apps?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2058191
>
> Title:
> Getting SIGSEGV and SIGILL in many programs
>
> Status in Ubuntu:
> New
> Status in linux package in Ubuntu:
> New
>
> Bug description:
> Okay, recently I upgraded to 24.04. I'm getting some SIGSEGV and
> SIGILLs from time to time. Sometimes the entire computer freezes and i
> can't even turn down unless i hold the power button for 5 secs.
>
> I tought it could be the kernel version, so I upgraded from Ubuntu's
> 6.8.0-11.11+1 to mainline 6.8.1. However, it didn't fix.
>
> Here are some softwares i got SIGSEGV or SIGILLs:
> - code-insiders (vscode)
> - brave (Brave browser)
> - bun (node.js alternative)
> - node.js
>
> I know i should upload more logs, but I didn't find the errors in
> syslog or journalctl.
>
> $ lsb_release -rd
> -----------------
> No LSB modules are available.
> Description: Ubuntu Noble Numbat (development branch)
> Release: 24.04
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+bug/2058191/+subscriptions
>
>

Revision history for this message
Andrea Righi (arighi) wrote :

Unfortunately those traces don't say much without the debugging symbols. If it happens also with the mainline kernel we should see similar bugs reported upstream, that's why I'm not very convinced about this being a kernel issue. More likely a library issue, considering that it happens with different applications (or interactions between kernel and a particular library).

You mention that the last kernel that was working was like a 6.5? There's a huge delta between 6.5 and 6.8. Maybe we could try to restrict this range a bit more...

At https://kernel.ubuntu.com/mainline/ you can find the debs of pretty much all the mainline kernel versions, maybe you could also test some kernels between 6.5 and 6.8 (assuming you can easily reproduce the problem), in order to restrict the range of changes and have a better idea where to look.

Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.