Yes, I am sure this is the same issue that they are experiencing there, and I now believe the issue lies in glib, and not mutter.
When we install mutter-common, it calls the libglib2.0-0 hook to recompile the gsettings schemas.
The customer provided me with a tarball of their /usr/share/glib-2.0/schemas directory, and I have spent the day looking at it.
I deleted all the schemas from a test 20.04 VM, and extracted the tarball of their schemas in place, and rebooted the VM.
From there, the same exact problems occurred. Each program could not load the compiled gschema file, and hit a breakpoint in the glib library.
Jul 2 13:41:04 ubuntu tracker-miner-f[1235]: No GSettings schemas are installed on the system
Jul 2 13:41:04 ubuntu tracker-extract[1234]: No GSettings schemas are installed on the system
Jul 2 13:41:04 ubuntu kernel: [ 13.280095] show_signal: 7 callbacks suppressed
Jul 2 13:41:04 ubuntu kernel: [ 13.280097] traps: tracker-miner-f[1235] trap int3 ip:7fb6202ac295 sp:7fff0d5c7cd0 error:0 in libglib-2.0.so.0.6400.6[7fb620270000+84000]
Jul 2 13:41:04 ubuntu kernel: [ 13.281163] traps: tracker-extract[1234] trap int3 ip:7f8718ac3295 sp:7ffe774d1c40 error:0 in libglib-2.0.so.0.6400.6[7f8718a87000+84000]
Jul 2 13:41:00 ubuntu gnome-session[1175]: gnome-session-binary[1175]: GLib-GIO-ERROR: No GSettings schemas are installed on the system
Jul 2 13:41:00 ubuntu gnome-session[1175]: aborting...
Jul 2 13:41:00 ubuntu gnome-session-binary[1175]: GLib-GIO-ERROR: No GSettings schemas are installed on the system#012aborting...
Jul 2 13:41:00 ubuntu gdm3: GdmDisplay: Session never registered, failing
Jul 2 13:41:00 ubuntu gdm3: GdmLocalDisplayFactory: maximum number of X display failures reached: check X server log for errors
Jul 2 13:41:00 ubuntu gdm3: Child process -1157 was already dead.
Now, looking closer, we see their gschema.compiled file exists. This means that we aren't dealing with a missing file and it not being re-created, but instead a corrupted gschema.compiled file.
and rebooted, and the system came up normally. Very interesting.
From there, I rebuilt the file several times, each time checking the sha256 value. Each time it was exactly the same, so the compile process appears to be deterministic.
I then did a binary diff of the corrupted gschema.compiled file, and a freshly rebuilt one.
I need to determine exactly how these two bytes ended up different.
I think we are chasing two bugs here:
1) A bug which generates a corrupted gschema.compiled file.
2) A bug where we cannot parse a corrupted gschema.compiled file gracefully.
Since my VM was generating a lot of coredumps for each process, I took a look. I downloaded the debug symbols of glib2.0 for 20.04 and opened a crashdump in gdb.
(gdb) bt
#0 _g_log_abort (breakpoint=1) at ../../../glib/gmessages.c:554
#1 0x00007f635e381579 in g_logv (log_domain=0x7f635e6006ff "GLib-GIO", log_level=G_LOG_LEVEL_ERROR, format=<optimized out>, args=args@entry=0x7ffe83d1e730) at ../../../glib/gmessages.c:1373
#2 0x00007f635e381743 in g_log (log_domain=log_domain@entry=0x7f635e6006ff "GLib-GIO", log_level=log_level@entry=G_LOG_LEVEL_ERROR,
format=format@entry=0x7f635e6217b8 "No GSettings schemas are installed on the system") at ../../../glib/gmessages.c:1415
#3 0x00007f635e5ad1fa in g_settings_set_property (object=<optimized out>, prop_id=2, value=<optimized out>, pspec=<optimized out>) at ../../../gio/gsettings.c:591
#4 0x00007f635e46b681 in object_set_property (nqueue=0x55a285fd8e20, value=0x7ffe83d1e910, pspec=0x55a285fd4570, object=0x55a285fe3570) at ../../../gobject/gobject.c:1565
#5 g_object_new_internal (class=class@entry=0x55a285fee870, params=params@entry=0x7ffe83d1e9b0, n_params=n_params@entry=1) at ../../../gobject/gobject.c:1971
#6 0x00007f635e46d378 in g_object_new_valist (object_type=<optimized out>, first_property_name=<optimized out>, var_args=var_args@entry=0x7ffe83d1eb00) at ../../../gobject/gobject.c:2262
#7 0x00007f635e46d6cd in g_object_new (object_type=<optimized out>, first_property_name=<optimized out>) at ../../../gobject/gobject.c:1780
#8 0x000055a285196a5c in ?? ()
#9 0x000055a28517cfe6 in ?? ()
#10 0x00007f635e0180b3 in __libc_start_main (main=0x55a28517c8d0, argc=4, argv=0x7ffe83d1edc8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe83d1edb8)
at ../csu/libc-start.c:308
#11 0x000055a28517d21e in ?? ()
Okay, so gnome-session and gdm and nautilus and all the other programs crash for the exact same reason, and that is, glib2.0 tries to parse the binary gsettings.compiled file, fails for some reason, and returns NULL to its caller in g_settings_set_property():
Now, this goes and logs the error to disk, and eventually hits a breakpoint in _g_log_abort(), called from g_logv(), the kernel finds that there is no debugger waiting for this breakpoint, and then collects a coredump, and terminates the process.
I followed the logic in g_settings_schema_source_get_default(). What it does is allocate a buffer for the binary file, read the file in, and then attempts to build a table by parsing the binary file. Interestingly, it explicitly marks the input as "trusted" and even has a comment to say that problems can occur if we parse a trusted binary file, that happens to be corrupted.
248 /**
249 * g_settings_schema_source_new_from_directory:
...
264 * If @trusted is %TRUE then `gschemas.compiled` is trusted not to be
265 * corrupted. This assumption has a performance advantage, but can result
266 * in crashes or inconsistent behaviour in the case of a corrupted file.
267 * Generally, you should set @trusted to %TRUE for files installed by the
268 * system and to %FALSE for files in the home directory.
269 *
270 * In either case, an empty file or some types of corruption in the file will
271 * result in %G_FILE_ERROR_INVAL being returned.
...
I did some quick tests. If I changed each byte that was different individually, things worked without issue. So we need both of these byte changed to cause issues.
At the moment, I am reading up about the compiled gschema binary format, and how the glib library parses the binary file, and why we error out on corruption.
I tried the same corrupted gschema.compiled file on a fresh Impish install, and the latest glib version there crashes as well.
Hi Daniel,
Yes, I am sure this is the same issue that they are experiencing there, and I now believe the issue lies in glib, and not mutter.
When we install mutter-common, it calls the libglib2.0-0 hook to recompile the gsettings schemas.
The customer provided me with a tarball of their /usr/share/ glib-2. 0/schemas directory, and I have spent the day looking at it.
I deleted all the schemas from a test 20.04 VM, and extracted the tarball of their schemas in place, and rebooted the VM.
From there, the same exact problems occurred. Each program could not load the compiled gschema file, and hit a breakpoint in the glib library.
Jul 2 13:41:04 ubuntu tracker- miner-f[ 1235]: No GSettings schemas are installed on the system extract[ 1234]: No GSettings schemas are installed on the system miner-f[ 1235] trap int3 ip:7fb6202ac295 sp:7fff0d5c7cd0 error:0 in libglib- 2.0.so. 0.6400. 6[7fb620270000+ 84000] extract[ 1234] trap int3 ip:7f8718ac3295 sp:7ffe774d1c40 error:0 in libglib- 2.0.so. 0.6400. 6[7f8718a87000+ 84000]
Jul 2 13:41:04 ubuntu tracker-
Jul 2 13:41:04 ubuntu kernel: [ 13.280095] show_signal: 7 callbacks suppressed
Jul 2 13:41:04 ubuntu kernel: [ 13.280097] traps: tracker-
Jul 2 13:41:04 ubuntu kernel: [ 13.281163] traps: tracker-
Jul 2 13:41:00 ubuntu gnome-session[ 1175]: gnome-session- binary[ 1175]: GLib-GIO-ERROR: No GSettings schemas are installed on the system 1175]: aborting... binary[ 1175]: GLib-GIO-ERROR: No GSettings schemas are installed on the system# 012aborting. .. Factory: maximum number of X display failures reached: check X server log for errors
Jul 2 13:41:00 ubuntu gnome-session[
Jul 2 13:41:00 ubuntu gnome-session-
Jul 2 13:41:00 ubuntu gdm3: GdmDisplay: Session never registered, failing
Jul 2 13:41:00 ubuntu gdm3: GdmLocalDisplay
Jul 2 13:41:00 ubuntu gdm3: Child process -1157 was already dead.
Now, looking closer, we see their gschema.compiled file exists. This means that we aren't dealing with a missing file and it not being re-created, but instead a corrupted gschema.compiled file.
I rebuilt the file with:
$ sudo glib-compile- schemas /usr/share/ glib-2. 0/schemas/
and rebooted, and the system came up normally. Very interesting.
From there, I rebuilt the file several times, each time checking the sha256 value. Each time it was exactly the same, so the compile process appears to be deterministic.
I then did a binary diff of the corrupted gschema.compiled file, and a freshly rebuilt one.
I found two bytes were different:
$ cmp -l ~/schemas/ gschemas. compiled /usr/share/ glib-2. 0/schemas/ gschemas. compiled | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}'
0000376F E3 25
00003771 A4 65
$ xxd ~/schemas/ gschemas. compiled > ~/corrupt.bin glib-2. 0/schemas/ gschemas. compiled > ~/working.bin
$ xxd /usr/share/
$ diff ~/corrupt.bin ~/working.bin
887,888c887,888
< 00003760: 0515 0000 ffff ffff 7837 0000 0000 e300 ........x7......
< 00003770: a455 0000 0000 0000 6f72 672e 676e 6f6d .U......org.gnom
---
> 00003760: 0515 0000 ffff ffff 7837 0000 0000 2500 ........x7....%.
> 00003770: 6555 0000 0000 0000 6f72 672e 676e 6f6d eU......org.gnom
I need to determine exactly how these two bytes ended up different.
I think we are chasing two bugs here:
1) A bug which generates a corrupted gschema.compiled file.
2) A bug where we cannot parse a corrupted gschema.compiled file gracefully.
Since my VM was generating a lot of coredumps for each process, I took a look. I downloaded the debug symbols of glib2.0 for 20.04 and opened a crashdump in gdb.
(gdb) bt ./glib/ gmessages. c:554 0x7f635e6006ff "GLib-GIO", log_level= G_LOG_LEVEL_ ERROR, format=<optimized out>, args=args@ entry=0x7ffe83d 1e730) at ../../. ./glib/ gmessages. c:1373 log_domain@ entry=0x7f635e6 006ff "GLib-GIO", log_level= log_level@ entry=G_ LOG_LEVEL_ ERROR, format@ entry=0x7f635e6 217b8 "No GSettings schemas are installed on the system") at ../../. ./glib/ gmessages. c:1415 set_property (object=<optimized out>, prop_id=2, value=<optimized out>, pspec=<optimized out>) at ../../. ./gio/gsettings .c:591 0x55a285fd8e20, value=0x7ffe83d 1e910, pspec=0x55a285f d4570, object= 0x55a285fe3570) at ../../. ./gobject/ gobject. c:1565 new_internal (class= class@entry= 0x55a285fee870, params= params@ entry=0x7ffe83d 1e9b0, n_params= n_params@ entry=1) at ../../. ./gobject/ gobject. c:1971 type=<optimized out>, first_property_ name=<optimized out>, var_args= var_args@ entry=0x7ffe83d 1eb00) at ../../. ./gobject/ gobject. c:2262 type=<optimized out>, first_property_ name=<optimized out>) at ../../. ./gobject/ gobject. c:1780 7c8d0, argc=4, argv=0x7ffe83d1 edc8, init=<optimized out>, fini=<optimized out>, rtld_fini= <optimized out>, stack_end= 0x7ffe83d1edb8) libc-start. c:308
#0 _g_log_abort (breakpoint=1) at ../../.
#1 0x00007f635e381579 in g_logv (log_domain=
#2 0x00007f635e381743 in g_log (log_domain=
format=
#3 0x00007f635e5ad1fa in g_settings_
#4 0x00007f635e46b681 in object_set_property (nqueue=
#5 g_object_
#6 0x00007f635e46d378 in g_object_new_valist (object_
#7 0x00007f635e46d6cd in g_object_new (object_
#8 0x000055a285196a5c in ?? ()
#9 0x000055a28517cfe6 in ?? ()
#10 0x00007f635e0180b3 in __libc_start_main (main=0x55a2851
at ../csu/
#11 0x000055a28517d21e in ?? ()
Okay, so gnome-session and gdm and nautilus and all the other programs crash for the exact same reason, and that is, glib2.0 tries to parse the binary gsettings.compiled file, fails for some reason, and returns NULL to its caller in g_settings_ set_property( ):
544 static void set_property (GObject *object, schema_ source_ get_default ();
545 g_settings_
546 guint prop_id,
547 const GValue *value,
548 GParamSpec *pspec)
549 {
...
588 default_source = g_settings_
589
590 if (default_source == NULL)
591 g_error ("No GSettings schemas are installed on the system");
...
Now, this goes and logs the error to disk, and eventually hits a breakpoint in _g_log_abort(), called from g_logv(), the kernel finds that there is no debugger waiting for this breakpoint, and then collects a coredump, and terminates the process.
I followed the logic in g_settings_ schema_ source_ get_default( ). What it does is allocate a buffer for the binary file, read the file in, and then attempts to build a table by parsing the binary file. Interestingly, it explicitly marks the input as "trusted" and even has a comment to say that problems can occur if we parse a trusted binary file, that happens to be corrupted.
248 /** schema_ source_ new_from_ directory:
249 * g_settings_
...
264 * If @trusted is %TRUE then `gschemas.compiled` is trusted not to be
265 * corrupted. This assumption has a performance advantage, but can result
266 * in crashes or inconsistent behaviour in the case of a corrupted file.
267 * Generally, you should set @trusted to %TRUE for files installed by the
268 * system and to %FALSE for files in the home directory.
269 *
270 * In either case, an empty file or some types of corruption in the file will
271 * result in %G_FILE_ERROR_INVAL being returned.
...
I did some quick tests. If I changed each byte that was different individually, things worked without issue. So we need both of these byte changed to cause issues.
At the moment, I am reading up about the compiled gschema binary format, and how the glib library parses the binary file, and why we error out on corruption.
I tried the same corrupted gschema.compiled file on a fresh Impish install, and the latest glib version there crashes as well.