glusterd and glusterfs crashes after dist-upgrade from focal to jammy

Bug #1991441 reported by Patrick Winnertz
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glusterfs (Ubuntu)
Confirmed
Wishlist
Unassigned

Bug Description

glusterd crashes upon start with the following error message:

[2022-10-01 09:56:27.339326 +0000] I [MSGID: 100030] [glusterfsd.c:2767:main] 0-glusterd: Started running version [{arg=glusterd}, {version=10.1}, {cmdlinestr=glusterd}]
[2022-10-01 09:56:27.343033 +0000] I [glusterfsd.c:2447:daemonize] 0-glusterfs: Pid of current running process is 4399
pending frames:
patchset: git://git.gluster.org/glusterfs.git
signal received: 4
time of crash:
2022-10-01 09:56:27 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 10.1
---------

also a mount of a remote filesystem crashes with this error:

[2022-10-01 10:18:11.509645 +0000] I [MSGID: 100030] [glusterfsd.c:2767:main] 0-/usr/sbin/glusterfs: Started running version [{arg=/usr/sbin/glusterfs}, {version=10.1}, {cmdlinestr=/usr/sbin/glusterfs --process-name fuse --volfile-server=192.168.68.12 --volfile-id=dockervolume /home/docker}]
[2022-10-01 10:18:11.516342 +0000] I [glusterfsd.c:2447:daemonize] 0-glusterfs: Pid of current running process is 6594
pending frames:
patchset: git://git.gluster.org/glusterfs.git
signal received: 4
time of crash:
2022-10-01 10:18:11 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 10.1
---------

The versions which are installed:
$ dpkg -l | grep gluster
ii glusterfs-cli 10.1-1 armhf clustered file-system (cli package)
ii glusterfs-client 10.1-1 armhf clustered file-system (client package)
ii glusterfs-common 10.1-1 armhf GlusterFS common libraries and translator modules
ii glusterfs-server 10.1-1 armhf clustered file-system (server package)
ii libglusterd0:armhf 10.1-1 armhf GlusterFS glusterd shared library
ii libglusterfs0:armhf 10.1-1 armhf GlusterFS shared library

Please note: System architecture is armhf running on a odroid xu4.

Revision history for this message
Patrick Winnertz (pwinnertz) wrote :

# glusterd --debug
[2022-10-01 11:31:49.495260 +0000] I [MSGID: 100030] [glusterfsd.c:2767:main] 0-glusterd: Started running version [{arg=glusterd}, {version=10.1}, {cmdlinestr=glusterd --debug}]
[2022-10-01 11:31:49.495390 +0000] I [glusterfsd.c:2447:daemonize] 0-glusterfs: Pid of current running process is 1491
[2022-10-01 11:31:49.495419 +0000] D [logging.c:1705:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2022-10-01 11:31:49.501716 +0000] D [MSGID: 0] [gf-io.c:513:gf_io_run] 0-io: Trying I/O engine 'legacy'
[2022-10-01 11:31:49.501764 +0000] D [MSGID: 0] [gf-io.c:517:gf_io_run] 0-io: I/O engine 'legacy' is ready
[2022-10-01 11:31:49.501962 +0000] D [logging.c:1675:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 3 extra log messages
[2022-10-01 11:31:49.501997 +0000] D [logging.c:1681:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 3 extra log messages
pending frames:
patchset: git://git.gluster.org/glusterfs.git
signal received: 4
time of crash:
2022-10-01 11:31:49 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 10.1
---------
Illegal instruction (core dumped)

Thats the output of glusterd --debug

I've build the package in the meantime on the same system using debuild - however it still crashes.

Revision history for this message
Paride Legovini (paride) wrote :

Thanks Patrick for this bug report, I've been able to reproduce it on a Raspberry Pi 4.

In these cases it is worth checking if the bug is caused by LTO by trying to recompile the package with LTO disabled. This means recompiling the package with

  DEB_BUILD_MAINT_OPTIONS=optimize=-lto

As you recompiled the package already, would you be able to try recompiling it without LTO (i.e. by setting that flag) and report your findings?

If you do so, please check that the build log doesn't show:

  Building with LTO : yes

and that the compiled is not called with the "-flto" flag. For reference see the build log of the package currently in Jammy [1]: you'll find plenty of "-flto" occurrences.

See https://wiki.ubuntu.com/ToolChain/LTO for more information on LTO and on the ways to disable it.

Thanks!

[1] https://launchpadlibrarian.net/582144161/buildlog_ubuntu-jammy-armhf.glusterfs_10.1-1_BUILDING.txt.gz

Paride Legovini (paride)
Changed in glusterfs (Ubuntu):
status: New → Confirmed
Revision history for this message
Patrick Winnertz (pwinnertz) wrote :

I've still the same error while disabling lto:

By the way - the DEB_BUILD_MAINT_OPTIONS didn't worked as expected. I've added "--disable-lto" to the DEB_CONFIGURE_EXTRA_FLAGS instead.

root@odroid:/tmp# cat glusterfs_10.1-1_armhf.build |grep lto
dh_auto_configure -- --disable-linux-io_uring --enable-firewalld --libexecdir=/usr/lib/arm-linux-gnueabihf --without-tcmalloc --disable-lto
        ./configure --build=arm-linux-gnueabihf --prefix=/usr --includedir=\${prefix}/include --mandir=\${prefix}/share/man --infodir=\${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=\${prefix}/lib/arm-linux-gnueabihf --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --disable-linux-io_uring --enable-firewalld --libexecdir=/usr/lib/arm-linux-gnueabihf --without-tcmalloc --disable-lto
checking for dlltool... no
root@odroid:/tmp#

Error/Crash message is still the same - see above.

Revision history for this message
David Muench (davemuench) wrote :

I don't have anything of value to contribute, but I also ran into this exact same issue on an odroid HC2 (basically the same as an XU4 in a different form factor) upgraded from focal to jammy.

# /usr/sbin/glusterd --debug
[2023-06-24 04:18:36.664100 +0000] I [MSGID: 100030] [glusterfsd.c:2767:main] 0-/usr/sbin/glusterd: Started running version [{arg=/usr/sbin/glusterd}, {version=10.1}, {cmdlinestr=/usr/sbin/glusterd --debug}]
[2023-06-24 04:18:36.664220 +0000] I [glusterfsd.c:2447:daemonize] 0-glusterfs: Pid of current running process is 3913
[2023-06-24 04:18:36.664256 +0000] D [logging.c:1705:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2023-06-24 04:18:36.670335 +0000] D [MSGID: 0] [gf-io.c:513:gf_io_run] 0-io: Trying I/O engine 'legacy'
[2023-06-24 04:18:36.670377 +0000] D [MSGID: 0] [gf-io.c:517:gf_io_run] 0-io: I/O engine 'legacy' is ready
[2023-06-24 04:18:36.670603 +0000] D [logging.c:1675:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 3 extra log messages
[2023-06-24 04:18:36.670634 +0000] D [logging.c:1681:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 3 extra log messages
pending frames:
patchset: git://git.gluster.org/glusterfs.git
signal received: 4
time of crash:
2023-06-24 04:18:36 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 10.1
---------
Illegal instruction (core dumped)
#

Out of curiosity, I installed the focal gluster 7.2 packages on jammy (I know, not the right thing to do, but I was interested as a test) and they work fine.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for taking the time to report this bug.

I did some investigation and found a few interesting things:

- The crash happens because of the following instruction:

(gdb) x/i $pc
=> 0xf7f41f86 <gf_io_run+898>: udf #255 ; 0xff

udf is an undefined instruction, so this crash is indeed meant to happen here. Now, why is that?

- Some internet research took me to the following link:

https://github.com/gluster/glusterfs/issues/3911

You can see that it seems like upstream has semi-officially abandoned the support for 32-bit architectures.

- Talking specifically about the code in question, it requires support for 64-bit atomic operations, but armhf doesn't have it. Therefore, the illegal instruction is generated and this is something intentional.

I honestly don't know if there's much we can do here as a distribution. For now, I am marking this bug as Wishlist (because it can be considered like a "feature request"), and I'll like the upstream issue I meantioned above. The last comment there is from someone who says that they will try to work on this problem, but they can't guarantee that everything will work.

Thanks.

Changed in glusterfs (Ubuntu):
importance: Undecided → Wishlist
Revision history for this message
David Muench (davemuench) wrote :

Thank you for finding that, I appreciate the detective work.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.