prom-env-test test aborted and core dumped

Bug #1713434 reported by R.Nageswara Sastry
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Thomas Huth

Bug Description

On ppc64le architecture machine the following test case Aborted and Core dumped.

# tests/prom-env-test --quiet --keep-going -m=quick --GTestLogFD=6
**
ERROR:tests/libqtest.c:628:qtest_get_arch: assertion failed: (qemu != NULL)
Aborted (core dumped)

Steps to re-produce:
clone from the git
configure & compile
run the unit tests by 'make check'

(gdb) bt
#0 0x00003fff9d60eff0 in raise () from /lib64/libc.so.6
#1 0x00003fff9d61136c in abort () from /lib64/libc.so.6
#2 0x00003fff9de1aa04 in g_assertion_message () from /lib64/libglib-2.0.so.0
#3 0x00003fff9de1ab0c in g_assertion_message_expr () from /lib64/libglib-2.0.so.0
#4 0x000000001000cc30 in qtest_get_arch () at tests/libqtest.c:628
#5 0x00000000100048f0 in main (argc=5, argv=0x3ffff2145538) at tests/prom-env-test.c:82
(gdb) i r
r0 0xfa 250
r1 0x3ffff2144d30 70368510627120
r2 0x3fff9d7b9900 70367091333376
r3 0x0 0
r4 0x12a7 4775
r5 0x6 6
r6 0x8 8
r7 0x1 1
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0x0 0
r13 0x3fff9dfa1950 70367099623760
r14 0x0 0
r15 0x0 0
r16 0x0 0
r17 0x0 0
r18 0x0 0
r19 0x0 0
r20 0x0 0
r21 0x0 0
r22 0x0 0
r23 0x0 0
r24 0x0 0
r25 0x0 0
r26 0x0 0
r27 0x100287f8 268601336
r28 0x16841b40 377756480
r29 0x4c 76
r30 0x3ffff2144de8 70368510627304
r31 0x6 6
pc 0x3fff9d60eff0 0x3fff9d60eff0 <raise+96>
msr 0x900000000280f033 10376293541503627315
cr 0x42000842 1107298370
lr 0x3fff9d61136c 0x3fff9d61136c <abort+396>
ctr 0x0 0
xer 0x0 0
orig_r3 0x12a7 4775
trap 0xc00 3072

Revision history for this message
Thomas Huth (th-huth) wrote :

When running tests directly, you've got to specify the QEMU binary like this:

QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 tests/prom-env-test --quiet --keep-going -m=quick

But the abort() is indeed ugly here and I think we should print out a user-friendly error message instead.

Changed in qemu:
assignee: nobody → Thomas Huth (th-huth)
status: New → In Progress
Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

The actual failure was the following

  LINK tests/test-hmp
  GTESTER check-qtest-ppc64
**
ERROR:tests/prom-env-test.c:42:check_guest_memory: assertion failed (signature == MAGIC): (0x7c7f1b78 == 0xcafec0de)
GTester: last random seed: R02Sfb567618f7c703a032934c0c11e263c6
make: *** [check-qtest-ppc64] Error 1

But I was going through found the above was aborting. so reported here.

Revision history for this message
Thomas Huth (th-huth) wrote :

The "ERROR:tests/prom-env-test.c:42:check_guest_memory" error is a timeout error... is it reproducible? Was the host you're testing on very loaded at that point in time?

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

Host was not loaded at that time. And can be re-producable all the times

  GTESTER check-qtest-ppc64
**
ERROR:tests/prom-env-test.c:42:check_guest_memory: assertion failed (signature == MAGIC): (0x7c7f1b78 == 0xcafec0de)
GTester: last random seed: R02S5625099e4ad7700238a4e83dbd6576e0

this is with new seed.

Revision history for this message
Thomas Huth (th-huth) wrote :

That works for me - no problems with tests/prom-env-test on a POWER8 little endian system here. What host system are you using? Could you also check what happens if you run QEMU directly like this, and post the console output here:

ppc64-softmmu/qemu-system-ppc64 -nographic -M pseries,accel=tcg -nodefaults -serial stdio -prom-env 'use-nvramrc?=true' -prom-env 'nvramrc=." Hello World!" cr power-off'

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

I am using a Power9 machine.

# ppc64-softmmu/qemu-system-ppc64 -nographic -M pseries,accel=tcg -nodefaults -serial stdio -prom-env 'use-nvramrc?=true' -prom-env 'nvramrc=." Hello World!" cr power-off'

SLOF **********************************************************************
QEMU Starting
 Build Date = Jul 24 2017 15:15:46
 FW Version = git-89f519f09bf85091
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
(!) Executing code specified in nvramrc
 SLOF Setup = Hello World!

Revision history for this message
Thomas Huth (th-huth) wrote :

Hmm, that looks like -prom-env is working correctly with the pseries machine. I wonder whether the problem is somewhere else... Could you please run "make check-qtest-ppc64 V=1" to see whether it rather fails for the mac99 or g3beige machines?

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

TEST: tests/prom-env-test... (pid=9915)
  /ppc64/prom-env/mac99: OK
  /ppc64/prom-env/g3beige: OK
  /ppc64/prom-env/pseries: **
ERROR:tests/prom-env-test.c:42:check_guest_memory: assertion failed (signature == MAGIC): (0x7c7f1b78 == 0xcafec0de)
FAIL
GTester: last random seed: R02S765f0e192be9c96e793728ecc28d6395
(pid=10152)
FAIL: tests/prom-env-test

Revision history for this message
Thomas Huth (th-huth) wrote :

Weird. I managed to run the test on a POWER9 box today, too, and it works for me:

TEST: tests/prom-env-test... (pid=18912)
  /ppc64/prom-env/mac99: OK
  /ppc64/prom-env/g3beige: OK
  /ppc64/prom-env/pseries: OK
PASS: tests/prom-env-test

Which OS and C compiler are you using?

Also, could you please try to add this patch (to add "-serial /dev/stderr") and then post the console output of the prom-env-test:

diff --git a/tests/prom-env-test.c b/tests/prom-env-test.c
--- a/tests/prom-env-test.c
+++ b/tests/prom-env-test.c
@@ -51,7 +51,7 @@ static void test_machine(const void *machine)
     extra_args = strcmp(machine, "pseries") == 0 ? "-nodefaults" : "";

     args = g_strdup_printf("-M %s,accel=tcg %s -prom-env 'use-nvramrc?=true' "
- "-prom-env 'nvramrc=%x %x l!' ",
+ "-prom-env 'nvramrc=%x %x l!' -serial /dev/stderr",
                            (const char *)machine, extra_args, MAGIC, ADDRESS);

     qtest_start(args);

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

Strange,

# gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

OS is RHEL based.
# uname -a
Linux host 4.11.0-26.el7a.ppc64le #1 SMP Wed Aug 23 17:27:28 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux

TEST: tests/prom-env-test... (pid=6726)
  /ppc64/prom-env/mac99:
>> =============================================================
>> OpenBIOS 1.1 [Jul 13 2017 18:28]
>> Configuration device id QEMU version 1 machine id 3
>> CPUs: 1
>> Memory: 128M
>> UUID: 00000000-0000-0000-0000-000000000000
>> CPU type PowerPC,970FX
milliseconds isn't unique.
OK
  /ppc64/prom-env/g3beige:
>> =============================================================
>> OpenBIOS 1.1 [Jul 13 2017 18:28]
>> Configuration device id QEMU version 1 machine id 2
>> CPUs: 1
>> Memory: 128M
>> UUID: 00000000-0000-0000-0000-000000000000
>> CPU type PowerPC,750
milliseconds isn't unique.
OK
  /ppc64/prom-env/pseries:

SLOF **********************************************************************
QEMU Starting
 Build Date = Jul 24 2017 15:15:46
 FW Version = git-89f519f09bf85091
 Press "s" to enter Open Firmware.

**500
ERROR:tests/prom-env-test.c:42:check_guest_memory: assertion failed (signature == MAGIC): (0x7c7f1b78 == 0xcafec0de)
FAIL
GTester: last random seed: R02Sf6897626ac2ebaf5edd7aa24cc1999df
(pid=6951)
FAIL: tests/prom-env-test

Revision history for this message
Thomas Huth (th-huth) wrote :

Very weird, looks like SLOF crashed at a very early stage here. I've got no further clue how to debug this ... could you maybe try it on another POWER9 host if possible? Or check whether slof.bin accidentially got corrupted (md5sum pc-bios/slof.bin should give you db83598b28052e9c12972d86c37b0c69)? Or maybe you could also ask Nikunj to have a look at this?

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

# md5sum ./pc-bios/slof.bin
db83598b28052e9c12972d86c37b0c69 ./pc-bios/slof.bin

Same as what you mentioned.

Will try to get a different machine and try. If the problem still persists, I will check with Nikunj.

Thanks a lot for your time. I have learned many things while interacting with you.

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

On other machine with same OS and gcc level, it's working fine. Not getting what went wrong in the machine where I can re-produce this issue.
I guess this bug can be closed. Thank you.

  /ppc64/prom-env/pseries:

SLOF **********************************************************************
QEMU Starting
 Build Date = Jul 24 2017 15:15:46
 FW Version = git-89f519f09bf85091
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
(!) Executing code specified in nvramrc
 SLOF Setup = OK
PASS: tests/prom-env-test

Revision history for this message
Thomas Huth (th-huth) wrote :

OK, thanks for testing! Anyway, I'll keep the bug opened to track the original issue that you've mentioned in the description ("ERROR:tests/libqtest.c:628:qtest_get_arch: assertion failed") here.

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :
Download full text (7.2 KiB)

Similar failure seen with the following test too.

# make check-qtest-sparc64 V=1
(cd /home/nasastry/qemu; printf '#define QEMU_PKGVERSION '; if test -n ""; then printf '""\n'; else if test -d .git; then printf '" ('; git describe --match 'v*' 2>/dev/null | tr -d '\n'; if ! git diff-index --quiet HEAD &>/dev/null; then printf -- '-dirty'; fi; printf ')"\n'; else printf '""\n'; fi; fi) > qemu-version.h.tmp
if ! cmp -s qemu-version.h qemu-version.h.tmp; then mv qemu-version.h.tmp qemu-version.h; else rm qemu-version.h.tmp; fi
make -C /home/nasastry/qemu/capstone CAPSTONE_SHARED=no BUILDDIR="/home/nasastry/qemu/capstone" CC="cc" AR="ar" LD="ld" CFLAGS="-fprofile-arcs -ftest-coverage -g -g -I/usr/include/pixman-1 -DHAS_LIBSSH2_SFTP_FSYNC -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -DNCURSES_WIDECHAR -fPIE -DPIE -m64 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fno-common -fwrapv -fstack-protector-strong -I/usr/include/p11-kit-1 -I/usr/include/libpng15 -I/home/nasastry/qemu/capstone/include -I/home/nasastry/qemu/tests -DCAPSTONE_USE_SYS_DYN_MEM -DCAPSTONE_HAS_ARM -DCAPSTONE_HAS_ARM64 -DCAPSTONE_HAS_POWERPC -DCAPSTONE_HAS_X86" BUILD_DIR=/home/nasastry/qemu /home/nasastry/qemu/capstone/libcapstone.a
make[1]: Entering directory `/home/nasastry/qemu/capstone'
make[1]: `/home/nasastry/qemu/capstone/libcapstone.a' is up to date.
make[1]: Leaving directory `/home/nasastry/qemu/capstone'
make BUILD_DIR=/home/nasastry/qemu -C sparc64-softmmu V="1" TARGET_DIR="sparc64-softmmu/" all
make[1]: Entering directory `/home/nasastry/qemu/sparc64-softmmu'
make[1]: Leaving directory `/home/nasastry/qemu/sparc64-softmmu'
QTEST_QEMU_BINARY=sparc64-softmmu/qemu-system-sparc64 QTEST_QEMU_IMG=qemu-img MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} gtester -k --verbose -m=quick tests/endianness-test tests/prom-env-test tests/qmp-test tests/device-introspect-test tests/qom-test tests/test-hmp
TEST: tests/endianness-test... (pid=72117)
  /sparc64/endianness/sun4u: OK
  /sparc64/endianness/split/sun4u: OK
  /sparc64/endianness/combine/sun4u: OK
PASS: tests/endianness-test
TEST: tests/prom-env-test... (pid=72128)
  /sparc64/prom-env/sun4u: **
ERROR:tests/prom-env-test.c:42:check_guest_memory: assertion failed (signature == MAGIC): (0x00000000 == 0xcafec0de)
FAIL
GTester: last random seed: R02S54a8565ad2895d49d8d1b0dc99da7044
(pid=74068)
FAIL: tests/prom-env-test
TEST: tests/qmp-test... (pid=74069)
  /sparc64/qmp/protocol: OK
  /sparc64/qmp/qom-list-types: OK
  /sparc64/qmp/query-acpi-ospm-status: OK
  /sparc64/qmp/query-balloon: OK
  /sparc64/qmp/query-block: OK
  /sparc64/qmp/query-block-jobs: OK
  /sparc64/qmp/query-blockstats: ...

Read more...

Revision history for this message
Thomas Huth (th-huth) wrote :

Is that 100% reproducible? Which version of QEMU did you use here? And which host are you using, POWER9 again? The very latest git master branch is using a timeout of 10 minutes now, so that should be sufficient for all cases...

Could you please try to run this manually:

qemu-system-sparc64 -nographic -M sun4u -prom-env 'use-nvramrc?=true' -prom-env 'nvramrc=." Hello World!" cr'

... and then paste the output here?

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

Git head is at 299d1ea9bb56bd9f45f905125489bdd7d543a1aa
latest today

100% re-producible. This is different & working Power9 machine than the other day.

# ./sparc64-softmmu/qemu-system-sparc64 -nographic -M sun4u -prom-env 'use-nvramrc?=true' -prom-env 'nvramrc=." Hello World!" cr'
OpenBIOS for Sparc64
Unhandled Exception 0x0000000000000034
PC = 0x00000000ffd0f704 NPC = 0x00000000ffd0f708
Stopping execution

and hung here. Not responding to ctrl+c or ctrl+z.

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

Here is the md5sum of openbios-sparc64

# md5sum ./pc-bios/openbios-sparc64
15418a4c9429d9ee9c637701b94c7ffb ./pc-bios/openbios-sparc64

> Could you please check with the QEMU 2.10 release to see whether this is a regression or whether it was already failing there?
Sure, I will update here. Most probably tomorrow.

> I don't have access to POWER9 anymore, so I can't check this right now
No issues, I can check.

> To quit QEMU, you've got to press "CTRL-a c" and then type "quit" at the monitor prompt.
Thank you. I am learning new things along with raising bugs :-)

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

This test case was working till 2.10.0 and got broken in 2.10.1

I checked with 2.9.1, 2.10.0-rc2, 2.10.0-rc3, 2.10.0-rc4, 2.10.0

Working scenario:
# ./sparc64-softmmu/qemu-system-sparc64 -nographic -M sun4u -prom-env 'use-nvramrc?=true' -prom-env 'nvramrc=." Hello World!" cr'
OpenBIOS for Sparc64
Configuration device id QEMU version 1 machine id 0
kernel cmdline
CPUs: 1 x SUNW,UltraSPARC-IIi
UUID: 00000000-0000-0000-0000-000000000000
Hello World!
Welcome to OpenBIOS v1.1 built on Jul 13 2017 18:27
  Type 'help' for detailed information
Trying disk:a...
No valid state has been set by load or init-program

0 > QEMU 2.10.0 monitor - type 'help' for more information
(qemu) quit

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

The above problem is getting re-producible only with configure option "--enable-crypto-afalg" This got introduced in between 2.9.1 and 2.10.0. I will bisect it and update.

When I tried earlier with 2.9.1 it complained saying "--enable-crypto-afalg" option is not available so I did with out it and continued that's the reason why it worked with 2.10.0 and till rc4.
Tested 2.10.1 with configure option "--enable-crypto-afalg" so it failed.

So this information made me to say it broke in 2.10.1.

When I disable this option "crypto-afalg" in 2.10.1 it's working fine and same with latest qemu (git head is at a4f0537db0cd68fa2da097995f6ec00747ca453c)

# ./sparc64-softmmu/qemu-system-sparc64 -nographic -M sun4u -prom-env 'use-nvramrc?=true' -prom-env 'nvramrc=." Hello World!" cr'
OpenBIOS for Sparc64
Configuration device id QEMU version 1 machine id 0
kernel cmdline
CPUs: 1 x SUNW,UltraSPARC-IIi
UUID: 00000000-0000-0000-0000-000000000000
Hello World!
Welcome to OpenBIOS v1.1 built on Oct 19 2017 06:59
  Type 'help' for detailed information
Trying disk:a...
No valid state has been set by load or init-program

0 > QEMU 2.10.50 monitor - type 'help' for more information
(qemu) quit

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

After talking to Mark Cave-Ayland over e-mail, to make sure I have the proper versions of OpenBIOS binaries - removed the existing git tree and with a fresh clone not seeing the 'qemu-system-sparc64' related crash.
Before cleanup I was seeing the crash all the times.
Thanks!!

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

Some times it's still puzzling, when I got the clean git tree still seeing crash with correct OpenBIOS file.

[root@zzfp365-lp1 test]# git clone git://git.qemu.org/qemu.git
Cloning into 'qemu'...
remote: Counting objects: 349636, done.
remote: Compressing objects: 100% (66763/66763), done.
remote: Total 349636 (delta 284434), reused 346628 (delta 282016)
Receiving objects: 100% (349636/349636), 112.12 MiB | 1.28 MiB/s, done.
Resolving deltas: 100% (284434/284434), done.

[root@zzfp365-lp1 qemu]# md5sum ./pc-bios/openbios-sparc64
15418a4c9429d9ee9c637701b94c7ffb ./pc-bios/openbios-sparc64

[root@zzfp365-lp1 qemu]# git pull
Already up-to-date.

did configure:
./configure --enable-attr --enable-bsd-user --enable-cap-ng --enable-coroutine-pool --enable-crypto-afalg --enable-curl --enable-curses --enable-debug --enable-debug-info --enable-debug-tcg --enable-fdt --enable-gcrypt --enable-gnutls --enable-gprof --enable-gtk --enable-guest-agent --enable-kvm --enable-libiscsi --enable-libssh2 --enable-linux-aio --enable-linux-user --enable-live-block-migration --enable-modules --enable-numa --enable-pie --enable-profiler --enable-qom-cast-debug --enable-rbd --enable-replication --enable-seccomp --enable-smartcard --enable-stack-protector --enable-system --enable-tcg --enable-tcg-interpreter --enable-tools --enable-tpm --enable-trace-backend=ftrace --enable-user --enable-vhost-net --enable-vhost-scsi --enable-vhost-user --enable-vhost-vsock --enable-virtfs --enable-vnc --enable-tpm --enable-vnc-png --enable-vnc-sasl --enable-werror --enable-xfsctl --enable-gcov --enable-debug-stack-usage

make -j 32

# ./sparc64-softmmu/qemu-system-sparc64 -nographic -M sun4u -prom-env 'use-nvramrc?=true' -prom-env 'nvramrc=." Hello World!" cr'
OpenBIOS for Sparc64
Unhandled Exception 0x0000000000000034
PC = 0x00000000ffd0f704 NPC = 0x00000000ffd0f708
Stopping execution
QEMU 2.10.90 monitor - type 'help' for more information
(qemu) quit

# md5sum ./pc-bios/openbios-sparc64
15418a4c9429d9ee9c637701b94c7ffb ./pc-bios/openbios-sparc64

git head is at b0fbe46ad82982b289a44ee2495b59b0bad8a842

Revision history for this message
R.Nageswara Sastry (nasastry) wrote :

Thomas thanks for your hint about the configuration option named "--enable-tcg-interpreter". By removing it the test case started working fine.

[root@zzfp365-lp1 qemu]# ./sparc64-softmmu/qemu-system-sparc64 -nographic -M sun4u -prom-env 'use-nvramrc?=true' -prom-env 'nvramrc=." Hello World!" cr'
OpenBIOS for Sparc64
Configuration device id QEMU version 1 machine id 0
kernel cmdline
CPUs: 1 x SUNW,UltraSPARC-IIi
UUID: 00000000-0000-0000-0000-000000000000
Hello World!
Welcome to OpenBIOS v1.1 built on Oct 19 2017 06:59
  Type 'help' for detailed information
Trying disk:a...
No valid state has been set by load or init-program

0 > QEMU 2.10.90 monitor - type 'help' for more information
(qemu) quit

Revision history for this message
Thomas Huth (th-huth) wrote :

OK, so this was "only" the TCG-interpreter that was causing the sparc64 problem here. Since there are known issues with the TCG-interpreter on certain architectures, this is not really related to the prom-env-test. And since the fix for the original "ERROR:tests/libqtest.c:628:qtest_get_arch: assertion failed" problem has been committed already ("commit db221e66d8117f8"), I'm marking the status of this bug now accordingly.

Changed in qemu:
status: In Progress → Fix Committed
Revision history for this message
Thomas Huth (th-huth) wrote :
Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.