Tempest silently hides crash due to OOM

Bug #1886954 reported by Pierre Riteau
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Rally
New
Undecided
Unassigned

Bug Description

I was running the full Tempest suite via Rally. It looked like the suite passed successfully:

[...]
2020-07-09 10:48:32.672 16426 INFO default [-] {0} tempest.api.identity.admin.v3.test_list_users.UsersV3TestJSON.test_list_users_with_not_enabled ... success [0.142s]
2020-07-09 10:49:22.303 16426 INFO rally.task.context [-] Verification a73cc1fa-aa39-4427-8e3a-33ea223c2a4f | Context testr@default cleanup() started
2020-07-09 10:49:22.303 16426 INFO rally.task.context [-] Verification a73cc1fa-aa39-4427-8e3a-33ea223c2a4f | Context testr@default cleanup() finished in 0.82 msec
2020-07-09 10:49:22.304 16426 INFO rally.task.context [-] Verification a73cc1fa-aa39-4427-8e3a-33ea223c2a4f | Context tempest@default cleanup() started
2020-07-09 10:49:32.420 16426 INFO rally.task.context [-] Verification a73cc1fa-aa39-4427-8e3a-33ea223c2a4f | Context tempest@default cleanup() finished in 10.12 sec
2020-07-09 10:49:32.467 16426 INFO rally.api [-] Verification (UUID=6160b3bb-7d54-4e6f-a360-c185ae578135) has been successfully finished for deployment 'production' (UUID=206669da-6147-4
29e-b92c-c3ec049974e5)!

======
Totals
======

Ran: 1558 tests in 3693.127 sec.
 - Success: 579
 - Skipped: 138
 - Expected failures: 0
 - Unexpected success: 0
 - Failures: 0

Using verification (UUID=6160b3bb-7d54-4e6f-a360-c185ae578135) as the default verification for the future operations.

The HTML report matches these numbers: 579 tests in "success" status and 138 in "skipped" status.

However, if you look more closely at the test numbers, they don't add up at all: there are 841 tests missing! With `rally verify show 6160b3bb-7d54-4e6f-a360-c185ae578135` I was able to see that the 841 missing tests were in `init` status.

I found the reason tests stopped running at that point: there was an OOM which killed a python process:

[Thu Jul 9 10:49:20 2020] Out of memory: Killed process 16431 (python) total-vm:2074092kB, anon-rss:1509788kB, file-rss:0kB, shmem-rss:0kB, UID:1000
[Thu Jul 9 10:49:20 2020] oom_reaper: reaped process 16431 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Unfortunately tempest hides this failure and makes it look like the test suite completed successfully.

Revision history for this message
Martin Kopec (mkopec) wrote :

It looks like the framework running tempest (rally) should have dealt with the Out of memory error. I'd say that the whole tempest process (not just a tempest test) got killed by the error and therefore tempest didn't show any traceback, it couldn't, it got killed before.
I'm gonna change the project to Rally.

affects: tempest → rally
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.