nova-ceph-multistore job fails with mysqld got oom-killed

Bug #1961068 reported by Elod Illes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Unassigned

Bug Description

Searching through the jobs showed that nova-ceph-multistore job fails time to time with DB crash due to out of memory error.

In the tempest errors the following message can be seen:

tempest.lib.exceptions.ServerFault: Got server fault
Details: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_db.exception.DBConnectionError'>

in mysqld error logs (controller/logs/mysql/error_log.txt) the crash recovery is visible:

2022-02-15T19:26:40.245179Z 0 [System] [MY-010229] [Server] Starting XA crash recovery...
2022-02-15T19:26:40.268204Z 0 [System] [MY-010232] [Server] XA crash recovery finished.

and around that time in syslog (controller/logs/syslog.txt) the Out of Memory logs can be seen:

Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mysql.service,task=mysqld,pid=67959,uid=116
Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: Out of memory: Killed process 67959 (mysqld) total-vm:5127600kB, anon-rss:756064kB, file-rss:0kB, shmem-rss:0kB, UID:116 pgtables:2388kB oom_score_adj:0
Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: oom_reaper: reaped process 67959 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

The error only comes in nova-ceph-multistore job. (see recent occurrences via logsearch: https://paste.opendev.org/show/bQNKfoaMafUyNFCyQ0kN/ ) Mostly happens on current master branch (yoga), but example error found in wallaby as well: https://zuul.opendev.org/t/openstack/build/d8a6a9c1496346dda6986db00c06a616

Tags: gate-failure
tags: added: gate-failure
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/874664

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/874664
Committed: https://opendev.org/openstack/nova/commit/84d1f25446731e4e51beb83a017cdf7bfda8c5d5
Submitter: "Zuul (22348)"
Branch: master

commit 84d1f25446731e4e51beb83a017cdf7bfda8c5d5
Author: Dan Smith <email address hidden>
Date: Tue Feb 21 08:43:13 2023 -0800

    Use mysql memory reduction flags for ceph job

    This makes the ceph-multistore job use the MYSQL_REDUCE_MEMORY
    flag in devstack to try to address the frequent OOMs we see in that
    job.

    Change-Id: Ibc203bd10dcb530027c2c9f58eb840ccc088280d
    Closes-Bug: #1961068

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 27.0.0.0rc1

This issue was fixed in the openstack/nova 27.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.