[SRU] ceph_osd crash in _committed_osd_maps when failed to encode first inc map
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| Ubuntu Cloud Archive |
Critical
|
Unassigned | ||
| Ussuri |
Critical
|
Unassigned | ||
| Victoria |
Critical
|
Unassigned | ||
| ceph (Ubuntu) |
Critical
|
Unassigned | ||
| Focal |
Critical
|
Unassigned | ||
| Groovy |
Critical
|
Unassigned |
Bug Description
[Impact]
Upstream tracker: issue#46443 [0].
The ceph-osd service can crash when processing osd map updates.
When the osd encounters a CRC error while processing an incremental map update, it will request a full map update from its peers. In this code path, an uninitialized variable was recently introduced and that will get de-referenced causing a crash.
The uninitialized variable was introduced in nautilus 14.2.10, and octopus 15.2.1.
[Test Case]
# Inject osd_inject_
sudo ceph daemon osd.{id} config set osd_inject_
# Trigger some osd map updates by restarting a different osd
sudo systemctl restart osd@{diff-id}
[Regression Potential]
The code has been updated to leave handle_osd_maps() early if a CRC error is encountered, therefore preventing the map commit if the failure is encountered while processing an incremental map update. This will make the full map update take longer but should prevent the crash that resulted in this bug. Additionally, _committed_
[Other Info]
Upstream has released a fix for this issue in Nautilus 14.2.11. The SRU for this point release is being tracked by LP: #1891077
Upstream has merged a fix for this issue in Octopus [1], but there is no current release target. The ceph packages in focal, groovy, and the ussuri cloud archive are exposed to this critical regression.
[0] https:/
[1] https:/
description: | updated |
Changed in ceph (Ubuntu Focal): | |
status: | New → Triaged |
Changed in ceph (Ubuntu Groovy): | |
status: | New → Triaged |
Changed in ceph (Ubuntu Focal): | |
importance: | Undecided → Critical |
Changed in ceph (Ubuntu Groovy): | |
importance: | Undecided → Critical |
Corey Bryant (corey.bryant) wrote : | #1 |
Corey Bryant (corey.bryant) wrote : | #2 |
ceph 15.2.3-0ubuntu2 is uploaded to groovy and ceph 15.2.3-
[1] https:/
Robie Basak (racb) wrote : | #3 |
A fix for this is in groovy-proposed (since 15.2.3-0ubuntu2 like Corey said) but not migrated yet, so I'll adjust that task to Fix Committed so it doesn't look like the SRU is going in ahead of it.
Changed in ceph (Ubuntu Groovy): | |
status: | Triaged → Fix Committed |
Launchpad Janitor (janitor) wrote : | #4 |
This bug was fixed in the package ceph - 15.2.3-0ubuntu3
---------------
ceph (15.2.3-0ubuntu3) groovy; urgency=medium
* d/control: Drop BD on obsolete cython (LP: #1891820).
ceph (15.2.3-0ubuntu2) groovy; urgency=medium
* d/p/fix-
when processing osd map updates (LP: #1891567).
-- Corey Bryant <email address hidden> Mon, 17 Aug 2020 13:46:06 -0400
Changed in ceph (Ubuntu Groovy): | |
status: | Fix Committed → Fix Released |
description: | updated |
description: | updated |
Hello Dan, or anyone else affected,
Accepted ceph into focal-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-
Further information regarding the verification process can be found at https:/
N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.
Changed in ceph (Ubuntu Focal): | |
status: | Triaged → Fix Committed |
tags: | added: verification-needed verification-needed-focal |
Corey Bryant (corey.bryant) wrote : | #6 |
Ceph currently isn't in the victoria cloud archive, marking invalid.
Corey Bryant (corey.bryant) wrote : | #7 |
Hello Dan, or anyone else affected,
Accepted ceph into ussuri-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.
Please help us by testing this new package. To enable the -proposed repository:
sudo add-apt-repository cloud-archive:
sudo apt-get update
Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-
Further information regarding the verification process can be found at https:/
tags: | added: verification-ussuri-needed |
Ponnuvel Palaniyappan (pponnuvel) wrote : | #8 |
I have tested this ussuri-proposed packages and it fixes the issue.
Setup a Nautilus cluster with the following versions:
# ceph versions
{
"mon": {
"ceph version 14.2.9 (581f22da52345d
},
"mgr": {
"ceph version 14.2.9 (581f22da52345d
},
"osd": {
"ceph version 14.2.9 (581f22da52345d
},
"mds": {},
"overall": {
"ceph version 14.2.9 (581f22da52345d
}
}
# dpkg -l | grep -i ceph
ii ceph 14.2.9-
ii ceph-base 14.2.9-
ii ceph-common 14.2.9-
ii ceph-mgr 14.2.9-
ii ceph-mon 14.2.9-
ii ceph-osd 14.2.9-
ii libcephfs2 14.2.9-
ii python3-
ii python3-cephfs 14.2.9-
ii python3-rados 14.2.9-
Ponnuvel Palaniyappan (pponnuvel) wrote : | #9 |
Also tested the same with Octopus:
# ceph versions
{
"mon": {
"ceph version 15.2.3 (d289bbdec69ed7
},
"mgr": {
"ceph version 15.2.3 (d289bbdec69ed7
},
"osd": {
"ceph version 15.2.3 (d289bbdec69ed7
},
"mds": {},
"overall": {
"ceph version 15.2.3 (d289bbdec69ed7
}
}
# ceph report | grep ceph_version
report 2214250888
tags: |
added: verification-ussuri-done removed: verification-ussuri-needed |
Ponnuvel Palaniyappan (pponnuvel) wrote : | #10 |
Tests for Focal:
$ for osd in {0..2}; do juju ssh ceph-osd/$osd 'sudo dpkg -l | grep ceph'; done
ii ceph 15.2.3-
ii ceph-base 15.2.3-
ii ceph-common 15.2.3-
ii ceph-mds 15.2.3-
ii ceph-mgr 15.2.3-
ii ceph-mgr-
ii ceph-mon 15.2.3-
ii ceph-osd 15.2.3-
ii libcephfs2 15.2.3-
ii python3-
ii python3-ceph-common 15.2.3-
ii python3-cephfs 15.2.3-
Connection to 10.5.2.78 closed.
ii ceph 15.2.3-
ii ceph-base 15.2.3-
ii ceph-common 15.2.3-
ii ceph-mds 15.2.3-
ii ceph-mgr 15.2.3-
ii ceph-mgr-
ii ceph-mon 15.2.3-
ii ceph-osd 15.2.3-
ii libcephfs2 15.2.3-
ii python3-
ii python3-ceph-common ...
tags: |
added: verification-needed-done removed: verification-needed-focal |
tags: |
added: verification-done verification-done-focal removed: verification-needed verification-needed-done |
Brian Murray (brian-murray) wrote : | #11 |
I don't see the Test Case from the bug description having been done in the comment regarding this a verified for Ubuntu 20.04, subsequently I'm flipping the tags back to verification needed.
tags: |
added: verification-needed verification-needed-focal removed: verification-done verification-done-focal |
Ponnuvel Palaniyappan (pponnuvel) wrote : | #12 |
@Brian, I have repeated the steps for focal and attached the text file with relevant logs/output. Can you please check again?
tags: |
added: verification-done verification-done-focal removed: verification-needed verification-needed-focal |
The verification of the Stable Release Update for ceph has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
Launchpad Janitor (janitor) wrote : | #14 |
This bug was fixed in the package ceph - 15.2.3-
---------------
ceph (15.2.3-
* d/p/fix-
when processing osd map updates (LP: #1891567).
-- Corey Bryant <email address hidden> Fri, 14 Aug 2020 11:46:05 -0400
Changed in ceph (Ubuntu Focal): | |
status: | Fix Committed → Fix Released |
Corey Bryant (corey.bryant) wrote : | #15 |
The verification of the Stable Release Update for ceph has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
Corey Bryant (corey.bryant) wrote : | #16 |
This bug was fixed in the package ceph - 15.2.3-
---------------
ceph (15.2.3-
.
* New update for the Ubuntu Cloud Archive.
.
ceph (15.2.3-
.
* d/p/fix-
when processing osd map updates (LP: #1891567).
I'm checking in oftc #ceph irc channel to see if there is a 15.2.5 release coming soon for octopus.