[SRU] ceph 10.2.9

Bug #1706566 reported by James Page on 2017-07-26
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Undecided
Unassigned
Mitaka
Medium
Unassigned
ceph (Ubuntu)
Undecided
Unassigned
Xenial
Medium
Unassigned
Zesty
Medium
Unassigned

Bug Description

[Impact]
This release sports mostly bug-fixes and we would like to make sure all of our supported customers have access to these improvements.

The update contains the following package updates:

   * ceph 10.2.9

[Test Case]
The following SRU process was followed:

https://wiki.ubuntu.com/OpenStackUpdates

In order to avoid regression of existing consumers, the OpenStack team will run their continuous integration test against the packages that are in -proposed. A successful run of all available tests will be required before the proposed packages can be let into -updates.

The OpenStack team will be in charge of attaching the output summary of the executed tests. The OpenStack team members will not mark ‘verification-done’ until this has happened.

[Regression Potential]
In order to mitigate the regression potential, the results of the
aforementioned tests are attached to this bug.

[Upstream changelog]
V10.2.9 JEWEL

This point release fixes a regression introduced in v10.2.8.

We recommend that all Jewel users upgrade.

For more detailed information, see the complete changelog.

NOTABLE CHANGES

cephfs: Damaged MDS with 10.2.8 (issue#20599, pr#16282, Nathan Cutler)

V10.2.8 JEWEL

This point release brought a number of important bugfixes in all major components of Ceph. However, it also introduced a regression that could cause MDS damage, and a new release, v10.2.9, was published to address this. Therefore, Jewel users should not upgrade to this version - instead, we recommend upgrading directly to v10.2.9.

For more detailed information, see the complete changelog.

OSD REMOVAL CAVEAT

There was a bug introduced in Jewel (#19119) that broke the mapping behavior when an “out” OSD that still existed in the CRUSH map was removed with ‘osd rm’. This could result in ‘misdirected op’ and other errors. The bug is now fixed, but the fix itself introduces the same risk because the behavior may vary between clients and OSDs. To avoid problems, please ensure that all OSDs are removed from the CRUSH map before deleting them. That is, be sure to do:

ceph osd crush rm osd.123
before:

ceph osd rm osd.123
SNAP TRIMMER IMPROVEMENTS

This release greatly improves control and throttling of the snap trimmer. It introduces the “osd max trimming pgs” option (defaulting to 2), which limits how many PGs on an OSD can be trimming snapshots at a time. And it restores the safe use of the “osd snap trim sleep” option, wihch defaults to 0 but otherwise adds the given number of seconds in delay between every dispatch of trim operations to the underlying system.

OTHER NOTABLE CHANGES

build/ops: “osd marked itself down” will not recognised if host runs mon + osd on shutdown/reboot (issue#18516, pr#13492, Boris Ranto)
build/ops: ceph-base package missing dependency for psmisc (issue#19129, pr#13786, Nathan Cutler)
build/ops: enable build of ceph-resource-agents package on rpm-based os (issue#17613, issue#19546, pr#13606, Nathan Cutler)
build/ops: rbdmap.service not included in debian packaging (jewel-only) (issue#19547, pr#14383, Ken Dreyer)
cephfs: Journaler may execute on_safe contexts prematurely (issue#20055, pr#15468, “Yan, Zheng”)
cephfs: MDS assert failed when shutting down (issue#19204, pr#14683, John Spray)
cephfs: MDS goes readonly writing backtrace for a file whose data pool has been removed (issue#19401, pr#14682, John Spray)
cephfs: MDS server crashes due to inconsistent metadata (issue#19406, pr#14676, John Spray)
cephfs: No output for ceph mds rmfailed 0 –yes-i-really-mean-it command (issue#16709, pr#14674, John Spray)
cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) (issue#18914, pr#14685, “Yan, Zheng”)
cephfs: Test failure: test_open_inode (issue#18661, pr#14669, John Spray)
cephfs: The mount point break off when mds switch hanppened (issue#19437, pr#14679, Guan yunfei)
cephfs: ceph-fuse does not recover after lost connection to MDS (issue#16743, issue#18757, pr#14698, Kefu Chai, Henrik Korkuc, Patrick Donnelly)
cephfs: client: fix the cross-quota rename boundary check conditions (issue#18699, pr#14667, Greg Farnum)
cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs to a file (issue#19033, pr#14684, Yang Honggang)
cephfs: non-local quota changes not visible until some IO is done (issue#17939, pr#15466, John Spray, Nathan Cutler)
cephfs: normalize file open flags internally used by cephfs (issue#18872, issue#19890, pr#15000, Jan Fajerski, “Yan, Zheng”)
common: monitor creation with IPv6 public network segfaults (issue#19371, pr#14324, Fabian Grünbichler)
common: radosstriper: protect aio_write API from calls with 0 bytes (issue#14609, pr#13254, Sebastien Ponce)
core: Objecter::epoch_barrier isn’t respected in _op_submit() (issue#19396, pr#14332, Ilya Dryomov)
core: clear divergent_priors set off disk (issue#17916, pr#14596, Greg Farnum)
core: improve snap trimming, enable restriction of parallelism (issue#19241, pr#14492, Samuel Just, Greg Farnum)
core: os/filestore/HashIndex: be loud about splits (issue#18235, pr#13788, Dan van der Ster)
core: os/filestore: fix clang static check warn use-after-free (issue#19311, pr#14044, liuchang0812, yaoning)
core: transient jerasure unit test failures (issue#18070, issue#17762, issue#18128, issue#17951, pr#14701, Kefu Chai, Pan Liu, Loic Dachary, Jason Dillaman)
core: two instances of omap_digest mismatch (issue#18533, pr#14204, Samuel Just, David Zafman)
doc: Improvements to crushtool manpage (issue#19649, pr#14635, Loic Dachary, Nathan Cutler)
doc: PendingReleaseNotes: note about 19119 (issue#19119, pr#13732, Sage Weil)
doc: admin ops: fix the quota section (issue#19397, pr#14654, Chu, Hua-Rong)
doc: radosgw-admin: add the ‘object stat’ command to usage (issue#19013, pr#13872, Pavan Rallabhandi)
doc: rgw S3 create bucket should not do response in json (issue#18889, pr#13874, Abhishek Lekshmanan)
fs: Invalid error code returned by MDS is causing a kernel client WARNING (issue#19205, pr#13831, Jan Fajerski, xie xingguo)
librbd: Incomplete declaration for ContextWQ in librbd/Journal.h (issue#18862, pr#14152, Boris Ranto)
librbd: Issues with C API image metadata retrieval functions (issue#19588, pr#14666, Mykola Golub)
librbd: Possible deadlock performing a synchronous API action while refresh in-progress (issue#18419, pr#13154, Jason Dillaman)
librbd: is_exclusive_lock_owner API should ping OSD (issue#19287, pr#14481, Jason Dillaman)
librbd: remove image header lock assertions (issue#18244, pr#13809, Jason Dillaman)
mds: C_MDSInternalNoop::complete doesn’t free itself (issue#19501, pr#14677, “Yan, Zheng”)
mds: Too many stat ops when trying to probe a large file (issue#19955, pr#15472, “Yan, Zheng”)
mds: avoid reusing deleted inode in StrayManager::_purge_stray_logged (issue#18877, pr#14670, Zhi Zhang)
mds: enable start when session ino info is corrupt (issue#19708, issue#16842, pr#14700, John Spray)
mds: fragment space check can cause replayed request fail (issue#18660, pr#14668, “Yan, Zheng”)
mds: heartbeat timeout during rejoin, when working with large amount of caps/inodes (issue#19118, pr#14672, John Spray)
mds: issue new caps when sending reply to client (issue#19635, pr#15438, “Yan, Zheng”)
mon: OSDMonitor: make ‘osd crush move ...’ work on osds (issue#18587, pr#13261, Sage Weil)
mon: fix ‘sortbitwise’ warning on jewel (issue#20578, pr#15208, huanwen ren, Sage Weil)
mon: make get_mon_log_message() atomic (issue#19427, pr#14587, Kefu Chai)
mon: remove bad rocksdb option (issue#19392, pr#14236, Sage Weil)
msg: IPv6 Heartbeat packets are not marked with DSCP QoS - simple messenger (issue#18887, pr#13450, Yan Jun, Robin H. Johnson)
msg: set close on exec flag (issue#16390, pr#13585, Kefu Chai)
osd: –flush-journal: sporadic segfaults on exit (issue#18820, pr#13477, Alexey Sheplyakov)
osd: Give requested scrubs a higher priority (issue#15789, pr#14686, David Zafman)
osd: Implement asynchronous scrub sleep (issue#19986, issue#19497, pr#15529, Brad Hubbard)
osd: Object level shard errors are tracked and used if no auth available (issue#20089, pr#15416, David Zafman)
osd: ReplicatedPG: try with pool’s use-gmt setting if hitset archive not found (issue#19185, pr#13827, Kefu Chai)
osd: allow client throttler to be adjusted on-fly, without restart (issue#18791, pr#13214, Piotr Dałek)
osd: bypass readonly ops when osd full (issue#19394, pr#14181, Jianpeng Ma, yaoning)
osd: degraded and misplaced status output inaccurate (issue#18619, pr#14325, David Zafman)
osd: new added OSD always down when full flag is set (issue#15025, pr#14326, Mingxin Liu)
osd: pg_pool_t::encode(): be compatible with Hammer <= 0.94.6 (issue#19508, pr#14392, Alexey Sheplyakov)
osd: pre-jewel “osd rm” incrementals are misinterpreted (issue#19119, pr#13884, Ilya Dryomov)
osd: preserve allocation hint attribute during recovery (issue#19083, pr#13647, yaoning)
osd: promote throttle parameters are reversed (issue#19773, pr#14791, Mark Nelson)
osd: reindex properly on pg log split (issue#18975, pr#14047, Alexey Sheplyakov)
osd: restrict want_acting to up+acting on recovery completion (issue#18929, pr#13541, Sage Weil)
rbd-nbd: check /sys/block/nbdX/size to ensure kernel mapped correctly (issue#18335, pr#13932, Mykola Golub, Alexey Sheplyakov)
rbd: [api] temporarily restrict (rbd_)mirror_peer_add from adding multiple peers (issue#19256, pr#14664, Jason Dillaman)
rbd: qemu crash triggered by network issues (issue#18436, pr#13244, Jason Dillaman)
rbd: rbd –pool=x rename y z does not work (issue#18326, pr#14148, Gaurav Kumar Garg)
rbd: systemctl stop rbdmap unmaps all rbds and not just the ones in /etc/ceph/rbdmap (issue#18884, issue#18262, pr#14083, David Disseldorp, Nathan Cutler)
rgw: “cluster [WRN] bad locator @X on object @X....” in cluster log (issue#18980, pr#14064, Casey Bodley)
rgw: ‘radosgw-admin sync status’ on master zone of non-master zonegroup (issue#18091, pr#13779, Jing Wenjun)
rgw: Change loglevel to 20 for ‘System already converted’ message (issue#18919, pr#13834, Vikhyat Umrao)
rgw: Use decoded URI when verifying TempURL (issue#18590, pr#13724, Alexey Sheplyakov)
rgw: a few cases where rgw_obj is incorrectly initialized (issue#19096, pr#13842, Yehuda Sadeh)
rgw: add apis to support ragweed suite (issue#19804, pr#14851, Yehuda Sadeh)
rgw: add bucket size limit check to radosgw-admin (issue#17925, pr#14787, Matt Benjamin)
rgw: allow system users to read SLO parts (issue#19027, pr#14752, Casey Bodley)
rgw: don’t return skew time in pre-signed url (issue#18828, issue#18829, pr#14605, liuchang0812)
rgw: failure to create s3 type subuser from admin rest api (issue#16682, pr#14815, snakeAngel2015)
rgw: fix break inside of yield in RGWFetchAllMetaCR (issue#17655, pr#14066, Casey Bodley)
rgw: fix failed to create bucket if a non-master zonegroup has a single zone (issue#19756, pr#14766, weiqiaomiao)
rgw: health check errors out incorrectly (issue#19025, pr#13865, Pavan Rallabhandi)
rgw: list_plain_entries() stops before bi_log entries (issue#19876, pr#15383, Casey Bodley)
rgw: multisite: fetch_remote_obj() gets wrong version when copying from remote (issue#19599, pr#14607, Zhang Shaowen, Casey Bodley)
rgw: multisite: some yields in RGWMetaSyncShardCR::full_sync() resume in incremental_sync() (issue#18076, pr#13837, Casey Bodley, Abhishek Lekshmanan)
rgw: only append zonegroups to rest params if not empty (issue#20078, pr#15312, Yehuda Sadeh, Karol Mroz)
rgw: pullup civet chunked (issue#19736, pr#14776, Matt Benjamin)
rgw: rgw_file: fix event expire check, don’t expire directories being read (issue#19623, issue#19270, issue#19625, issue#19624, issue#19634, issue#19435, pr#14653, Gui Hecheng, Matt Benjamin)
rgw: swift: disable revocation thread under certain circumstances (issue#19499, issue#9493, pr#14789, Marcus Watts)
rgw: the swift container acl does not support field .ref (issue#18484, pr#13833, Jing Wenjun)
rgw: typo in rgw_admin.cc (issue#19026, pr#13863, Ronak Jain)
rgw: unsafe access in RGWListBucket_ObjStore_SWIFT::send_response() (issue#19249, pr#14661, Yehuda Sadeh)
rgw: upgrade to multisite v2 fails if there is a zone without zone info (issue#19231, pr#14136, Danny Al-Gaaf, Orit Wasserman)
rgw: use separate http_manager for read_sync_status (issue#19236, pr#14195, Casey Bodley, Shasha Lu)
rgw: when converting region_map we need to use rgw_zone_root_pool (issue#19195, pr#14143, Orit Wasserman)
rgw: zonegroupmap set does not work (issue#19498, issue#18725, pr#14660, Orit Wasserman, Casey Bodley)
rgw:fix memory leaks in data/md sync (issue#20088, pr#15382, weiqiaomiao)
tests: ‘ceph auth import -i’ overwrites caps, should alert user before overwrite (issue#18932, pr#13544, Vikhyat Umrao)
tests: New upgrade test for #19508 (issue#19829, issue#19508, pr#14930, Nathan Cutler)
tests: [ FAILED ] TestLibRBD.ImagePollIO in upgrade:client-upgrade-kraken-distro-basic-smithi (issue#18617, pr#13107, Jason Dillaman)
tests: [librados_test_stub] cls_cxx_map_get_XYZ methods don’t return correct value (issue#19597, pr#14665, Jason Dillaman)
tests: additional rbd-mirror test stability improvements (issue#18935, pr#14154, Jason Dillaman)
tests: api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (issue#15368, pr#14763, Sage Weil)
tests: buffer overflow in test LibCephFS.DirLs (issue#18941, pr#14671, “Yan, Zheng”)
tests: clone workunit using the branch specified by task (issue#19429, pr#14371, Kefu Chai, Dan Mick)
tests: drop upgrade/hammer-jewel-x (issue#20574, pr#15933, Nathan Cutler)
tests: dummy suite fails in OpenStack (issue#18259, pr#14070, Nathan Cutler)
tests: eliminate race condition in Thrasher constructor (issue#18799, pr#13608, Nathan Cutler)
tests: enable quotas for pre-luminous quota tests (issue#20412, pr#15936, Patrick Donnelly)
tests: fix oversight in yaml comment (issue#20581, pr#14449, Nathan Cutler)
tests: move swift.py task from teuthology to ceph, phase one (jewel) (issue#20392, pr#15870, Nathan Cutler, Sage Weil, Warren Usui, Greg Farnum, Ali Maredia, Tommi Virtanen, Zack Cerza, Sam Lang, Yehuda Sadeh, Joe Buck, Josh Durgin)
tests: qa/Fixed upgrade sequence to 10.2.0 -> 10.2.7 -> latest -x (10.2.8) (issue#20572, pr#16089, Yuri Weinstein)
tests: qa/suites/upgrade/hammer-x: set “sortbitwise” for jewel clusters (issue#20342, pr#15842, Nathan Cutler)
tests: qa/workunits/rados/test-upgrade-*: whitelist tests for master (part 1) (issue#20577, pr#15360, Sage Weil)
tests: qa/workunits/rados/test-upgrade-*: whitelist tests for master (part 2) (issue#20576, pr#15778, Kefu Chai)
tests: qa/workunits/rados/test-upgrade-*: whitelist tests the right way (issue#20575, pr#15824, Kefu Chai)
tests: rados: sleep before ceph tell osd.0 flush_pg_stats after restart (issue#16239, issue#20489, pr#14710, Kefu Chai, Nathan Cutler)
tests: run upgrade/client-upgrade on latest CentOS 7.3 (issue#20573, pr#16088, Nathan Cutler)
tests: run-rbd-unit-tests.sh assert in lockdep_will_lock, TestLibRBD.ObjectMapConsistentSnap (issue#17447, pr#14150, Jason Dillaman)
tests: systemd test backport to jewel (issue#19717, pr#14694, Vasu Kulkarni)
tests: test/librados/tmap_migrate: g_ceph_context->put() upon return (issue#20579, pr#14809, Kefu Chai)
tests: test_notify.py: rbd.InvalidArgument: error updating features for image test_notify_clone2 (issue#19692, pr#14680, Jason Dillaman)
tests: upgrade/hammer-x failing with OSD has the store locked when Thrasher runs ceph-objectstore-tool on down PG (issue#19556, pr#14416, Nathan Cutler)
tests: upgrade:hammer-x/stress-split-erasure-code-x86_64 fails in 10.2.8 integration testing (issue#20413, pr#15904, Nathan Cutler)
tools: brag fails to count “in” mds (issue#19192, pr#14112, Oleh Prypin, Peng Zhang)
tools: ceph-disk does not support cluster names different than ‘ceph’ (issue#17821, pr#14765, Loic Dachary)
tools: ceph-disk: Racing between partition creation and device node creation (issue#19428, pr#14329, Erwan Velu)
tools: ceph-disk: bluestore –setgroup incorrectly set with user (issue#18955, pr#13489, craigchi)
tools: ceph-disk: ceph-disk list reports mount error for OSD having mount options with SELinux context (issue#17331, pr#14402, Brad Hubbard)
tools: ceph-disk: do not setup_statedir on trigger (issue#19941, pr#15504, Loic Dachary)
tools: ceph-disk: enable directory backed OSD at boot time (issue#19628, pr#14602, Loic Dachary)
tools: rados: RadosImport::import should return an error if Rados::connect fails (issue#19319, pr#14113, Brad Hubbard)

James Page (james-page) on 2017-07-26
Changed in cloud-archive:
status: New → Invalid
Changed in ceph (Ubuntu):
status: New → Invalid
Changed in ceph (Ubuntu Xenial):
status: New → Triaged
Changed in ceph (Ubuntu Zesty):
status: New → Triaged
Changed in ceph (Ubuntu Xenial):
importance: Undecided → Medium
Changed in ceph (Ubuntu Zesty):
importance: Undecided → Medium
James Page (james-page) on 2017-09-26
description: updated

Hello James, or anyone else affected,

Accepted ceph into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/10.2.9-0ubuntu0.17.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ceph (Ubuntu Zesty):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-zesty
Changed in ceph (Ubuntu Xenial):
status: Triaged → Fix Committed
tags: added: verification-needed-xenial
Brian Murray (brian-murray) wrote :

Hello James, or anyone else affected,

Accepted ceph into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/10.2.9-0ubuntu0.16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

James Page (james-page) wrote :

Hello James, or anyone else affected,

Accepted ceph into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers