Activity log for bug #1978913

Date Who What changed Old value New value Message
2022-06-16 06:05:12 nikhil kshirsagar bug added bug
2022-08-16 16:20:17 Dan Hill nominated for series Ubuntu Bionic
2022-08-16 16:20:17 Dan Hill bug task added ceph (Ubuntu Bionic)
2022-08-16 16:20:17 Dan Hill nominated for series Ubuntu Kinetic
2022-08-16 16:20:17 Dan Hill bug task added ceph (Ubuntu Kinetic)
2022-08-16 16:20:17 Dan Hill nominated for series Ubuntu Jammy
2022-08-16 16:20:17 Dan Hill bug task added ceph (Ubuntu Jammy)
2022-08-16 16:20:17 Dan Hill nominated for series Ubuntu Focal
2022-08-16 16:20:17 Dan Hill bug task added ceph (Ubuntu Focal)
2022-08-16 16:26:35 Dan Hill bug task added cloud-archive
2022-08-16 16:30:48 Dan Hill nominated for series cloud-archive/xena
2022-08-16 16:30:48 Dan Hill bug task added cloud-archive/xena
2022-08-16 16:30:48 Dan Hill nominated for series cloud-archive/wallaby
2022-08-16 16:30:48 Dan Hill bug task added cloud-archive/wallaby
2022-08-16 16:30:48 Dan Hill nominated for series cloud-archive/queens
2022-08-16 16:30:48 Dan Hill bug task added cloud-archive/queens
2022-08-16 16:30:48 Dan Hill nominated for series cloud-archive/yoga
2022-08-16 16:30:48 Dan Hill bug task added cloud-archive/yoga
2022-08-16 16:30:48 Dan Hill nominated for series cloud-archive/ussuri
2022-08-16 16:30:48 Dan Hill bug task added cloud-archive/ussuri
2022-08-16 16:31:40 Dan Hill tags seg
2022-09-30 11:16:14 nikhil kshirsagar attachment added 1978913.patch https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1978913/+attachment/5620269/+files/1978913.patch
2022-09-30 12:50:58 Ubuntu Foundations Team Bug Bot tags seg patch seg
2022-09-30 12:51:10 Ubuntu Foundations Team Bug Bot bug added subscriber Ubuntu Review Team
2022-10-31 06:14:02 nikhil kshirsagar attachment removed 1978913.patch https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1978913/+attachment/5620269/+files/1978913.patch
2022-10-31 06:15:54 nikhil kshirsagar attachment added LP1978913.patch https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1978913/+attachment/5627950/+files/LP1978913.patch
2022-10-31 06:26:54 nikhil kshirsagar description [Impact] ceph-osd takes all memory at boot [Test Plan] https://tracker.ceph.com/issues/53729 [Where problems could occur] Trimming large clusters could be time consuming. [Other Info] The way this is fixed is that the PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth. Reported upstream at https://tracker.ceph.com/issues/53729 and fixed on master through https://github.com/ceph/ceph/pull/45529 and on octopus through https://github.com/ceph/ceph/pull/46253 [Impact] ceph-osd takes all memory at boot [Test Plan] https://tracker.ceph.com/issues/53729 [Where problems could occur] Trimming large clusters could be time consuming. [Other Info] The way this is fixed is that the PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth. Reported upstream at https://tracker.ceph.com/issues/53729 and fixed on master through https://github.com/ceph/ceph/pull/47046
2022-11-01 11:15:54 Dan Hill tags patch seg patch seg sts
2022-11-02 08:02:33 nikhil kshirsagar description [Impact] ceph-osd takes all memory at boot [Test Plan] https://tracker.ceph.com/issues/53729 [Where problems could occur] Trimming large clusters could be time consuming. [Other Info] The way this is fixed is that the PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth. Reported upstream at https://tracker.ceph.com/issues/53729 and fixed on master through https://github.com/ceph/ceph/pull/47046 [Impact] ceph-osd takes all memory at boot [Test Plan] To see the problem, follow this approach for a test cluster, with for eg. 3 OSDs, #ps -eaf | grep osd root 334891 1 0 Sep21 ? 00:42:03 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf root 335541 1 0 Sep21 ? 00:40:20 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf kill all OSDs, so they're down, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ceph -s 2022-09-22T08:26:15.120+0000 7fa9694fe700 -1 WARNING: all dangerous and experimental features are enabled. 2022-09-22T08:26:15.140+0000 7fa963fff700 -1 WARNING: all dangerous and experimental features are enabled. cluster: id: 9e7c0a82-8072-4c48-b697-1e6399b4fc9e health: HEALTH_WARN 2 osds down 1 host (3 osds) down 1 root (3 osds) down Reduced data availability: 169 pgs stale Degraded data redundancy: 255/765 objects degraded (33.333%), 64 pgs degraded, 169 pgs undersized services: mon: 3 daemons, quorum a,b,c (age 3s) mgr: x(active, since 28h) mds: a:1 {0=a=up:active} osd: 3 osds: 0 up (since 83m), 2 in (since 91m) rgw: 1 daemon active (8000) task status: data: pools: 7 pools, 169 pgs objects: 255 objects, 9.5 KiB usage: 4.1 GiB used, 198 GiB / 202 GiB avail pgs: 255/765 objects degraded (33.333%) 105 stale+active+undersized 64 stale+active+undersized+degraded Then inject dups using this json for all OSDs, root@nikhil-Lenovo-Legion-Y540-15IRH-PG0:/home/nikhil/HDD_MOUNT/Downloads/ceph_build_oct/ceph/build# cat bin/dups.json [ {"reqid": "client.4177.0:0", "version": "3'0", "user_version": "0", "generate": "50000000", "return_code": "0"} ] Use the ceph-objectstore-tool with the --pg-log-inject-dups parameter, to inject dups for all OSDs. root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd0/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd2/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e Then set osd debug level to 20 (since here is the log that actually doing the trim: https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138, so need debug_osd = 20) set debug osd=20 in global in ceph.conf, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# cat ceph.conf | grep "debug osd" debug osd=20 Then bring up the OSDs /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 1 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf Run some IO on the OSDs. Wait at least a few hours. Then take the OSDs down (so the command below can be run), and run, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1e --op log > op.log You will see at the end of that output in the file op.log, the number of dups is still as it was when they were injected, (no trimming has taken place) { "reqid": "client.4177.0:0", "version": "3'499999", "user_version": "0", "return_code": "0" }, { "reqid": "client.4177.0:0", <-- note the id (4177) "version": "3'500000", <--- "user_version": "0", "return_code": "0" } ] }, "pg_missing_t": { "missing": [], "may_include_deletes": true } To verify the patch: With the patch in place, once the dups are injected, output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log will again show the dups (this command should be run with the OSDs down, like before). Then bring up the OSDs and start IO using rbd bench-write, leave the IO running a few hours, till these logs (https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138) are seen as below, in the osd logs, with the same client ID (4177 in my example) as the one that the client that injected the dups had used, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build/out# cat osd.1.log | grep -i "trim dup " | grep 4177 | more 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'5 uv=0 rc=0) ... ... 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'52 uv=0 rc=0) # grep -ri "trim dup " *.log | grep 4177 | wc -l 390001 <-- total of all OSDs, should be ~ 3x what is seen in the below output (dups trimmed till 130001) if you have 3 OSDs for eg. Basically this number of trimmed dup logs are from all OSDs combined. And the output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log (you would need to take the particular OSD down for verifying this) will show that the first bunch of (130k for eg. here) dups have been trimmed already, see the "version", "dups": [ { "reqid": "client.4177.0:0", "version": "3'130001", <---- "user_version": "0", "return_code": "0" }, { "reqid": "client.4177.0:0", "version": "3'130002", "user_version": "0", "return_code": "0" }, This will verify that the dups are being trimmed by the patch, and it is working correctly. And of course, OSDs should not go OOM at boot time! [Where problems could occur] This is not a clean cherry-pick due to some differences in the octopus and master codebases, related to RocksDBStore and Objectstore. (see https://github.com/ceph/ceph/pull/47046#issuecomment-1243252126). Also, an earlier attempt to fix this issue upstream was reverted, as discussed at https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1978913/comments/1 While this fix has been tested and validated after building it into the upstream 15.2.17 release (please see the [Test Plan] section), we would still need to proceed with extreme caution by allowing some time for problems (if any) to surface before going ahead with this SRU, and running our QA tests on the packages that build this fix into the 15.2.17 release before releasing it to the customer who await this fix on octopus. [Other Info] The way this is fixed is that the PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth. Reported upstream at https://tracker.ceph.com/issues/53729 and fixed on master through https://github.com/ceph/ceph/pull/47046
2022-11-02 08:03:35 nikhil kshirsagar description [Impact] ceph-osd takes all memory at boot [Test Plan] To see the problem, follow this approach for a test cluster, with for eg. 3 OSDs, #ps -eaf | grep osd root 334891 1 0 Sep21 ? 00:42:03 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf root 335541 1 0 Sep21 ? 00:40:20 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf kill all OSDs, so they're down, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ceph -s 2022-09-22T08:26:15.120+0000 7fa9694fe700 -1 WARNING: all dangerous and experimental features are enabled. 2022-09-22T08:26:15.140+0000 7fa963fff700 -1 WARNING: all dangerous and experimental features are enabled. cluster: id: 9e7c0a82-8072-4c48-b697-1e6399b4fc9e health: HEALTH_WARN 2 osds down 1 host (3 osds) down 1 root (3 osds) down Reduced data availability: 169 pgs stale Degraded data redundancy: 255/765 objects degraded (33.333%), 64 pgs degraded, 169 pgs undersized services: mon: 3 daemons, quorum a,b,c (age 3s) mgr: x(active, since 28h) mds: a:1 {0=a=up:active} osd: 3 osds: 0 up (since 83m), 2 in (since 91m) rgw: 1 daemon active (8000) task status: data: pools: 7 pools, 169 pgs objects: 255 objects, 9.5 KiB usage: 4.1 GiB used, 198 GiB / 202 GiB avail pgs: 255/765 objects degraded (33.333%) 105 stale+active+undersized 64 stale+active+undersized+degraded Then inject dups using this json for all OSDs, root@nikhil-Lenovo-Legion-Y540-15IRH-PG0:/home/nikhil/HDD_MOUNT/Downloads/ceph_build_oct/ceph/build# cat bin/dups.json [ {"reqid": "client.4177.0:0", "version": "3'0", "user_version": "0", "generate": "50000000", "return_code": "0"} ] Use the ceph-objectstore-tool with the --pg-log-inject-dups parameter, to inject dups for all OSDs. root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd0/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd2/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e Then set osd debug level to 20 (since here is the log that actually doing the trim: https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138, so need debug_osd = 20) set debug osd=20 in global in ceph.conf, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# cat ceph.conf | grep "debug osd" debug osd=20 Then bring up the OSDs /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 1 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf Run some IO on the OSDs. Wait at least a few hours. Then take the OSDs down (so the command below can be run), and run, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1e --op log > op.log You will see at the end of that output in the file op.log, the number of dups is still as it was when they were injected, (no trimming has taken place) { "reqid": "client.4177.0:0", "version": "3'499999", "user_version": "0", "return_code": "0" }, { "reqid": "client.4177.0:0", <-- note the id (4177) "version": "3'500000", <--- "user_version": "0", "return_code": "0" } ] }, "pg_missing_t": { "missing": [], "may_include_deletes": true } To verify the patch: With the patch in place, once the dups are injected, output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log will again show the dups (this command should be run with the OSDs down, like before). Then bring up the OSDs and start IO using rbd bench-write, leave the IO running a few hours, till these logs (https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138) are seen as below, in the osd logs, with the same client ID (4177 in my example) as the one that the client that injected the dups had used, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build/out# cat osd.1.log | grep -i "trim dup " | grep 4177 | more 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'5 uv=0 rc=0) ... ... 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'52 uv=0 rc=0) # grep -ri "trim dup " *.log | grep 4177 | wc -l 390001 <-- total of all OSDs, should be ~ 3x what is seen in the below output (dups trimmed till 130001) if you have 3 OSDs for eg. Basically this number of trimmed dup logs are from all OSDs combined. And the output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log (you would need to take the particular OSD down for verifying this) will show that the first bunch of (130k for eg. here) dups have been trimmed already, see the "version", "dups": [ { "reqid": "client.4177.0:0", "version": "3'130001", <---- "user_version": "0", "return_code": "0" }, { "reqid": "client.4177.0:0", "version": "3'130002", "user_version": "0", "return_code": "0" }, This will verify that the dups are being trimmed by the patch, and it is working correctly. And of course, OSDs should not go OOM at boot time! [Where problems could occur] This is not a clean cherry-pick due to some differences in the octopus and master codebases, related to RocksDBStore and Objectstore. (see https://github.com/ceph/ceph/pull/47046#issuecomment-1243252126). Also, an earlier attempt to fix this issue upstream was reverted, as discussed at https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1978913/comments/1 While this fix has been tested and validated after building it into the upstream 15.2.17 release (please see the [Test Plan] section), we would still need to proceed with extreme caution by allowing some time for problems (if any) to surface before going ahead with this SRU, and running our QA tests on the packages that build this fix into the 15.2.17 release before releasing it to the customer who await this fix on octopus. [Other Info] The way this is fixed is that the PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth. Reported upstream at https://tracker.ceph.com/issues/53729 and fixed on master through https://github.com/ceph/ceph/pull/47046 [Impact] ceph-osd takes all memory at boot [Test Plan] To see the problem, follow this approach for a test cluster, with for eg. 3 OSDs, #ps -eaf | grep osd root 334891 1 0 Sep21 ? 00:42:03 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf root 335541 1 0 Sep21 ? 00:40:20 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf kill all OSDs, so they're down, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ceph -s 2022-09-22T08:26:15.120+0000 7fa9694fe700 -1 WARNING: all dangerous and experimental features are enabled. 2022-09-22T08:26:15.140+0000 7fa963fff700 -1 WARNING: all dangerous and experimental features are enabled.   cluster:     id: 9e7c0a82-8072-4c48-b697-1e6399b4fc9e     health: HEALTH_WARN             2 osds down             1 host (3 osds) down             1 root (3 osds) down             Reduced data availability: 169 pgs stale             Degraded data redundancy: 255/765 objects degraded (33.333%), 64 pgs degraded, 169 pgs undersized   services:     mon: 3 daemons, quorum a,b,c (age 3s)     mgr: x(active, since 28h)     mds: a:1 {0=a=up:active}     osd: 3 osds: 0 up (since 83m), 2 in (since 91m)     rgw: 1 daemon active (8000)   task status:   data:     pools: 7 pools, 169 pgs     objects: 255 objects, 9.5 KiB     usage: 4.1 GiB used, 198 GiB / 202 GiB avail     pgs: 255/765 objects degraded (33.333%)              105 stale+active+undersized              64 stale+active+undersized+degraded Then inject dups using this json for all OSDs, root@nikhil-Lenovo-Legion-Y540-15IRH-PG0:/home/nikhil/HDD_MOUNT/Downloads/ceph_build_oct/ceph/build# cat bin/dups.json [  {"reqid": "client.4177.0:0",  "version": "3'0",  "user_version": "0",  "generate": "500000",  "return_code": "0"} ] Use the ceph-objectstore-tool with the --pg-log-inject-dups parameter, to inject dups for all OSDs. root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd0/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd2/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e Then set osd debug level to 20 (since here is the log that actually doing the trim: https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138, so need debug_osd = 20) set debug osd=20 in global in ceph.conf, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# cat ceph.conf | grep "debug osd"         debug osd=20 Then bring up the OSDs /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 1 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf Run some IO on the OSDs. Wait at least a few hours. Then take the OSDs down (so the command below can be run), and run, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1e --op log > op.log You will see at the end of that output in the file op.log, the number of dups is still as it was when they were injected, (no trimming has taken place)             {                 "reqid": "client.4177.0:0",                 "version": "3'499999",                 "user_version": "0",                 "return_code": "0"             },             {                 "reqid": "client.4177.0:0", <-- note the id (4177)                 "version": "3'500000", <---                 "user_version": "0",                 "return_code": "0"             }         ]     },     "pg_missing_t": {         "missing": [],         "may_include_deletes": true     } To verify the patch: With the patch in place, once the dups are injected, output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log will again show the dups (this command should be run with the OSDs down, like before). Then bring up the OSDs and start IO using rbd bench-write, leave the IO running a few hours, till these logs (https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138) are seen as below, in the osd logs, with the same client ID (4177 in my example) as the one that the client that injected the dups had used, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build/out# cat osd.1.log | grep -i "trim dup " | grep 4177 | more 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'5 uv=0 rc=0) ... ... 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'52 uv=0 rc=0) # grep -ri "trim dup " *.log | grep 4177 | wc -l 390001 <-- total of all OSDs, should be ~ 3x what is seen in the below output (dups trimmed till 130001) if you have 3 OSDs for eg. Basically this number of trimmed dup logs are from all OSDs combined. And the output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log (you would need to take the particular OSD down for verifying this) will show that the first bunch of (130k for eg. here) dups have been trimmed already, see the "version",  "dups": [             {                 "reqid": "client.4177.0:0",                 "version": "3'130001", <----                 "user_version": "0",                 "return_code": "0"             },             {                 "reqid": "client.4177.0:0",                 "version": "3'130002",                 "user_version": "0",                 "return_code": "0"             }, This will verify that the dups are being trimmed by the patch, and it is working correctly. And of course, OSDs should not go OOM at boot time! [Where problems could occur] This is not a clean cherry-pick due to some differences in the octopus and master codebases, related to RocksDBStore and Objectstore. (see https://github.com/ceph/ceph/pull/47046#issuecomment-1243252126). Also, an earlier attempt to fix this issue upstream was reverted, as discussed at https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1978913/comments/1 While this fix has been tested and validated after building it into the upstream 15.2.17 release (please see the [Test Plan] section), we would still need to proceed with extreme caution by allowing some time for problems (if any) to surface before going ahead with this SRU, and running our QA tests on the packages that build this fix into the 15.2.17 release before releasing it to the customer who await this fix on octopus. [Other Info] The way this is fixed is that the PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth. Reported upstream at https://tracker.ceph.com/issues/53729 and fixed on master through https://github.com/ceph/ceph/pull/47046
2022-11-02 08:04:56 nikhil kshirsagar description [Impact] ceph-osd takes all memory at boot [Test Plan] To see the problem, follow this approach for a test cluster, with for eg. 3 OSDs, #ps -eaf | grep osd root 334891 1 0 Sep21 ? 00:42:03 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf root 335541 1 0 Sep21 ? 00:40:20 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf kill all OSDs, so they're down, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ceph -s 2022-09-22T08:26:15.120+0000 7fa9694fe700 -1 WARNING: all dangerous and experimental features are enabled. 2022-09-22T08:26:15.140+0000 7fa963fff700 -1 WARNING: all dangerous and experimental features are enabled.   cluster:     id: 9e7c0a82-8072-4c48-b697-1e6399b4fc9e     health: HEALTH_WARN             2 osds down             1 host (3 osds) down             1 root (3 osds) down             Reduced data availability: 169 pgs stale             Degraded data redundancy: 255/765 objects degraded (33.333%), 64 pgs degraded, 169 pgs undersized   services:     mon: 3 daemons, quorum a,b,c (age 3s)     mgr: x(active, since 28h)     mds: a:1 {0=a=up:active}     osd: 3 osds: 0 up (since 83m), 2 in (since 91m)     rgw: 1 daemon active (8000)   task status:   data:     pools: 7 pools, 169 pgs     objects: 255 objects, 9.5 KiB     usage: 4.1 GiB used, 198 GiB / 202 GiB avail     pgs: 255/765 objects degraded (33.333%)              105 stale+active+undersized              64 stale+active+undersized+degraded Then inject dups using this json for all OSDs, root@nikhil-Lenovo-Legion-Y540-15IRH-PG0:/home/nikhil/HDD_MOUNT/Downloads/ceph_build_oct/ceph/build# cat bin/dups.json [  {"reqid": "client.4177.0:0",  "version": "3'0",  "user_version": "0",  "generate": "500000",  "return_code": "0"} ] Use the ceph-objectstore-tool with the --pg-log-inject-dups parameter, to inject dups for all OSDs. root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd0/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd2/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e Then set osd debug level to 20 (since here is the log that actually doing the trim: https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138, so need debug_osd = 20) set debug osd=20 in global in ceph.conf, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# cat ceph.conf | grep "debug osd"         debug osd=20 Then bring up the OSDs /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 1 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf Run some IO on the OSDs. Wait at least a few hours. Then take the OSDs down (so the command below can be run), and run, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1e --op log > op.log You will see at the end of that output in the file op.log, the number of dups is still as it was when they were injected, (no trimming has taken place)             {                 "reqid": "client.4177.0:0",                 "version": "3'499999",                 "user_version": "0",                 "return_code": "0"             },             {                 "reqid": "client.4177.0:0", <-- note the id (4177)                 "version": "3'500000", <---                 "user_version": "0",                 "return_code": "0"             }         ]     },     "pg_missing_t": {         "missing": [],         "may_include_deletes": true     } To verify the patch: With the patch in place, once the dups are injected, output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log will again show the dups (this command should be run with the OSDs down, like before). Then bring up the OSDs and start IO using rbd bench-write, leave the IO running a few hours, till these logs (https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138) are seen as below, in the osd logs, with the same client ID (4177 in my example) as the one that the client that injected the dups had used, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build/out# cat osd.1.log | grep -i "trim dup " | grep 4177 | more 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'5 uv=0 rc=0) ... ... 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'52 uv=0 rc=0) # grep -ri "trim dup " *.log | grep 4177 | wc -l 390001 <-- total of all OSDs, should be ~ 3x what is seen in the below output (dups trimmed till 130001) if you have 3 OSDs for eg. Basically this number of trimmed dup logs are from all OSDs combined. And the output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log (you would need to take the particular OSD down for verifying this) will show that the first bunch of (130k for eg. here) dups have been trimmed already, see the "version",  "dups": [             {                 "reqid": "client.4177.0:0",                 "version": "3'130001", <----                 "user_version": "0",                 "return_code": "0"             },             {                 "reqid": "client.4177.0:0",                 "version": "3'130002",                 "user_version": "0",                 "return_code": "0"             }, This will verify that the dups are being trimmed by the patch, and it is working correctly. And of course, OSDs should not go OOM at boot time! [Where problems could occur] This is not a clean cherry-pick due to some differences in the octopus and master codebases, related to RocksDBStore and Objectstore. (see https://github.com/ceph/ceph/pull/47046#issuecomment-1243252126). Also, an earlier attempt to fix this issue upstream was reverted, as discussed at https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1978913/comments/1 While this fix has been tested and validated after building it into the upstream 15.2.17 release (please see the [Test Plan] section), we would still need to proceed with extreme caution by allowing some time for problems (if any) to surface before going ahead with this SRU, and running our QA tests on the packages that build this fix into the 15.2.17 release before releasing it to the customer who await this fix on octopus. [Other Info] The way this is fixed is that the PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth. Reported upstream at https://tracker.ceph.com/issues/53729 and fixed on master through https://github.com/ceph/ceph/pull/47046 [Impact] ceph-osd takes all memory at boot [Test Plan] To see the problem, follow this approach for a test cluster, with for eg. 3 OSDs, #ps -eaf | grep osd root 334891 1 0 Sep21 ? 00:42:03 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf root 335541 1 0 Sep21 ? 00:40:20 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf kill all OSDs, so they're down, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ceph -s 2022-09-22T08:26:15.120+0000 7fa9694fe700 -1 WARNING: all dangerous and experimental features are enabled. 2022-09-22T08:26:15.140+0000 7fa963fff700 -1 WARNING: all dangerous and experimental features are enabled.   cluster:     id: 9e7c0a82-8072-4c48-b697-1e6399b4fc9e     health: HEALTH_WARN             2 osds down             1 host (3 osds) down             1 root (3 osds) down             Reduced data availability: 169 pgs stale             Degraded data redundancy: 255/765 objects degraded (33.333%), 64 pgs degraded, 169 pgs undersized   services:     mon: 3 daemons, quorum a,b,c (age 3s)     mgr: x(active, since 28h)     mds: a:1 {0=a=up:active}     osd: 3 osds: 0 up (since 83m), 2 in (since 91m)     rgw: 1 daemon active (8000)   task status:   data:     pools: 7 pools, 169 pgs     objects: 255 objects, 9.5 KiB     usage: 4.1 GiB used, 198 GiB / 202 GiB avail     pgs: 255/765 objects degraded (33.333%)              105 stale+active+undersized              64 stale+active+undersized+degraded Then inject dups using this json for all OSDs, root@nikhil-Lenovo-Legion-Y540-15IRH-PG0:/home/nikhil/HDD_MOUNT/Downloads/ceph_build_oct/ceph/build# cat bin/dups.json [  {"reqid": "client.4177.0:0",  "version": "3'0",  "user_version": "0",  "generate": "500000",  "return_code": "0"} ] Use the ceph-objectstore-tool with the --pg-log-inject-dups parameter, to inject dups for all OSDs. root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd0/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd2/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e Then set osd debug level to 20 (since here is the log that actually doing the trim: https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138, so need debug_osd = 20) set debug osd=20 in global in ceph.conf, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# cat ceph.conf | grep "debug osd"         debug osd=20 Then bring up the OSDs /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 1 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf Run some IO on the OSDs. Wait at least a few hours. Then take the OSDs down (so the command below can be run), and run, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1e --op log > op.log You will see at the end of that output in the file op.log, the number of dups is still as it was when they were injected, (no trimming has taken place)             {                 "reqid": "client.4177.0:0",                 "version": "3'499999",                 "user_version": "0",                 "return_code": "0"             },             {                 "reqid": "client.4177.0:0", <-- note the id (4177)                 "version": "3'500000", <---                 "user_version": "0",                 "return_code": "0"             }         ]     },     "pg_missing_t": {         "missing": [],         "may_include_deletes": true     } To verify the patch: With the patch in place, once the dups are injected, output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log will again show the dups (this command should be run with the OSDs down, like before). Then bring up the OSDs and start IO using rbd bench-write, leave the IO running a few hours, till these logs (https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138) are seen as below, in the osd logs, with the same client ID (4177 in my example) as the one that the client that injected the dups had used, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build/out# cat osd.1.log | grep -i "trim dup " | grep 4177 | more 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'5 uv=0 rc=0) ... ... 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'52 uv=0 rc=0) # grep -ri "trim dup " *.log | grep 4177 | wc -l 390001 <-- total of all OSDs, should be ~ 3x what is seen in the below output (dups trimmed till 130001) if you have 3 OSDs for eg. Basically this number of trimmed dup logs are from all OSDs combined. And the output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log (you would need to take the particular OSD down for verifying this) will show that the first bunch of (130k for eg. here) dups have been trimmed already, see the "version", which starts with the figure 3'130001 instead of 0 now,  "dups": [             {                 "reqid": "client.4177.0:0",                 "version": "3'130001", <----                 "user_version": "0",                 "return_code": "0"             },             {                 "reqid": "client.4177.0:0",                 "version": "3'130002",                 "user_version": "0",                 "return_code": "0"             }, This will verify that the dups are being trimmed by the patch, and it is working correctly. And of course, OSDs should not go OOM at boot time! [Where problems could occur] This is not a clean cherry-pick due to some differences in the octopus and master codebases, related to RocksDBStore and Objectstore. (see https://github.com/ceph/ceph/pull/47046#issuecomment-1243252126). Also, an earlier attempt to fix this issue upstream was reverted, as discussed at https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1978913/comments/1 While this fix has been tested and validated after building it into the upstream 15.2.17 release (please see the [Test Plan] section), we would still need to proceed with extreme caution by allowing some time for problems (if any) to surface before going ahead with this SRU, and running our QA tests on the packages that build this fix into the 15.2.17 release before releasing it to the customer who await this fix on octopus. [Other Info] The way this is fixed is that the PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth. Reported upstream at https://tracker.ceph.com/issues/53729 and fixed on master through https://github.com/ceph/ceph/pull/47046
2022-11-02 08:14:47 nikhil kshirsagar description [Impact] ceph-osd takes all memory at boot [Test Plan] To see the problem, follow this approach for a test cluster, with for eg. 3 OSDs, #ps -eaf | grep osd root 334891 1 0 Sep21 ? 00:42:03 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf root 335541 1 0 Sep21 ? 00:40:20 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf kill all OSDs, so they're down, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ceph -s 2022-09-22T08:26:15.120+0000 7fa9694fe700 -1 WARNING: all dangerous and experimental features are enabled. 2022-09-22T08:26:15.140+0000 7fa963fff700 -1 WARNING: all dangerous and experimental features are enabled.   cluster:     id: 9e7c0a82-8072-4c48-b697-1e6399b4fc9e     health: HEALTH_WARN             2 osds down             1 host (3 osds) down             1 root (3 osds) down             Reduced data availability: 169 pgs stale             Degraded data redundancy: 255/765 objects degraded (33.333%), 64 pgs degraded, 169 pgs undersized   services:     mon: 3 daemons, quorum a,b,c (age 3s)     mgr: x(active, since 28h)     mds: a:1 {0=a=up:active}     osd: 3 osds: 0 up (since 83m), 2 in (since 91m)     rgw: 1 daemon active (8000)   task status:   data:     pools: 7 pools, 169 pgs     objects: 255 objects, 9.5 KiB     usage: 4.1 GiB used, 198 GiB / 202 GiB avail     pgs: 255/765 objects degraded (33.333%)              105 stale+active+undersized              64 stale+active+undersized+degraded Then inject dups using this json for all OSDs, root@nikhil-Lenovo-Legion-Y540-15IRH-PG0:/home/nikhil/HDD_MOUNT/Downloads/ceph_build_oct/ceph/build# cat bin/dups.json [  {"reqid": "client.4177.0:0",  "version": "3'0",  "user_version": "0",  "generate": "500000",  "return_code": "0"} ] Use the ceph-objectstore-tool with the --pg-log-inject-dups parameter, to inject dups for all OSDs. root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd0/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd2/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e Then set osd debug level to 20 (since here is the log that actually doing the trim: https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138, so need debug_osd = 20) set debug osd=20 in global in ceph.conf, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# cat ceph.conf | grep "debug osd"         debug osd=20 Then bring up the OSDs /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 1 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf Run some IO on the OSDs. Wait at least a few hours. Then take the OSDs down (so the command below can be run), and run, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1e --op log > op.log You will see at the end of that output in the file op.log, the number of dups is still as it was when they were injected, (no trimming has taken place)             {                 "reqid": "client.4177.0:0",                 "version": "3'499999",                 "user_version": "0",                 "return_code": "0"             },             {                 "reqid": "client.4177.0:0", <-- note the id (4177)                 "version": "3'500000", <---                 "user_version": "0",                 "return_code": "0"             }         ]     },     "pg_missing_t": {         "missing": [],         "may_include_deletes": true     } To verify the patch: With the patch in place, once the dups are injected, output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log will again show the dups (this command should be run with the OSDs down, like before). Then bring up the OSDs and start IO using rbd bench-write, leave the IO running a few hours, till these logs (https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138) are seen as below, in the osd logs, with the same client ID (4177 in my example) as the one that the client that injected the dups had used, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build/out# cat osd.1.log | grep -i "trim dup " | grep 4177 | more 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'5 uv=0 rc=0) ... ... 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'52 uv=0 rc=0) # grep -ri "trim dup " *.log | grep 4177 | wc -l 390001 <-- total of all OSDs, should be ~ 3x what is seen in the below output (dups trimmed till 130001) if you have 3 OSDs for eg. Basically this number of trimmed dup logs are from all OSDs combined. And the output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log (you would need to take the particular OSD down for verifying this) will show that the first bunch of (130k for eg. here) dups have been trimmed already, see the "version", which starts with the figure 3'130001 instead of 0 now,  "dups": [             {                 "reqid": "client.4177.0:0",                 "version": "3'130001", <----                 "user_version": "0",                 "return_code": "0"             },             {                 "reqid": "client.4177.0:0",                 "version": "3'130002",                 "user_version": "0",                 "return_code": "0"             }, This will verify that the dups are being trimmed by the patch, and it is working correctly. And of course, OSDs should not go OOM at boot time! [Where problems could occur] This is not a clean cherry-pick due to some differences in the octopus and master codebases, related to RocksDBStore and Objectstore. (see https://github.com/ceph/ceph/pull/47046#issuecomment-1243252126). Also, an earlier attempt to fix this issue upstream was reverted, as discussed at https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1978913/comments/1 While this fix has been tested and validated after building it into the upstream 15.2.17 release (please see the [Test Plan] section), we would still need to proceed with extreme caution by allowing some time for problems (if any) to surface before going ahead with this SRU, and running our QA tests on the packages that build this fix into the 15.2.17 release before releasing it to the customer who await this fix on octopus. [Other Info] The way this is fixed is that the PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth. Reported upstream at https://tracker.ceph.com/issues/53729 and fixed on master through https://github.com/ceph/ceph/pull/47046 [Impact] The OSD will fail to trim the pg log dup entries, which could result in millions of dup entries for a PG while it was supposed to be at most 3000 (controlled by option osd_pg_log_dups_tracked). This could cause OSD to run out of memory and crash, and it might not be able to start up again due to the need of loading millions of dup entries. This could happen to multiple OSDs at the same time (as also reported by many community users), so we may get a completely unusable cluster if we hit this issue. The current known trigger for this problem is the pg split, as the whole dup entries will be copied during the pg split. The reason we don’t observe this so often before is that the pg autoscale wasn’t turned on by default, it’s on by default since from octopus. Note that there is also no way to check the number of dups in a PG online. [Test Plan] To see the problem, follow this approach for a test cluster, with for eg. 3 OSDs, #ps -eaf | grep osd root 334891 1 0 Sep21 ? 00:42:03 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf root 335541 1 0 Sep21 ? 00:40:20 /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf kill all OSDs, so they're down, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ceph -s 2022-09-22T08:26:15.120+0000 7fa9694fe700 -1 WARNING: all dangerous and experimental features are enabled. 2022-09-22T08:26:15.140+0000 7fa963fff700 -1 WARNING: all dangerous and experimental features are enabled.   cluster:     id: 9e7c0a82-8072-4c48-b697-1e6399b4fc9e     health: HEALTH_WARN             2 osds down             1 host (3 osds) down             1 root (3 osds) down             Reduced data availability: 169 pgs stale             Degraded data redundancy: 255/765 objects degraded (33.333%), 64 pgs degraded, 169 pgs undersized   services:     mon: 3 daemons, quorum a,b,c (age 3s)     mgr: x(active, since 28h)     mds: a:1 {0=a=up:active}     osd: 3 osds: 0 up (since 83m), 2 in (since 91m)     rgw: 1 daemon active (8000)   task status:   data:     pools: 7 pools, 169 pgs     objects: 255 objects, 9.5 KiB     usage: 4.1 GiB used, 198 GiB / 202 GiB avail     pgs: 255/765 objects degraded (33.333%)              105 stale+active+undersized              64 stale+active+undersized+degraded Then inject dups using this json for all OSDs, root@nikhil-Lenovo-Legion-Y540-15IRH-PG0:/home/nikhil/HDD_MOUNT/Downloads/ceph_build_oct/ceph/build# cat bin/dups.json [  {"reqid": "client.4177.0:0",  "version": "3'0",  "user_version": "0",  "generate": "500000",  "return_code": "0"} ] Use the ceph-objectstore-tool with the --pg-log-inject-dups parameter, to inject dups for all OSDs. root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd0/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd2/ --op pg-log-inject-dups --file bin/dups.json --no-mon-config --pgid 2.1e Then set osd debug level to 20 (since here is the log that actually doing the trim: https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138, so need debug_osd = 20) set debug osd=20 in global in ceph.conf, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# cat ceph.conf | grep "debug osd"         debug osd=20 Then bring up the OSDs /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 0 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 1 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf /home/nikhil/Downloads/ceph_build_oct/ceph/build/bin/ceph-osd -i 2 -c /home/nikhil/Downloads/ceph_build_oct/ceph/build/ceph.conf Run some IO on the OSDs. Wait at least a few hours. Then take the OSDs down (so the command below can be run), and run, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build# ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1e --op log > op.log You will see at the end of that output in the file op.log, the number of dups is still as it was when they were injected, (no trimming has taken place)             {                 "reqid": "client.4177.0:0",                 "version": "3'499999",                 "user_version": "0",                 "return_code": "0"             },             {                 "reqid": "client.4177.0:0", <-- note the id (4177)                 "version": "3'500000", <---                 "user_version": "0",                 "return_code": "0"             }         ]     },     "pg_missing_t": {         "missing": [],         "may_include_deletes": true     } To verify the patch: With the patch in place, once the dups are injected, output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log will again show the dups (this command should be run with the OSDs down, like before). Then bring up the OSDs and start IO using rbd bench-write, leave the IO running a few hours, till these logs (https://github.com/ceph/ceph/pull/47046/commits/aada08acde7a05ad769bb7a886ebcece628d522c#diff-b293fb673637ea53b5874bbb04f8f0638ca39cab009610e2cbc40a867bca4906L138) are seen as below, in the osd logs, with the same client ID (4177 in my example) as the one that the client that injected the dups had used, root@focal-new:/home/nikhil/Downloads/ceph_build_oct/ceph/build/out# cat osd.1.log | grep -i "trim dup " | grep 4177 | more 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'5 uv=0 rc=0) ... ... 2022-09-26T10:30:53.125+0000 7fdb72741700 1 trim dup log_dup(reqid=client.4177.0:0 v=3'52 uv=0 rc=0) # grep -ri "trim dup " *.log | grep 4177 | wc -l 390001 <-- total of all OSDs, should be ~ 3x what is seen in the below output (dups trimmed till 130001) if you have 3 OSDs for eg. Basically this number of trimmed dup logs are from all OSDs combined. And the output of ./bin/ceph-objectstore-tool --data-path dev/osd1/ --no-mon-config --pgid 2.1f --op log (you would need to take the particular OSD down for verifying this) will show that the first bunch of (130k for eg. here) dups have been trimmed already, see the "version", which starts with the figure 3'130001 instead of 0 now,  "dups": [             {                 "reqid": "client.4177.0:0",                 "version": "3'130001", <----                 "user_version": "0",                 "return_code": "0"             },             {                 "reqid": "client.4177.0:0",                 "version": "3'130002",                 "user_version": "0",                 "return_code": "0"             }, This will verify that the dups are being trimmed by the patch, and it is working correctly. And of course, OSDs should not go OOM at boot time! [Where problems could occur] This is not a clean cherry-pick due to some differences in the octopus and master codebases, related to RocksDBStore and Objectstore. (see https://github.com/ceph/ceph/pull/47046#issuecomment-1243252126). Also, an earlier attempt to fix this issue upstream was reverted, as discussed at https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1978913/comments/1 While this fix has been tested and validated after building it into the upstream 15.2.17 release (please see the [Test Plan] section), we would still need to proceed with extreme caution by allowing some time for problems (if any) to surface before going ahead with this SRU, and running our QA tests on the packages that build this fix into the 15.2.17 release before releasing it to the customer who await this fix on octopus. [Other Info] The way this is fixed is that the PGLog needs to trim duplicates by the number of entries rather than the versions. That way, we prevent unbounded duplicate growth. Reported upstream at https://tracker.ceph.com/issues/53729 and fixed on master through https://github.com/ceph/ceph/pull/47046
2023-01-27 16:36:54 Ponnuvel Palaniyappan bug watch added http://tracker.ceph.com/issues/53729
2023-03-06 17:21:40 James Page ceph (Ubuntu): status New Fix Released
2023-03-06 17:21:51 James Page ceph (Ubuntu Jammy): status New Invalid
2023-03-06 17:22:00 James Page ceph (Ubuntu Kinetic): status New Invalid
2023-03-06 18:00:53 James Page cloud-archive/yoga: status New Invalid
2023-03-06 18:01:03 James Page cloud-archive/wallaby: status New Invalid
2023-03-06 18:01:13 James Page cloud-archive/xena: status New Invalid
2023-03-06 18:01:37 James Page cloud-archive: status New Invalid
2023-03-06 18:02:53 James Page bug added subscriber Ubuntu Stable Release Updates Team
2023-03-06 18:03:53 James Page ceph (Ubuntu Focal): importance Undecided High
2023-03-06 18:04:01 James Page ceph (Ubuntu Focal): status New In Progress
2023-03-06 18:04:12 James Page cloud-archive/ussuri: status New In Progress
2023-03-06 18:04:22 James Page cloud-archive/ussuri: importance Undecided High
2023-03-17 19:50:34 Steve Langasek ceph (Ubuntu Focal): status In Progress Fix Committed
2023-03-17 19:50:38 Steve Langasek bug added subscriber SRU Verification
2023-03-17 19:50:41 Steve Langasek tags patch seg sts patch seg sts verification-needed verification-needed-focal
2023-03-22 13:11:36 Corey Bryant cloud-archive/ussuri: status In Progress Fix Committed
2023-03-22 13:11:37 Corey Bryant tags patch seg sts verification-needed verification-needed-focal patch seg sts verification-needed verification-needed-focal verification-ussuri-needed
2023-03-24 08:37:09 nikhil kshirsagar tags patch seg sts verification-needed verification-needed-focal verification-ussuri-needed patch seg sts verification-failed-focal verification-needed verification-ussuri-needed
2023-04-12 00:04:20 Steve Langasek ceph (Ubuntu Focal): status Fix Committed Confirmed
2023-04-12 00:04:28 Steve Langasek removed subscriber Ubuntu Stable Release Updates Team
2023-04-12 00:04:30 Steve Langasek removed subscriber SRU Verification
2023-04-12 00:04:31 Steve Langasek tags patch seg sts verification-failed-focal verification-needed verification-ussuri-needed patch seg sts verification-failed-focal verification-ussuri-needed
2023-04-25 09:43:43 nikhil kshirsagar tags patch seg sts verification-failed-focal verification-ussuri-needed patch seg sts verification-done-focal verification-ussuri-needed
2023-04-25 09:47:56 nikhil kshirsagar tags patch seg sts verification-done-focal verification-ussuri-needed patch seg sts verification-done-focal verification-ussuri-done
2023-04-25 12:09:40 nikhil kshirsagar bug added subscriber Ubuntu Sponsors Team
2023-04-26 22:53:22 Dan Hill bug added subscriber Dan Hill
2023-05-10 13:04:21 Mauricio Faria de Oliveira tags patch seg sts verification-done-focal verification-ussuri-done patch seg sts
2023-05-10 13:04:34 Mauricio Faria de Oliveira tags patch seg sts patch se-sponsor-mfo seg sts
2023-05-10 13:05:31 Mauricio Faria de Oliveira attachment added lp1978913-focal-ceph-v4.debdiff https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1978913/+attachment/5672182/+files/lp1978913-focal-ceph-v4.debdiff
2023-05-10 13:10:50 Mauricio Faria de Oliveira ceph (Ubuntu Focal): status Confirmed In Progress
2023-05-10 13:10:50 Mauricio Faria de Oliveira ceph (Ubuntu Focal): assignee nikhil kshirsagar (nkshirsagar)
2023-05-10 13:11:04 Mauricio Faria de Oliveira bug added subscriber Mauricio Faria de Oliveira
2023-05-10 13:11:18 Mauricio Faria de Oliveira bug added subscriber SE ("STS") Sponsors
2023-05-10 13:35:59 Robie Basak ceph (Ubuntu Focal): status In Progress Fix Committed
2023-05-10 13:36:00 Robie Basak bug added subscriber Ubuntu Stable Release Updates Team
2023-05-10 13:36:07 Robie Basak bug added subscriber SRU Verification
2023-05-10 13:36:10 Robie Basak tags patch se-sponsor-mfo seg sts patch se-sponsor-mfo seg sts verification-needed verification-needed-focal
2023-05-11 09:10:50 nikhil kshirsagar tags patch se-sponsor-mfo seg sts verification-needed verification-needed-focal patch se-sponsor-mfo seg sts verification-done-focal
2023-05-11 10:07:32 Robie Basak removed subscriber Ubuntu Sponsors Team
2023-05-16 13:32:26 Corey Bryant tags patch se-sponsor-mfo seg sts verification-done-focal patch se-sponsor-mfo seg sts verification-done-focal verification-ussuri-needed
2023-05-17 11:56:15 nikhil kshirsagar tags patch se-sponsor-mfo seg sts verification-done-focal verification-ussuri-needed patch se-sponsor-mfo seg sts verification-done-focal verification-ussuri-done
2023-05-18 18:39:09 Launchpad Janitor ceph (Ubuntu Focal): status Fix Committed Fix Released
2023-05-18 18:39:19 Andreas Hasenack removed subscriber Ubuntu Stable Release Updates Team
2023-09-14 20:24:57 Dan Hill ceph (Ubuntu Bionic): status New Invalid
2023-09-14 20:25:08 Dan Hill cloud-archive/queens: status New Invalid
2023-10-07 19:54:10 Mauricio Faria de Oliveira cloud-archive/ussuri: status Fix Committed Fix Released