2021-02-07 08:48:48 |
dongdong tao |
bug |
|
|
added bug |
2021-02-07 16:04:53 |
Ponnuvel Palaniyappan |
bug |
|
|
added subscriber Ponnuvel Palaniyappan |
2021-02-07 21:38:20 |
Dominique Poulain |
bug |
|
|
added subscriber Dominique Poulain |
2021-02-08 10:34:42 |
Ponnuvel Palaniyappan |
ceph (Ubuntu): assignee |
|
Ponnuvel Palaniyappan (pponnuvel) |
|
2021-02-10 19:30:48 |
Ponnuvel Palaniyappan |
nominated for series |
|
Ubuntu Bionic |
|
2021-02-10 19:30:48 |
Ponnuvel Palaniyappan |
bug task added |
|
ceph (Ubuntu Bionic) |
|
2021-02-10 19:38:55 |
Ponnuvel Palaniyappan |
bug task added |
|
cloud-archive |
|
2021-02-10 19:39:17 |
Ponnuvel Palaniyappan |
nominated for series |
|
cloud-archive/queens |
|
2021-02-10 19:39:17 |
Ponnuvel Palaniyappan |
bug task added |
|
cloud-archive/queens |
|
2021-02-10 22:14:16 |
Ponnuvel Palaniyappan |
ceph (Ubuntu Bionic): status |
New |
In Progress |
|
2021-02-10 23:09:15 |
Ponnuvel Palaniyappan |
ceph (Ubuntu Bionic): assignee |
|
Ponnuvel Palaniyappan (pponnuvel) |
|
2021-02-11 08:46:38 |
Ponnuvel Palaniyappan |
attachment added |
|
bionic.debdiff https://bugs.launchpad.net/cloud-archive/+bug/1914911/+attachment/5462589/+files/bionic.debdiff |
|
2021-02-11 08:47:55 |
Ponnuvel Palaniyappan |
tags |
|
sts-sru-needed |
|
2021-02-11 09:15:17 |
Ponnuvel Palaniyappan |
summary |
bluefs doesn't compact log file |
[SRU] bluefs doesn't compact log file |
|
2021-02-11 15:12:41 |
Ponnuvel Palaniyappan |
description |
For a certain type of workload, the bluefs might never compact the log file,
which would cause the bluefs log file slowly grows to a huge size
(some bigger than 1TB for a 1.5T device).
This bug could eventually cause osd crash and failed to restart as it couldn't get through the bluefs replay phase during boot time.
We might see below log when trying to restart the osd:
bluefs mount failed to replay log: (5) Input/output error
There are more details in the bluefs perf counters when this issue happened:
e.g.
"bluefs": {
"gift_bytes": 811748818944,
"reclaim_bytes": 0,
"db_total_bytes": 888564350976,
"db_used_bytes": 867311747072,
"wal_total_bytes": 0,
"wal_used_bytes": 0,
"slow_total_bytes": 0,
"slow_used_bytes": 0,
"num_files": 11,
"log_bytes": 866545131520,
"log_compactions": 0,
"logged_bytes": 866542977024,
"files_written_wal": 2,
"files_written_sst": 3,
"bytes_written_wal": 32424281934,
"bytes_written_sst": 25382201
}
As we can see the log_compactions is 0, which means it's never compacted and the log file size(log_bytes) is already 800+G. After the compaction, the log file size would reduced to around
1 G
Here is the PR[1] that addressed this bug, we need to backport this to ubuntu 12.2.13
[1] https://github.com/ceph/ceph/pull/17354 |
[Impact]
For a certain type of workload, the bluefs might never compact the log file, which would cause the bluefs log file slowly grows to a huge size (some bigger than 1TB for a 1.5T device).
There are more details in the bluefs perf counters when this issue happened:
e.g.
"bluefs": {
"gift_bytes": 811748818944,
"reclaim_bytes": 0,
"db_total_bytes": 888564350976,
"db_used_bytes": 867311747072,
"wal_total_bytes": 0,
"wal_used_bytes": 0,
"slow_total_bytes": 0,
"slow_used_bytes": 0,
"num_files": 11,
"log_bytes": 866545131520,
"log_compactions": 0,
"logged_bytes": 866542977024,
"files_written_wal": 2,
"files_written_sst": 3,
"bytes_written_wal": 32424281934,
"bytes_written_sst": 25382201
}
This bug could eventually cause osd crash and failed to restart as it couldn't get through the bluefs replay phase during boot time.
We might see below log when trying to restart the osd:
bluefs mount failed to replay log: (5) Input/output error
As we can see the log_compactions is 0, which means it's never compacted and the log file size(log_bytes) is already 800+G. After the compaction, the log file size would need to be reduced to around 1G.
[Test Case]
Deploy a test ceph cluster (Luminous 12.2.13 which has the bug) and drive I/O. The compaction doesn't get triggered often when most I/O are reads. So fill up the cluster initially with lots of writes and then start reading heavy reads (no writes). Then the problem should occur. Smaller sized OSDs are OK as we'are only interested filling up the OSD and grow the bluefs log.
[Where problems could occur]
This fix has been part of all upstream releases since Mimic, so there's been quite good "runtime".
The changes ensure that compaction happens more often. But that's not going to cause any problem.
I can't see any real problems.
[Other Info]
- It's only needed for Luminous (Bionic). All new releases since have this already.
- Upstream PR: https://github.com/ceph/ceph/pull/17354 |
|
2021-02-11 15:43:46 |
Ponnuvel Palaniyappan |
attachment added |
|
ceph12.2.13-lp1914911.gitdiff https://bugs.launchpad.net/cloud-archive/+bug/1914911/+attachment/5462713/+files/ceph12.2.13-lp1914911.gitdiff |
|
2021-02-21 17:12:37 |
Ponnuvel Palaniyappan |
tags |
sts-sru-needed |
seg sts-sru-needed |
|
2021-03-16 18:30:27 |
Ponnuvel Palaniyappan |
description |
[Impact]
For a certain type of workload, the bluefs might never compact the log file, which would cause the bluefs log file slowly grows to a huge size (some bigger than 1TB for a 1.5T device).
There are more details in the bluefs perf counters when this issue happened:
e.g.
"bluefs": {
"gift_bytes": 811748818944,
"reclaim_bytes": 0,
"db_total_bytes": 888564350976,
"db_used_bytes": 867311747072,
"wal_total_bytes": 0,
"wal_used_bytes": 0,
"slow_total_bytes": 0,
"slow_used_bytes": 0,
"num_files": 11,
"log_bytes": 866545131520,
"log_compactions": 0,
"logged_bytes": 866542977024,
"files_written_wal": 2,
"files_written_sst": 3,
"bytes_written_wal": 32424281934,
"bytes_written_sst": 25382201
}
This bug could eventually cause osd crash and failed to restart as it couldn't get through the bluefs replay phase during boot time.
We might see below log when trying to restart the osd:
bluefs mount failed to replay log: (5) Input/output error
As we can see the log_compactions is 0, which means it's never compacted and the log file size(log_bytes) is already 800+G. After the compaction, the log file size would need to be reduced to around 1G.
[Test Case]
Deploy a test ceph cluster (Luminous 12.2.13 which has the bug) and drive I/O. The compaction doesn't get triggered often when most I/O are reads. So fill up the cluster initially with lots of writes and then start reading heavy reads (no writes). Then the problem should occur. Smaller sized OSDs are OK as we'are only interested filling up the OSD and grow the bluefs log.
[Where problems could occur]
This fix has been part of all upstream releases since Mimic, so there's been quite good "runtime".
The changes ensure that compaction happens more often. But that's not going to cause any problem.
I can't see any real problems.
[Other Info]
- It's only needed for Luminous (Bionic). All new releases since have this already.
- Upstream PR: https://github.com/ceph/ceph/pull/17354 |
[Impact]
For a certain type of workload, the bluefs might never compact the log file, which would cause the bluefs log file slowly grows to a huge size (some bigger than 1TB for a 1.5T device).
There are more details in the bluefs perf counters when this issue happened:
e.g.
"bluefs": {
"gift_bytes": 811748818944,
"reclaim_bytes": 0,
"db_total_bytes": 888564350976,
"db_used_bytes": 867311747072,
"wal_total_bytes": 0,
"wal_used_bytes": 0,
"slow_total_bytes": 0,
"slow_used_bytes": 0,
"num_files": 11,
"log_bytes": 866545131520,
"log_compactions": 0,
"logged_bytes": 866542977024,
"files_written_wal": 2,
"files_written_sst": 3,
"bytes_written_wal": 32424281934,
"bytes_written_sst": 25382201
}
This bug could eventually cause osd crash and failed to restart as it couldn't get through the bluefs replay phase during boot time.
We might see below log when trying to restart the osd:
bluefs mount failed to replay log: (5) Input/output error
As we can see the log_compactions is 0, which means it's never compacted and the log file size(log_bytes) is already 800+G. After the compaction, the log file size would need to be reduced to around 1G.
[Test Case]
Deploy a test ceph cluster (Luminous 12.2.13 which has the bug) and drive I/O. The compaction doesn't get triggered often when most I/O are reads. So fill up the cluster initially with lots of writes and then start reading heavy reads (no writes). Then the problem should occur. Smaller sized OSDs are OK as we'are only interested filling up the OSD and grow the bluefs log.
[Where problems could occur]
This fix has been part of all upstream releases since Mimic, so there's been quite good "runtime".
The changes ensure that compaction happens more often. But that's not going to cause any problem.
I can't see any real problems.
[Other Info]
- It's only needed for Luminous (Bionic). All new releases since have this already.
- Upstream master PR: https://github.com/ceph/ceph/pull/17354
- Upstream Luminous PR: https://github.com/ceph/ceph/pull/34876/files |
|
2021-03-16 18:37:01 |
Ponnuvel Palaniyappan |
attachment added |
|
bionic.debdiff.fixed https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1914911/+attachment/5477289/+files/bionic.debdiff.fixed |
|
2021-03-25 08:44:33 |
James Page |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2021-03-25 08:44:40 |
James Page |
ceph (Ubuntu Bionic): importance |
Undecided |
High |
|
2021-03-25 08:44:43 |
James Page |
ceph (Ubuntu): status |
New |
Invalid |
|
2021-03-25 08:44:46 |
James Page |
cloud-archive: status |
New |
Invalid |
|
2021-03-25 08:44:48 |
James Page |
cloud-archive/queens: status |
New |
Triaged |
|
2021-03-25 08:44:50 |
James Page |
cloud-archive/queens: importance |
Undecided |
High |
|
2021-03-25 10:45:55 |
Ponnuvel Palaniyappan |
attachment added |
|
bluefs-bionic.debdiff https://bugs.launchpad.net/cloud-archive/+bug/1914911/+attachment/5480862/+files/bluefs-bionic.debdiff |
|
2021-04-26 15:03:53 |
Timo Aaltonen |
ceph (Ubuntu Bionic): status |
In Progress |
Fix Committed |
|
2021-04-26 15:03:55 |
Timo Aaltonen |
bug |
|
|
added subscriber SRU Verification |
2021-04-26 15:03:59 |
Timo Aaltonen |
tags |
seg sts-sru-needed |
seg sts-sru-needed verification-needed verification-needed-bionic |
|
2021-04-29 05:34:46 |
dongdong tao |
tags |
seg sts-sru-needed verification-needed verification-needed-bionic |
seg sts-sru-needed verification-done verification-done-bionic |
|
2021-05-05 12:00:27 |
Corey Bryant |
cloud-archive/queens: status |
Triaged |
Fix Committed |
|
2021-05-05 12:00:28 |
Corey Bryant |
tags |
seg sts-sru-needed verification-done verification-done-bionic |
seg sts-sru-needed verification-done verification-done-bionic verification-queens-needed |
|
2021-05-06 09:00:38 |
Łukasz Zemczak |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2021-05-06 09:06:01 |
Launchpad Janitor |
ceph (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2021-05-28 10:22:53 |
Ponnuvel Palaniyappan |
attachment added |
|
queens_sru_1914911 https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1914911/+attachment/5500895/+files/queens_sru_1914911 |
|
2021-05-28 10:23:23 |
Ponnuvel Palaniyappan |
tags |
seg sts-sru-needed verification-done verification-done-bionic verification-queens-needed |
seg sts-sru-needed verification-done verification-done-bionic verification-queens-done |
|
2021-07-12 13:38:08 |
James Page |
cloud-archive/queens: status |
Fix Committed |
Fix Released |
|
2021-09-30 18:47:07 |
Mathew Hodson |
ceph (Ubuntu): status |
Invalid |
Fix Released |
|
2021-09-30 18:47:29 |
Mathew Hodson |
cloud-archive: status |
Invalid |
Fix Released |
|