pg repair action

Bug #1923218 reported by Andrea Ieri
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Committed
Wishlist
Unassigned

Bug Description

Whenever pg inconsistencies occur, manual intervention from operators is required. Many inconsistencies can be resolved by running `ceph pg repair <pg ID>`, but this is only safe in some situations. For example (list not necessarily exhaustive):

* read errors
* 0 size shards
* wrong data digest on non-primary shard
* inconsistency in an erasure coded pool

As all of the above can be easily verified programmatically, it would be very useful to have a "safe-pg-repair pgid=<num>" action that runs `ceph pg repair <pg ID>` *only* if the inconsistency falls into one of the known safe situations, and returns a warning otherwise. This would drastically simplify managing a charmed ceph cluster since operators could simply run the action as a first attempt, and spend time digging into the specifics of the inconsistency only in cases that truly require human intervention.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (master)
Changed in charm-ceph-mon:
status: New → In Progress
Changed in charm-ceph-mon:
importance: Undecided → Wishlist
Alvaro Uria (aluria)
tags: added: bseng-38
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (master)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-mon/+/831001
Committed: https://opendev.org/openstack/charm-ceph-mon/commit/a1cffc669322a2fe1c709e09a717ff9f78ab5680
Submitter: "Zuul (22348)"
Branch: master

commit a1cffc669322a2fe1c709e09a717ff9f78ab5680
Author: Connor Chamberlain <email address hidden>
Date: Fri Feb 25 08:33:10 2022 -0700

    Added safe-pg-repair action

    This action automatically repairs inconsistent placement groups
    which are caused by read errors.

    PGs are repaired using `ceph pg repair <pgid>`.

    Action is only taken if on of a PG's shards has a "read_error",
    and no action will be taken if any additional errors are found.
    No action will be taken if multiple "read_errors" are found.

    This action is intended to be safe to run in all contexts.

    Closes-Bug: #1923218
    Change-Id: I903dfe02aa3b7c67414e3d0d9b57f4042d301830

Changed in charm-ceph-mon:
status: In Progress → Fix Committed
Revision history for this message
Andrea Ieri (aieri) wrote :

hi, could we please have a backport of this action to focal/ussuri (ceph octopus)? Most of our environments are using that version and would benefit from this action.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (stable/octopus)

Fix proposed to branch: stable/octopus
Review: https://review.opendev.org/c/openstack/charm-ceph-mon/+/865342

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (stable/pacific)

Fix proposed to branch: stable/pacific
Review: https://review.opendev.org/c/openstack/charm-ceph-mon/+/865343

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (stable/quincy)

Fix proposed to branch: stable/quincy
Review: https://review.opendev.org/c/openstack/charm-ceph-mon/+/865346

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-ceph-mon (stable/quincy)

Change abandoned by "Luciano Lo Giudice <email address hidden>" on branch: stable/quincy
Review: https://review.opendev.org/c/openstack/charm-ceph-mon/+/865346
Reason: should go in quincy.2 (but it's already there :) )

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (stable/octopus)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-mon/+/865342
Committed: https://opendev.org/openstack/charm-ceph-mon/commit/f4396f426e0c76770b5ac900fe934061d07ad62e
Submitter: "Zuul (22348)"
Branch: stable/octopus

commit f4396f426e0c76770b5ac900fe934061d07ad62e
Author: Connor Chamberlain <email address hidden>
Date: Fri Feb 25 08:33:10 2022 -0700

    Backport pg-repair action (octopus)

    This action automatically repairs inconsistent placement groups
    which are caused by read errors.

    PGs are repaired using `ceph pg repair <pgid>`.

    Action is only taken if on of a PG's shards has a "read_error",
    and no action will be taken if any additional errors are found.
    No action will be taken if multiple "read_errors" are found.

    This action is intended to be safe to run in all contexts.

    Closes-Bug: #1923218
    (cherry-picked from commit a1cffc669322a2fe1c709e09a717ff9f78ab5680)

    Change-Id: I32ed3b674211ee3e98a2d1b07bb9f5c7d29df5ca

tags: added: in-stable-octopus
tags: added: in-stable-pacific
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (stable/pacific)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-mon/+/865343
Committed: https://opendev.org/openstack/charm-ceph-mon/commit/d2afe9b8e7279cc70cc8b74308c45699adfe14b8
Submitter: "Zuul (22348)"
Branch: stable/pacific

commit d2afe9b8e7279cc70cc8b74308c45699adfe14b8
Author: Connor Chamberlain <email address hidden>
Date: Fri Feb 25 08:33:10 2022 -0700

    Backport pg-repair action (pacific)

    This action automatically repairs inconsistent placement groups
    which are caused by read errors.

    PGs are repaired using `ceph pg repair <pgid>`.

    Action is only taken if on of a PG's shards has a "read_error",
    and no action will be taken if any additional errors are found.
    No action will be taken if multiple "read_errors" are found.

    This action is intended to be safe to run in all contexts.

    Closes-Bug: #1923218
    (cherry-picked from commit a1cffc669322a2fe1c709e09a717ff9f78ab5680)

    Change-Id: I367f5e9569ed688ba712a4ff8cacd0b05208366a

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.