Race condition in collect_ceph_status.sh

Bug #1755207 reported by Tamas Erdei
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Released
Medium
Tamas Erdei

Bug Description

There is a race condition between collect_ceph_status.sh writing the status file and check_ceph_status.py reading that file.
Although one might think, that this can happen very rarely, the fact is, that we are experiencing this issue very often, resulting in OK->UNKNOWN, UNKNOWN->OK state changes and notifications in Nagios.

I already created the fix for this, and will submit it for review.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Fix proposed on branch master: https://review.openstack.org/#/c/552513/

Changed in charm-ceph-mon:
status: New → In Progress
importance: Undecided → Medium
milestone: none → 18.05
assignee: nobody → Tamas Erdei (terdei)
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Manual validation that proposed fix still produces expected output:
$ juju ssh ceph-mon/0 sudo cat /var/lib/nagios/cat-ceph-status.txt

{"health":{"health":{"health_services":[{"mons":[{"name":"juju-662f8c-2","kb_total":928355808,"kb_used":556823588,"kb_avail":324351420,"avail_percent":34,"last_updated":"2018-03-13 15:59:48.383350","store_stats":{"bytes_total":12782119,"bytes_sst":0,"bytes_log":3594421,"bytes_misc":9187698,"last_updated":"0.000000"},"health":"HEALTH_OK"},{"name":"juju-662f8c-1","kb_total":928355808,"kb_used":556823240,"kb_avail":324351768,"avail_percent":34,"last_updated":"2018-03-13 15:59:43.791875","store_stats":{"bytes_total":14452386,"bytes_sst":0,"bytes_log":5264675,"bytes_misc":9187711,"last_updated":"0.000000"},"health":"HEALTH_OK"},{"name":"juju-662f8c-0","kb_total":928355808,"kb_used":556823588,"kb_avail":324351420,"avail_percent":34,"last_updated":"2018-03-13 15:59:49.153830","store_stats":{"bytes_total":14583095,"bytes_sst":0,"bytes_log":5395351,"bytes_misc":9187744,"last_updated":"0.000000"},"health":"HEALTH_OK"}]}]},"timechecks":{"epoch":12,"round":6,"round_status":"finished","mons":[{"name":"juju-662f8c-2","skew":0.000000,"latency":0.000000,"health":"HEALTH_OK"},{"name":"juju-662f8c-1","skew":0.000000,"latency":0.012322,"health":"HEALTH_OK"},{"name":"juju-662f8c-0","skew":0.000000,"latency":0.006653,"health":"HEALTH_OK"}]},"summary":[],"overall_status":"HEALTH_OK","detail":[]},"fsid":"41fb4b0a-26d5-11e8-8228-00163e1c968f","election_epoch":12,"quorum":[0,1,2],"quorum_names":["juju-662f8c-2","juju-662f8c-1","juju-662f8c-0"],"monmap":{"epoch":1,"fsid":"41fb4b0a-26d5-11e8-8228-00163e1c968f","modified":"2018-03-13 15:45:25.659088","created":"2018-03-13 15:45:25.659088","mons":[{"rank":0,"name":"juju-662f8c-2","addr":"10.130.236.111:6789\/0"},{"rank":1,"name":"juju-662f8c-1","addr":"10.130.236.204:6789\/0"},{"rank":2,"name":"juju-662f8c-0","addr":"10.130.236.230:6789\/0"}]},"osdmap":{"osdmap":{"epoch":11,"num_osds":3,"num_up_osds":3,"num_in_osds":3,"full":false,"nearfull":false,"num_remapped_pgs":0}},"pgmap":{"pgs_by_state":[{"state_name":"active+clean","count":64}],"version":471,"num_pgs":64,"data_bytes":0,"bytes_used":1710536056832,"bytes_avail":996433567744,"bytes_total":2851909042176},"fsmap":{"epoch":1,"by_rank":[]}}

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (master)

Reviewed: https://review.openstack.org/552513
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-mon/commit/?id=5dbafb0b2fc64cd63af0cdd351df79a4c33d9d21
Submitter: Zuul
Branch: master

commit 5dbafb0b2fc64cd63af0cdd351df79a4c33d9d21
Author: Tamas Erdei <email address hidden>
Date: Tue Mar 13 14:24:25 2018 +0100

    Fix race condition in collect_ceph_status.sh

    There is a race condition between collect_ceph_status.sh writing
    the status file and check_ceph_status.py reading that file.

    This patch fixes that by directing ceph output into a temp file,
    and then replacing the old state file with the new temp file using
    an atomic mv operation.

    Change-Id: If332d187f8dcb9f7fcd8b4a47f791beb8e27eaaa
    Closes-Bug: 1755207

Changed in charm-ceph-mon:
status: In Progress → Fix Committed
David Ames (thedac)
Changed in charm-ceph-mon:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.