EDAC support for slave nodes

Bug #1483629 reported by Adam Heczko on 2015-08-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Medium
Bulat Gaifullin
Mitaka
Medium
Bulat Gaifullin
Newton
Medium
Bulat Gaifullin

Bug Description

EDAC provides hardware health checking, including ECC and PCI errors checking and reporting. It is critically important for cloud health to ensure that hardware is healthy. Uncorrected and un-detected memory errors usually leads to Ceph cluster failure and cloud collapse.
We should ensure that:
- EDAC kernel module is loaded on CentOS slaves
- EDAC kernel module is loaded on Ubuntu slaves
- EDAC errors gets reported to syslog/kernlog

tags: added: feature
Changed in fuel:
milestone: none → 8.0
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Fuel Library Team (fuel-library)
Changed in fuel:
status: Confirmed → Triaged
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Bulat Gaifullin (bgaifullin)

Fix proposed to branch: master
Review: https://review.openstack.org/225672

Changed in fuel:
status: Triaged → In Progress
Alexei Sheplyakov (asheplyakov) wrote :

> EDAC provides hardware health checking, including ECC and PCI errors checking and reporting.

Also it provides "funny" kernel lockups due to the hardware/drivers bugs.

Dmitry Pyzhov (dpyzhov) on 2015-10-22
tags: added: area-python
Changed in fuel:
milestone: 8.0 → 9.0

Change abandoned by Fuel DevOps Robot (<email address hidden>) on branch: master
Review: https://review.openstack.org/225672
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers