EDAC support for slave nodes

Bug #1483629 reported by Adam Heczko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
In Progress
Medium
Bulat Gaifullin
Mitaka
Won't Fix
Medium
Bulat Gaifullin
Newton
In Progress
Medium
Bulat Gaifullin

Bug Description

EDAC provides hardware health checking, including ECC and PCI errors checking and reporting. It is critically important for cloud health to ensure that hardware is healthy. Uncorrected and un-detected memory errors usually leads to Ceph cluster failure and cloud collapse.
We should ensure that:
- EDAC kernel module is loaded on CentOS slaves
- EDAC kernel module is loaded on Ubuntu slaves
- EDAC errors gets reported to syslog/kernlog

tags: added: feature
Changed in fuel:
milestone: none → 8.0
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Fuel Library Team (fuel-library)
Changed in fuel:
status: Confirmed → Triaged
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Bulat Gaifullin (bgaifullin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/225672

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> EDAC provides hardware health checking, including ECC and PCI errors checking and reporting.

Also it provides "funny" kernel lockups due to the hardware/drivers bugs.

Dmitry Pyzhov (dpyzhov)
tags: added: area-python
Changed in fuel:
milestone: 8.0 → 9.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Fuel DevOps Robot (<email address hidden>) on branch: master
Review: https://review.openstack.org/225672
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.