Need driver support for IPMI device on OCPv2 Windmill

Bug #1156667 reported by Samantha Jian-Pielak on 2013-03-18
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
The Open Compute Project
Critical
Unassigned
linux (Ubuntu)
Undecided
Unassigned
Nominated for Precise by Jeff Marcom

Bug Description

The IPMI device is a custom module that adds support for the modified Intel management engine ipmi stack included by intel for this platform.

$ ipmitool channel info
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

# modprobe ipmi_si
FATAL: Error inserting ipmi_si (/lib/modules/3.2.0-38-generic/kernel/drivers/char/ipmi/ipmi_si.ko): No such device
**12.04**
# modprobe ipmi_si
FATAL: Error inserting ipmi_si (/lib/modules/3.5.0-25-generic/kernel/drivers/char/ipmi/ipmi_si.ko): No such device
**12.04.2**

Changed in opencompute:
importance: Undecided → Critical
David Duffey (dduffey) wrote :

We also came across this during our evaluation of the Windmill board. It seems as though there is a kernel module called DCMI that interacts over a HECI interface(Possible documentation here) to access the DCMI capabilities of the Intel Platform Controller Hub (c602), with Management Extensions (ME).

In addition to needing the DCMI module, you need a specialized version of ipmitool (or dcmitool) in order to interact over the HECI interface. This is due to current versions of ipmitool and dcmi tool do not contain support for it.

Attached are both the kernel modules (for a 2.6 based kernel on RPM distribution... CentOS).

This is all for the OCP V2.0 Windmill boards for dcmitool to work you need to compile the drivers dcmi and mei and then *may* need install these compatlibs

yum install compat-readline5
yum install openssl098e-0.9.8e-17.el6_2.2.x86_64
Module madness -- remove ipmi and install mei and dcmi

modprobe -r ipmi_si
modprobe -r ipmi_devintf
modprobe -r ipmi_msghandler
put the mei and dcmi drivers in the "correct structure" i.e.

/lib/modules/2.6.32-220.13.1.el6.x86_64/kernel/drivers/char/<module_name>
run depmod and then install the stuff

depmod
modprobe mei
modprobe dcmi

David Duffey (dduffey) wrote :

Some hardware specification:

The Intel windmill spec can be found here: http://opencompute.org/wp/wp-content/uploads/2012/05/Open_Compute_Project_Intel_Motherboard_v2.0.pdf

Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz

This is one of them:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
stepping : 7
microcode : 0x70a
cpu MHz : 1200.000
cache size : 15360 KB
physical id : 0
siblings : 12
core id : 0
cpu cores : 6
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdt
scp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc ap
erfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdc
m pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm
arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips : 4000.11
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

David Duffey (dduffey) wrote :
David Duffey (dduffey) wrote :

I tried to compile dcmi module on Ubuntu 12.04.2/OCP v2 hardware. During the compile, it's clear the source is for 2.6 kernel. I commented out some lines in the Makefile so it compiles successfully against the 3.5 kernel in 12.04.2, but at the end the module won't load.
$ sudo modprobe dcmi
[sudo] password for rorke:
FATAL: Error inserting dcmi (/lib/modules/3.5.0-26-generic/kernel/drivers/char/dcmi.ko): Invalid module format

Regarding mei module, it is compiled and loaded fine on Ubuntu 12.04.2.
$ modinfo mei
filename: /lib/modules/3.5.0-26-generic/kernel/drivers/char/mei.ko
version: 7.1.21.4.S
license: GPL v2
description: Intel(R) Management Engine Interface for Server
author: Intel Corporation
srcversion: A1B16BB6787946C63832E7A

description: updated

With NO ipmi_si or dcmi module loaded, this is the dcmitool command output on Ubuntu 12.04.2 (3.5 kernel) for managing remote system.

# Compiled and ran dcmitool downloaded from Intel http://www.intel.com/content/www/us/en/data-center/dcmi/dcmitool-source.html

rorke@ocpr0s11r:~/Dcmitool/src$ ./dcmitool -H 192.168.145.222 -U USERID chassis power status
Password:
Error loading interface lanplus
rorke@ocpr0s11r:~/Dcmitool/src$ ./dcmitool -I imb -H 192.168.145.222 -U USERID chassis power status
Password:
Error sending IMB request, status=1 ccode=0
Error sending IMB request, status=1 ccode=0
Error sending IMB request, status=1 ccode=0
Chassis Power is off
rorke@ocpr0s11r:~/Dcmitool/src$ ./dcmitool -I open -H 192.168.145.222 -U USERID chassis power status
Password:
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
Unable to get Chassis Power Status
rorke@ocpr0s11r:~/Dcmitool/src$

The changes to be made to the src/Makefile in order to compile and load dcmi successfully are attached. Thanks Leann.

rorke@ocpr0s11l:~/dcmi-from-david/DCMI/dcmi$ modinfo dcmi
filename: /lib/modules/3.5.0-26-generic/kernel/drivers/char/dcmi.ko
version: 2.1.6.28.MEI
license: Dual BSD/GPL
description: Intel(R) Data Center Host Interface
author: Intel Corporation
srcversion: 370A4CA27DE6D4AC35F7A81
depends:
vermagic: 3.5.0-26-generic SMP mod_unload modversions
parm: dcmi_debug:Debug enabled or not (int)
rorke@ocpr0s11l:~/dcmi-from-david/DCMI/dcmi$ sudo modprobe dcmi
rorke@ocpr0s11l:~/dcmi-from-david/DCMI/dcmi$ dmesg |grep "dcmi"
[428597.474992] dcmi: disagrees about version of symbol module_layout
[705108.380665] dcmi: Intel(R) Data Center Host Interface (DCMI-HI) - version 2.1.6.28.MEI
[705108.380672] dcmi: Copyright (c) 2003 - 2011 Intel Corporation.
rorke@ocpr0s11l:~/dcmi-from-david/DCMI/dcmi$

I compiled the dcmitool on 12.04.2 but it wasn't successful. I am going to try CentOS just to make sure I get the compile part right.

Using the dcmitool source attached in comment #1, I have the same compile error on CentOS 6.4 (Linux livecd.centos 2.6.32-358.el6.x86_64 #1 SMP Fri Feb 22 00:31:26 UTC 2013 x86_64 x84_64 x86_64 GNU/Linux) as Ubuntu.

make[1]: Entering directory `/home/centoslive/Downloads/DCMI/dcmitool/Dcmitool/src'
if gcc -DHAVE_CONFIG_H -I. -I. -I.. -I../include -fstack-protector -DNDEBUG -fno-strict-aliasing -MT ipmitool.o -MD -MP -MF ".deps/ipmitool.Tpo" -c -o ipmitool.o ipmitool.c; \
 then mv -f ".deps/ipmitool.Tpo" ".deps/ipmitool.Po"; else rm -f ".deps/ipmitool.Tpo"; exit 1; fi
if gcc -DHAVE_CONFIG_H -I. -I. -I.. -I../include -fstack-protector -DNDEBUG -fno-strict-aliasing -MT ipmishell.o -MD -MP -MF ".deps/ipmishell.Tpo" -c -o ipmishell.o ipmishell.c; \
 then mv -f ".deps/ipmishell.Tpo" ".deps/ipmishell.Po"; else rm -f ".deps/ipmishell.Tpo"; exit 1; fi
/bin/sh ../libtool --silent --tag=CC --mode=link gcc -fstack-protector -DNDEBUG -fno-strict-aliasing -o dcmitool ipmitool.o ipmishell.o ../lib/libipmitool.la plugins/libintf.la
../lib/.libs/libipmitool.a(dcmi.o): In function `ipmi_dcmi_prnt_oobDiscover':
dcmi.c:(.text+0xa0e): undefined reference to `ipmiv2_lan_ping'
collect2: ld returned 1 exit status
make[1]: *** [dcmitool] Error 1
make[1]: Leaving directory `/home/centoslive/Downloads/DCMI/dcmitool/Dcmitool/src'
make: *** [all-recursive] Error 1
[centoslive@livecd Dcmitool]$

More dcmitool compile log on CentOS: http://pastebin.ubuntu.com/5672730/
dcmitool compile log on Ubuntu: http://pastebin.ubuntu.com/5672749/
Command 'sudo bash build_all.sh run' was used.

Checking if I can get in-band info with dcmitool from Intel (see comment #6) with different modules.

1. With mei and dcmi modules
./dcmitool dcmi discover and ./dcmitool dcmi power reading both return
"Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

    Unable to get DCMI information
"
2. With mei, dcmi, ipmi_msghandler, and ipmi_devintf modules
The same result as above.

The version of dcmitool downloaded from Intel is 1.8.10.
rorke@ocpr0s11l:~/dcmitool-from-intel/Dcmitool/src$ ./dcmitool -V
dcmitool version 1.8.10

Attached mei/dcmi modules and dcmitool (binary) are provided by Quanta. Use password 'quanta' for unzip.

Remove ipmi* modules, compile and load mei and dcmi:
1. Make sure all ipmi modules are not loaded. 'sudo modprobe -r'
2. Run 'sudo make install' to compile mei.
3. For dcmi, edit Makefile as in comment #7 then 'sudo make install'.
4. Run 'sudo depmod', 'sudo modprobe dcmi', 'sudo modprobe mei'.
*The version of mei and dcmi shown in modinfo is the same as those from comment #1.

Dcmitool:
5. Make the dcmitool file executable, 'sudo chmod 755 dcmitool'.
6. Running dcmitool gives 'error while loading shared libraries: libreadline.so.5'. Install libreadline5 'sudo apt-get install libreadline5'.
7. Running dcmitool again gives 'error while loading shared libraries: libcypto.so.6'. Install libssl-dev 'sudo apt-get install libssl-dev'.
8. Locate libcrypto.so and rename it to libcrypto.so.6
rorke@ocpr0s11l:/$ sudo find | grep libcrypto.so
./usr/lib/x86_64-linux-gnu/libcrypto.so
./lib/x86_64-linux-gnu/libcrypto.so.1.0.0
rorke@ocpr0s11l:/$ ls -al /usr/lib/x86_64-linux-gnu/ |grep libcrypto
-rw-r--r-- 1 root root 4050576 Mar 19 14:25 libcrypto.a
lrwxrwxrwx 1 root root 40 Mar 19 14:25 libcrypto.so -> /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
rorke@ocpr0s11l:/$
rorke@ocpr0s11l:/$ sudo mv /usr/lib/x86_64-linux-gnu/libcrypto.so /usr/lib/x86_64-linux-gnu/libcrypto.so.6

9. Run dcmitool
rorke@ocpr0s11l:~/dcmi-mei-from-quanta$ ./dcmitool -V
dcmitool version 2.00.05.000
rorke@ocpr0s11l:~/dcmi-mei-from-quanta$ ./dcmitool power status -vvvvv
IDC-HI rsSa : 20
IDC-HI netFn : 0
IDC-HI cmdType : 1
IDC-HI dataLength : 0
Cannot open dcmi driver
Error sending IDC-HI request, status=1 ccode=d7
Cannot open dcmi driver
Error sending IDC-HI request, status=1 ccode=d7
Cannot open dcmi driver
Error sending IDC-HI request, status=1 ccode=d7
Get Chassis Power Status failed: Cannot open driver
rorke@ocpr0s11l:~/dcmi-mei-from-quanta$

Note I didn't reboot the system during the last exercise. I issued the reboot command but the system didn't come back. Using ipmitool remotely with power command (off then on, or cycle) didn't help. Will check again after the system is hard power cycle'd.

I have to use sudo for the dcmitool command
rorke@ocpr0s11l:~/dcmi-mei-from-quanta$ sudo ./dcmitool power status
Chassis Power is on
rorke@ocpr0s11l:~/dcmi-mei-from-quanta$ cd ../dcmitool-from-intel/Dcmitool/src
rorke@ocpr0s11l:~/dcmitool-from-intel/Dcmitool/src$ sudo ./dcmitool power status
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
Unable to get Chassis Power Status
rorke@ocpr0s11l:~/dcmitool-from-intel/Dcmitool/src$

Note the dcmi driver is not loaded automatically after the reboot.

Using ipmitool 1.8.12 from https://launchpad.net/~dell-poweredge-team/+archive/poweredge-tools
Could not get local ME info.

The following three commands I tried
sudo ipmitool dcmi power reading
sudo ipmitool dcmi discover
sudo ipmitool dcmi asset_tag
all returned:
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

    Unable to get DCMI information

the dcmitool binary attached in commend #1 also works.

rorke@ocpr0s11l:~/dcmi-from-david/DCMI/dcmitool/bin$ ls -al
total 8
drwxr-xr-x 2 rorke rorke 4096 Nov 28 01:47 .
drwxr-xr-x 5 rorke rorke 4096 Nov 28 01:40 ..
lrwxrwxrwx 1 rorke rorke 37 Jan 11 18:00 dcmitool -> /opt/dcmi/Dcmitool/Linux/x64/dcmitool

rorke@ocpr0s11l:~/dcmi-from-david/DCMI/dcmitool/Dcmitool/Linux/x64$ sudo ./dcmitool dcmi power reading

    Instantaneous power reading: 36 Watts
    Minimum Sampling period: 0 Seconds
    Maximum Sampling period: 105 Seconds
    Average power reading over sample period: 36 Watts
    IPMI Timestamp: Wed Jun 27 12:12:30 802187764
    Sampling period: 5894000 Milliseconds
    Power reading state is: activated

rorke@ocpr0s11l:~/dcmi-from-david/DCMI/dcmitool/Dcmitool/Linux/x64$

Forgot to provide the dcmitool version in comment 16
rorke@ocpr0s11l:~/dcmi-from-david/DCMI/dcmitool/Dcmitool/Linux/x64$ sudo ./dcmitool -V
dcmitool version 1.8.10

Changed in opencompute:
status: New → Confirmed
importance: Critical → High
Andres Rodriguez (andreserl) wrote :

Hi Samantha, David,

We have looked into this issue and it seems that it is not related to MAAS, but rather to the kernel.

For this reason I'm marking this bug report as Invalid. If you feel this should continue to be a bug related to MAAS, please feel free to reopen the bug.

Best regards.

Changed in maas:
status: New → Invalid
Jeff Marcom (jeffmarcom) wrote :

Just so that we have some data:

During the enlistment and commissioning phase of MAAS the ipmi-inband command on the node fails with the following error:

"bmc-config: 2039 map pfn expected mapping type uncached-minus for:
3e9000-7f3ea00, got write back."

Jeff Marcom (jeffmarcom) wrote :

I agree this is a kernel issue, and that we probably should be filling and tracking against linux.

no longer affects: linux

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1156667

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: precise

AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.25.
ApportVersion: 2.0.1-0ubuntu17.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 1625 F.... pulseaudio
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'PCH'/'HDA Intel PCH at 0xc2620000 irq 52'
   Mixer name : 'Intel CougarPoint HDMI'
   Components : 'HDA:10ec0888,10ec0888,00100202 HDA:80862805,80860101,00100000'
   Controls : 67
   Simple ctrls : 25
DistroRelease: Ubuntu 12.04
HibernationDevice: RESUME=UUID=3c978014-bfb9-43e5-b497-193f13fd53e1
InstallationMedia: Ubuntu 12.04.2 LTS "Precise Pangolin" - Release amd64 (20130213)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
MachineType: INSYDE StrawberryMountain
MarkForUpload: True
Package: linux (not installed)
ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.5.0-23-generic root=UUID=32cb7b84-bce6-432a-8f2d-2adc2dff3c81 ro quiet splash initcall_debug vt.handoff=7
ProcVersionSignature: Ubuntu 3.5.0-23.35~precise1-generic 3.5.7.2
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/ubuntu not ours.
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-23-generic N/A
 linux-backports-modules-3.5.0-23-generic N/A
 linux-firmware 1.79.1
RfKill:

Tags: precise
Uname: Linux 3.5.0-23-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 06/18/2012
dmi.bios.vendor: Insyde Corp.
dmi.bios.version: SBM. 03.72.24
dmi.board.asset.tag: Type2 - Board Asset Tag
dmi.board.name: Type2 - Board Product Name1
dmi.board.vendor: Type2 - Board Vendor Name1
dmi.board.version: Type2 - Board Version
dmi.chassis.asset.tag: Chassis Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: Chassis Manufacturer
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnInsydeCorp.:bvrSBM.03.72.24:bd06/18/2012:svnINSYDE:pnStrawberryMountain:pvrSBM.03.72.24:rvnType2-BoardVendorName1:rnType2-BoardProductName1:rvrType2-BoardVersion:cvnChassisManufacturer:ct10:cvrChassisVersion:
dmi.product.name: StrawberryMountain
dmi.product.version: SBM. 03.72.24
dmi.sys.vendor: INSYDE

tags: added: apport-collected

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Keve Gabbert (keve-a-gabbert) wrote :

has anyone on Ubuntu kernel team looked into this issue?
have you tried Ubuntu 13.10?

David Duffey (dduffey) wrote :

Hi Keve,

We've tried on 14.04 LTS (daily). the inband management does not work and the system locks when rebooting.

We did have the Ubuntu kernel team take a look. While we get a module packaged and working in dkms format (and the dcmitool compiled).

The question from the kernel team is why this has not been uptreamed, or if it has can you let us know to enable the upstream functionality.

David

David Duffey (dduffey) on 2014-02-26
Changed in opencompute:
importance: High → Critical
Keve Gabbert (keve-a-gabbert) wrote :

sorry for not following this closely, but what is it that has not been upstreamed?

David Duffey (dduffey) wrote :

The dcmi kernel module has not been upstreamed (which the dcmitool uses, which is necessary for inband management and setting username/passwords and BMC information).

The dcmi kernel module also relies on the mei driver. The upstream mei driver causes the OCP v2 Intel Windmill machine to hang on reboot. So the mei issue appears to be more of a bug. Bug the dcmi module is not upstreamed.

To post a comment you must log in.