MAAS ipmi fails on OCPv3 Roadrunner

Bug #1210393 reported by David Duffey
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Jason Hobbs
The Open Compute Project
Fix Released
High
Unassigned
linux (Ubuntu)
Fix Released
Medium
Unassigned
Saucy
Won't Fix
Medium
Unassigned

Bug Description

The OCPv3 Roadrunner machine has been fully enabled and passes certification testing. When testing ipmitool locally I'm able to setup the BMC and users, etc.

When using MAAS, MAAS is able to setup the BMC network information (I see that it changes that), but it appears to fail to set a username and password. If I try to use the username and password as defined in the MAAS GUI, it fails. Therefore commissioning and juju bootstrapping the node has to be done manually (by physically pushing the power button).

If I use the username/password I've set on the BMC I can see that MAAS fails to set the username 'maas' and the password as defined in the MAAS gui.

Since the commissioning/enlisting process is temporary and I'm not sure how to login to this phase to gather data, troubleshooting tips are welcome.

Related branches

Revision history for this message
David Duffey (dduffey) wrote :

I should add, that using ipmitool manually from a remote host using the 'admin' user I created also works fine. It simply seems that MAAS thinks it has set a user 'maas' when it has not.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Andres, you know more about this than me, any idea?

Changed in maas:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

This error is present regardless whether maas is in use. Even using freeipmi locally fails to set ipmi parameters or get sdr readings with an error similar to:

map pfn expected mapping type uncached-minus for d7fa0000-d7fa1000, got write-back

This is present in saucy using openipmi and freeipmi.

Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

BMC info:

<lshw:node id="serial" claimed="true" class="bus" handle="PCI:0000:01:00.6">
         <lshw:description>IPMI SMIC interface</lshw:description>
         <lshw:product>Broadcom Corporation</lshw:product>
         <lshw:vendor>Broadcom Corporation</lshw:vendor>
         <lshw:physid>0.6</lshw:physid>
         <lshw:businfo>pci@0000:01:00.6</lshw:businfo>
         <lshw:version>00</lshw:version>
         <lshw:width units="bits">64</lshw:width>
         <lshw:clock units="Hz">33000000</lshw:clock>
         <lshw:configuration>
          <lshw:setting id="driver" value="ipmi_si"/>
          <lshw:setting id="latency" value="0"/>
         </lshw:configuration>
         <lshw:capabilities>
          <lshw:capability id="pm">Power Management</lshw:capability>
          <lshw:capability id="pciexpress">PCI Express</lshw:capability>
          <lshw:capability id="bus_master">bus mastering</lshw:capability>
          <lshw:capability id="cap_list">PCI capabilities listing</lshw:capability>
         </lshw:capabilities>
         <lshw:resources>
          <lshw:resource type="irq" value="0"/>
          <lshw:resource type="memory" value="d0010000-d001ffff"/>
          <lshw:resource type="memory" value="d0000000-d000ffff"/>

Jeff Marcom (jeffmarcom)
affects: linux → ubuntu
affects: ubuntu → linux (Ubuntu)
no longer affects: freeipmi (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1210393

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jeff Marcom (jeffmarcom) wrote : apport information

ApportVersion: 2.12.1-0ubuntu4
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 2918 F.... pulseaudio
CasperVersion: 1.336
DistroRelease: Ubuntu 13.10
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
LiveMediaBuild: Ubuntu 13.10 "Saucy Salamander" - Alpha amd64 (20130916)
MachineType: empty empty
MarkForUpload: True
Package: linux (not installed)
ProcEnviron:
 TERM=linux
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: noprompt cdrom-detect/try-usb=true file=/cdrom/preseed/hostname.seed boot=casper initrd=/casper/initrd.lz quiet splash -- maybe-ubiquity
ProcVersionSignature: Ubuntu 3.11.0-7.13-generic 3.11.0
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.11.0-7-generic N/A
 linux-backports-modules-3.11.0-7-generic N/A
 linux-firmware 1.114
RfKill:

Tags: saucy
Uname: Linux 3.11.0-7-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 03/04/2013
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 4.6.5
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: S8237
dmi.board.vendor: TYAN
dmi.board.version: empty
dmi.chassis.asset.tag: empty
dmi.chassis.type: 28
dmi.chassis.vendor: empty
dmi.chassis.version: empty
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr4.6.5:bd03/04/2013:svnempty:pnempty:pvrempty:rvnTYAN:rnS8237:rvrempty:cvnempty:ct28:cvrempty:
dmi.product.name: empty
dmi.product.version: empty
dmi.sys.vendor: empty

tags: added: apport-collected saucy
Revision history for this message
Jeff Marcom (jeffmarcom) wrote : AlsaInfo.txt

apport information

Revision history for this message
Jeff Marcom (jeffmarcom) wrote : BootDmesg.txt

apport information

Revision history for this message
Jeff Marcom (jeffmarcom) wrote : CRDA.txt

apport information

Revision history for this message
Jeff Marcom (jeffmarcom) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Jeff Marcom (jeffmarcom) wrote : Lspci.txt

apport information

Revision history for this message
Jeff Marcom (jeffmarcom) wrote : Lsusb.txt

apport information

Revision history for this message
Jeff Marcom (jeffmarcom) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Jeff Marcom (jeffmarcom) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Jeff Marcom (jeffmarcom) wrote : ProcModules.txt

apport information

Revision history for this message
Jeff Marcom (jeffmarcom) wrote : UdevDb.txt

apport information

Revision history for this message
Jeff Marcom (jeffmarcom) wrote : UdevLog.txt

apport information

Revision history for this message
Jeff Marcom (jeffmarcom) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → New
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1210393

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

Troubleshooting steps:

modprobe ipmi_msghandler
modprobe ipmi_devintf
modprobe ipmi_si

Will enable the bmc device parameters to be changed locally and eliminate the write back errors, something I feel the Saucy ephemeral images are not doing.

Secondly, trying to enable link authentication fails to set:
 ipmitool lan set 1 auth ADMIN MD5,PASSWORD

Channel link access still states:
User ID : 4

User Name : admin

Fixed Name : No

Access Available : call-in / callback

Link Authentication : disabled

IPMI Messaging : enabled

Also, setting link access for channel 1 fails:

sudo ipmitool lan set 1 access on
Set Channel Access for channel 1 failed: Unknown (0x83)

Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

BMC is detected on the network, just cannot establish connection via link:

ipmiping 10.193.37.122
ipmiping 10.193.37.122 (10.193.37.122)
response received from 10.193.37.122: rq_seq=32
response received from 10.193.37.122: rq_seq=33
response received from 10.193.37.122: rq_seq=34
response received from 10.193.37.122: rq_seq=35
response received from 10.193.37.122: rq_seq=36

Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Saucy):
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

I don't understand. Why was this set to incomplete? What specific information do you need?

Changed in linux (Ubuntu):
status: Incomplete → New
Changed in linux (Ubuntu Saucy):
status: Incomplete → New
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1210393

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Saucy):
status: New → Incomplete
Jeff Marcom (jeffmarcom)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Saucy):
status: Incomplete → Confirmed
Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

Confirmed that the ipmi-locate command fails during the MAAS enlistment process on OCV3 bmc.

ipmi-locate failure:
Probing KCS device using DMIDECODE... done
IPMI Version: 2.0
IPMI locate driver: DMIDECODE
IPMI interface: KCS
BMC driver device:
BMC memory base address: 0x0
Register spacing: 1

Probing SMIC device using DMIDECODE... FAILED

Probing BT device using DMIDECODE... FAILED

Probing SSIF device using DMIDECODE... FAILED

Probing KCS device using SMBIOS... done
IPMI Version: 2.0
IPMI locate driver: SMBIOS
IPMI interface: KCS
BMC driver device:
BMC memory base address: 0x0
Register spacing: 1

Probing SMIC device using SMBIOS... FAILED

Probing BT device using SMBIOS... FAILED

Probing SSIF device using SMBIOS... FAILED

Probing KCS device using ACPI... FAILED

Probing SMIC device using ACPI... FAILED

Probing BT device using ACPI... FAILED

Probing SSIF device using ACPI... FAILED

Probing KCS device using PCI... FAILED

Probing SMIC device using PCI... FAILED

Probing BT device using PCI... FAILED

Probing SSIF device using PCI... FAILED

Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

ipmi-chassis --get-capabilities command success:

Intrusion sensor : not provided
Front Panel Lockout : provided
Diagnostic Interrupt : provided
Power interlock : not provided
FRU Info Device Address : 20h
SDR Device Address : 20h
SEL Device Address : 20h
Sys Mgmt Device Address : 0h
Bridge Device Address : 20h

IPMI modules are loaded

Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

User list after cloud-init/ipmitool has tried to set maas username/password:

ID Name Callin Link Auth IPMI Msg Channel Priv Limit
1 true false true USER
2 admin true false true ADMINISTRATOR
3 Operator true false true OPERATOR
4 admin true false true ADMINISTRATOR

Changed in maas:
status: Triaged → New
Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

Just to clarify, I'm able to successfully update the username+password in the ephemeral saucy image by hand using ipmitool

Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

So basically I was able to get this to work in maas.

Basically you need to change the following line in :
/etc/maas/templates/commissioning-user-data/snippets/maas_ipmi_autodetect.py

    def commit_ipmi_user_settings(user, password):
    ipmi_user_number = get_ipmi_user_number(user)
    if ipmi_user_number is None:
        (status, output) = commands.getstatusoutput('bmc-config --commit --key-pair="User10:Username=%s"' % user)
        ipmi_user_number = get_ipmi_user_number(user)

You have to change User10 to User4, as this bmc config maxes out at 4 User sections.

Revision history for this message
Jeff Marcom (jeffmarcom) wrote :

Also, it seems you have to set the power management option to ipmi_2.0 in the MAAS UI because the power command will work.

Changed in maas:
status: New → Triaged
Changed in maas:
assignee: nobody → Andres Rodriguez (andreserl)
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Ok so here's the bug:

maas_ipmi_autodetect.py looks for a user slot called "maas", and reprograms that first. If it does not exist then it defaults to User10. User10 does not exist on this hardware.

What it should be doing:
 * Look for a user called maas and re-use or re-program
 * Failing that, look for an empty slot
 * Failing that, go for User4 (or perhaps make this depend on the type of BMC detected?)

Jeff Marcom (jeffmarcom)
Changed in opencompute:
status: New → Triaged
status: Triaged → Confirmed
importance: Undecided → High
Revision history for this message
Jeff Lane  (bladernr) wrote :

why not have it look for an existing user called maas, and if none is present, have it walk the slots starting at 0 until it finds an empty or runs out of spaces?

Or is that one of those weird things that seem simple on paper, but in practice are really actually difficult to do because of some IPMI issue or design?

Anyway, just curious.

Changed in maas:
assignee: Andres Rodriguez (andreserl) → Jason Hobbs (jason-hobbs)
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I've started work on a patch to fix this. It will find either an existing maas user, or will find the first disabled user with an empty username. If it can't find either it will bail and give up on automatic IPMI config.

Revision history for this message
David Duffey (dduffey) wrote :

@ Jason,

Is this targeted for 14.04 LTS?

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi David!

Yes this is targeted for 14.04.

Cheers.

Changed in maas:
milestone: none → 14.04
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I've posted a branch with a fix to lp:~jason-hobbs/maas/lp-1210393

I've manually tested this, but for lack of access, not on OCPv3 Roadrunner.

Revision history for this message
David Duffey (dduffey) wrote :

Thanks Jason, we will get this tested (Rod/Samantha)

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Cool David - let me know how it works out. The branch is otherwise complete/reviewed and ready to land.

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Hey Jason! Thanks!

Could you please re-assign this bug, then, to Samantha or Rod? And Samantha/Rod, could you please update the status (in-progress) when you're working on it?

That helps those of us here watching/waiting for this work to complete :-)

Thanks!
Dustin

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Hey Dustin - I reassigned to David since I'm not sure who will be testing it. David/Samantha/Rod - please reassign to whoever is doing the test!

Changed in maas:
status: Triaged → In Progress
assignee: Jason Hobbs (jason-hobbs) → David Duffey (david-duffey)
Changed in maas:
status: In Progress → Fix Committed
tags: added: server-hwe
Changed in maas:
assignee: David Duffey (david-duffey) → Jason Hobbs (jason-hobbs)
Changed in maas:
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Changed in opencompute:
status: Confirmed → Fix Released
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I don't believe there are plans to fix this against Saucy.

Changed in linux (Ubuntu Saucy):
status: Confirmed → Incomplete
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Closing unsupported series nomination.

This bug was nominated against a series that is no longer supported, ie saucy. The bug task representing the saucy nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Saucy):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.