ipmi template tries to apply ipmi to *hosts* address rather than ipmi address

Bug #1293791 reported by Scott Moser
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Julian Edwards

Bug Description

The maas 'ipmi.template' has the following code:

| # Parameters.
| power_change={{power_change}}
| power_address={{power_address}}
| ...
| # If ip_address was supplied, use it in preference to the power_address,
| # because it gets discovered on-the-fly based on mac_address.
| {{if ip_address}}
| power_address={{ip_address}}
| {{endif}}
| ...
|
| issue_ipmi_command() {
| echo workaround |\
| ${ipmipower} ${workarounds} ${driver_option} -h ${power_address} -u ${power_user} -p ${power_pass} "$@"
| ...

The "if ip_address" section is simply wrong.

If maas is aware of an ip address that it would use for this *HOST*, then it will try to contact ipmi at that address.

The following is some output of my debug statements in ipmi.template, with stdout and stderr in line:

| Mon, 17 Mar 2014 19:44:03 +0000: [4741] power_address=192.168.1.202 [ip_address=192.168.9.7] power_change=off user=maas pass=MYPASSWORD driver=LAN_2_0 config=/etc/maas/templates/power/ipmi.conf
| echo workaround | /usr/sbin/ipmi-chassis-config -W opensesspriv --driver-type=LAN_2_0 -h 192.168.9.7 -u maas -p MYPASSWORD --commit --filename /etc/maas/templates/power/ipmi.conf
| /usr/sbin/ipmi-chassis-config: connection timeout
| echo workaround | /usr/sbin/ipmipower -W opensesspriv --driver-type=LAN_2_0 -h 192.168.9.7 -u maas -p MYPASSWORD --off
| 192.168.9.7: connection timeout
| Mon, 17 Mar 2014 19:44:44 +0000: ret=0

Related branches

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Scott, I have previously tested this quite thoroughly on my local servers and I can't find anything wrong.

The ip_address is only set in the template if mac_address is defined in the power parameters. That mac_address is supposed to be the MAC of the BMC, *not* the host.

The intention of this logic is to get around the fact that BMCs that use DHCP can change their IP address at any time. If you want to revert to the old behaviour, don't set the MAC address.

Changed in maas:
status: New → Incomplete
Revision history for this message
Julian Edwards (julian-edwards) wrote :

(I should add, the ip_address is looked up from the MAC with ARP)

Revision history for this message
Scott Moser (smoser) wrote :

I've never set a mac address on the ipmi.
There are 2 MACs set for the systems in question, both are registered.

This is run on the system:
ubuntu@maas-1-03:~$ for i in /sys/class/net/*; do echo "$i: $(cat $i/address)"; done
/sys/class/net/eth0: 00:25:90:4c:f0:04
/sys/class/net/eth1: 00:25:90:4c:f0:05
/sys/class/net/lo: 00:00:00:00:00:00

And then this from maas cli:
$ maas maaslocal node read node-2eaf26be-ab9d-11e3-befe-d4ae527ac129
{
    "status": 6,
    "macaddress_set": [
        {
            "resource_uri": "/MAAS/api/1.0/nodes/node-2eaf26be-ab9d-11e3-befe-d4ae527ac129/macs/00%3A25%3A90%3A4c%3Af0%3A04/",
            "mac_address": "00:25:90:4c:f0:04"
        },
        {
            "resource_uri": "/MAAS/api/1.0/nodes/node-2eaf26be-ab9d-11e3-befe-d4ae527ac129/macs/00%3A25%3A90%3A4c%3Af0%3A05/",
            "mac_address": "00:25:90:4c:f0:05"
        }
    ],
    "hostname": "maas-1-03.maas",
    "zone": {
        "resource_uri": "/MAAS/api/1.0/zones/default/",
        "name": "default",
        "description": ""
    },
    "routers": [
        "20:4e:7f:94:2e:10"
    ],
    "netboot": false,
    "cpu_count": 24,
    "storage": 6489018,
    "owner": "admin",
    "system_id": "node-2eaf26be-ab9d-11e3-befe-d4ae527ac129",
    "architecture": "amd64/generic",
    "memory": 49152,
    "power_type": "ipmi",
    "tag_names": [
        "use-fastpath-installer"
    ],
    "ip_addresses": [
        "192.168.9.8"
    ],
    "resource_uri": "/MAAS/api/1.0/nodes/node-2eaf26be-ab9d-11e3-befe-d4ae527ac129/"
}

You can look at garage maas and see. The MAC address box in the ipmi settings is empty when I look at
http://localhost:8001/MAAS/nodes/node-2eaf26be-ab9d-11e3-befe-d4ae527ac129/edit/

(ssh forwarded 8001 -> maas system)

Changed in maas:
status: Incomplete → New
Revision history for this message
Raphaël Badin (rvb) wrote :

I might be missing something, but I think Scott is right.

Here is get_effective_power_parameters() from node.py:

    def get_effective_power_parameters(self):
        """Return effective power parameters, including any defaults."""
        [...]
        # The "mac" parameter defaults to the node's primary MAC
        # address, but only if not already set.
        if 'mac_address' not in power_params:
            primary_mac = self.get_primary_mac()
            if primary_mac is not None:
                mac = primary_mac.mac_address.get_raw()
                power_params['mac_address'] = mac
        return power_params

We can see that the primary MAC (i.e. the first MAC from the *host*) is put in the power_params if it doesn't already contain a value for the key 'mac_address'.

Later on, in src/provisioningserver/tasks.py, we use this MAC address to find the corresponding IP using ARP:

def issue_power_action(power_type, power_change, **kwargs):
    """Issue a power action to a node.

    :param power_type: The node's power type. Must have a corresponding
        power template.
    :param power_change: The change to request: 'on' or 'off'.
    :param **kwargs: Keyword arguments are passed on to :class:`PowerAction`.
    """
    assert power_change in ('on', 'off'), (
        "Unknown power change keyword: %s" % power_change)
    kwargs['power_change'] = power_change
    if 'mac_address' in kwargs:
        kwargs['ip_address'] = find_ip_via_arp(kwargs['mac_address'])
    [...]

Revision history for this message
Raphaël Badin (rvb) wrote :

I just built a package with the following diff: http://paste.ubuntu.com/7113619/. Testing that package in the lab right now.

Revision history for this message
Raphaël Badin (rvb) wrote :

I got 4 consecutive successes in the lab using the package I just built.

Revision history for this message
Gavin Panella (allenap) wrote :

For AMT, at least as far as we've seen it so far, it makes sense to use the MAC address of the *host* when no MAC is specified in the power parameters; the AMT controller lives like a lamprey on the same NIC as the host. However, this makes sense pretty much *only* for AMT right now.

Revision history for this message
Jeff Lane  (bladernr) wrote :

I ran into this earlier today and after filing what is actually a duplicate bug, jhobbs pointed me to the patch from Raphael in comment 5 and I'll add a confirmation that it resolves the issue on my IPMI based systems.

Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1293791] Re: ipmi template tries to apply ipmi to *hosts* address rather than ipmi address

On Tuesday 18 Mar 2014 11:47:44 you wrote:
> I just built a package with the following diff:
> http://paste.ubuntu.com/7113619/. Testing that package in the lab right
> now.

This is the wrong fix.

Something has happened to the code in node.py, I am sure it's not the same as
what I tested. The old code used to only do the MAC addition if it was a
power_type of WoL.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

On 03/18/2014 06:22 PM, Julian Edwards wrote:
> On Tuesday 18 Mar 2014 11:47:44 you wrote:
>> I just built a package with the following diff:
>> http://paste.ubuntu.com/7113619/. Testing that package in the lab right
>> now.
>
> This is the wrong fix.
>
> Something has happened to the code in node.py, I am sure it's not the same as
> what I tested. The old code used to only do the MAC addition if it was a
> power_type of WoL.

Is it possible the bugs being reported against the trusty package are
due to only part of the implementation being pulled into the patch for
trusty?

Jason

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Tuesday 18 Mar 2014 23:34:23 Jason Hobbs wrote:
> On 03/18/2014 06:22 PM, Julian Edwards wrote:
> > On Tuesday 18 Mar 2014 11:47:44 you wrote:
> >> I just built a package with the following diff:
> >> http://paste.ubuntu.com/7113619/. Testing that package in the lab right
> >> now.
> >
> > This is the wrong fix.
> >
> > Something has happened to the code in node.py, I am sure it's not the same
> > as what I tested. The old code used to only do the MAC addition if it
> > was a power_type of WoL.

I meant AMT here of course... :/

> Is it possible the bugs being reported against the trusty package are
> due to only part of the implementation being pulled into the patch for
> trusty?

I don't think so, but never say never... I'm going to do a fix for this ASAP
and then we can forget about it.

Changed in maas:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Julian Edwards (julian-edwards)
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → In Progress
Changed in maas:
status: In Progress → Fix Committed
milestone: none → 14.04
Revision history for this message
Raphaël Badin (rvb) wrote :

On 03/19/2014 12:22 AM, Julian Edwards wrote:
> On Tuesday 18 Mar 2014 11:47:44 you wrote:
>> I just built a package with the following diff:
>> http://paste.ubuntu.com/7113619/. Testing that package in the lab right
>> now.
>
> This is the wrong fix.
>

It wasn't intended to be a fix. Just a way to prove that the problem
was right there.

Revision history for this message
Mike Rushton (leftyfb) wrote :

I tried the patch which is just removing the following 3 lines from /etc/maas/templates/power/ipmi.template

#{{if ip_address}}
#power_address={{ip_address}}
#{{endif}}

The MAAS server is still not able to power on or off the server via ipmi when trying to commission or start/stop a node. I am able to query and control the BMC via ipmi manually just fine.

This is on Ubuntu 14.04 amd64 with all updates including maas 1.5+bzr1977-0ubuntu5

Revision history for this message
Mike Rushton (leftyfb) wrote :

The servers we're testing at the moment are Cisco C220 and C240

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Mike, commenting the lines out doesn't work - you have to remove them all the way.

Revision history for this message
Kent Baxley (kentb) wrote :

Mike,

DON'T comment it out. I tried that earlier (see above comments).
With the format of this file you have to either get rid of them
completely or maybe put a space between the pound sign and the {{.

On Wed, Mar 19, 2014 at 10:20 AM, Mike Rushton
<email address hidden> wrote:
> The servers we're testing at the moment are Cisco C220 and C240
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1294332).
> https://bugs.launchpad.net/bugs/1293791
>
> Title:
> ipmi template tries to apply ipmi to *hosts* address rather than ipmi
> address
>
> Status in MAAS:
> Fix Committed
>
> Bug description:
>
> The maas 'ipmi.template' has the following code:
>
> | # Parameters.
> | power_change={{power_change}}
> | power_address={{power_address}}
> | ...
> | # If ip_address was supplied, use it in preference to the power_address,
> | # because it gets discovered on-the-fly based on mac_address.
> | {{if ip_address}}
> | power_address={{ip_address}}
> | {{endif}}
> | ...
> |
> | issue_ipmi_command() {
> | echo workaround |\
> | ${ipmipower} ${workarounds} ${driver_option} -h ${power_address} -u ${power_user} -p ${power_pass} "$@"
> | ...
>
> The "if ip_address" section is simply wrong.
>
> If maas is aware of an ip address that it would use for this *HOST*,
> then it will try to contact ipmi at that address.
>
> The following is some output of my debug statements in ipmi.template,
> with stdout and stderr in line:
>
> | Mon, 17 Mar 2014 19:44:03 +0000: [4741] power_address=192.168.1.202 [ip_address=192.168.9.7] power_change=off user=maas pass=MYPASSWORD driver=LAN_2_0 config=/etc/maas/templates/power/ipmi.conf
> | echo workaround | /usr/sbin/ipmi-chassis-config -W opensesspriv --driver-type=LAN_2_0 -h 192.168.9.7 -u maas -p MYPASSWORD --commit --filename /etc/maas/templates/power/ipmi.conf
> | /usr/sbin/ipmi-chassis-config: connection timeout
> | echo workaround | /usr/sbin/ipmipower -W opensesspriv --driver-type=LAN_2_0 -h 192.168.9.7 -u maas -p MYPASSWORD --off
> | 192.168.9.7: connection timeout
> | Mon, 17 Mar 2014 19:44:44 +0000: ret=0
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1293791/+subscriptions

--
Kent Baxley
Field Engineer, Canonical
<email address hidden>

Revision history for this message
Mike Rushton (leftyfb) wrote :

Sorry about that. Deleting them completely does resolve the issue as you said.

Revision history for this message
Kent Baxley (kentb) wrote :

Hi Julian,

I tried adding your fix in manually to /etc/maas/templates/power/ipmi.template. Unfortunately, I still cannot automatically power on my Dell PowerEdge blades over IPMI with pressing "Commission Node".

This particular blade has a static IP assigned to it and I can power the blade up by hand using ipmipower.

The version of MAAS that I'm running is 1.5+bzr1977-0ubuntu5

The only way I can restore the power-on functionality via MAAS is to completely remove those three lines in ipmi.template like we did earlier.

Are there other fixes that need to go in along with the one that was merged for this issue? Thanks!

Revision history for this message
Kent Baxley (kentb) wrote :

The same also goes for the tower and rack systems in the lab. Some of these share the BMC on the NIC but (of course) have a separate mac address for each, while other machines have their own dedicated port for the BMC.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I think Kent and others are seeing this bug on 1.5+bzr1977-0ubuntu5 because the patch applied picked up part of r2029 (changes to impi.template) but not the rest of r2029, particularly "kwargs.setdefault('ip_address', None)", which causes the rendering of the template to fail.

Here's the bad diff:
http://launchpadlibrarian.net/169928836/maas_1.5%2Bbzr1977-0ubuntu4_1.5%2Bbzr1977-0ubuntu5.diff.gz

Here's r2029:
http://bazaar.launchpad.net/~maas-maintainers/maas/trunk/revision/2029

Revision history for this message
Kent Baxley (kentb) wrote :

I added the relevant pieces from r2029 to my maas server and there's still something about the template rendering that maas doesn't like, either with or without Julian's most recent changes to the ipmi.template. I get the traceback noted in the bug that's duplicated to this one if I have those three lines present in my ipmi template.

Is there anything else I might need?

Revision history for this message
Kent Baxley (kentb) wrote :

Alternatviely, would you guys consider backing out the 'bad' diff for now in an update to maas in the main trusty archive until the rest of this is sorted out? It'd really help us move forward with certification work on at least the Dell and HP server lines.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Please don't cherry pick diffs, instead please try the package in the daily builds PPA. When you cherry pick fixes, it means we have no idea if the template is consistent with the code that drives its values. If the latest package is still broken then I will re-open the bug.

Thanks.

Revision history for this message
Kent Baxley (kentb) wrote :

Ok. I thought that'd be your answer :) I'll try the daily build and see what happens.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Actually don't use daily builds, use https://launchpad.net/~maas-maintainers/+archive/daily-qa-ok

The daily builds has got experimental stuff in it!

Kent Baxley (kentb)
tags: added: blocks-hwcert-server
Revision history for this message
Kent Baxley (kentb) wrote :

I installed the latest qa-checked daily build and still can't get the nodes to power up automatically at commissioning time.

No tracebacks in the celery log, though. This build does not, of course, include any of the latest commits to the ipmi.template file that Julian made. In other words I still have this in it:

{{if ip_address}}
power_address={{ip_address}}
{{endif}}

Let me know else I can get for you on this. I can currently work around it by issuing ipmipower 'on' commands to my BMC. I'll leave my ipmi.template file alone for now.

Revision history for this message
Kent Baxley (kentb) wrote :

Ok. One node actually powered on this time, so, I'm going to check a few more and see how it goes. This same node was consistently not powering on before the update.

Revision history for this message
Kent Baxley (kentb) wrote :

The same node that I just said powered up at commissioning time, failed to power up when I clicked 'start node', so, it looks like things aren't very consistent at best.

Revision history for this message
Kent Baxley (kentb) wrote :

I tried another machine and got the same behavior. I can enlist, and the automatically power up the machine when pressing "commission node". However, it will not power up automatically when I run 'start node' after commissioning...so, two systems in a row are doing this.

Revision history for this message
Kent Baxley (kentb) wrote :

Three machines in a row now. So, the trend seems to mostly be:

System will power automatically after enlisting and clicking 'commission node'

After the node commissions, it will not power on automatically when clicking 'start node'.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Thursday 27 Mar 2014 18:18:43 you wrote:
> The same node that I just said powered up at commissioning time, failed
> to power up when I clicked 'start node', so, it looks like things aren't
> very consistent at best.

Right it's because it's missing that IPMI template change that you noticed.
Can you do a manual hack for now please?

Revision history for this message
Kent Baxley (kentb) wrote :

Thanks, Julian. I added the "and not power_address" back into the template.

This seems to fix it once and for all. I can now power up the nodes via the web UI again.

Thanks, again, and looking forward to something landing in the main archive soon.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On 29/03/14 07:30, Kent Baxley wrote:
> Thanks, Julian. I added the "and not power_address" back into the
> template.
>
> This seems to fix it once and for all. I can now power up the nodes via
> the web UI again.
>
> Thanks, again, and looking forward to something landing in the main
> archive soon.
>

An upload was made over the weekend, it will appear as soon as it's
reviewed.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On 29/03/14 07:30, Kent Baxley wrote:
> Thanks, Julian. I added the "and not power_address" back into the
> template.
>
> This seems to fix it once and for all. I can now power up the nodes via
> the web UI again.
>
> Thanks, again, and looking forward to something landing in the main
> archive soon.
>

An upload was made over the weekend, it will get promoted from the
upload queue as soon as it's reviewed by someone in the release team.

Revision history for this message
Kent Baxley (kentb) wrote :

Looking good so far with the maas from the trusty main archive. Power on / off occurs without any manual workarounds.

Thanks!

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Thursday 03 Apr 2014 20:55:46 you wrote:
> Looking good so far with the maas from the trusty main archive. Power
> on / off occurs without any manual workarounds.
>
> Thanks!

\o/

Thanks for confirming.

Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.