mellanox driver losing the plot regularly in HP region

Bug #1245852 reported by Derek Higgins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Invalid
High
Unassigned

Bug Description

the heat event-list for the overcloud ends in

| NovaCompute0 | 37619 | ClientException: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-f4603762-9da9-4f79-b7d8-8f53fc1c5875) | CREATE_FAILED | 2013-10-29T09:58:46Z |
| notcompute | 37620 | CREATE aborted | CREATE_FAILED | 2013-10-29T09:58:46Z |

In the nova logs we have a traceback connecting to neutron (attached), in neutron-server log "Fail scheduling network" and in the neutron-openvswitch-agen log a number of errors running ovs-vsctl list-ports br-int, which is probably the route of the neutron problem (attached)

Revision history for this message
Derek Higgins (derekh) wrote :
Revision history for this message
Derek Higgins (derekh) wrote :

We also have an ovsdb-server error in the syslog
Oct 29 05:58:28 undercloud-notcompute-jws3awlsb2kh ovsdb-server: 00136|reconnect|ERR|unix:/tmp/stream-unix.18185.44: no response to inactivity probe after 5 seconds, disconnecting

Revision history for this message
Derek Higgins (derekh) wrote :

Woops pasted the wrong line above there was one around the same time a the ovs-vsctl failure

Oct 29 09:58:05 undercloud-notcompute-jws3awlsb2kh ovsdb-server: 00139|reconnect|ERR|unix:/tmp/stream-unix.18185.45: no response to inactivity probe after 5 seconds, disconnecting

Revision history for this message
Derek Higgins (derekh) wrote :

Hmmm we seem to be getting a lot of errors messages about a non existing interface

We have lots of these
Oct 29 06:36:45|388952|netdev|WARN|Dropped 41 log messages in last 12 seconds (most recently, 1 seconds ago) due to excessive rate
Oct 29 06:36:45|388953|netdev|WARN|failed to get flags for network device tap8e1c39ef-b8: No such device

fabd9861-5527-4902-998c-26d94aba0564
    Bridge br-ctlplane
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
        Port phy-br-ctlplane
            Interface phy-br-ctlplane
        Port "eth2"
            Interface "eth2"
    Bridge br-int
        Port "tap8e1c39ef-b8"
            tag: 1
            Interface "tap8e1c39ef-b8"
                type: internal
        Port "tap6d0e0158-f8"
            tag: 4095
            Interface "tap6d0e0158-f8"
                type: internal
        Port int-br-ctlplane
            Interface int-br-ctlplane
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "1.4.3"

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN mode DEFAULT
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 78:e7:d1:23:9d:78 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 78:e7:d1:23:9d:79 brd ff:ff:ff:ff:ff:ff
6: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT
    link/ether 78:e7:d1:23:9d:7d brd ff:ff:ff:ff:ff:ff
7: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT
    link/ether 76:05:99:b8:dd:4b brd ff:ff:ff:ff:ff:ff
8: tap6d0e0158-f8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT
    link/ether ea:b7:d3:db:45:e9 brd ff:ff:ff:ff:ff:ff
29: phy-br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 36:05:6d:c6:84:42 brd ff:ff:ff:ff:ff:ff
30: int-br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 4a:ee:34:e1:68:23 brd ff:ff:ff:ff:ff:ff
45: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 78:e7:d1:23:9d:7d brd ff:ff:ff:ff:ff:ff
46: vlan25@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT
    link/ether 78:e7:d1:23:9d:7d brd ff:ff:ff:ff:ff:ff

I'm not sure if this is related but should we remove the tap8e1c39ef-b8 interface to see if it helps things?

Chris Jones (cmsj)
Changed in tripleo:
status: New → Triaged
Changed in tripleo:
importance: Critical → High
summary: - Errors running ovs-vsctl list-ports on undercloud
+ mellanox driver losing the plot regularly in HP region
Revision history for this message
Derek Higgins (derekh) wrote :

Closing this as it is no longer an issue, an upgrade to trusty in hp1 seems to have sorted out the problem

Changed in tripleo:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.