Change in IGMP behavior between 2.6.35-28.49 and 2.6.38-13.53

Bug #954357 reported by George Bonser
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

I have an interface with a /32 IP address. It is just an interface that sends out some multicast for some server clustering. Everything works fine up to 2.6.35-28-server #49

Suddenly at 2.6.38-13-server #53 we get a change in IGMP behavior that is breaking our stuff.

I have an interface configured on a router that acts as an IGMP querier. When the older kernel hears an IGMP query, it sends the report out the same interface on which it heard the query:

12:15:41.822639 IP 10.1.112.5 > 224.0.0.1: igmp query v3
12:15:48.274824 IP 10.1.112.41 > 224.0.0.22: igmp v3 report, 2 group record(s)

Note that the querier is 10.1.112.5 but the IP address on that interface (10.1.112.41) is a /32 and is just used as an IP address for an application to bind to.

What happens in 2.6.38-13-server #53 is that the unit sends the igmp report out the interface that has a route back to 10.1.112.5 (the default gateway) rather than the interface on which it heard the query. That breaks IGMP because the switches along the path to that interface that heard the query never hear the report and so they turn off multicast traffic to that interface. The default gateway interface takes a different path through different switches. The topology over which the query is heard is not the topology over which the report is sent resulting in IGMP shutting off the multicast traffic. The unit should not pay attention to the IP address of the querier and should send the report out the interface on which the query was heard as has been the standard practice for as long as we have been using Linux.

If I do a "ip route add 10.1.112.5/32 dev <interface>" then the IGMP queries work. It seems to want to send the report on the interface that has a route back to the querier when I am not worried about the querier hearing the report so much as all the switches in the path that have IGMP snooping enabled.

The current behavior breaks IGMP.

Tags: igmp maverick
Revision history for this message
George Bonser (georgeb) wrote :

Ok, kill this one. Apparently something got hosed in the kernel and rebooting the box made this behavior go away. It had been running for about two months and began exhibiting this behavior yesterday.

Rebooting the server restored the expected behavior.

Chalk this one up to gremlins, I suppose.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 954357

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: maverick
Revision history for this message
George Bonser (georgeb) wrote :

The "bad" unit had been running for about 2 months without the application running, we started the application yesterday and noticed this unexpected behavior. Two units in its cluster running the older kernel which have been running the application , one for 48 consecutive days and the other for 94 days have not had an issue. The "odd" box had been booted but not running the application software until yesterday.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.