On running network and I/O traffic on SM15000-XE, "BUG: soft lockup -- CPU#0 stuck for 22s!" seen on dmesg

Bug #1183135 reported by vikrant soman
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
irqbalance (Ubuntu)
Fix Released
High
Unassigned
Precise
Won't Fix
High
Unassigned
Quantal
Fix Released
Medium
Unassigned

Bug Description

Ubuntu 12.04 LTS was installed on 2 servers on a SM15000-XE system. Bidirectional iperf traffic was started on 7 nics between the 2 servers. After a few hours, following stack trace was observed on the dmesg. Sluggish network performance was observed after that

[15640.958682] sched: RT throttling activated [16924.585541] ------------[ cut here ]------------

[16924.585549] WARNING: at /build/buildd/linux-3.2.0/net/sched/sch_generic.c:255 dev_watchdog+0x25a/0x270() [16924.585551] Hardware name: Sabre2 [16924.585552] NETDEV WATCHDOG: eth6 (e1000): transmit queue 0 timed out [16924.585553] Modules linked in: 8021q garp stp nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc msr mac_hid lp parport e1000 [16924.585564] Pid: 0, comm: swapper/7 Not tainted 3.2.0-23-generic #36-Ubuntu [16924.585566] Call Trace:
> [16924.585567] <IRQ> [<ffffffff8106712f>] warn_slowpath_common+0x7f/0xc0 [16924.585575] [<ffffffff81067226>] warn_slowpath_fmt+0x46/0x50 [16924.585578] [<ffffffff8155e22a>] dev_watchdog+0x25a/0x270 [16924.585581] [<ffffffff81110b80>] ? perf_rotate_context+0x110/0x220 [16924.585585] [<ffffffff81083030>] ? __queue_work+0x320/0x320 [16924.585587] [<ffffffff8155dfd0>] ? qdisc_reset+0x50/0x50 [16924.585589] [<ffffffff8155dfd0>] ? qdisc_reset+0x50/0x50 [16924.585592] [<ffffffff81076086>] call_timer_fn+0x46/0x160 [16924.585594] [<ffffffff8155dfd0>] ? qdisc_reset+0x50/0x50 [16924.585597] [<ffffffff810779d2>] run_timer_softirq+0x132/0x2a0 [16924.585600] [<ffffffff810950c5>] ? ktime_get+0x65/0xe0 [16924.585604] [<ffffffff8106e928>] __do_softirq+0xa8/0x210 [16924.585607] [<ffffffff8101a779>] ? read_tsc+0x9/0x20 [16924.585609] [<ffffffff8109c054>] ? tick_program_event+0x24/0x30 [16924.585613] [<ffffffff81666cec>] call_softirq+0x1c/0x30 [16924.585616] [<ffffffff81015305>] do_softirq+0x65/0xa0 [16924.585619] [<ffffffff8106ed0e>] irq_exit+0x8e/0xb0 [16924.585622] [<ffffffff8166768e>] smp_apic_timer_interrupt+0x6e/0x99
> [16924.585625] [<ffffffff8166555e>] apic_timer_interrupt+0x6e/0x80 [16924.585626] <EOI> [<ffffffff8136b42d>] ? intel_idle+0xed/0x150 [16924.585631] [<ffffffff8136b40f>] ? intel_idle+0xcf/0x150 [16924.585634] [<ffffffff81504a01>] cpuidle_idle_call+0xc1/0x280 [16924.585637] [<ffffffff8101222a>] cpu_idle+0xca/0x120 [16924.585640] [<ffffffff8163a7fa>] start_secondary+0xd9/0xdb [16924.585642]

---[ end trace 3c74a3d373267b03 ]---

[16941.890574] BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]

[16941.963686] Modules linked in: 8021q garp stp nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc msr mac_hid lp parport e1000
[16941.963697] CPU 0
[16941.963698] Modules linked in: 8021q garp stp nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc msr mac_hid lp parport e1000
[16941.963704]
[16941.963706] Pid: 0, comm: swapper/0 Tainted: G W 3.2.0-23-generic #36-Ubuntu SeaMicro Sabre2/Type2 - Board Product Name1
[16941.963710] RIP: 0010:[<ffffffff81531e2f>] [<ffffffff81531e2f>] __alloc_skb+0xcf/0x240
[16941.963718] RSP: 0018:ffff88043fc03c70 EFLAGS: 00010202
[16941.963719] RAX: 0000000000000000 RBX: ffff88043fc03c60 RCX: 000000000000000a
[16941.963721] RDX: ffff88040f876680 RSI: 00000000000000cc RDI: ffff880420dab478
[16941.963722] RBP: ffff88043fc03cb0 R08: 0000000000000680 R09: 0000000000000800
[16941.963724] R10: ffff880420dab500 R11: 0000000000000001 R12: ffff88043fc03be8
[16941.963725] R13: ffffffff8166555e R14: ffff88043fc03cb0 R15: ffff88040f876000
[16941.963727] FS: 0000000000000000(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000
[16941.963729] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[16941.963730] CR2: 00007f19afb0ad50 CR3: 0000000210999000 CR4: 00000000000406f0
[16941.963732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[16941.963734] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[16941.963735] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0d020)
[16941.963737] Stack:
[16941.987830] ffff88043fc03ca0 00000632813265a4 ffff880424796090 ffff880421200000
[16941.987834] 0000000000000000 00000000000000ff ffff88040f875840 ffff8804218a7a80
[16941.987837] ffff88043fc03cd0 ffffffff81532424 00000000000000fe ffffc90001877fd0
[16941.987840] Call Trace:
[16942.017153] <IRQ>
[16942.042416] [<ffffffff81532424>] __netdev_alloc_skb+0x24/0x50
[16942.042423] [<ffffffffa0003861>] e1000_alloc_rx_buffers+0x291/0x4f0 [e1000]
[16942.042427] [<ffffffff813265c0>] ? map_single+0x60/0x60
[16942.042430] [<ffffffffa000287a>] e1000_clean_rx_irq+0x3ba/0x4d0 [e1000]
[16942.042434] [<ffffffffa0003561>] e1000_clean+0x51/0xc0 [e1000]
[16942.042438] [<ffffffff81540ad4>] net_rx_action+0x134/0x290
[16942.042442] [<ffffffff8106e928>] __do_softirq+0xa8/0x210
[16942.042446] [<ffffffff810334ee>] ? eoi_ioapic_irq.isra.23+0x5e/0x70
[16942.042450] [<ffffffff81666cec>] call_softirq+0x1c/0x30
[16942.042453] [<ffffffff81015305>] do_softirq+0x65/0xa0
[16942.042456] [<ffffffff8106ed0e>] irq_exit+0x8e/0xb0
[16942.042458] [<ffffffff816675a3>] do_IRQ+0x63/0xe0
[16942.042462] [<ffffffff8165c92e>] common_interrupt+0x6e/0x6e
[16942.042463] <EOI>
[16942.067708] [<ffffffff8108ed70>] ? hrtimer_try_to_cancel+0x50/0xc0
[16942.067712] [<ffffffff8109cdb6>] ? tick_nohz_restart_sched_tick+0x106/0x130
[16942.067715] [<ffffffff8109cdb2>] ? tick_nohz_restart_sched_tick+0x102/0x130
[16942.067719] [<ffffffff81012263>] cpu_idle+0x103/0x120
[16942.067723] [<ffffffff816235ae>] rest_init+0x72/0x74
[16942.067728] [<ffffffff81cfbc0d>] start_kernel+0x3ba/0x3c7
[16942.067731] [<ffffffff81cfb388>] x86_64_start_reservations+0x132/0x136
[16942.067734] [<ffffffff81cfb140>] ? early_idt_handlers+0x140/0x140
[16942.067737] [<ffffffff81cfb459>] x86_64_start_kernel+0xcd/0xdc
[16942.067739] Code: df be cc 00 00 00 0f 85 48 01 00 00 40 f6 c7 02 0f 85 56 01 00 00 40 f6 c7 04 0f 85 1c 01 00 00 89 f1 31 c0 c1 e9 03 40 f6 c6 04 <f3> 48 ab 0f 85 f0 00 00 00 40 f6 c6 02 0f 85 ce 00 00 00 83 e6
[16942.301153] Call Trace:
[16942.301156] <IRQ> [<ffffffff81532424>] __netdev_alloc_skb+0x24/0x50
[16942.301168] [<ffffffffa0003861>] e1000_alloc_rx_buffers+0x291/0x4f0 [e1000]
[16942.301172] [<ffffffff813265c0>] ? map_single+0x60/0x60
[16942.301175] [<ffffffffa000287a>] e1000_clean_rx_irq+0x3ba/0x4d0 [e1000]
[16942.301178] [<ffffffffa0003561>] e1000_clean+0x51/0xc0 [e1000]
[16942.301181] [<ffffffff81540ad4>] net_rx_action+0x134/0x290
[16942.301186] [<ffffffff8106e928>] __do_softirq+0xa8/0x210
[16942.301189] [<ffffffff810334ee>] ? eoi_ioapic_irq.isra.23+0x5e/0x70
[16942.301193] [<ffffffff81666cec>] call_softirq+0x1c/0x30
[16942.301196] [<ffffffff81015305>] do_softirq+0x65/0xa0
[16942.301199] [<ffffffff8106ed0e>] irq_exit+0x8e/0xb0
[16942.301201] [<ffffffff816675a3>] do_IRQ+0x63/0xe0
[16942.301204] [<ffffffff8165c92e>] common_interrupt+0x6e/0x6e
[16942.301205] <EOI> [<ffffffff8108ed70>] ? hrtimer_try_to_cancel+0x50/0xc0
[16942.301211] [<ffffffff8109cdb6>] ? tick_nohz_restart_sched_tick+0x106/0x130
[16942.301214] [<ffffffff8109cdb2>] ? tick_nohz_restart_sched_tick+0x102/0x130
[16942.301216] [<ffffffff81012263>] cpu_idle+0x103/0x120
[16942.301220] [<ffffffff816235ae>] rest_init+0x72/0x74
[16942.301233] [<ffffffff81cfbc0d>] start_kernel+0x3ba/0x3c7
[16942.301236] [<ffffffff81cfb388>] x86_64_start_reservations+0x132/0x136
[16942.301239] [<ffffffff81cfb140>] ? early_idt_handlers+0x140/0x140
[16942.301242] [<ffffffff81cfb459>] x86_64_start_kernel+0xcd/0xdc
[17102.391716] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17102.391717] Tx Queue <0>
[17102.391718] TDH <eb>
[17102.391719] TDT <d3>
[17102.391719] next_to_use <d3>
[17102.391720] next_to_clean <e5>
[17102.391720] buffer_info[next_to_clean]
[17102.391721] time_stamp <100402f9f>
[17102.391722] next_to_watch <f8>
[17102.391722] jiffies <100403264>
[17102.391723] next_to_watch.status <0>
[17134.426481] INFO: task kworker/7:2:5025 blocked for more than 120 seconds.
[17134.509027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17134.603042] kworker/7:2 D 0000000000000007 0 5025 2 0x00000000
[17134.603047] ffff880413843af0 0000000000000046 0000000000000000 0000000000000000
[17134.603061] ffff880413843fd8 ffff880413843fd8 ffff880413843fd8 0000000000013780
[17134.603066] ffff880420e75bc0 ffff880416ecdbc0 0000000000000000 7fffffffffffffff
[17134.603070] Call Trace:
[17134.603080] [<ffffffff8165a55f>] schedule+0x3f/0x60
[17134.603084] [<ffffffff8165aba5>] schedule_timeout+0x2a5/0x320
[17134.603088] [<ffffffff8165a39f>] wait_for_common+0xdf/0x180
[17134.603094] [<ffffffff81055fdd>] ? set_next_entity+0xad/0xd0
[17134.603099] [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200
[17134.603102] [<ffffffff8165a51d>] wait_for_completion+0x1d/0x20
[17134.603108] [<ffffffff81084b53>] wait_on_cpu_work+0xd3/0xe0
[17134.603112] [<ffffffff810822d0>] ? do_work_for_cpu+0x30/0x30
[17134.603119] [<ffffffff81084c14>] wait_on_work+0xb4/0x120
[17134.603123] [<ffffffff81085643>] __cancel_work_timer+0x73/0x80
[17134.603127] [<ffffffff81085680>] cancel_work_sync+0x10/0x20
[17134.603136] [<ffffffffa0000875>] e1000_down_and_stop+0x25/0x50 [e1000]
[17134.603150] [<ffffffffa0004e4f>] e1000_down+0x14f/0x200 [e1000]
[17134.603155] [<ffffffffa0006ef0>] ? e1000_change_mtu+0x1c0/0x1c0 [e1000]
[17134.603160] [<ffffffffa0006f5e>] e1000_reset_task+0x6e/0x90 [e1000]
[17134.603162] [<ffffffff81084e2a>] process_one_work+0x11a/0x480
[17134.603165] [<ffffffff81085bd4>] worker_thread+0x164/0x370
[17134.603168] [<ffffffff81085a70>] ? manage_workers.isra.29+0x130/0x130
[17134.603170] [<ffffffff8108a42c>] kthread+0x8c/0xa0
[17134.603173] [<ffffffff81666bf4>] kernel_thread_helper+0x4/0x10
[17134.603176] [<ffffffff8108a3a0>] ? flush_kthread_worker+0xa0/0xa0
[17134.603178] [<ffffffff81666bf0>] ? gs_change+0x13/0x13
[17134.603184] INFO: task kworker/7:1:5691 blocked for more than 120 seconds.
[17134.685722] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17134.779732] kworker/7:1 D 0000000000000007 0 5691 2 0x00000000
[17134.779738] ffff8801d0e29d10 0000000000000046 0000000000000000 0000000000000000
[17134.779745] ffff8801d0e29fd8 ffff8801d0e29fd8 ffff8801d0e29fd8 0000000000013780
[17134.779750] ffff880425775bc0 ffff88020f8d96f0 ffff880425794600 ffff880421205040
[17134.779754] Call Trace:
[17134.779762] [<ffffffff8165a55f>] schedule+0x3f/0x60
[17134.779764] [<ffffffff8165b367>] __mutex_lock_slowpath+0xd7/0x150
[17134.779767] [<ffffffff8165af7a>] mutex_lock+0x2a/0x50
[17134.779786] [<ffffffffa0005d31>] e1000_watchdog+0x71/0x3d0 [e1000]
[17134.779792] [<ffffffffa0005cc0>] ? e1000_update_stats+0x970/0x970 [e1000]
[17134.779798] [<ffffffff81084e2a>] process_one_work+0x11a/0x480
[17134.779802] [<ffffffff81085bd4>] worker_thread+0x164/0x370
[17134.779808] [<ffffffff81085a70>] ? manage_workers.isra.29+0x130/0x130
[17134.779812] [<ffffffff8108a42c>] kthread+0x8c/0xa0
[17134.779815] [<ffffffff81666bf4>] kernel_thread_helper+0x4/0x10
[17134.779817] [<ffffffff8108a3a0>] ? flush_kthread_worker+0xa0/0xa0
[17134.779822] [<ffffffff81666bf0>] ? gs_change+0x13/0x13
[17146.285107] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17146.285108] Tx Queue <0>
[17146.285109] TDH <ed>
[17146.285109] TDT <37>
[17146.285110] next_to_use <37>
[17146.285111] next_to_clean <eb>
[17146.285111] buffer_info[next_to_clean]
[17146.285112] time_stamp <100405c55>
[17146.285112] next_to_watch <ee>
[17146.285113] jiffies <100405d53>
[17146.285113] next_to_watch.status <0>
[17148.215140] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17148.215141] Tx Queue <0>
[17148.215142] TDH <a>
[17148.215142] TDT <f2>
[17148.215143] next_to_use <f2>
[17148.215143] next_to_clean <5>
[17148.215144] buffer_info[next_to_clean]
[17148.215144] time_stamp <100405cb2>
[17148.215145] next_to_watch <10>
[17148.215145] jiffies <100405f36>
[17148.215146] next_to_watch.status <0>
[17223.041709] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17223.041710] Tx Queue <0>
[17223.041711] TDH <bd>
[17223.041711] TDT <20>
[17223.041712] next_to_use <20>
[17223.041712] next_to_clean <b4>
[17223.041713] buffer_info[next_to_clean]
[17223.041713] time_stamp <10040a73e>
[17223.041714] next_to_watch <be>
[17223.041714] jiffies <10040a866>
[17223.041715] next_to_watch.status <0>
[17231.053339] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17231.053340] Tx Queue <0>
[17231.053341] TDH <f5>
[17231.053341] TDT <75>
[17231.053342] next_to_use <75>
[17231.053342] next_to_clean <f1>
[17231.053343] buffer_info[next_to_clean]
[17231.053343] time_stamp <10040af0d>
[17231.053344] next_to_watch <f9>
[17231.053344] jiffies <10040b03b>
[17231.053345] next_to_watch.status <0>
[17254.591185] INFO: task kworker/7:2:5025 blocked for more than 120 seconds.
[17254.673724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17254.767714] kworker/7:2 D 0000000000000007 0 5025 2 0x00000000

INTERRUPT

root@ubuntu ~# cat /proc/interrupts 100% 243KB 242.8KB/s 00:00
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
> 0: 87 0 0 0 0 0 0 0 IR-IO-APIC-edge timer
> 4: 17735 0 14263 0 30126 0 0 0 IR-IO-APIC-edge serial
> 8: 1 0 0 0 0 0 0 0 IR-IO-APIC-edge rtc0
> 9: 0 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi acpi
> 16: 16604523 0 0 12120410 0 0 0 0 IR-IO-APIC-fasteoi ahci, ahci, ahci, eth1, eth4
> 17: 1678725 0 0 8996499 0 0 0 0 IR-IO-APIC-fasteoi ahci, ahci, ahci, eth0, eth6
> 18: 45881173 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi ahci, ahci, eth3, eth7
> 19: 59172483 0 0 20836 0 0 0 0 IR-IO-APIC-fasteoi eth2, eth5
> 40: 0 0 0 0 0 0 0 0 DMAR_MSI-edge dmar0
> 41: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 42: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 43: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 44: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 45: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 46: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 47: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 48: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> NMI: 5647 157 1546 1659 1984 356 997 572 Non-maskable interrupts
> LOC: 964914 979936 1409782 582208 1756677 1311067 2095033 1656838 Local timer interrupts
> SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
> PMI: 5647 157 1546 1659 1984 356 997 572 Performance monitoring interrupts
> IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts
> RES: 147237 712096 22528993 144356 10341760 488304 3966861 554994 Rescheduling interrupts
> CAL: 174 221 212 149 223 212 205 218 Function call interrupts
> TLB: 153 210 252 1558 299 517 361 682 TLB shootdowns
> TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
> THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
> MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
> MCP: 45 45 45 45 45 45 45 45 Machine check polls

Tags: bot-comment
Revision history for this message
vikrant soman (vikrant-soman) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1183135/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu:
status: New → Confirmed
information type: Public → Private
information type: Private → Public
Changed in ubuntu:
assignee: nobody → Samantha Jian-Pielak (samantha-jian)
Changed in ubuntu:
assignee: Samantha Jian-Pielak (samantha-jian) → nobody
Revision history for this message
Samantha Jian-Pielak (samantha-jian) wrote :

Comment from Vikrant:
With Ubuntu 12.04, we see TX hang and exceptions on running network traffic.

This is seen particularly on Intel platforms.

The issue seems to arise because of the fact that irqbalance does not balance the interrupts evenly across all available cores. Ubuntu 12.04 uses 0.56-1ubuntu4 irqbalance package.

As a workaround, we tried to use the irqbalance package from an older version of Ubuntu, namely 0.55+20091017-3ubuntu2, and it worked very well. We see the full expected performance and there are no TX hangs or exceptions. The interrupts seem to be spread evenly across all cores.

Changed in ubuntu:
assignee: nobody → Samantha Jian-Pielak (samantha-jian)
affects: ubuntu → irqbalance (Ubuntu)
Changed in irqbalance (Ubuntu):
assignee: Samantha Jian-Pielak (samantha-jian) → nobody
Revision history for this message
Samantha Jian-Pielak (samantha-jian) wrote :

NIC: Intel 82540EM Gigabit Ethernet Controller

e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI (from attached dmesg)

Revision history for this message
Samantha Jian-Pielak (samantha-jian) wrote :

Hi Vikrant,

Please let us know if you can reproduce the issue with irqbalance 1.0.3-1ubuntu2 in the 12.10 (quantal) repo.

Thanks,
Samantha

Revision history for this message
vikrant soman (vikrant-soman) wrote :

Ran the network test for over 72 hours with the latest Ubuntu irqbalance package v 1.0.3 (quantal)and no tx unit hang was seen. Line rate was obtained on all interfaces.

Revision history for this message
Scott Moser (smoser) wrote :

Thanks vikrant.
  So it appears that the bug is only in 12.04 released version of irqbalance (mentioned fixed in 0.55+20091017-3ubuntu2 of oneiric and 1.0.3-1ubuntu2 of quantal).

Changed in irqbalance (Ubuntu Quantal):
status: New → Fix Released
importance: Undecided → Medium
Changed in irqbalance (Ubuntu Precise):
status: New → Confirmed
importance: Undecided → High
Changed in irqbalance (Ubuntu):
status: Confirmed → Fix Released
importance: Undecided → High
Revision history for this message
Samantha Jian-Pielak (samantha-jian) wrote :

@Scott,

What would be the next step to fix the irqbalance in 12.04?

Samantha

Revision history for this message
David Duffey (dduffey) wrote :

I'm assuming this bug is in the irqbalance, and not the kernel?

But have we tried one of the newer point releases (like 12.04.2 LTS) which is based on the Q HWE kernel?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi,

In order to figure out what broke or fixed this, it would be great
if we could bisect. Unfortunately the irqbalance svn and git repos
don't seem to have any version numbers prior to 1.0.0, so it's a
bit tough to tell which versions we should be checking. Fortunately,
the package seems to build quite easily from git. Unfortunately
(again), I don't have any hardware I would be able to reproduce this
on. So I will place here the basic recipe to build a package from
a particular git version.

I would recommend trying the following to versions as baselines:
commit 0b917bd98c684878b2730f3a57e15d7c6149daff (Sep 24 16:26:44 2010)
commit d7ef8fca9edf24a543f88073a01edb476c295256 (Fri Dec 2 11:06:14 2011 -0500)

Hopefully the first will reproduce the bug, and the second not. If
that is the case, then you can proceed with a bisect between those
versions. (Please ask here if you need more instructions on that)
There are only 85 commits so that should take about 8 steps to
bisect.

You will need to grab a irqbalance git tree, and grab the packaging
tree.

 git clone https://code.google.com/p/irqbalance
 pull-lp-source irqbalance quantal

Now, clone a new git tree in which you'll do the build:

 git clone irqbalance build
 cd build

Go into the build directory, and checkout the version to test:

 git checkout -b test
 git reset --hard 0b917bd98c684878b2730f3a57e15d7c6149daff

Run autoconf:
 sh autogen.sh

Copy the packaging,

 cp -a ../irqbalance-1.0.3/debian .

This is optional, but you may want to edit the changelog to give yourself an
easily recognizable version. Run

 dch -i

which pulls the changelog up in your $EDITOR, with the first lines
looking like:

> irqbalance (1.0.3-1ubuntu3) UNRELEASED; urgency=low
>
> *
>
> -- Serge Hallyn <email address hidden> Tue, 24 Sep 2013 15:00:23 -0500

Edit the version number and release, and add a message, so that it looks
like:

> irqbalance (1.0.3-1ubuntu3bisect1) precise; urgency=low
>
> * Build from commit 0b917bd98c684878b2730f3a57e15d7c6149daff
>
> -- Serge Hallyn <email address hidden> Tue, 24 Sep 2013 15:00:23 -0500

Save that and exit the editor.

Finally build the package:

 debian/rules build && fakeroot debian/rules binary

Now you can install the packages and test:

 cd ..
 dpkg -i irqbalance_1.0.3-1ubuntu3bisect1_amd64.deb

Revision history for this message
Andreas Otto (andreas-otto) wrote :

Will a fixed version for Ubuntu 12.04 LTS be available?

Revision history for this message
Andreas Otto (andreas-otto) wrote :

Looks like that Ubuntu 10.04 LTS is also affected.

Revision history for this message
Steve Langasek (vorlon) wrote :

The Precise Pangolin has reached end of life, so this bug will not be fixed for that release

Changed in irqbalance (Ubuntu Precise):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.