page allocation failure on machines under heavy network load
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-ec2 (Ubuntu) |
Invalid
|
Low
|
Unassigned |
Bug Description
Tuning option:
The error messages seen below indicate failure to allocate a 32K chunk of memory (order 5 allocation = 2^5K), the valid function calls in the stack trace lead to the network stack. The problem mainly is that the virtual network devices allocate and free buffers quicker than the vm subsystem can keep up with. Increasing the amount of memory tried to keep free for allocations (/proc/
---
This error occurs in many processes running on our web servers: apache2, glusterfsd, kswapd0, rsyslogd, swapper and zabbix_agentd today.
Sep 27 07:39:19 web3 kernel: [2909159.608929] __ratelimit: 35 callbacks suppressed
Sep 27 07:39:19 web3 kernel: [2909159.608933] swapper: page allocation failure. order:5, mode:0x20
Sep 27 07:39:19 web3 kernel: [2909159.608937] Pid: 0, comm: swapper Not tainted 2.6.32-305-ec2 #9-Ubuntu
Sep 27 07:39:19 web3 kernel: [2909159.608939] Call Trace:
Sep 27 07:39:19 web3 kernel: [2909159.608948] [<c052840d>] ? printk+0x18/0x1b
Sep 27 07:39:19 web3 kernel: [2909159.608960] [<c019be6e>] __alloc_
Sep 27 07:39:19 web3 kernel: [2909159.608963] [<c019bfed>] __alloc_
Sep 27 07:39:19 web3 kernel: [2909159.608968] [<c01c4e65>] T.685+0xe5/0x3a0
Sep 27 07:39:19 web3 kernel: [2909159.608971] [<c01c531c>] cache_alloc_
Sep 27 07:39:19 web3 kernel: [2909159.608975] [<c01c54c6>] __kmalloc+
Sep 27 07:39:19 web3 kernel: [2909159.608979] [<c048d9e3>] pskb_expand_
Sep 27 07:39:19 web3 kernel: [2909159.608982] [<c048f50c>] __pskb_
Sep 27 07:39:19 web3 kernel: [2909159.608987] [<c04b66d1>] ? nf_iterate+
Sep 27 07:39:19 web3 kernel: [2909159.608990] [<c04bffd0>] ? dst_output+0x0/0x10
Sep 27 07:39:19 web3 kernel: [2909159.608993] [<c049a743>] dev_queue_
Sep 27 07:39:19 web3 kernel: [2909159.608996] [<c04b6829>] ? nf_hook_
Sep 27 07:39:19 web3 kernel: [2909159.609000] [<c04c27df>] ip_finish_
Sep 27 07:39:19 web3 kernel: [2909159.609003] [<c04c2a0f>] ip_output+0x9f/0xb0
Sep 27 07:39:19 web3 kernel: [2909159.609006] [<c04bffd0>] ? dst_output+0x0/0x10
Sep 27 07:39:19 web3 kernel: [2909159.609008] [<c04c1bc8>] ip_local_
Sep 27 07:39:19 web3 kernel: [2909159.609011] [<c04c22e7>] ip_queue_
Sep 27 07:39:19 web3 kernel: [2909159.609016] [<c012a0dc>] ? enqueue_
Sep 27 07:39:19 web3 kernel: [2909159.609022] [<c01400d7>] ? lock_timer_
Sep 27 07:39:19 web3 kernel: [2909159.609027] [<c04d4935>] tcp_transmit_
Sep 27 07:39:19 web3 kernel: [2909159.609030] [<c04d252f>] ? tcp_clean_
Sep 27 07:39:19 web3 kernel: [2909159.609033] [<c04d6d47>] tcp_write_
Sep 27 07:39:19 web3 kernel: [2909159.609037] [<c04d7421>] __tcp_push_
Sep 27 07:39:19 web3 kernel: [2909159.609040] [<c04d0166>] tcp_data_
Sep 27 07:39:19 web3 kernel: [2909159.609043] [<c04d354b>] tcp_rcv_
Sep 27 07:39:19 web3 kernel: [2909159.609046] [<c04da44d>] tcp_v4_
Sep 27 07:39:19 web3 kernel: [2909159.609049] [<c04db1fe>] tcp_v4_
Sep 27 07:39:19 web3 kernel: [2909159.609051] [<c04b66d1>] ? nf_iterate+
Sep 27 07:39:19 web3 kernel: [2909159.609054] [<c04b6829>] ? nf_hook_
Sep 27 07:39:19 web3 kernel: [2909159.609058] [<c04bda87>] ip_local_
Sep 27 07:39:19 web3 kernel: [2909159.609061] [<c04bdcd7>] ip_local_
Sep 27 07:39:19 web3 kernel: [2909159.609064] [<c04bd9e0>] ? ip_local_
Sep 27 07:39:19 web3 kernel: [2909159.609067] [<c04bd33b>] ip_rcv_
Sep 27 07:39:19 web3 kernel: [2909159.609071] [<c050ddb8>] ? packet_
Sep 27 07:39:19 web3 kernel: [2909159.609074] [<c04bd7ee>] ip_rcv+0x21e/0x2e0
Sep 27 07:39:19 web3 kernel: [2909159.609077] [<c050ddb8>] ? packet_
Sep 27 07:39:19 web3 kernel: [2909159.609080] [<c049901f>] netif_receive_
Sep 27 07:39:19 web3 kernel: [2909159.609084] [<c0453fae>] netif_poll+
Sep 27 07:39:19 web3 kernel: [2909159.609088] [<c0499a5a>] net_rx_
Sep 27 07:39:19 web3 kernel: [2909159.609091] [<c04d82a0>] ? tcp_write_
Sep 27 07:39:19 web3 kernel: [2909159.609095] [<c01387fb>] __do_softirq+
Sep 27 07:39:19 web3 kernel: [2909159.609099] [<c01775f7>] ? handle_
Sep 27 07:39:19 web3 kernel: [2909159.609102] [<c01775c4>] ? handle_
Sep 27 07:39:19 web3 kernel: [2909159.609105] [<c01389b5>] do_softirq+
Sep 27 07:39:19 web3 kernel: [2909159.609108] [<c0138acd>] irq_exit+0x2d/0x40
Sep 27 07:39:19 web3 kernel: [2909159.609111] [<c0443f55>] evtchn_
Sep 27 07:39:19 web3 kernel: [2909159.609114] [<c0136ec8>] ? ns_to_timespec+
Sep 27 07:39:19 web3 kernel: [2909159.609118] [<c0104a06>] hypervisor_
Sep 27 07:39:19 web3 kernel: [2909159.609121] [<c0106735>] ? xen_safe_
Sep 27 07:39:19 web3 kernel: [2909159.609125] [<c0109e09>] xen_idle+0x29/0x80
Sep 27 07:39:19 web3 kernel: [2909159.609127] [<c01034af>] cpu_idle+0x8f/0xc0
Sep 27 07:39:19 web3 kernel: [2909159.609131] [<c051c793>] rest_init+0x53/0x60
Sep 27 07:39:19 web3 kernel: [2909159.609136] [<c06b5c28>] start_kernel+
Sep 27 07:39:19 web3 kernel: [2909159.609139] [<c06b56d9>] ? unknown_
Sep 27 07:39:19 web3 kernel: [2909159.609144] [<c06b5067>] i386_start_
Sep 27 07:39:19 web3 kernel: [2909159.609150] Mem-Info:
Sep 27 07:39:19 web3 kernel: [2909159.609152] DMA per-cpu:
Sep 27 07:39:19 web3 kernel: [2909159.609154] CPU 0: hi: 0, btch: 1 usd: 0
Sep 27 07:39:19 web3 kernel: [2909159.609156] CPU 1: hi: 0, btch: 1 usd: 0
Sep 27 07:39:19 web3 kernel: [2909159.609157] Normal per-cpu:
Sep 27 07:39:19 web3 kernel: [2909159.609159] CPU 0: hi: 155, btch: 38 usd: 34
Sep 27 07:39:19 web3 kernel: [2909159.609161] CPU 1: hi: 155, btch: 38 usd: 125
Sep 27 07:39:19 web3 kernel: [2909159.609162] HighMem per-cpu:
Sep 27 07:39:19 web3 kernel: [2909159.609164] CPU 0: hi: 155, btch: 38 usd: 27
Sep 27 07:39:19 web3 kernel: [2909159.609166] CPU 1: hi: 155, btch: 38 usd: 11
Sep 27 07:39:19 web3 kernel: [2909159.609170] active_anon:130673 inactive_anon:43799 isolated_anon:48
Sep 27 07:39:19 web3 kernel: [2909159.609171] active_file:69994 inactive_
Sep 27 07:39:19 web3 kernel: [2909159.609172] unevictable:8 dirty:36 writeback:0 unstable:0
Sep 27 07:39:19 web3 kernel: [2909159.609173] free:46836 slab_reclaimabl
Sep 27 07:39:19 web3 kernel: [2909159.609174] mapped:12605 shmem:7770 pagetables:0 bounce:0
Sep 27 07:39:19 web3 kernel: [2909159.609180] DMA free:2880kB min:76kB low:92kB high:112kB active_anon:0kB inactive_anon:0kB active_file:5064kB inactive_file:112kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16256kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimabl
Sep 27 07:39:19 web3 kernel: [2909159.609184] lowmem_reserve[]: 0 696 1710 1710
Sep 27 07:39:19 web3 kernel: [2909159.609191] Normal free:144196kB min:3336kB low:4168kB high:5004kB active_anon:45808kB inactive_
Sep 27 07:39:19 web3 kernel: [2909159.609196] lowmem_reserve[]: 0 0 8111 8111
Sep 27 07:39:19 web3 kernel: [2909159.609203] HighMem free:40268kB min:512kB low:1724kB high:2940kB active_
Sep 27 07:39:19 web3 kernel: [2909159.609207] lowmem_reserve[]: 0 0 0 0
Sep 27 07:39:19 web3 kernel: [2909159.609211] DMA: 458*4kB 1*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2880kB
Sep 27 07:39:19 web3 kernel: [2909159.609219] Normal: 16083*4kB 5465*8kB 1009*16kB 457*32kB 82*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 144196kB
Sep 27 07:39:19 web3 kernel: [2909159.609227] HighMem: 211*4kB 88*8kB 1554*16kB 407*32kB 7*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 40268kB
Sep 27 07:39:19 web3 kernel: [2909159.609236] 194330 total pagecache pages
Sep 27 07:39:19 web3 kernel: [2909159.609238] 2520 pages in swap cache
Sep 27 07:39:19 web3 kernel: [2909159.609240] Swap cache stats: add 2222885, delete 2220365, find 14893866/14933489
Sep 27 07:39:19 web3 kernel: [2909159.609242] Free swap = 856824kB
Sep 27 07:39:19 web3 kernel: [2909159.609243] Total swap = 917496kB
Sep 27 07:39:19 web3 kernel: [2909159.615308] 447488 pages RAM
Sep 27 07:39:19 web3 kernel: [2909159.615311] 263682 pages HighMem
Sep 27 07:39:19 web3 kernel: [2909159.615312] 10446 pages reserved
Sep 27 07:39:19 web3 kernel: [2909159.615313] 244645 pages shared
Sep 27 07:39:19 web3 kernel: [2909159.615315] 279978 pages non-shared
I can't tell for certain if these errors cause any problem, but we are seeing performance issues on these machines and I want to clear this up before digging too much further.
Similar errors have been reported online related to network drivers (iwlan/madwifi), and connection tracking, none of which should be relevant based on my backtraces.
Also reported relating to EC2: http://
According to this thread, similar issues have been fixed in 2.6.35: http://
Ubuntu 10.04
AMI: ami-a94d67dd
Manifest: 099720109477/
AKI: aki-c34d67b7
This particular instance is a c1.medium.
crb@web3:~$ uname -a
Linux web3 2.6.32-305-ec2 #9-Ubuntu SMP Thu Apr 15 04:14:01 UTC 2010 i686 GNU/Linux
summary: |
- page allocaiton failure on machines under heavy network load + page allocation failure on machines under heavy network load |
tags: | added: ec2-images |
Changed in linux-ec2 (Ubuntu): | |
importance: | Undecided → Low |
status: | New → Triaged |
@Craig, are you able to reproduce this easily ?
I've copied John also, a kernel engineer who handles ubuntu kernels on ec2.