Comment 8 for bug 1234718

Revision history for this message
Kim Phillips (kim-phillips) wrote :

note: I'm not experiencing the udhcpc problem on my local board, but I am in the LAVA lab.

I traced the kernel where qemu was hanging, occupying 99% cpu to find __get_user_pages_fast was being called:

##### CPU 0 buffer started ####
 qemu-system-arm-1743 [000] ....21. 99.008416: unpin_current_cpu <-migrate_enable
 qemu-system-arm-1743 [000] ....1.. 99.008417: handle_exit <-kvm_arch_vcpu_ioctl_run
 qemu-system-arm-1743 [000] ....1.. 99.008417: kvm_condition_valid <-handle_exit
 qemu-system-arm-1743 [000] ....1.. 99.008418: kvm_handle_guest_abort <-handle_exit
 qemu-system-arm-1743 [000] ....1.. 99.008418: __srcu_read_lock <-kvm_handle_guest_abort
 qemu-system-arm-1743 [000] ....1.. 99.008419: kvm_is_visible_gfn <-kvm_handle_guest_abort
 qemu-system-arm-1743 [000] ....1.. 99.008420: gfn_to_memslot <-kvm_handle_guest_abort
 qemu-system-arm-1743 [000] ....1.. 99.008421: gfn_to_hva <-kvm_handle_guest_abort
 qemu-system-arm-1743 [000] ....1.. 99.008421: __gfn_to_hva_many <-gfn_to_hva
 qemu-system-arm-1743 [000] ....1.. 99.008422: rt_down_read <-kvm_handle_guest_abort
 qemu-system-arm-1743 [000] ....1.. 99.008422: __rt_down_read <-kvm_handle_guest_abort
 qemu-system-arm-1743 [000] ....1.. 99.008423: rt_mutex_lock <-__rt_down_read
 qemu-system-arm-1743 [000] ....1.. 99.008424: rt_mutex_slowlock <-__rt_down_read
 qemu-system-arm-1743 [000] ....1.. 99.008424: _raw_spin_lock <-rt_mutex_slowlock
 qemu-system-arm-1743 [000] ....2.. 99.008425: __try_to_take_rt_mutex <-rt_mutex_slowlock
 qemu-system-arm-1743 [000] ....2.. 99.008426: _raw_spin_unlock <-rt_mutex_slowlock
 qemu-system-arm-1743 [000] ....1.. 99.008427: find_vma <-kvm_handle_guest_abort
 qemu-system-arm-1743 [000] ....1.. 99.008428: rt_up_read <-kvm_handle_guest_abort
 qemu-system-arm-1743 [000] ....1.. 99.008428: rt_mutex_unlock <-kvm_handle_guest_abort
 qemu-system-arm-1743 [000] ....1.. 99.008429: _raw_spin_lock <-rt_mutex_unlock
 qemu-system-arm-1743 [000] ....2.. 99.008430: _raw_spin_unlock <-kvm_handle_guest_abort
 qemu-system-arm-1743 [000] ....1.. 99.008431: gfn_to_pfn_prot <-kvm_handle_guest_abort
 qemu-system-arm-1743 [000] ....1.. 99.008431: __gfn_to_pfn <-gfn_to_pfn_prot
 qemu-system-arm-1743 [000] ....1.. 99.008432: __gfn_to_pfn_memslot <-gfn_to_pfn_prot
 qemu-system-arm-1743 [000] ....1.. 99.008432: __gfn_to_hva_many <-__gfn_to_pfn_memslot
 qemu-system-arm-1743 [000] ....1.. 99.008433: get_user_pages_fast <-__gfn_to_pfn_memslot
 qemu-system-arm-1743 [000] ....1.. 99.008434: rt_down_read <-get_user_pages_fast
 qemu-system-arm-1743 [000] ....1.. 99.008434: __rt_down_read <-get_user_pages_fast
 qemu-system-arm-1743 [000] ....1.. 99.008435: rt_mutex_lock <-__rt_down_read
 qemu-system-arm-1743 [000] ....1.. 99.008436: rt_mutex_slowlock <-__rt_down_read
 qemu-system-arm-1743 [000] ....1.. 99.008436: _raw_spin_lock <-rt_mutex_slowlock
 qemu-system-arm-1743 [000] ....2.. 99.008437: __try_to_take_rt_mutex <-rt_mutex_slowlock
 qemu-system-arm-1743 [000] ....2.. 99.008438: _raw_spin_unlock <-rt_mutex_slowlock
 qemu-system-arm-1743 [000] ....1.. 99.008439: get_user_pages <-get_user_pages_fast
 qemu-system-arm-1743 [000] ....1.. 99.008439: __get_user_pages <-get_user_pages_fast
 qemu-system-arm-1743 [000] ....1.. 99.008440: find_extend_vma <-__get_user_pages
 qemu-system-arm-1743 [000] ....1.. 99.008441: find_vma <-find_extend_vma
 qemu-system-arm-1743 [000] ....1.. 99.008441: _cond_resched <-__get_user_pages
 qemu-system-arm-1743 [000] ....1.. 99.008442: follow_page_mask <-__get_user_pages
 qemu-system-arm-1743 [000] ....1.. 99.008442: follow_huge_addr <-follow_page_mask
 qemu-system-arm-1743 [000] ....1.. 99.008443: pud_huge <-follow_page_mask
 qemu-system-arm-1743 [000] ....1.. 99.008444: pmd_huge <-follow_page_mask
 qemu-system-arm-1743 [000] ....1.. 99.008444: migrate_disable <-follow_page_mask
 qemu-system-arm-1743 [000] ....21. 99.008445: pin_current_cpu <-migrate_disable
 qemu-system-arm-1743 [000] ....111 99.008445: rt_spin_lock <-follow_page_mask
 qemu-system-arm-1743 [000] ....111 99.008446: rt_spin_lock_slowlock <-rt_spin_lock
 qemu-system-arm-1743 [000] ....111 99.008447: _raw_spin_lock <-rt_spin_lock_slowlock
 qemu-system-arm-1743 [000] ....211 99.008448: __try_to_take_rt_mutex <-rt_spin_lock_slowlock
 qemu-system-arm-1743 [000] ....211 99.008448: _raw_spin_unlock <-rt_spin_lock_slowlock
 qemu-system-arm-1743 [000] ....111 99.008449: follow_trans_huge_pmd <-follow_page_mask
 qemu-system-arm-1743 [000] ....111 99.008450: rt_spin_unlock <-follow_page_mask
 qemu-system-arm-1743 [000] ....111 99.008451: rt_spin_lock_slowunlock <-follow_page_mask
 qemu-system-arm-1743 [000] ....111 99.008451: _raw_spin_lock <-rt_spin_lock_slowunlock
 qemu-system-arm-1743 [000] ....211 99.008452: _raw_spin_unlock <-follow_page_mask
 qemu-system-arm-1743 [000] ....111 99.008453: migrate_enable <-follow_page_mask
 qemu-system-arm-1743 [000] ....21. 99.008454: unpin_current_cpu <-migrate_enable
 qemu-system-arm-1743 [000] ....1.. 99.008454: flush_dcache_page <-__get_user_pages
 qemu-system-arm-1743 [000] ....1.. 99.008455: page_mapping <-flush_dcache_page
 qemu-system-arm-1743 [000] ....1.. 99.008455: __flush_dcache_page <-flush_dcache_page
 qemu-system-arm-1743 [000] ....1.. 99.008456: kmap_atomic <-__flush_dcache_page
 qemu-system-arm-1743 [000] ....1.. 99.008457: pagefault_disable <-kmap_atomic
 qemu-system-arm-1743 [000] ....1.. 99.008457: migrate_disable <-pagefault_disable
 qemu-system-arm-1743 [000] ....21. 99.008458: pin_current_cpu <-migrate_disable

I recalled Zi had posted a related patch: [RFC] ARM: lockless get_user_pages_fast() [1], which I tested and was able to boot a guest OS.

So the LNG kernel *might* be able to use Zi's RFC as a workaround for now (I don't have a good sense of what the responses to the RFC mean in terms of stability).

Anyway, Zi has since added bug 1236949 to track the proper upstream friendly version of the lockless GUP for ARM implementation.

[1] https://groups.google.com/a/linaro.org/forum/#!topic/linaro-networking/qr1-D6EaTKw