The garbage collector becomes very slow as the number of threads grows. The problem is in scavenge_newspace_generation (in gencgc.c). It invokes gc_alloc_update_all_page_tables in loop over sets of new areas. gc_alloc_update_all_page_tables, in its turn, iterates over all threads.
There seems to be an easy fix to the problem: GC is single-threaded and all memory allocations during a collection happen in the GC thread, so it is sufficient to update all the the page tables once at the beginning of a collection and update only page tables of the GC thread during the collection.