Comment 0 for bug 1744173

Change flush from congruence-first with dependencies to linear with
no dependencies, which increases flush performance by 8x on P8, and
3x on P9 (as measured with null syscall loop, which will have the
flush area in the L2).

The flush also becomes simpler and more adaptable to different cache
geometries.