I've tried upgrading to the latest development kernel, from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc1/, and re-testing. The details of the problem have changed (but they were never 100% consistent), but the problem definitely still exists. I'm attaching dmesg output from three runs:
* run1.txt -- In this run, the cpu_offlining script successfully shut
down all CPU nodes (except node 0, of course), but when bringing
them up again, the system segfaulted after bringing up several
nodes. Thereafter, any remotely substantive command (top or
shutdown, for instance) hung, although bash remained responsive
and I could take file listings with ls.
* run2.txt -- In this run, the cpu_offlining script segfaulted
when taking CPU nodes offline. The system then became unreliable
in the same way as with run 1.
* run3.txt -- In this run, the script seemed to complete successfully,
but the dmesg output includes errors associated with bringing up
several nodes. The system SEEMED TO operate normally thereafter,
but my testing was limited.
I've tried upgrading to the latest development kernel, from http:// kernel. ubuntu. com/~kernel- ppa/mainline/ v4.15-rc1/, and re-testing. The details of the problem have changed (but they were never 100% consistent), but the problem definitely still exists. I'm attaching dmesg output from three runs:
* run1.txt -- In this run, the cpu_offlining script successfully shut
down all CPU nodes (except node 0, of course), but when bringing
them up again, the system segfaulted after bringing up several
nodes. Thereafter, any remotely substantive command (top or
shutdown, for instance) hung, although bash remained responsive
and I could take file listings with ls.
* run2.txt -- In this run, the cpu_offlining script segfaulted
when taking CPU nodes offline. The system then became unreliable
in the same way as with run 1.
* run3.txt -- In this run, the script seemed to complete successfully,
but the dmesg output includes errors associated with bringing up
several nodes. The system SEEMED TO operate normally thereafter,
but my testing was limited.