numad sched_setaffinity bug

Bug #1839071 reported by Ryo Hayashi on 2019-08-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
numad (Ubuntu)
Undecided
Unassigned

Bug Description

Description of problem:
When node, a node_data_p instance inside numad.c, has index different from actual node id, numad sets to process wrong core affinities.

For example, here are few lines from output of /var/log/numad.log
Wed Jul 24 14:22:46 2019: Nodes: 4
Min CPUs free: 888, Max CPUs: 958, Avg CPUs: 932, StdDev: 28.0624
Min MBs free: 0, Max MBs: 14045, Avg MBs: 6901, StdDev: 6903.86
Node 0: MBs_total 16098, MBs_free 14045, CPUs_total 960, CPUs_free 958, Distance: 16 16 10 16 CPUs: 8-15,40-47
Node 1: MBs_total 15991, MBs_free 13562, CPUs_total 960, CPUs_free 929, Distance: 10 16 16 16 CPUs: 0-7,32-39
Node 2: MBs_total 0, MBs_free 0, CPUs_total 960, CPUs_free 888, Distance: 16 16 16 10 CPUs: 24-31,56-63
Node 3: MBs_total 0, MBs_free 0, CPUs_total 960, CPUs_free 955, Distance: 16 10 16 16 CPUs: 16-23,48-55
Wed Jul 24 14:22:46 2019: Processes: 857
Wed Jul 24 14:22:46 2019: Candidates: 1
310038: PID 38334: (canneal), Threads 2, MBs_size 934, MBs_used 850, CPUs_used 99, Magnitude 84150, Nodes: 0,2
Wed Jul 24 14:22:46 2019: PICK NODES FOR: PID: 38334, CPUs 116, MBs 1000
Wed Jul 24 14:22:46 2019: PROCESS_MBs[0]: 847
Wed Jul 24 14:22:46 2019: PROCESS_MBs[2]: 3
Wed Jul 24 14:22:46 2019: Node[0]: mem: 66412 cpu: 1540
Wed Jul 24 14:22:46 2019: Node[1]: mem: 60248 cpu: 1509
Wed Jul 24 14:22:46 2019: Node[2]: mem: 0 cpu: 1540
Wed Jul 24 14:22:46 2019: Node[3]: mem: 0 cpu: 1535
Wed Jul 24 14:22:46 2019: Totmag[0]: 909142
Wed Jul 24 14:22:46 2019: Totmag[1]: 0
Wed Jul 24 14:22:46 2019: Totmag[2]: 1022744
Wed Jul 24 14:22:46 2019: Totmag[3]: 0
Wed Jul 24 14:22:46 2019: best_node_ix: 2
Wed Jul 24 14:22:46 2019: Node: 2 Dist: 10 Magnitude: 102274480
Wed Jul 24 14:22:46 2019: Node: 0 Dist: 16 Magnitude: 90914232
Wed Jul 24 14:22:46 2019: Node: 3 Dist: 16 Magnitude: 0
Wed Jul 24 14:22:46 2019: Node: 1 Dist: 16 Magnitude: 0
Wed Jul 24 14:22:46 2019: MBs: 1000, CPUs: 116
Wed Jul 24 14:22:46 2019: Assigning resources from node 0
Wed Jul 24 14:22:46 2019: Node[0]: mem: 56412 cpu: 844
Wed Jul 24 14:22:46 2019: Advising pid 38334 (canneal) move from nodes (0,2) to nodes (2)
Wed Jul 24 14:22:46 2019: Moving memory from node: 0 to node 2
Wed Jul 24 14:22:47 2019: PID 38334 moved to node(s) 2 in 0.91 seconds

Looking at the log file, PID 38334's core affinity is expected to be set to node 2's cores. But the output of taskset -c -p 38334 showed that its core affinity is set to cores of node 3.
Threadripper 2990WX, on which I am running the application, is unique in core topology. The memory is connected only to node 0&2. Node 1 and 3 is not directly connected to memory, which results in slow memory access compared to node 0&2.
numad actually hurts the performance of memory intentive application on my machine because of this bug.

Version-Release number of selected component (if applicable):
Ubuntu 18.04.2 LTS
numad 0.5+20150602-5

How reproducible:
This bug is manually reproducible by modifying two lines inside numad.c so that the node is indexed in descending alphabetical order.

Steps to Reproduce:
1. Overwrite line 1213 as following: int num_files = scandir ("/sys/devices/system/node", &namelist, node_and_digits, alphasort);
2. Reverse the for loop on line 1249 as following: for (int node_ix = num_nodes-1; (node_ix >= 0); node_ix--) {
3. make clean && make install
4. #/usr/bin/numad -d
5. run application that invokes numad
6. compare log file's set node and actual core affinity (e.g. taskset -cp <pid>)

Actual results:
Process's core affinity is binded to a node which is different from destination node described on numad log file on line "Advising pid *** move from nodes (*) to nodes (*)

Expected results:
Process's core affinity is same as destination node described on numad log file on line "Advising pid *** move from nodes (*) to nodes (*)

Additional info:
This bug can be fixed by modifying numad.c as following. (From line #995)
+ int index[num_nodes];
+ for (int n = 0; (n < num_nodes); n++) {
+ index[node[n].node_id] = n;
+ }
     while (nodes) {
         if (ID_IS_IN_LIST(node_id, p->node_list_p)) {
- OR_LISTS(cpu_bind_list_p, cpu_bind_list_p, node[node_id].cpu_list_p);
+ OR_LISTS(cpu_bind_list_p, cpu_bind_list_p, node[index[node_id]].cpu_list_p);

P.S.:
If anyone knows the original development repository of numad, please let me know the url. I would like to send a pull request by myself.

Thanks for taking the time to report this bug and helping to make Ubuntu better. We appreciate the difficulties you are facing, but this appears to be a "regular" (non-security) bug. I have unmarked it as a security issue since this bug does not show evidence of allowing attackers to cross privilege boundaries nor directly cause loss of data/privacy. Please feel free to report any other bugs you may find.

information type: Private Security → Public
Ryo Hayashi (napppoli) wrote :

I didn't pay enough attention to that checkbox to notice it wasn't "Accept our terms and policy" checkbox but security vulnerability info checkbox. My apologies for taking your time.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers