Add thread ID to THREAD structure on Linux

Bug #1751562 reported by Michał "phoe" Herda
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Fix Released
Undecided
Unassigned

Bug Description

I had a situation where on SBCL on Linux, I had multiple worker threads on my machine, and one of them was taking 100% of my CPU. I wanted to retrieve the thread object of the offending thread, but this turned out to be non-trivial.

From htop, I knew the PID of the offending thread, but there is currently no easy way to map an operating system PID to the thread object.

On Linux, it is possible to issue the sys_gettid syscall to get the current thread ID. The syscall number for sys_gettid is 186. This, along with some help from smokeink on #lisp, allowed me to successfully extract thread IDs, and therefore, successfully map OS threads to SBCL thread objects.

```
* (sb-alien:alien-funcall
   (sb-alien:extern-alien "syscall" (function sb-alien:unsigned int)) 186)
3288 ;; this is the correct PID of the main SBCL thread

* (bt:make-thread (lambda ()
                    (sb-alien:alien-funcall
                     (sb-alien:extern-alien "syscall" (function sb-alien:unsigned int)) 186)))
#<SB-THREAD:THREAD "Anonymous thread" FINISHED values: 3412 {100364B363}>

* (bt:join-thread *)
3412 ;; 3412 is the correct PID of the SBCL thread that was created to serve that request
```

But, to find out the thread ID, the syscall needs to be executed from inside the offending thread. I want to avoid interrupting the thread, because it is dangerous and might be impossible if the thread is stuck in the WITHOUT-INTERRUPTS context.

Therefore, I propose to declare an additional slot in the THREAD structure called THREAD-ID. This value will be populated at thread start-up - the thread, before calling the user-provided function, will make the aforementioned syscall and set the value of its THREAD-ID slot to its result. This ID will be visible to the user and they will be able to find their thread objects in a simple way via `(find 12345 (all-threads) :key #'thread-id)`. Therefore, we will achieve mapping from the thread ID (reported via top, htop or any other reporting utility) to thread object inside SBCL.

The downside of this method is that the snippet above is non-portable. The above method will work on Linux only due to being syscall-specific. More operating systems will need to be supported via portability code, such as https://source.winehq.org/git/wine.git/blob?f=dlls/ntdll/server.c#l1338 - and I am still not sure how thread IDs work on Windows.

Nonetheless, I am willing to make a pull request to SBCL that implements the above, as long as you see no severe issues in the above technique.

What do you think?

Revision history for this message
Michał "phoe" Herda (phoe-krk) wrote :

https://stackoverflow.com/questions/28948415/get-thread-id-in-sbcl/48972801#48972801 has some insight on how to make it work for Windows as well - it will retrieve the thread handle then.

Revision history for this message
Tomas Hlavaty (q-tom-o) wrote :

you could find the thread by name
see https://bugs.launchpad.net/sbcl/+bug/1399727/comments/5

Revision history for this message
Tomas Hlavaty (q-tom-o) wrote :

this works:

(defun gettid (thread)
  (let (z
        (w (sb-thread::make-semaphore)))
    (sb-thread:interrupt-thread
     thread
     (lambda ()
       (setq z
             (with-open-file (s "/proc/thread-self/stat")
               (read s)))
       (sb-thread:signal-semaphore w)))
    (sb-thread:wait-on-semaphore w)
    z))

(mapcar (lambda (x) (cons (gettid x) (sb-thread:thread-name x)))
        (sb-thread:list-all-threads))

or instead of reading from /proc, use syscall:

            (sb-alien:with-alien ((fn (function sb-alien:int sb-alien:int)
                                       :extern "syscall"))
               (let ((a (sb-alien:alien-funcall fn 224)))
                 (if (minusp a)
                     (sb-alien:alien-funcall fn 186)
                     a))))))

this might be ok for development but not sure about production.
how dangerous is interrupt-thread used like this?

Stas Boukarev (stassats)
Changed in sbcl:
status: New → Fix Committed
Stas Boukarev (stassats)
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.