StarlingX

Bug #1927772
Comment #3

Comment 3 for bug 1927772

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-12-02: Fix merged to monitoring (master)

Reviewed: https://review.opendev.org/c/starlingx/monitoring/+/819498
Committed: https://opendev.org/starlingx/monitoring/commit/34c2ef786555d56e26d3c09b63d9ed464cdcd0ea
Submitter: "Zuul (22348)"
Branch: master

commit 34c2ef786555d56e26d3c09b63d9ed464cdcd0ea
Author: Jim Gauld <email address hidden>
Date: Fri Nov 26 15:44:51 2021 -0500

Enhance schedtop with blocked_max, disk waiters, and watch commands

The 'schedtop' monitoring tool is used to do engineering
analysis of process scheduling, disk IO, and latency.

    This enhances the schedtop monitoring tool with:
    - additional fields "bmax" latency and "D" disk-sleep tasks
    - command-line options to watch specific tasks and mechanism to
      trigger sysrq

    The following new fields are reported:
    - "bmax" milliseconds, corresponds to linux scheduler stats
      "blocked_max". This represents involuntary wait of scheduling
      and IO wait.
    - "D:<n>", the current number of disk-sleep "D" tasks.

    The following command line options are added to be able to watch
    specific processes, and optionally trigger a sysrq (i.e., force
    a crashdump) when trigger delay threshold milliseconds is reached.
    [--watch-cmd=tid1,cmd1,cmd2,...] [--watch-only] [--watch-quiet]
    [--trig-delay=time]

The --watch-cmd option matches process names 'comm' field pattern.

    The --watch-only option watches and displays only the subset of
    tasks discovered at tool startup. This dramatically reduces the
    tool cpu overhead.

The --watch-quiet displays no sample output after tool startup,
the only output occurs when the --trig-delay is exceeded.

    The --trig-delay=time option will trigger a sysrq to force a crash
    dump any watched process "bmax" delay exceeds trigger delay time
    in milliseconds.

Example: collect 1 minute of data, monitor all tasks,
reset scheduler hiwatermark statistics

schedtop \
--period=60 --reset-hwm

Example: collect 1 minute of data, watch specific tasks

    schedtop \
    --period=60 --reset-hwm \
    --watch-cmd=jbd2,kube-apiserver,etcd,forward-journald,containerd \
    --watch-only

Example: watch specific tasks and trigger sysrq when any of the
watched commands exceed 10000ms delay (10 seconds)

    schedtop \
    --period=36000 --reset-hwm \
    --watch-cmd=jbd2,kube-apiserver,etcd,forward-journald,containerd \
    --watch-only \
    --trig-delay=10000

    Testcases:
    PASS: Collect standard tool output, verified new bmax and D fields
    PASS: Verify --watch-cmds will detect the specified commands or tids
    PASS: Verify --watch-only will only display watched commands
    PASS: Verify --trig-delay will generate a sysrq
    PASS: Verify comm field is limited to 15 characters wide

Closes-Bug: 1927772

Signed-off-by: Jim Gauld <email address hidden>
Change-Id: I5368aac66b24608f5eab366cd929be4c0d4a1f76

Reviewed:  https://review.opendev.org/c/starlingx/monitoring/+/819498
Committed: https://opendev.org/starlingx/monitoring/commit/34c2ef786555d56e26d3c09b63d9ed464cdcd0ea
Submitter: "Zuul (22348)"
Branch:    master

commit 34c2ef786555d56e26d3c09b63d9ed464cdcd0ea
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Fri Nov 26 15:44:51 2021 -0500

Enhance schedtop with blocked_max, disk waiters, and watch commands
    
    The 'schedtop' monitoring tool is used to do engineering
    analysis of process scheduling, disk IO, and latency.
    
    This enhances the schedtop monitoring tool with:
    - additional fields "bmax" latency and "D" disk-sleep tasks
    - command-line options to watch specific tasks and mechanism to
      trigger sysrq
    
    The following new fields are reported:
    - "bmax" milliseconds, corresponds to linux scheduler stats
      "blocked_max". This represents involuntary wait of scheduling
      and IO wait.
    - "D:<n>", the current number of disk-sleep "D" tasks.
    
    The following command line options are added to be able to watch
    specific processes, and optionally trigger a sysrq (i.e., force
    a crashdump) when trigger delay threshold milliseconds is reached.
    [--watch-cmd=tid1,cmd1,cmd2,...] [--watch-only] [--watch-quiet]
    [--trig-delay=time]
    
    The --watch-cmd option matches process names 'comm' field pattern.
    
    The --watch-only option watches and displays only the subset of
    tasks discovered at tool startup. This dramatically reduces the
    tool cpu overhead.
    
    The --watch-quiet displays no sample output after tool startup,
    the only output occurs when the --trig-delay is exceeded.
    
    The --trig-delay=time option will trigger a sysrq to force a crash
    dump any watched process "bmax" delay exceeds trigger delay time
    in milliseconds.
    
    Example: collect 1 minute of data, monitor all tasks,
             reset scheduler hiwatermark statistics
    
    schedtop \
    --period=60 --reset-hwm
    
    Example: collect 1 minute of data, watch specific tasks
    
    schedtop \
    --period=60 --reset-hwm \
    --watch-cmd=jbd2,kube-apiserver,etcd,forward-journald,containerd \
    --watch-only
    
    Example: watch specific tasks and trigger sysrq when any of the
             watched commands exceed 10000ms delay (10 seconds)
    
    schedtop \
    --period=36000 --reset-hwm \
    --watch-cmd=jbd2,kube-apiserver,etcd,forward-journald,containerd \
    --watch-only \
    --trig-delay=10000
    
    Testcases:
    PASS: Collect standard tool output, verified new bmax and D fields
    PASS: Verify --watch-cmds will detect the specified commands or tids
    PASS: Verify --watch-only will only display watched commands
    PASS: Verify --trig-delay will generate a sysrq
    PASS: Verify comm field is limited to 15 characters wide
    
    Closes-Bug: 1927772
    
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>
    Change-Id: I5368aac66b24608f5eab366cd929be4c0d4a1f76