pcp installation is unreliable

Bug #1943184 reported by Clark Boylan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
devstack
Won't Fix
Medium
Unassigned

Bug Description

pcp (performance co-pilot) is being used to replace dstat which is no longer maintained. Unfortunately, the pcp package installations on Ubuntu are unreliable and often fail due to a timeout starting the pmlogger.service. It seems there have been bugs for this in Fedora [0] and while that bug notes a fix was upstreamed that fix does not seem to be sufficient.

You can see the occurence of these failures by querying logstash for:

  message:"pmlogger.service failed because a timeout was exceeded." AND filename:"job-output.txt"

Based on this logstash data it seems to hit us several times a day. I noticed it because it has reset the gate queue for openstack projects twice in as many days.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1721223

Revision history for this message
Clark Boylan (cboylan) wrote :

Digging into this I've discovered that collectl is another similar tool which also has openstack support which might be useful for devstack. I'll propose a change that starts to try and use collectl as a stand in for dstat (though note they are not directly compatible so any switch should properly move things around).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/devstack/+/808134

Changed in devstack:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on devstack (master)

Change abandoned by "Clark Boylan <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/devstack/+/808134
Reason: Abandoned as there doesn't seem to be a desire to use this tool instead of pcp.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to devstack (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/devstack/+/822824

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to devstack (master)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/822824
Committed: https://opendev.org/openstack/devstack/commit/134205c1388ac69169698ff2fe36cba23044ff62
Submitter: "Zuul (22348)"
Branch: master

commit 134205c1388ac69169698ff2fe36cba23044ff62
Author: Dr. Jens Harbott <email address hidden>
Date: Thu Dec 23 12:26:36 2021 +0100

    Don't enable the dstat service in CI jobs

    We still are seeing regular job failures because the pcp package fails
    to install. Assume that we can still enable it on demand when someone
    needs to debug specific job issues, let us just disable it by default.

    Related-Bug: 1943184
    Signed-off-by: Dr. Jens Harbott <email address hidden>
    Change-Id: I32ef8038e21c818623db9389588b3c6d3f98dcad

Changed in devstack:
status: In Progress → Triaged
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to devstack (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/devstack/+/823755

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to devstack (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/devstack/+/823756

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to devstack (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/823755
Committed: https://opendev.org/openstack/devstack/commit/09c83cf07eac157c2481bf8e4836b87cd8c2f981
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 09c83cf07eac157c2481bf8e4836b87cd8c2f981
Author: Dr. Jens Harbott <email address hidden>
Date: Thu Dec 23 12:26:36 2021 +0100

    Don't enable the dstat service in CI jobs

    We still are seeing regular job failures because the pcp package fails
    to install. Assume that we can still enable it on demand when someone
    needs to debug specific job issues, let us just disable it by default.

    Related-Bug: 1943184
    Signed-off-by: Dr. Jens Harbott <email address hidden>
    Change-Id: I32ef8038e21c818623db9389588b3c6d3f98dcad
    (cherry picked from commit 134205c1388ac69169698ff2fe36cba23044ff62)

tags: added: in-stable-xena
tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to devstack (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/823756
Committed: https://opendev.org/openstack/devstack/commit/8b7ef9f4482d0d00e19b4a60c897019d951ee8e1
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 8b7ef9f4482d0d00e19b4a60c897019d951ee8e1
Author: Dr. Jens Harbott <email address hidden>
Date: Thu Dec 23 12:26:36 2021 +0100

    Don't enable the dstat service in CI jobs

    We still are seeing regular job failures because the pcp package fails
    to install. Assume that we can still enable it on demand when someone
    needs to debug specific job issues, let us just disable it by default.

    Related-Bug: 1943184
    Signed-off-by: Dr. Jens Harbott <email address hidden>
    Change-Id: I32ef8038e21c818623db9389588b3c6d3f98dcad
    (cherry picked from commit 134205c1388ac69169698ff2fe36cba23044ff62)

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

It doesn't look like there's anything else we can do here unless the package gets better maintainance.

Changed in devstack:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.