package pcp 4.3.4-1build1 failed to install/upgrade: installed pcp package post-installation script subprocess returned error exit status 1

Bug #1850281 reported by Robin T. Bye
38
This bug affects 8 people
Affects Status Importance Assigned to Milestone
pcp (Fedora)
Fix Released
Undecided
pcp (Ubuntu)
Confirmed
Undecided
Unassigned
Eoan
Confirmed
Undecided
Unassigned
Focal
Confirmed
Undecided
Unassigned

Bug Description

Tried to install pcp package but got error.

ProblemType: Package
DistroRelease: Ubuntu 19.10
Package: pcp 4.3.4-1build1
ProcVersionSignature: Ubuntu 5.3.0-20.21-generic 5.3.7
Uname: Linux 5.3.0-20-generic x86_64
ApportVersion: 2.20.11-0ubuntu8
Architecture: amd64
Date: Tue Oct 29 11:04:58 2019
ErrorMessage: installed pcp package post-installation script subprocess returned error exit status 1
InstallationDate: Installed on 2018-10-15 (378 days ago)
InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
Python3Details: /usr/bin/python3.7, Python 3.7.5, python3-minimal, 3.7.5-1
PythonDetails: /usr/bin/python2.7, Python 2.7.17rc1, python-minimal, 2.7.17-1
RelatedPackageVersions:
 dpkg 1.19.7ubuntu2
 apt 1.9.4
SourcePackage: pcp
Title: package pcp 4.3.4-1build1 failed to install/upgrade: installed pcp package post-installation script subprocess returned error exit status 1
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
In , paul.destefano-redhat2 (paul.destefano-redhat2-redhat-bugs) wrote :

Description of problem:
Everyday, I get these messages:

Jun 17 00:10:13 <host> systemd[1]: pmlogger.service: New main PID 4059499 does not
belong to service, and PID file is not owned by root. Refusing.
Jun 17 00:10:13 <host> systemd[1]: pmlogger.service: New main PID 4059499 does not
belong to service, and PID file is not owned by root. Refusing.
Jun 17 00:11:38 <host> systemd[1]: pmlogger.service: Start operation timed out.
Terminating.
Jun 17 00:11:38 <host> systemd[1]: pmlogger.service: Failed with result 'timeout'.
Jun 17 00:11:38 <host> systemd[1]: Failed to start Performance Metrics Archive Logger.
Jun 17 00:11:46 <host> systemd[1]: pmlogger.service: Supervising process 4066656 which
is not our child. We'll most likely not notice when it exits.

How can I fix the service so that it doesn't give errors every day?

Version-Release number of selected component (if applicable):
pcp-4.3.2-1.fc30.x86_64

How reproducible:
Happens every day.

Steps to Reproduce:
1. No action is required to produce this error. I guess you need to have PCP services enabled and running, though.

Actual results:
errors and warnings

Expected results:
no errors or warnings

Additional info:
PCP data is being collected. I don't really know if there is anything wrong with the data, but I assume it is fine.

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :

I can report the same issue, it was happening when my old box was running Fedora 29, then when upgraded to Fedora 30 it continued. Now with my brand new Fedora 30 machine the issue continues.

SELinux is not reporting any blocks, and then I had this disabled with F29 so it wasn't that.

What it does effect is the server is unable to store any more than a day of performance data. Same version of pcp.

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

This issue seems to be similar to BZ #1583159 - "Sample service file vncserver@.service is out of date with systemd changes", which was caused by an update to systemd in response to an upstream issue stating that PID run files should be owned by root, not an unprivileged user (such as 'pcp' in our case). The upstream systemd RFE requesting this is https://github.com/systemd/systemd/issues/6632 and the change that introduced the PID ownership check is https://github.com/systemd/systemd/commit/db256aab13d8a89d583ecd2bacf0aca87c66effc. The change subsequently resulted in a spate of issues with various service daemons unable to start correctly because their PID file was not owned by root (e.g. xinetd and other services, see for example https://github.com/systemd/systemd/issues/8085). As a result, systemd was subsequently updated to not follow symlinks, which apparently circumvented the original issue.

The end result is that /run/PID files should be owned by root and symlinks should not be used. An example of an updated service file to comply with the new systemd is BZ #1583159. So I guess for PCP we're going to need to update the service unit files for all our services to be compatible with the newer versions of systemd. Exactly what changes to make is yet to be determined.

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

(In reply to Fernando Viñan-Cano from comment #1)
> I can report the same issue, it was happening when my old box was running
> Fedora 29, then when upgraded to Fedora 30 it continued. Now with my brand
> new Fedora 30 machine the issue continues.
>
> SELinux is not reporting any blocks, and then I had this disabled with F29
> so it wasn't that.
>
> What it does effect is the server is unable to store any more than a day of
> performance data. Same version of pcp.

Fernando, can you provide any more details about your system not being able to store more than one day of data? e.g. compressed tarball of your logs below /var/log/pcp/pmlogger and journalctl -b -u pmlogger. If you don't want to attach them to this BZ, could you mail them to me? This might be a separate issue not related to the systemd service issue reported in this BZ because it looks like the pmlogger service is actually being started on subsequent attempts by the PCP scripts, despite all the systemd complaints about the PID files etc.

Thanks

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :

(In reply to Mark Goodwin from comment #3)
> (In reply to Fernando Viñan-Cano from comment #1)
> > I can report the same issue, it was happening when my old box was running
> > Fedora 29, then when upgraded to Fedora 30 it continued. Now with my brand
> > new Fedora 30 machine the issue continues.
> >
> > SELinux is not reporting any blocks, and then I had this disabled with F29
> > so it wasn't that.
> >
> > What it does effect is the server is unable to store any more than a day of
> > performance data. Same version of pcp.
>
> Fernando, can you provide any more details about your system not being able
> to store more than one day of data? e.g. compressed tarball of your logs
> below /var/log/pcp/pmlogger and journalctl -b -u pmlogger. If you don't want
> to attach them to this BZ, could you mail them to me? This might be a
> separate issue not related to the systemd service issue reported in this BZ
> because it looks like the pmlogger service is actually being started on
> subsequent attempts by the PCP scripts, despite all the systemd complaints
> about the PID files etc.
>
> Thanks

Here's the journalctl output, I will email the log folder separately:

-- Logs begin at Fri 2019-07-05 11:27:51 CEST, end at Mon 2019-07-15 15:12:12 CEST. --
Jul 15 13:10:51 MAGGIE.star-one.co.uk systemd[1]: Starting Performance Metrics Archive Logger...
Jul 15 13:10:52 MAGGIE.star-one.co.uk pmlogger[1304]: Starting pmlogger ...
Jul 15 13:10:52 MAGGIE.star-one.co.uk systemd[1]: pmlogger.service: Can't open PID file /run/pcp/pmlogger.pid (yet?) after start: No such file or directory
Jul 15 13:11:08 MAGGIE.star-one.co.uk systemd[1]: pmlogger.service: Supervising process 8652 which is not our child. We'll most likely not notice when it exits.
Jul 15 13:11:08 MAGGIE.star-one.co.uk systemd[1]: Started Performance Metrics Archive Logger.

Revision history for this message
In , updates (updates-redhat-bugs) wrote :

FEDORA-2019-97183bed56 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-97183bed56

Revision history for this message
In , updates (updates-redhat-bugs) wrote :

FEDORA-2019-44b383ec91 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-44b383ec91

Revision history for this message
In , updates (updates-redhat-bugs) wrote :

pcp-4.3.4-1.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-97183bed56

Revision history for this message
In , updates (updates-redhat-bugs) wrote :

pcp-4.3.4-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-44b383ec91

Revision history for this message
In , updates (updates-redhat-bugs) wrote :

pcp-4.3.4-1.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Revision history for this message
In , updates (updates-redhat-bugs) wrote :

pcp-4.3.4-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

Marking this as a DUP of BZ#1737091 since it's essentially the same issue and should now be fixedupstream in pcp-4.3.4-1 and later.

*** This bug has been marked as a duplicate of bug 1737091 ***

Revision history for this message
In , bjwyman (bjwyman-redhat-bugs) wrote :
Download full text (13.7 KiB)

I am assuming the even newer version that got installed from updates-testing should also carry the fix. It does not appear to fix the problem.

[<username>@<host> ~]$ sudo dnf update --enablerepo=updates-testing pcp
...
Last metadata expiration check: 0:00:37 ago on Thu 24 Oct 2019 12:46:43 PM CDT.
Dependencies resolved.
====================================================================================================================================
 Package Architecture Version Repository Size
====================================================================================================================================
Upgrading:
 pcp x86_64 5.0.0-1.fc30 updates-testing 1.1 M
 pcp-conf x86_64 5.0.0-1.fc30 updates-testing 30 k
 pcp-libs x86_64 5.0.0-1.fc30 updates-testing 506 k
 pcp-selinux x86_64 5.0.0-1.fc30 updates-testing 26 k
 pcp-system-tools x86_64 5.0.0-1.fc30 updates-testing 167 k
 python3-pcp x86_64 5.0.0-1.fc30 updates-testing 142 k

Transaction Summary
====================================================================================================================================
Upgrade 6 Packages

Total download size: 2.0 M
Is this ok [y/N]: y
Downloading Packages:
(1/6): pcp-conf-5.0.0-1.fc30.x86_64.rpm 73 kB/s | 30 kB 00:00
(2/6): pcp-selinux-5.0.0-1.fc30.x86_64.rpm 239 kB/s | 26 kB 00:00
(3/6): pcp-5.0.0-1.fc30.x86_64.rpm 1.7 MB/s | 1.1 MB 00:00
(4/6): pcp-system-tools-5.0.0-1.fc30.x86_64.rpm 1.2 MB/s | 167 kB 00:00
(5/6): python3-pcp-5.0.0-1.fc30.x86_64.rpm 1.9 MB/s | 142 kB 00:00
(6/6): pcp-libs-5.0.0-1.fc30.x86_64.rpm 337 kB/s | 506 kB 00:01
------------------------------------------------------------------------------------------------------------------------------------
Total 1.0 MB/s | 2.0 MB 00:01
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing : 1/1
  Running scriptlet: pcp-selinux-5.0.0-1.fc30.x86_64 ...

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :

I've given up waiting for this fix, plus I'm not sure how an older reported bug can be called a duplicate of a newer one surely it's the other way around, not that it really matters, plus isn't fixed in 4.3.4-1 as stated.

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

(In reply to Brandon J. Wyman from comment #12)
> I am assuming the even newer version that got installed from updates-testing
> should also carry the fix. It does not appear to fix the problem.
>
> [<username>@<host> ~]$ sudo dnf update --enablerepo=updates-testing pcp
> ...
> Oct 24 12:53:46 <host> pmlogger[19176]: Starting pmlogger ...
> Oct 24 12:53:46 <host> systemd[1]: pmlogger.service: Can't open PID file
> /run/pcp/pmlogger.pid (yet?) after start: No such file or directory
...

Brandon, the more serious bug involving incorrect systemd dependencies for the pmlogger service unit is fixed in the pcp-5.0.0 version that is in updates-testing. The less serious (and believed to be benign) issue of systemd complaining about missing pid files and so forth is not fixed, as is explained in https://bugzilla.redhat.com/show_bug.cgi?id=1761962#c2

The systemd complaints also occur for other (non-pcp) services that use type=forking, not just pcp / pmlogger. We will be changing the pcp pmlogger.service config over to type=notify to silence those systemd messages in the near future. I'm currently travelling and unable to commit time until next week at the earliest.

In the mean-time - unless your pmlogger service is actually failing to start, the issues are benign and just systemd being verbose.

Regards

Revision history for this message
Robin T. Bye (robin-bye) wrote :
Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :

My pmlogger service is constantly failing to start now, so it's worse than before:

pcp.x86_64 5.0.0-1.fc30

pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
PRIORITY 3
SYSLOG_FACILITY 3
SYSLOG_IDENTIFIER pcp
SYSLOG_TIMESTAMP Oct 28 11:30:53
_BOOT_ID f1022eb69e6b460fabe067d7fe080578
_CAP_EFFECTIVE 0
_COMM logger
_GID 985
_HOSTNAME MAGGIE
_MACHINE_ID ab73b29407d94f1187cf13f29453ffc3
_PID 2695
_SELINUX_CONTEXT system_u:system_r:unconfined_service_t:s0
_SOURCE_REALTIME_TIMESTAMP 1572258653005928
_SYSTEMD_CGROUP /system.slice/pmlogger_check.service
_SYSTEMD_INVOCATION_ID 5cc774ad29fb44c494c6668f189d5e16
_SYSTEMD_SLICE system.slice
_SYSTEMD_UNIT pmlogger_check.service
_TRANSPORT syslog
_UID 985
__CURSOR s=08bcabd5ee904ef2b1e9e69dd6740a20;i=51d;b=f1022eb69e6b460fabe067d7fe080578;m=3640fd6;t=595f5fe81ea96;x=9e47b1cf04c52206
__MONOTONIC_TIMESTAMP 56889302
__REALTIME_TIMESTAMP 1572258653006486

pmlogger_check failed - see /var/log/pcp/pmlogger/pmlogger_check.log
PRIORITY 3
SYSLOG_FACILITY 3
SYSLOG_IDENTIFIER pcp
SYSLOG_TIMESTAMP Oct 28 11:30:53
_BOOT_ID f1022eb69e6b460fabe067d7fe080578
_GID 985
_HOSTNAME MAGGIE
_MACHINE_ID ab73b29407d94f1187cf13f29453ffc3
_PID 2711
_SELINUX_CONTEXT system_u:system_r:unconfined_service_t:s0
_SOURCE_REALTIME_TIMESTAMP 1572258653018970
_TRANSPORT syslog
_UID 985
__CURSOR s=08bcabd5ee904ef2b1e9e69dd6740a20;i=51e;b=f1022eb69e6b460fabe067d7fe080578;m=3644208;t=595f5fe821cc8;x=a5f9ecce071f0799
__MONOTONIC_TIMESTAMP 56902152
__REALTIME_TIMESTAMP 1572258653019336

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

(In reply to Fernando Viñan-Cano from comment #15)
> My pmlogger service is constantly failing to start now, so it's worse than
> before:
>
> pcp.x86_64
> 5.0.0-1.fc30
>
> pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
...
>
> pmlogger_check failed - see /var/log/pcp/pmlogger/pmlogger_check.log
...

Fernando, what version of systemd, which distro/platform? Please also attach the above two logs and /var/log/pcp/pmlogger/*/pmlogger.log and also "journalctl -xe -u pmlogger" and "systemctl status pmlogger.service". If you'd rather email them to me directly rather than attach to a public BZ, then please do.

Does it only fail during boot? Or even after a systemctl restart pmlogger.service?

Thanks

Revision history for this message
In , myllynen (myllynen-redhat-bugs) wrote :

FWIW, I probably saw something similar on Fedora 31 but after completely reinstalling all pcp packages I can't reproduce the issue any more.

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :
Download full text (4.2 KiB)

(In reply to Mark Goodwin from comment #16)
> (In reply to Fernando Viñan-Cano from comment #15)
> > My pmlogger service is constantly failing to start now, so it's worse than
> > before:
> >
> > pcp.x86_64
> > 5.0.0-1.fc30
> >
> > pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
> ...
> >
> > pmlogger_check failed - see /var/log/pcp/pmlogger/pmlogger_check.log
> ...
>
> Fernando, what version of systemd, which distro/platform? Please also attach
> the above two logs and /var/log/pcp/pmlogger/*/pmlogger.log and also
> "journalctl -xe -u pmlogger" and "systemctl status pmlogger.service". If
> you'd rather email them to me directly rather than attach to a public BZ,
> then please do.
>
> Does it only fail during boot? Or even after a systemctl restart
> pmlogger.service?
>
> Thanks

It's Fedora 30, it fails after a short while if I try to restart it manually, and I'd be happy to reinstall PCP if it could help, but I don't know the best way to do that

[root@MAGGIE ~]# cat /var/log/pcp/pmlogger/pmlogger_daily-K.log

=== compressing PCP archives for host local: ===

pmlogger_daily: [control.d/local:26]
Warning: removing lock file older than 30 minutes
----------. 1 pcp pcp 0 Oct 31 07:54 /var/log/pcp/pmlogger/MAGGIE/lock
pmlogger_daily: [control.d/local:26]
Warning: cannot get archive basename pmlogger PID=180099

pmcd.pmlogger.archive
No value(s) available!
pmlogger_daily: [control.d/local:26]
Warning: current volume of current pmlogger not known, compression skipped

[root@MAGGIE ~]# cat /var/log/pcp/pmlogger/pmlogger_check.log
cat: /var/log/pcp/pmlogger/pmlogger_check.log: No such file or directory

Oct 31 07:56:00 MAGGIE systemd[1]: pmlogger.service: Service RestartSec=100ms expired, scheduling restart.
Oct 31 07:56:00 MAGGIE systemd[1]: pmlogger.service: Scheduled restart job, restart counter is at 392.
Oct 31 07:56:00 MAGGIE systemd[1]: Stopped Performance Metrics Archive Logger.
Oct 31 07:56:00 MAGGIE systemd[1]: Starting Performance Metrics Archive Logger...
Oct 31 07:56:00 MAGGIE pmlogger[186695]: Starting pmlogger ...
Oct 31 07:56:00 MAGGIE systemd[1]: pmlogger.service: Can't open PID file /run/pcp/pmlogger.pid (yet?) after start: No such file or directory
Oct 31 07:56:05 MAGGIE systemd[1]: pmlogger.service: New main PID 192384 does not belong to service, and PID file is not owned by root. Refusing.
Oct 31 07:56:05 MAGGIE systemd[1]: pmlogger.service: New main PID 192384 does not belong to service, and PID file is not owned by root. Refusing.

[root@MAGGIE ~]# journalctl -xe -u pmlogger
Oct 31 07:58:00 MAGGIE systemd[1]: pmlogger.service: Service RestartSec=100ms expired, scheduling restart.
Oct 31 07:58:00 MAGGIE systemd[1]: pmlogger.service: Scheduled restart job, restart counter is at 393.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Automatic restarting of the unit pmlogger.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Oct 31 07:58:00 MAGGIE systemd[1]: Stopped Performance Metrics Ar...

Read more...

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

(In reply to Fernando Viñan-Cano from comment #18)

> It's Fedora 30, it fails after a short while if I try to restart it
> manually, and I'd be happy to reinstall PCP if it could help, but I don't
> know the best way to do that
>
...
> -- The job identifier is 153999.
> Oct 31 07:58:00 MAGGIE pmlogger[193427]: Starting pmlogger ...
> Oct 31 07:58:00 MAGGIE systemd[1]: pmlogger.service: Can't open PID file
> /run/pcp/pmlogger.pid (yet?) after start: No such file or directory
> Oct 31 07:58:05 MAGGIE systemd[1]: pmlogger.service: New main PID 199132
> does not belong to service, and PID file is not owned by root. Refusing.
> Oct 31 07:58:05 MAGGIE systemd[1]: pmlogger.service: New main PID 199132
> does not belong to service, and PID file is not owned by root. Refusing.

f30 originally shipped with systemd-241-7.gita2eaa1c.fc30, but that has now been updated to systemd-241-12.git323cdf4.fc30. WHat is your current version Fernando? Also, Marko, was your systemd updated prior to the problem going away? I ask because I'm running the latest -12 version and do not see this problem of pmlogger failing to start or restart, though I do get some of the messages (believed to be benign). In particular, I do not get the "PID file is not owned by root. Refusing." message.

Regards

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :

I have:

systemd.i686 241-12.git323cdf4.fc30

I also tried a "dnf reinstall pcp" but that's made no difference even after a reboot.

Revision history for this message
In , myllynen (myllynen-redhat-bugs) wrote :

On an up-to-date F30 and F31 VMs (4 vCPU / 4 GB) with systemd-241-12.git323cdf4.fc30.x86_64 / systemd-243-4.gitef67743.fc31.x86_64 this simple script is enough to reproduce the issue:

# cat r
#!/bin/sh
systemctl stop pmcd pmie pmlogger pmproxy redis > /dev/null 2>&1
dnf remove --noautoremove -y pcp\*
rm -rf /etc/pcp* /var/lib/pcp/ /var/log/pcp/ /var/lib/dnf/yumdb/p /var/cache/dnf/pcp*
systemctl daemon-reload
dnf install -y pcp
systemctl daemon-reload
systemctl stop pmcd pmie pmlogger pmproxy
systemctl start pmcd pmie pmlogger pmproxy
# rpm -q pcp
pcp-5.0.0-1.fc30.x86_64

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :
Download full text (5.7 KiB)

Thanks Marko - I ran your script on an up-to-date f31 VM and after it finished, I do see the "pmlogger.service failed because a timeout was exceeded" messages, however after no further action the pmlogger service is up and running:

....
Job for pmlogger.service failed because a timeout was exceeded.
See "systemctl status pmlogger.service" and "journalctl -xe" for details.

Running systemctl status pmlogger.service now shows:

[root@f31-systemd mgoodwin]# systemctl status pmlogger.service
● pmlogger.service - Performance Metrics Archive Logger
   Loaded: loaded (/usr/lib/systemd/system/pmlogger.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-11-04 08:04:57 AEDT; 6s ago
     Docs: man:pmlogger(1)
  Process: 89860 ExecStart=/usr/share/pcp/lib/pmlogger start (code=exited, status=0/SUCCESS)
 Main PID: 95257 (pmlogger)
    Tasks: 1 (limit: 9506)
   Memory: 3.4M
      CPU: 6.362s
   CGroup: /system.slice/pmlogger.service
           └─95257 /usr/libexec/pcp/bin/pmlogger -P -r -T24h10m -c config.default -v 100mb -m pmlogger_check 20191104.08.04

Nov 04 08:04:52 f31-systemd systemd[1]: Starting Performance Metrics Archive Logger...
Nov 04 08:04:52 f31-systemd pmlogger[89860]: Starting pmlogger ...
Nov 04 08:04:52 f31-systemd systemd[1]: pmlogger.service: Can't open PID file /run/pcp/pmlogger.pid (yet?) after start: No such file o>
Nov 04 08:04:57 f31-systemd systemd[1]: pmlogger.service: Supervising process 95257 which is not our child. We'll most likely not noti>
Nov 04 08:04:57 f31-systemd systemd[1]: Started Performance Metrics Archive Logger.
[root@f31-systemd mgoodwin]# pcp
Performance Co-Pilot configuration on f31-systemd:

 platform: Linux f31-systemd 5.3.7-301.fc31.x86_64 #1 SMP Mon Oct 21 19:18:58 UTC 2019 x86_64
 hardware: 2 cpus, 1 disk, 1 node, 7961MB RAM
 timezone: AEDT-11
 services: pmcd pmproxy
     pmcd: Version 5.0.0-1, 9 agents, 4 clients
     pmda: root pmcd proc pmproxy xfs linux mmv kvm jbd2
 pmlogger: primary logger: /var/log/pcp/pmlogger/f31-systemd/20191104.08.04
     pmie: primary engine: /var/log/pcp/pmie/f31-systemd/pmie.log

And the journalctl -xe -u pmlogger.service log shows :

-- A start job for unit pmlogger.service has begun execution.
--
-- The job identifier is 51208.
Nov 04 08:02:51 f31-systemd pmlogger[82555]: Starting pmlogger ...
Nov 04 08:02:51 f31-systemd systemd[1]: pmlogger.service: Can't open PID file /run/pcp/pmlogger.pid (yet?) after start: No such file or directory
Nov 04 08:02:57 f31-systemd systemd[1]: pmlogger.service: New main PID 89164 does not belong to service, and PID file is not owned by root. Refusing.
Nov 04 08:04:51 f31-systemd systemd[1]: pmlogger.service: start operation timed out. Terminating.
Nov 04 08:04:51 f31-systemd systemd[1]: pmlogger.service: Failed with result 'timeout'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The unit pmlogger.service has entered the 'failed' state with result 'timeout'.
Nov 04 08:04:51 f31-systemd systemd[1]: Failed to start Performance Metrics Archive Logger.
-- Subject: A start job for unit pmlogger.service has failed
-- Defined-By: ...

Read more...

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :
Download full text (6.8 KiB)

This morning this was logged for pmlogger_daily:

[root@MAGGIE ~]# systemctl status pmlogger_daily
● pmlogger_daily.service - Process archive logs
   Loaded: loaded (/usr/lib/systemd/system/pmlogger_daily.service; static; vendor preset: disabled)
   Active: failed (Result: timeout) since Mon 2019-11-04 00:12:01 CET; 7h ago
     Docs: man:pmlogger_daily(1)
  Process: 216795 ExecStart=/usr/libexec/pcp/bin/pmlogger_daily $PMLOGGER_DAILY_PARAMS (code=exited, status=0/SUCCESS)
 Main PID: 216795 (code=exited, status=0/SUCCESS)

Nov 04 00:10:01 MAGGIE systemd[1]: Starting Process archive logs...
Nov 04 00:12:01 MAGGIE systemd[1]: pmlogger_daily.service: Start operation timed out. Terminating.
Nov 04 00:12:01 MAGGIE systemd[1]: pmlogger_daily.service: Failed with result 'timeout'.
Nov 04 00:12:01 MAGGIE systemd[1]: Failed to start Process archive logs.

[root@MAGGIE ~]# journalctl -xe -u pmlogger_daily-- Defined-By: systemd-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel-- -- The unit pmlogger_daily.service has successfully entered the 'dead' state.
Nov 03 00:11:48 MAGGIE systemd[1]: Started Process archive logs.
-- Subject: A start job for unit pmlogger_daily.service has finished successfully
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A start job for unit pmlogger_daily.service has finished successfully.
--
-- The job identifier is 366735.
Nov 04 00:10:01 MAGGIE systemd[1]: Starting Process archive logs...
-- Subject: A start job for unit pmlogger_daily.service has begun execution
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A start job for unit pmlogger_daily.service has begun execution.
--
-- The job identifier is 579462.
Nov 04 00:12:01 MAGGIE systemd[1]: pmlogger_daily.service: Start operation timed out. Terminating.
Nov 04 00:12:01 MAGGIE systemd[1]: pmlogger_daily.service: Failed with result 'timeout'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The unit pmlogger_daily.service has entered the 'failed' state with result 'timeout'.
Nov 04 00:12:01 MAGGIE systemd[1]: Failed to start Process archive logs.
-- Subject: A start job for unit pmlogger_daily.service has failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A start job for unit pmlogger_daily.service has finished with a failure.
--
-- The job identifier is 579462 and the job result is failed.

The service monitor in Cockpit shows pmlogger_daily as failed, and starting it manually causes the same error.

November 4, 2019
8:10 AM
Failed to start Process archive logs.
systemd
8:10 AM
pmlogger_daily.service: Failed with result 'exit-code'.
systemd
8:10 AM
pmlogger_daily.service: Main process exited, code=exited, status=1/FAILURE
systemd
8:10 AM
Starting Process archive logs...
systemd
8:10 AM
Failed to start Process archive logs.
systemd
8:10 AM
pmlogger_daily.service: Failed with result 'exit-code'.
systemd
8:10 AM
pmlogger_daily.service: Main process exited, code=exited, status=1/FAILURE
systemd
8:10 AM
pmlogg...

Read more...

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :
Download full text (10.6 KiB)

I think I've made things worse since running that script, this is just a sample:

November 5, 2019
5:25 AM
Failed to start Performance Metrics Archive Logger.
systemd
5:25 AM
pmlogger.service: New main PID 242536 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:23 AM
pmlogger.service: New main PID 236432 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:23 AM
Failed to start Performance Metrics Archive Logger.
systemd
5:21 AM
pmlogger.service: New main PID 229862 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:21 AM
Failed to start Performance Metrics Archive Logger.
systemd
5:19 AM
pmlogger.service: New main PID 222366 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:19 AM
Failed to start Performance Metrics Archive Logger.
systemd
5:17 AM
pmlogger.service: New main PID 214518 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:17 AM
Failed to start Performance Metrics Archive Logger.
systemd
5:15 AM
pmlogger.service: New main PID 206115 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:15 AM
Failed to start Performance Metrics Archive Logger.
systemd
5:13 AM
pmlogger.service: New main PID 198474 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:13 AM
Failed to start Performance Metrics Archive Logger.
systemd
5:11 AM
pmlogger.service: New main PID 191237 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:11 AM
Failed to start Performance Metrics Archive Logger.
systemd
5:09 AM
pmlogger.service: New main PID 184405 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:09 AM
Failed to start Performance Metrics Archive Logger.
systemd
5:07 AM
pmlogger.service: New main PID 177929 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:07 AM
Failed to start Performance Metrics Archive Logger.
systemd
5:05 AM
pmlogger.service: New main PID 171365 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:05 AM
Failed to start Performance Metrics Archive Logger.
systemd
5:03 AM
pmlogger.service: New main PID 164854 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:03 AM
Failed to start Performance Metrics Archive Logger.
systemd
5:01 AM
pmlogger.service: New main PID 158367 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
5:01 AM
Failed to start Performance Metrics Archive Logger.
systemd
4:59 AM
pmlogger.service: New main PID 151718 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
4:59 AM
Failed to start Performance Metrics Archive Logger.
systemd
4:57 AM
pmlogger.service: New main PID 145079 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
4:57 AM
Failed to start Performance Metrics Archive Logger.
systemd
4:55 AM
pmlogger.service: New main PID 138537 does not belong to service, and PID file is not owned by root. Refusing.
systemd
2
4...

Revision history for this message
In , myllynen (myllynen-redhat-bugs) wrote :

> i.e. do you end up with the service actually running correctly, despite the initial timeout? Or is the pmlogger service still NOT running at the completion of the script?

It should be mentioned what on latest Ubuntu PCP packages fail to install properly due to this issue.

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

re-opening this one, despite it being DUP'd elsewhere since people seem to be using this BZ rather than any of the others.

After a fair bit of poking around with difficult-to-debug systemd configs and PCP log management scripts, it looks like we basically have what's known as a "readyness protocol mismatch" with what systemd is expecting. Given the PCP rc scripts pre-date systemd by about 2 decades, it's not surprising.

Basically, systemd runs the PCP pmlogger.service rc script (/usr/share/pcp/lib/pmlogger), which then runs pmlogger_check to start the pmlogger service (to start at least the primary logger, but also any other loggers configured in the control file or control.d directory in a logging farm configuration).

pmlogger_check then forks off a background shell function which then forks off each pmlogger process with appropriate options (as per the control file), and then busy waits in a loop checking with pmlc for the new pmlogger process. Once each pmlogger is started, pmlogger_check writes the pid files and does various other things and then exits - and then the rc script exits. With a Type=forking config, this is supposed to signal to systemd that the service has started.

Unfortunately however, systemd is impatient and doesn't cope very well with the double forking and usually ends up killing the entire process tree (witness signal 15 messages in pmlogger.log), and reporting the failure as a service start timeout. Since we have restart=always in the systemd config, yet another rc script is then re-launched, which often succeeds because the initial pmlogconf work was previously completed for a
new installation, and we end up with one or more pmlogger daemons active. To complicate things, some PCP QA tests that have been interrupted may have left the pmlogger systemd config with Restart=no, but this only affects systems running QA - i.e. mostly PCP developers.

The fix will involve converting to the modern Type=notify readyness protocol - so each pmlogger sends an sd_notify message to systemd that it has started and completed initialization. This should function regardless of how much forking goes on with the rc and log management scripts. We'll also need to split out each pmlogger in a farm configuration so they're individually managed by systemd as distinct service units (there is a templating facility for this). Similar changes will be needed for other PCP services.

References:
https://unix.stackexchange.com/questions/401590/systemd-timeout-because-it-doesnt-detect-daemon-forking
https://unix.stackexchange.com/questions/200280/systemd-kills-service-immediately-after-start/200365#200365
https://unix.stackexchange.com/questions/336031/systemd-service-restarts-every-90-seconds/336067#336067
.. and many others - this is not an uncommon problem with legacy service daemons being adapted to the systemd ecosystem.

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :

Thanks Mark, let us know if we can help with getting any more info or for testing.

Also, what's the best thing I can do for my system? Before I ran the script it would log problems once per day, now since the script it's spamming the log every few minutes. Should I simply uninstall PCP or can I reconfigure it somehow to the way it was before?

Revision history for this message
Martin Pitt (pitti) wrote :

We see this in our Cockpit test image creation as well.

Changed in pcp (Ubuntu Eoan):
status: New → Confirmed
Changed in pcp (Ubuntu Focal):
status: New → Confirmed
Changed in pcp (Fedora):
importance: Unknown → Undecided
status: Unknown → In Progress
Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

Hi Fernando, I can repro this issue on a virgin install (beginning with no PCP installed at all). This fails to start the pmlogger service with a timeout - as reported. If I then: systemctl restart pmlogger.service it seems to work OK and we end up with the pmlogger service running. Are you seeing that too?

Also, can you please try and start the pmie service after a virgin install? The pmie systemd config is very similar to the pmlogger service and the rc scripts are also basically the same (just different service name and paths etc). This works for both a virgin install and for a restart. Can you please check that too? systemctl start pmie.service

Thanks
-- Mark

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :
Download full text (6.0 KiB)

Doesn't look good. I used the commands from the earlier script to remove the current installation then reinstalled - I omitted to do the service commands. Then I did the tasks you requested:

[root@MAGGIE ~]# systemctl status pmlogger.service
● pmlogger.service - Performance Metrics Archive Logger Loaded: loaded (/usr/lib/systemd/system/pmlogger.service; enabled; vendor preset: enabled) Active: inactive (dead) Docs: man:pmlogger(1)

Nov 08 07:49:44 MAGGIE systemd[1]: pmlogger.service: Service RestartSec=100ms expired, scheduling restart.
Nov 08 07:49:44 MAGGIE systemd[1]: pmlogger.service: Scheduled restart job, restart counter is at 1109.
Nov 08 07:49:44 MAGGIE systemd[1]: Stopped Performance Metrics Archive Logger.
Nov 08 07:49:44 MAGGIE systemd[1]: Starting Performance Metrics Archive Logger...
Nov 08 07:49:44 MAGGIE pmlogger[82993]: Starting pmlogger ...
Nov 08 07:49:44 MAGGIE systemd[1]: pmlogger.service: Can't open PID file /run/pcp/pmlogger.pid (yet?) after start: No such file >
Nov 08 07:49:49 MAGGIE systemd[1]: pmlogger.service: New main PID 88678 does not belong to service, and PID file is not owned by>
Nov 08 07:49:49 MAGGIE systemd[1]: pmlogger.service: New main PID 88678 does not belong to service, and PID file is not owned by>
Nov 08 07:49:51 MAGGIE systemd[1]: pmlogger.service: Succeeded.
Nov 08 07:49:51 MAGGIE systemd[1]: Stopped Performance Metrics Archive Logger.
[root@MAGGIE ~]# systemctl restart pmlogger.service
A dependency job for pmlogger.service failed. See 'journalctl -xe' for details.
[root@MAGGIE ~]# journalctl -xe
--
-- The process' exit code is 'exited' and its exit status is 1.
Nov 08 07:59:30 MAGGIE systemd[1]: pmlogger_check.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The unit pmlogger_check.service has entered the 'failed' state with result 'exit-code'.
Nov 08 07:59:30 MAGGIE systemd[1]: Failed to start Check pmlogger instances are running.
-- Subject: A start job for unit pmlogger_check.service has failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A start job for unit pmlogger_check.service has finished with a failure.
--
-- The job identifier is 392873 and the job result is failed.
Nov 08 07:59:30 MAGGIE systemd[1]: Dependency failed for Performance Metrics Archive Logger.
-- Subject: A start job for unit pmlogger.service has failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A start job for unit pmlogger.service has finished with a failure.
--
-- The job identifier is 392778 and the job result is dependency.
Nov 08 07:59:30 MAGGIE systemd[1]: pmlogger.service: Job pmlogger.service/start failed with result 'dependency'.
Nov 08 07:59:30 MAGGIE audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='>
Nov 08 07:59:41 MAGGIE root[94420]: pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
Nov 08 07:59:41 MAGGIE root[94425]: pmlogger_check failed - see /var/log/pcp/pmlogger/pmlogger_check.log

[r...

Read more...

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

Committed an upstream patch that fixes the service timeout reproducer test posted by Marko in Comment#21, along with a bunch of QA updates.

commit 0f3a1fa7d0d7a898acf2ee4a6f23b626d4fb7362
Author: Mark Goodwin <email address hidden>
Date: Wed Nov 13 09:50:57 2019 +1100

    logutil: fix pmlogger service timeout on virgin install

    RHBZ#1721223 - pmlogger.service: Failed with result 'timeout'
    PCP GH#792

    Do not invoke pmlogger_daily -K to compress archives from pmlogger_check
    on a virgin install - it confuses systemd's Type=forking exit status
    checks, resulting in the pmlogger service timing out. There is nothing
    to compress anyway!

    This fixes the installation issues reported by Marko in BZ#1721223
    but does not update the service configs - they are still Type=forking
    and so there will still be some (benign) complaints from systemd as
    reported in the BZ.

    Related QA updates are following.

The above has been pushed to the current master branch (5.0.2 devel), but has not been built or pushed to any Fedora or Deb/Ubuntu repos.

@Fernando and others, after building and installing latest master branch with this patch if you're still seeing issues with the pmlogger service or pmlogger_daily and pmlogger_check scripts failing and/or the pmie service failing to start then we need to examine your system and installation. It Works For Me (and Marko tested it too).

Thanks

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :

@Mark, thanks, once I see it pushed to my machine when I check for updates I'll check what happens on my system and let you know.

Revision history for this message
In , myllynen (myllynen-redhat-bugs) wrote :

I would also support a new PCP upstream release to address this or at least a new updated build for the affected distributions with selected fixes included (instead of waiting over a month for the next scheduled PCP release). Thanks.

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

(In reply to Marko Myllynen from comment #32)
> I would also support a new PCP upstream release to address this or at least
> a new updated build for the affected distributions with selected fixes
> included (instead of waiting over a month for the next scheduled PCP
> release). Thanks.

agree we should ship an early update to fix the install issue - a month is a long time. However, I think it would first be a good idea to ask Fernando to test a build from current master branch to if the issues he's seeing still persist.

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :

@Mark happy to test, just tell me how. I'm not a Linux admin, this server is just a hobby thing I use to run a few things, so no idea how to "test a build from current master branch"

Revision history for this message
In , bcotton (bcotton-redhat-bugs) wrote :

This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle.
Changing version to 32.

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

Closed as per the fix from Comment #30 and verified with Marko's recipe from Comment #21.
The additional work for changing the pmlogger service to use type=notify is a separate issue being tracked in upstream issue https://github.com/performancecopilot/pcp/issues/798

Changed in pcp (Fedora):
status: In Progress → Fix Released
Revision history for this message
In , kashcheyeu (kashcheyeu-redhat-bugs) wrote :

In Fedora 31 for ARMv7 this error still exists.

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :

I'm now on 5.1.0-1.fc31 and this problem still exists and just fills up the log with these messages constantly.

Is this ever going to be fixed?

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

(In reply to Fernando Viñan-Cano from comment #40)
> I'm now on 5.1.0-1.fc31 and this problem still exists and just fills up the
> log with these messages constantly.
>
> Is this ever going to be fixed?

Hi Fernando, it should be fixed now. Perhaps after your upgrade you need to run systemctl daemon-reload and then systemctl restart pmlogger ? If that doesn't work, please send your logs and a tarball of /etc/pcp.

Thanks

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :

Hi Mark,

I've rebooted at least 3 times since I did the update and it's still the same, but I tried those commands and got this:

[root@MAGGIE ~]# systemctl restart pmlogger
Job for pmlogger.service failed because the service did not take the steps required by its unit configuration.
See "systemctl status pmlogger.service" and "journalctl -xe" for details.

The details are:

May 04 13:14:26 MAGGIE.mydomain.com systemd[1]: Failed to start Performance Metrics Archive Logger.
-- Subject: A start job for unit pmlogger.service has failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A start job for unit pmlogger.service has finished with a failure.
--
-- The job identifier is 13372 and the job result is failed.
May 04 13:14:26 MAGGIE.mydomain.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:>
May 04 13:14:26 MAGGIE.mydomain.com systemd[1]: Starting Check pmlogger instances are running...
-- Subject: A start job for unit pmlogger_check.service has begun execution
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A start job for unit pmlogger_check.service has begun execution.
--
-- The job identifier is 13479.
May 04 13:14:27 MAGGIE.mydomain.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:>
May 04 13:14:27 MAGGIE.mydomain.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s>
May 04 13:14:27 MAGGIE.mydomain.com systemd[1]: pmlogger.service: Scheduled restart job, restart counter is at 4.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Automatic restarting of the unit pmlogger.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
May 04 13:14:27 MAGGIE.mydomain.com systemd[1]: Stopped Performance Metrics Archive Logger.
-- Subject: A stop job for unit pmlogger.service has finished
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A stop job for unit pmlogger.service has finished.
--
-- The job identifier is 13582 and the job result is done.
May 04 13:14:27 MAGGIE.mydomain.com systemd[1]: Starting Performance Metrics Archive Logger...
-- Subject: A start job for unit pmlogger.service has begun execution
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- A start job for unit pmlogger.service has begun execution.
--
-- The job identifier is 13582.

I have gathered all the files into archive and will email you them separately - hope that's ok.

Cheers,
Fernando

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

Fernando,

the logs you sent me indicate pmlogger is starting but exiting very soon after (and then being repeatedly restarted by systemd). I don't see any evidence of a systemd unit configuration problem. The PCP archives are being created, but have a corrupted label record and I can't read any of them. e.g. here's what I get with some debugging enabled trying to examine one of your recent faulty archives :

$ pminfo -Dlog,logmeta -a 20200504.11.16-00
__pmLogOpen: inspect file "./20200504.11.16-00.0"
__pmLogOpen: inspect file "./20200504.11.16-00.index"
__pmFopen("./20200504.11.16-00.index", "r"): no decompress
__pmLogOpen: inspect file "./20200504.11.16-00.meta"
__pmFopen("./20200504.11.16-00.meta", "r"): no decompress
__pmFopen("./20200504.11.16-00.0", "r"): no decompress
__pmLogChkLabel: fd=3 vol=0 trailer read -> 4 (expect 4) or bad trailer len=8650752 (expected 132)
pminfo: Cannot open archive "20200504.11.16-00": Illegal label record at start of a PCP archive log file

This may indicate a bug in pmlogger / libpcp. Can you please try running pmlogger manually, e.g.
$ pmlogger -c config.default -h local: -l- -s 5 -t 1 /tmp/testlog

Please send me the output of the above command (-l- will print messages to stdout) and also /var/lib/pcp/config/pmlogger/config.default and also /tmp/testlog.*

If the above command fails, please: rm /var/lib/pcp/config/pmlogger/config.default and then systemctl restart pmlogger (this will regenerate a new config.default since yours may be corrupted somehow). After restarting the pmlogger service, please also send me the output of: systemctl status pmlogger; journalctl -xe -u pmlogger.service

Also, back in Comment #20, you said you have systemd.i686 241-12.git323cdf4.fc30 installed. Are you running a 32bit machine?

Regards
--

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :

Hi Mark,

Ran the command

Last login: Mon May 4 13:10:44 2020 from 192.168.1.19
[root@MAGGIE ~]# pmlogger -c config.default -h local: -l- -s 5 -t 1 /tmp/testlog
Log for pmlogger on MAGGIE.star-one.co.uk started Tue May 5 07:32:37 2020

preprocessor cmd: /usr/libexec/pcp/bin/pmcpp -rs /var/lib/pcp/config/pmlogger/config.default -I /var/lib/pcp/config/pmlogger
Config parsed
Starting logger for host "MAGGIE.mydomain.com" via "local:"
Archive basename: /tmp/testlog
pmlogger: Sample limit reached, exiting

Log finished Tue May 5 07:32:40 2020

Will send you the rest via email in due course.

As for #20, no 64bit even at the time of the comment, but it looks right now.

[root@MAGGIE ~]# dnf list systemd*
Last metadata expiration check: 2:09:43 ago on Tue 05 May 2020 05:26:48 CEST.
Installed Packages
systemd.x86_64 243.8-1.fc31 @updates
systemd-bootchart.x86_64 233-5.fc31 @fedora
systemd-libs.x86_64 243.8-1.fc31 @updates
systemd-pam.x86_64 243.8-1.fc31 @updates
systemd-rpm-macros.noarch 243.8-1.fc31 @updates
systemd-udev.x86_64 243.8-1.fc31 @updates 243.8-1.fc31

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :
Download full text (3.8 KiB)

I am still getting these problems even though I am now upgraded to Fedora 32, and I used the uninstall/cleanup script from once before to try and sort out the PCP package. What is going on?

June 29, 2020
8:19 AM
Failed to start pmie_check.service.
systemd
8:19 AM
pmie_check.service: Failed at step EXEC spawning /usr/libexec/pcp/bin/pmie_check: No such file or directory
systemd
8:19 AM
pmie_check.service: Failed to execute command: No such file or directory
systemd
8:19 AM
Failed to start pmlogger_check.service.
systemd
8:19 AM
pmlogger_check.service: Failed at step EXEC spawning /usr/libexec/pcp/bin/pmlogger_check: No such file or directory
systemd
8:19 AM
pmlogger_check.service: Failed to execute command: No such file or directory
systemd
8:19 AM
pmlogger_daily-poll.timer: Unit to trigger vanished.
systemd
8:19 AM
pmie_daily.timer: Unit to trigger vanished.
systemd
8:19 AM
pmlogger_daily.timer: Unit to trigger vanished.
systemd
8:19 AM
pmie_check.timer: Unit to trigger vanished.
systemd
8:19 AM
pmlogger_check.timer: Unit to trigger vanished.
systemd
8:15 AM
Failed to start Performance Metrics Archive Logger.
systemd
2
8:15 AM
pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
root
8:15 AM
Failed to start Performance Metrics Archive Logger.
systemd
8:15 AM
pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
root
8:14 AM
Failed to start Performance Metrics Archive Logger.
systemd
8:14 AM
pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
root
8:14 AM
Failed to start Performance Metrics Archive Logger.
systemd
8:14 AM
pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
root
8:14 AM
Failed to start Performance Metrics Archive Logger.
systemd
8:14 AM
pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
root
8:06 AM
Failed to start Check pmlogger instances are running.
systemd
8:06 AM
Failed to start Performance Metrics Archive Logger.
systemd
8:06 AM
Failed to start Check pmlogger instances are running.
systemd
8:06 AM
Failed to start Performance Metrics Archive Logger.
systemd
8:06 AM
pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
root
8:06 AM
Failed to start Performance Metrics Archive Logger.
systemd
8:06 AM
pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
root
8:06 AM
Failed to start Performance Metrics Archive Logger.
systemd
8:06 AM
pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
root
8:06 AM
Failed to start Performance Metrics Archive Logger.
systemd
8:06 AM
pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
root
8:06 AM
Failed to start Performance Metrics Archive Logger.
systemd
8:06 AM
pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
root
8:06 AM
Failed to start Performance Metrics Archive Logger.
systemd
8:06 AM
pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
root
8:05 AM
pam_systemd(crond:session): Failed to get user record: Connection timed out
crond

[root@MAGGIE ~]# cat /var/log/pcp/pmlogger/pmlogger_daily-K.log
cat: /var/log/pcp/pmlogger/pmlogger_daily-K.log: No such file or directory

Installed Packages...

Read more...

Revision history for this message
In , nathans (nathans-redhat-bugs) wrote :

You installation of PCP is broken - this:

pmie_check.service: Failed at step EXEC spawning /usr/libexec/pcp/bin/pmie_check: No such file or directory

says you are missing critical files that are part of PCP:

$ rpm -ql pcp | grep pmlogger_check
/usr/lib/systemd/system/pmlogger_check.path
/usr/lib/systemd/system/pmlogger_check.service
/usr/lib/systemd/system/pmlogger_check.timer
/usr/libexec/pcp/bin/pmlogger_check

... which is unrelated to the issue in this BZ.

Revision history for this message
In , ferd352+000143 (ferd352+000143-redhat-bugs) wrote :

I get this when I run that command:

[root@MAGGIE ~]# rpm -ql pcp | grep pmlogger_check
/usr/lib/systemd/system/pmlogger_check.path
/usr/lib/systemd/system/pmlogger_check.service
/usr/lib/systemd/system/pmlogger_check.timer
/usr/libexec/pcp/bin/pmlogger_check

Are you telling me now that this doesn't completely re-install PCP?

systemctl stop pmcd pmie pmlogger pmproxy redis > /dev/null 2>&1
dnf remove --noautoremove -y pcp\*
rm -rf /etc/pcp* /var/lib/pcp/ /var/log/pcp/ /var/lib/dnf/yumdb/p /var/cache/dnf/pcp*
systemctl daemon-reload
dnf install -y pcp cockpit-pcp
systemctl daemon-reload
systemctl stop pmcd pmie pmlogger pmproxy
systemctl start pmcd pmie pmlogger pmproxy

If that is wrong, how I completely reinstall it?

Revision history for this message
In , kashcheyeu (kashcheyeu-redhat-bugs) wrote :

Fedora 32 armv7l, Fedora 32 x86_64:

Unit pmlogger_check.path not found.

$ rpm -ql pcp | grep pmlogger_check
/usr/lib/systemd/system/pmlogger_check.service
/usr/lib/systemd/system/pmlogger_check.timer
/usr/libexec/pcp/bin/pmlogger_check

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

Hi all,

The current release of PCP for f32 is pcp-5.2.0-1.fc32.x86_64.rpm https://dl.fedoraproject.org/pub/fedora/linux/updates/32/Everything/x86_64/Packages/p/pcp-5.2.0-1.fc32.x86_64.rpm contains all of the PCP service units and none of them reference pmlogger_check.path.

So exactly what version of PCP are you running? Have you done a 'systemctl daemon-reload' after upgrading?

What is still referencing pmlogger_check.path?

find /usr/lib/systemd/system /etc/systemd/system -type f | xargs grep pmlogger_check.path

thanks

Revision history for this message
In , kashcheyeu (kashcheyeu-redhat-bugs) wrote :

pcp version 5.2.0

$ find /usr/lib/systemd/system /etc/systemd/system -type f | xargs grep pmlogger_check.path
grep: /usr/lib/systemd/system/system-systemdx2dcryptsetup.slice: Нет такого файла или каталога

Revision history for this message
In , mgoodwin (mgoodwin-redhat-bugs) wrote :

(In reply to Кощеев from comment #50)
> pcp version 5.2.0
>
> $ find /usr/lib/systemd/system /etc/systemd/system -type f | xargs grep
> pmlogger_check.path
> grep: /usr/lib/systemd/system/system-systemdx2dcryptsetup.slice: Нет такого
> файла или каталога

well that didnt find anything, but we only looked for files, not syslinks.
How about: systemctl list-dependencies | grep pmlogger_check

Maybe we have an upgrade issue where deprecated units or dependencies aren't being cleaned up properly.
To remove stale dependencies: systemctl disable pmlogger_check.path

thanks

Revision history for this message
In , kashcheyeu (kashcheyeu-redhat-bugs) wrote :

> systemctl list-dependencies | grep pmlogger_check
Empty result.
> systemctl disable pmlogger_check.path
Removed /etc/systemd/system/pmlogger.service.requires/pmlogger_check.path.

Revision history for this message
In , kashcheyeu (kashcheyeu-redhat-bugs) wrote :

From the logwatch report:

 Packages Updated:
    cockpit-pcp-226-1.fc32.armv7hl -> cockpit-pcp-227-1.fc32.armv7hl

 **Unmatched Entries**
 Sep 9 04:11:30 ERROR Error in POSTIN scriptlet in rpm package cockpit-pcp

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.