lshw 100%CPU usage every minute

Bug #1613985 reported by Stanislav Kolenkin
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Confirmed
High
Stanislav Kolenkin
8.0.x
Confirmed
High
Stanislav Kolenkin
9.x
Confirmed
High
Stanislav Kolenkin

Bug Description

I have found that lshw process is started every minute and uses 100% CPU on any cloud-related node.

date && ps axuwww |grep lshw
Wed Aug 17 09:54:07 AST 2016
root 25790 111 0.0 61924 41372 ? R 09:54 0:01 /usr/bin/lshw -json
root 25794 0.0 0.0 10432 928 pts/8 S+ 09:54 0:00 grep --color=auto lshw

date && ps axuwww |grep lshw
Wed Aug 17 09:54:26 AST 2016
root 25819 0.0 0.0 10432 924 pts/8 S+ 09:54 0:00 grep --color=auto lshw

date && ps axuwww |grep lshw
Wed Aug 17 09:55:27 AST 2016
root 26283 32.0 0.0 23084 8792 ? R 09:55 0:00 /usr/bin/lshw -json
root 26287 0.0 0.0 10432 928 pts/8 S+ 09:55 0:00 grep --color=auto lshw

lshw --version
Hardware Lister (lshw) - B.02.16

Screenshot in the attachment.

Revision history for this message
Stanislav Kolenkin (skolenkin) wrote :
description: updated
Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Stanislav, please add more details on the topic. What is the impact? How does it affect workloads? Are there any failures?

tags: added: area-linux
Revision history for this message
Stanislav Kolenkin (skolenkin) wrote :

The described issue can affect the response time of services, caused CPU usage spikes.

Revision history for this message
Oleksandr Savatieiev (osavatieiev) wrote :

The lshw process works for 10-15 seconds on one (1) CPU and can affect response time when there will be high load on some service (services) that is working on the same host. For example, networking agent or contrail's cassandra DB that will definitely be affected.

This is not critical, I agree. But still, on high loaded clouds - this will become serious.

description: updated
tags: added: customer-found
tags: added: support
Revision history for this message
Maksym Shalamov (mshalamov) wrote :

Hello,

The described issue affected only 1 CPU core, so can affect the response time of services only on the high loaded clusters. Probably exist some simple workaround to resolve this issue. Could somebody help resolve this issue?

Revision history for this message
Maksym Shalamov (mshalamov) wrote :

Please pay attention that issue contains a customer-found tag.

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Why the issue in incomplete status?

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Hi MOS Linux team, could you please take a look the issue?

Thank you!

Revision history for this message
Ivan Suzdal (isuzdal) wrote :

Dear all
High CPU usage is absolutely normal behavior for lhsw.
AFAIU, lshw called from nailgun-agent. So, here is two options:
1) Gather necessary information using pure ruby code instead of calling lshw.
2) Set nailgun-agent priority with nice/ionice commands in cron task.
Yet another option - execute nailgun-agent on dedicated CPU (taskset can help).

Revision history for this message
Dmitriy Stremkovskiy (dstremkouski) wrote :

I don't think it is normal for production to spend 100% of one CPU for nothing. Why do we need to scan hardware changes all the time. This information is needed for monitoring use cases only.
You may consult deploy engineers to run nailgun-agent before each deploy changes to get proper data for nodes, but no need to run this scan each minute.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.