Collector heat can get stuck in request with no timeout
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
os-collect-config |
New
|
Undecided
|
Unassigned |
Bug Description
Hello,
under certain circumstances the collector "heat" will get stuck while collecting config.
This happens for example when the connection to the heat-api has already been established and the connection is unexpectedly interrupted.
The heatclient will then wait without any timeout for the request to complete.
I noticed that there would still be a socket open by os-collect-config.
By manually closing the socket I was able to get os-collect-config to continue:
[root@toolbox workdir]# ps aux|grep collect
root 2071 0.0 1.6 51048 33888 ? S Apr20 0:16 /usr/bin/python3 /usr/local/
root 1199103 0.0 0.1 10448 2308 pts/0 S+ 12:55 0:00 grep --color=auto collect
[root@toolbox workdir]# strace -p 2071
strace: Process 2071 attached
read(3, ^Cstrace: Process 2071 detached
<detached ...>
[root@toolbox workdir]# netstat -np |grep 2071
tcp 0 0 10.XXX.
[root@toolbox workdir]# ss --kill state established src :44124
Netid Recv-Q Send-Q Local Address:Port Peer Address:Port Process
tcp 0 0 10.XXX.
[root@toolbox workdir]# netstat -np |grep 2071
[root@toolbox workdir]# strace -p 2071
strace: Process 2071 attached
select(0, NULL, NULL, NULL, {tv_sec=20, tv_usec=598993}) = 0 (Timeout)
stat("/
openat(AT_FDCWD, "/var/lib/
fstat(3, {st_mode=
As a temporary fix I'm passing a hardcoded timeout (timeout=60) to the heatclient in the code of heat.py (https:/
The changed line looks like this:
This prevents the collector to get stuck during such network interruptions.
It would be great to have a proper fix for this issue however.
My suggestion would be a (optional) config option for the timeout which can be set within the heat section of /etc/os-
Best regards
Gabriel