Collector heat can get stuck in request with no timeout

Bug #1927122 reported by Gabriel Hartmann
This bug affects 2 people
Affects Status Importance Assigned to Milestone

Bug Description


under certain circumstances the collector "heat" will get stuck while collecting config.
This happens for example when the connection to the heat-api has already been established and the connection is unexpectedly interrupted.
The heatclient will then wait without any timeout for the request to complete.

I noticed that there would still be a socket open by os-collect-config.
By manually closing the socket I was able to get os-collect-config to continue:

[root@toolbox workdir]# ps aux|grep collect
root 2071 0.0 1.6 51048 33888 ? S Apr20 0:16 /usr/bin/python3 /usr/local/bin/os-collect-config --debug
root 1199103 0.0 0.1 10448 2308 pts/0 S+ 12:55 0:00 grep --color=auto collect
[root@toolbox workdir]# strace -p 2071
strace: Process 2071 attached
read(3, ^Cstrace: Process 2071 detached
 <detached ...>
[root@toolbox workdir]# netstat -np |grep 2071
tcp 0 0 10.XXX.XXX.XXX:44124 XXX.XXX.XXX.XXX:8004 ESTABLISHED 2071/python3
[root@toolbox workdir]# ss --kill state established src :44124
Netid Recv-Q Send-Q Local Address:Port Peer Address:Port Process
tcp 0 0 10.XXX.XXX.XXX:44124 XXX.XXX.XXX.XXX:8004
[root@toolbox workdir]# netstat -np |grep 2071
[root@toolbox workdir]# strace -p 2071
strace: Process 2071 attached
select(0, NULL, NULL, NULL, {tv_sec=20, tv_usec=598993}) = 0 (Timeout)
stat("/var/lib/os-collect-config/ec2.json", {st_mode=S_IFREG|0600, st_size=651, ...}) = 0
openat(AT_FDCWD, "/var/lib/os-collect-config/ec2.json", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0600, st_size=651, ...}) = 0

As a temporary fix I'm passing a hardcoded timeout (timeout=60) to the heatclient in the code of (

The changed line looks like this:
                '1', endpoint, token=ks.auth_token, timeout=60)

This prevents the collector to get stuck during such network interruptions.

It would be great to have a proper fix for this issue however.
My suggestion would be a (optional) config option for the timeout which can be set within the heat section of /etc/os-collect-config.conf and which would be passed to the heatclient.

Best regards

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers