fm event list returns HTTP 504 error on system controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
In Progress
|
Undecided
|
Agustin Carranza |
Bug Description
Load Info / Patch Line-Up
STX 6.0 + P5
System Config
Distributed Cloud
Description of failure
Fm event list gives HTTP 504 error while fm alarm list returns HTTP 200 statuscode.
Timestamp when failure occurred
2023-01-
Issue intermittent (Frequency of occurrence) or 100% Reproducible?
Issue seen on user site
Impact of Failure
Standard
Time-line based on log analysis
---> Fm alarm list returns HTTP 200 statuscode but fm event list returns HTTP 504 error.
##haproxy##
2023-01-
2023-01-
2023-01-
---> But openstack log for the event api list returns HTTP 200 error but the time taken is very high - around 178.2s
2023-01-19 20:49:02.740 105974 INFO eventlet.
File "/usr/lib/
write(
File "/usr/lib/
wfile.
File "/usr/lib64/
self.flush()
File "/usr/lib64/
self.
File "/usr/lib/
tail = self.send(data, flags)
File "/usr/lib/
return self._send_
File "/usr/lib/
return send_method(data, *args)
error: [Errno 104] Connection reset by peer
2023-01-19 20:49:02.741 105974 INFO eventlet.
Key failure logs
##haproxy##, ##openstack.log##
Summary of triage
---> The issue was seen after the 'network resiliency' testing as mentioned by user but seems unrelated.
--->The system is alarm free and returns HTTP 200 statuscode when getting alarms.
--->But while fetching fm event-list, the system returns HTTP 504 error.
[sysadmin@
[sysadmin@
HTTP Server Error (HTTP 504)
[sysadmin@
--->The 'fm-api' process is present on both the controllers therefore controller-1's network resiliency test should not impact the fm process in controller-0.
##Controller-0##
backend fm-api-
server s-fm-api-internal ********
##Controller-1##
backend fm-api-
server s-fm-api-internal ***********
Workaround
1. Restart fm-api
sudo systemctl restart fm-api
2. restart fm manager
sudo sm-restart service fm-mgr
3.Restart haproxy
sudo systemctl restart haproxy
Ask
Since both the alarm list and event list use ‘fm-api’, can be provided the cause for the ‘fm event-list’ to return 504 error while the alarm list works fine.
HTTP 504 is a gateway timeout code which indicates that the server took longer to respond and the request eventually timed out. But if this was the case, shouldn't both the requests return similar error?
Changed in starlingx: | |
assignee: | nobody → Agustin Carranza (acarranz) |
Fix proposed to branch: master /review. opendev. org/c/starlingx /stx-puppet/ +/898005
Review: https:/