Make gsa_sync tolerant to Business Center not responding to requests
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
KARL3 |
Fix Released
|
High
|
Paul Everitt |
Bug Description
Nat wrote:
When GSA/Business Center has an outage KARL GSA sync frequently goes down and must be manually restarted by the KARL dev team. We need to make KARL more resilient to GSA outages.
Analysis
=================
Our error monitor stays in a constant state of alert these days because of the following traceback:
Traceback (most recent call last):
File "/srv/osfkarl/
return func(args)
File "/srv/osfkarl/
gsa_sync = GsaSync(site, args.url, args.user, args.password)
File "/srv/osfkarl/
resource = urllib2.
File "/usr/lib/
return _opener.open(url, data, timeout)
File "/usr/lib/
response = self._open(req, data)
File "/usr/lib/
'_open', req)
File "/usr/lib/
result = func(*args)
File "/usr/lib/
return self.do_
File "/usr/lib/
raise URLError(err)
URLError: <urlopen error [Errno 104] Connection reset by peer>
However, every time this happens, I immediately go to my browser, open the URL that gets me to the XML, and it works fine. There are several points here:
- We should catch this exception and log it as an INFO, to avoid Nagios getting mad and emailing us constantly
- I'm suspicious about the actual problem. Is gsa_sync unable to connect from gocept but I am able to connect from Virginia? Is there some other, lower level problem (perhaps a certificate issue, DNS issue)?
- Sometimes the cron job gets wedged for days and I have to go in and kill the process. Can we set a Python 2.6 timeout on the socket?
Note:
I previously had worried that, based on seeing two progresses in "ps auwwx | grep gsa", we were running gsa_sync both from cron and supervisord. I confirmed that I am wrong. There is a shell script that is run from cron with gsa in the shell file name, which calls a Python module via karlserve that has gsa in the module name.
Changed in karl3: | |
assignee: | nobody → Paul Everitt (paul-agendaless) |
Changed in karl3: | |
importance: | Undecided → Low |
milestone: | none → m127 |
summary: |
- KARL-GSA Sync Failure + Make gsa_sync tolerant to Business Center not responding to requests |
Changed in karl3: | |
status: | Fix Committed → Fix Released |
Assigning to Tres to try and make gsa_sync more resilient.