HTTPConnectionPool(host='tempest-sendmail.tripleo.org', port=8080): Max retries exceeded with url

Bug #1806699 reported by Sorin Sbarnea
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Incomplete
Medium
Unassigned

Bug Description

The task below fails because port 8080 on tempest-sendmail.tripleo.org does not accept connections.

TASK [validate-tempest : Send tempest results by mail]
2018-12-04 10:22:28,503 DEBUG requests.packages.urllib3.connectionpool: Starting new HTTP connection (1): tempest-sendmail.tripleo.org
2018-12-04 10:22:28.749498 | primary | Traceback (most recent call last):
2018-12-04 10:22:28.749748 | primary | File "./tempestmail.py", line 407, in <module>
2018-12-04 10:22:28.749849 | primary | sys.exit(main())
2018-12-04 10:22:28.750007 | primary | File "./tempestmail.py", line 403, in main
2018-12-04 10:22:28.750100 | primary | tmc.checkJobs()
2018-12-04 10:22:28.750269 | primary | File "./tempestmail.py", line 354, in checkJobs
2018-12-04 10:22:28.750473 | primary | send_mail.send_mail(self.args.job, last, self.args.output)
2018-12-04 10:22:28.750676 | primary | File "./tempestmail.py", line 187, in send_mail
2018-12-04 10:22:28.750862 | primary | self._send_mail_api(addresses, message, subject)
2018-12-04 10:22:28.751045 | primary | File "./tempestmail.py", line 177, in _send_mail_api
2018-12-04 10:22:28.751224 | primary | requests.post(self.config.api_server, data=data)
2018-12-04 10:22:28.751463 | primary | File "/usr/lib/python2.7/site-packages/requests/api.py", line 112, in post
2018-12-04 10:22:28.751700 | primary | return request('post', url, data=data, json=json, **kwargs)
2018-12-04 10:22:28.751951 | primary | File "/usr/lib/python2.7/site-packages/requests/api.py", line 58, in request
2018-12-04 10:22:28.752150 | primary | return session.request(method=method, url=url, **kwargs)
2018-12-04 10:22:28.752498 | primary | File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 518, in request
2018-12-04 10:22:28.752760 | primary | resp = self.send(prep, **send_kwargs)
2018-12-04 10:22:28.753128 | primary | File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 639, in send
2018-12-04 10:22:28.753305 | primary | r = adapter.send(request, **kwargs)
2018-12-04 10:22:28.753629 | primary | File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 502, in send
2018-12-04 10:22:28.753810 | primary | raise ConnectionError(e, request=request)
2018-12-04 10:22:28.754821 | primary | requests.exceptions.ConnectionError: HTTPConnectionPool(host='tempest-sendmail.tripleo.org', port=8080): Max retries exceeded with url: /api/v1.0/sendmail (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused',))

http://logs.openstack.org/24/567224/141/check/tripleo-ci-centos-7-containers-multinode/8888af1/job-output.txt.gz#_2018-12-04_10_22_26_967416

Tags: ci
Revision history for this message
Arx Cruz (arxcruz) wrote :

There are some rules in the tempest-sendmail api, to avoid spam (since this is an email service). So the same ip was used in several vm's in a small amount of time, if we start to see this with frequency (now that standalone job run faster) I can change the rule on the api.

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

Arx, I am not sure how this is implemented but when I tried to make a HTTP request from my own machine I found that 8080 was not open (probably firewalled?).

I can understand the need to spam protect it but a HTTP server should answer from any location and reponsd with meaningful HTTP error codes.

HTTP 429 seems an adequate one. Also a funny photo about it at https://softwareengineering.stackexchange.com/questions/128512/suggested-http-rest-status-code-for-request-limit-reached

I think that the server should accept any requests regardless if it decides to drop them or send them.

Lets tune it for the moment and see what happens.

I created https://review.openstack.org/622352 which should allow to track future recurrence of this error.

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

The primary issue with current behaviour is that we cannot distinguish between some temporary rate-limting and a total downtime of the sendmail service.

wes hayutin (weshayutin)
tags: added: ci
removed: alert
wes hayutin (weshayutin)
Changed in tripleo:
importance: Undecided → Medium
milestone: none → train-1
Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

Is this still an issue?

Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Incomplete
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.