Pip fails to find distribution for package

Bug #1449136 reported by Sean Dague
22
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack-Gate
New
Undecided
Unassigned
tripleo
Incomplete
Critical
Unassigned

Bug Description

Tracked with the logstash query:

message:"ReadTimeoutError: HTTPConnectionPool" AND tags:console

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiUmVhZFRpbWVvdXRFcnJvcjogSFRUUENvbm5lY3Rpb25Qb29sXCIgQU5EIHRhZ3M6Y29uc29sZSIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiMTcyODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTQzMDE1MjA2Njc3M30=

Example failure - http://logs.openstack.org/59/177759/1/check/check-tempest-dsvm-full/024211f//logs/devstacklog.txt.gz#_2015-04-27_15_34_17_864

2015-04-27 15:33:54.404 | Downloading http://pypi.DFW.openstack.org/packages/source/B/Babel/Babel-1.3.tar.gz (3.4MB)
2015-04-27 15:34:17.863 | Exception:
2015-04-27 15:34:17.863 | Traceback (most recent call last):
2015-04-27 15:34:17.863 | File "/usr/local/lib/python2.7/dist-packages/pip/basecommand.py", line 246, in main
2015-04-27 15:34:17.863 | status = self.run(options, args)
2015-04-27 15:34:17.863 | File "/usr/local/lib/python2.7/dist-packages/pip/commands/install.py", line 342, in run
2015-04-27 15:34:17.863 | requirement_set.prepare_files(finder)
2015-04-27 15:34:17.863 | File "/usr/local/lib/python2.7/dist-packages/pip/req/req_set.py", line 345, in prepare_files
2015-04-27 15:34:17.863 | functools.partial(self._prepare_file, finder))
2015-04-27 15:34:17.863 | File "/usr/local/lib/python2.7/dist-packages/pip/req/req_set.py", line 290, in _walk_req_to_install
2015-04-27 15:34:17.863 | more_reqs = handler(req_to_install)
2015-04-27 15:34:17.863 | File "/usr/local/lib/python2.7/dist-packages/pip/req/req_set.py", line 487, in _prepare_file
2015-04-27 15:34:17.863 | download_dir, do_download, session=self.session,
2015-04-27 15:34:17.863 | File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 827, in unpack_url
2015-04-27 15:34:17.863 | session,
2015-04-27 15:34:17.863 | File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 673, in unpack_http_url
2015-04-27 15:34:17.863 | from_path, content_type = _download_http_url(link, session, temp_dir)
2015-04-27 15:34:17.863 | File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 888, in _download_http_url
2015-04-27 15:34:17.863 | _download_url(resp, link, content_file)
2015-04-27 15:34:17.863 | File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 621, in _download_url
2015-04-27 15:34:17.863 | for chunk in progress_indicator(resp_read(4096), 4096):
2015-04-27 15:34:17.863 | File "/usr/local/lib/python2.7/dist-packages/pip/utils/ui.py", line 133, in iter
2015-04-27 15:34:17.864 | for x in it:
2015-04-27 15:34:17.864 | File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 586, in resp_read
2015-04-27 15:34:17.864 | decode_content=False):
2015-04-27 15:34:17.864 | File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/packages/urllib3/response.py", line 273, in stream
2015-04-27 15:34:17.864 | data = self.read(amt=amt, decode_content=decode_content)
2015-04-27 15:34:17.864 | File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/packages/urllib3/response.py", line 218, in read
2015-04-27 15:34:17.864 | raise ReadTimeoutError(self._pool, None, 'Read timed out.')
2015-04-27 15:34:17.864 | ReadTimeoutError: HTTPConnectionPool(host='pypi.DFW.openstack.org', port=80): Read timed out.
2015-04-27 15:34:17.864 |

Revision history for this message
Lenny (lennyb) wrote :
Revision history for this message
Clark Boylan (cboylan) wrote :

Been digging into this and these readtimeouts are now retried by pip which means they aren't often the source of a job failure. Instead you hit this once or twice in a job but the GETs get retried and everything moves on happily.

I've been trying to find a better query for logstash that will show us when this actually is a failure and I think you get something like this when the package (tox in this case) can't be installed at all:

  Could not find a version that satisfies the requirement tox (from versions: )
No matching distribution found for tox

So I will be working to update the query for this bug to track ^.

Revision history for this message
Clark Boylan (cboylan) wrote : Re: Pip running into ReadTimeoutErrors and retrying.

Pip will attempt to download a package up to 5 times. That means these read timeouts are typically not fatal to our jobs and not even to the pip command. Additionally note that ReadTimeoutError is just generic enough that only querying on that string will result in many error from many jobs that have nothing to do with pip or pypi at all.

So we update the query to restrict to pypi to only see when pip has these problems.

summary: - OpenStack pypi mirrors disconnecting connections
+ Pip running into ReadTimeoutErrors and retrying.
Revision history for this message
Clark Boylan (cboylan) wrote :

A related bug to this is https://bugs.launchpad.net/openstack-gate/+bug/1708686 that is where we will track retry attempts. We track them separately because they are often not fatal. We will instead use this bug to track the "No matching distribution found" errors which should be fatal in most cases (will definitely be fatal to pip and most likely to jobs).

summary: - Pip running into ReadTimeoutErrors and retrying.
+ Pip fails to find distribution for package
Revision history for this message
Matt Riedemann (mriedem) wrote :

(4:18:35 PM) clarkb: mriedem: i think we did solve that one
(4:18:48 PM) clarkb: mriedem: the first blip was limestone mirror crashing the second blip was ovh gra1 mirror crashing
(4:19:03 PM) clarkb: mriedem: limestone fixed their issue and a reboot in gra1 seemed to have fixed that one

Revision history for this message
Clark Boylan (cboylan) wrote :

This continues to be an intermittent but persistent issue. Upon investigating latest pip (18.1) sets cache-control: max-age=0 on index requests. This means our caching proxies are not caching the indexes that pip is interested in. If you hit the indexes with your browser the max-age comes back as 600 seconds so we would be caching things for 10 minutes if not for pip explicitly disabling this.

PyPA suggests that pip isn't the tool to download and install python packages from pypi in a CI system. I don't think there is an alternative suggested.

Another suggestion from PyPA is to go back to a bandersnatch mirror. We stopped doing this as PyPI has been growing at an unsustainable rate. Others appear to have noticed this as well and PyPI is packaging rough stats on package sizes now at https://pypi.org/stats/. Total size as of today is 2.6TB and excluding the top 100 packages by size would get us down to about 1TB total.

We can't exclude all of the top packages by size as we depend on some of them like grpcio and numpy, but we may be able to run a reasonably sized pypi mirror now if we decide to spin that up with a large blacklist of packages that we know we don't want (like tf-nightly-gpu).

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

We have being seeing this for over 4h hours but without any update from infra regarding the outage.
See http://status.openstack.org/elastic-recheck/

I believe that current outage is likely to be caused today by our pypi proxies (not real mirrors).

What makes this even more problematic is the fact that gerrit users only see POST_FAILURE message and they do not get access to any build logs, as no logs are updated.

Example https://zuul.opendev.org/t/openstack/builds?result=POST_FAILURE --- if you click on any of these builds, you do not see any usable information, and these are the same links

tags: added: alert
tags: added: ci
wes hayutin (weshayutin)
Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
milestone: none → victoria-3
tags: added: promotion-blocker
Revision history for this message
Clark Boylan (cboylan) wrote :

There seems to be a fair bit of misunderstanding going on.

The POST_FAILURES are unrelated to pypi and pip. They were caused by the addition of a new swift log storage backend that had a container quota set too low. We've bumped the quota and the problem seems corrected.

Also, on a thread to discuss.python.org it was stated that we were getting 404s from pypi which isn't the case. The problem is that we are/were getting stale package indexes from pypi. We then cached these in our mirrors and served them to our mirror users. The problem originated with pypi and our mirrors were just serving what they got. https://github.com/pypa/warehouse/issues/8568 was filed to help coordinate with pypi on debugging and fixing. I expect this is where we'll track the issue going forward.

From there it was determined that pypi's fallback mirror had run out of disk space in early August. This meant it had gone stale. Whenever fastly would fallback to that mirror for content we would get stale content from pypi. Pypi is working to correct this and things should get better for us as they do so.

A key thing to note in the error output was that pip was getting a list of valid versions that excluded the latest version. Constraints caused us to fail because the version we wanted wasn't listed, but many valid versions were there. There were no 404s and we were getting responses from pypi. They just happened to be stale.

Finally there was also confusion that an AFS problem was involved. AFS is not used for serving pypi contents. We proxy cache pypi.

Changed in tripleo:
milestone: victoria-3 → wallaby-1
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Incomplete
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Revision history for this message
Marios Andreou (marios-b) wrote :

Bug status has been set to 'Incomplete' and target milestone has been removed due to inactivity. If you disagree please re-set these values and reach out to us on freenode #tripleo

Changed in tripleo:
milestone: wallaby-3 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.