network connections to http://pypi.openstack.org/openstack/ can cause CI failures with various symptoms

Bug #1292141 reported by Jon-Paul Sullivan
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Unassigned

Bug Description

Spurious CI failure in tripleo due to pypi mirror problems.

This looks to be specifically case related (there is a prettytable, but no PrettyTable).

This could be solved via links between the different case variations, or by configuring the apache server hosting the pypi mirror to use mod_speling:
$ a2enmod rewrite
$ a2enmod speling

# In /var/www/pypi-mirror/.htaccess
  RewriteEngine On
  CheckSpelling On

2014-03-13 14:13:47.256 | Downloading/unpacking PrettyTable>=0.7,<0.8 (from -r requirements.txt (line 4))
2014-03-13 14:14:02.353 | Cannot fetch index base URL http://pypi.openstack.org/openstack/
2014-03-13 14:14:02.353 | http://pypi.openstack.org/openstack/PrettyTable/ uses an insecure transport scheme (http). Consider using https if pypi.openstack.org has it available
2014-03-13 14:14:02.702 | Could not find any downloads that satisfy the requirement PrettyTable>=0.7,<0.8 (from -r requirements.txt (line 4))
2014-03-13 14:14:02.741 | Cleaning up...
2014-03-13 14:14:02.741 | No distributions at all found for PrettyTable>=0.7,<0.8 (from -r requirements.txt (line 4))
2014-03-13 14:14:02.742 | Storing debug log for failure in /root/.pip/pip.log
2014-03-13 14:14:02.742 | ++ check_break after-error run_in_target bash

Tags: ci
Revision history for this message
Jon-Paul Sullivan (jonpaul-sullivan) wrote :
Changed in tripleo:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Spurious failures are bad,, mmmkaaay. :)

Changed in tripleo:
status: Confirmed → Triaged
importance: High → Critical
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

I think the solution to this is to have our CI testenv boxes maintain a local pypi mirror and use the 'pypi' element from diskimage-builder.

Revision history for this message
James Polley (tchaypo) wrote :

I'd like to suggest we change requirements.txt to use the correct case - but https://pypi.python.org/simple/ lists PrettyTable and not prettytable, so it seems that PrettyTable *is* the correct case.

https://pypi.python.org/simple/ also apparently does some magic to try to find the right package (https://pypi.python.org/simple/prettytable redirects to https://pypi.python.org/simple/PrettyTable).

I like the idea of using the pypi element (https://git.openstack.org/cgit/openstack/diskimage-builder/tree/elements/pypi/README.md) to have a local cache on the testenv boxes; but I don't think that's a complete solution. The idea that pip is case-insensitive seems to be fairly pervasive - search for "sensitive" in http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2013-09-16.log and http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2013-07-26.log for two examples. This is probably because pypi itself fakes case-insensitivity, even though pip is case-sensitive.

Revision history for this message
Robert Collins (lifeless) wrote :

This is almost certainly a network glitch; we see them fairly often from the freecloud rack.

I'm with clint - we need to deploy a mirror infrastructure into the ci-overcloud and configure the slaves to use it.

Revision history for this message
James Polley (tchaypo) wrote :

*dives into pip code*

Actually, pip is somewhat case-insensitive; if a lookup for PrettyTable fails, pip.utils.normalize_name() will lowercase the name and swap -s for _s and try again.

The error we're seeing above isn't to do with case, the error here is that http://pypi.openstack.org/openstack/ could not be downloaded. As Rob and Clint have hinted at, no amount of case-insensitivity is going to fix this.

Revision history for this message
Jon-Paul Sullivan (jonpaul-sullivan) wrote :

See http://logs.openstack.org/44/80644/2/check-tripleo/check-tripleo-overcloud-precise/5757128/console.html for another instance of this, again with PrettyTable.

I appreciate all of the comments suggesting that this is a transient network condition, but the failure occurring at the same package retrieval without occurring elsewhere seems at odds with that theory.

Is the Openstack mirror a collection of servers and not just one? Could one of those mirrors be misconfigured or different in some way?

Revision history for this message
Derek Higgins (derekh) wrote :
Revision history for this message
jan grant (jan-grant) wrote :

FWIW, using PYPI)_MIRROR_URL pointing at a local mirror with CheckSpelling on has completely removed this behaviour.

Revision history for this message
Derek Higgins (derekh) wrote :

I've reproduced this in a manual test while running tcpdump to capture port 80, from what I can see it looks like a network problem(or server side problem), combined possibly with pip not handling the problem correctly.

Normal process (all same tcp connection)
GET /openstack/markupsafe/ -- HTTP/1.1 404 Not Found
GET /openstack/ HTTP/1.1 -- HTTP/1.1 200 OK
GET /openstack/MarkupSafe/ -- HTTP/1.1 200 OK
GET /openstack/MarkupSafe/MarkupSafe-0.19.tar.gz -- HTTP/1.1 200 OK

Failed process (two separate tcp connections)
GET /openstack/markupsafe/ -- HTTP/1.1 404 Not Found
GET /openstack -- no response for 15 seconds
    FYN,ACK sent to server (connection is closed)
    4 x RST sent to server 7 seconds later (presumably because its receiving packets on closed connection)
- New TCP connection
GET /openstack/markupsafe/ -- HTTP/1.1 404 Not Found
No attempt to get /openstack

Pip gives up and logs
  Getting page http://pypi.openstack.org/openstack/markupsafe/
  Could not fetch URL http://pypi.openstack.org/openstack/markupsafe/: 404 Client Error: Not Found
  Will skip URL http://pypi.openstack.org/openstack/markupsafe/ when looking for download links for markupsafe (from Jinja2->-r requirements.txt (line 8))
  Getting page http://pypi.openstack.org/openstack/
  Could not fetch URL http://pypi.openstack.org/openstack/: timed out
  Will skip URL http://pypi.openstack.org/openstack/ when looking for download links for markupsafe (from Jinja2->-r requirements.txt (line 8))
  Cannot fetch index base URL http://pypi.openstack.org/openstack/
  URLs to search for versions for markupsafe (from Jinja2->-r requirements.txt (line 8)):
  * http://pypi.openstack.org/openstack/markupsafe/
  http://pypi.openstack.org/openstack/markupsafe/ uses an insecure transport scheme (http). Consider using https if pypi.openstack.org has it available
  Getting page http://pypi.openstack.org/openstack/markupsafe/
  Could not fetch URL http://pypi.openstack.org/openstack/markupsafe/: 404 Client Error: Not Found
  Will skip URL http://pypi.openstack.org/openstack/markupsafe/ when looking for download links for markupsafe (from Jinja2->-r requirements.txt (line 8))
  Could not find any downloads that satisfy the requirement markupsafe (from Jinja2->-r requirements.txt (line 8))

From what I can see the time-out receiving a response from the server when getting http://pypi.openstack.org/openstack/ is the main issue here, attempts to fix the issue with case changes in url paths will probably help so we see it less often

Looking a pip code next to see if I can see any problem there.

Revision history for this message
Derek Higgins (derekh) wrote :

This hasn't occurred since we increased the time out

https://review.openstack.org/#/c/81815/

for an explanation see comment 4 of
https://bugs.launchpad.net/openstack-ci/+bug/1272417

Will close if all still looks ok in a day to two

Revision history for this message
Derek Higgins (derekh) wrote :

We should also pursue a local pip cache solution for speedup but this problem is solved so we should open another bug for the local cache if we don't have one already.

summary: - Cannot fetch index base URL http://pypi.openstack.org/openstack/
+ network connections to http://pypi.openstack.org/openstack/ can cause CI
+ failures with various symptoms
Derek Higgins (derekh)
Changed in tripleo:
importance: Critical → High
Revision history for this message
Robert Collins (lifeless) wrote :

For the record this occurs in the RH1 region as well as HP1.

Derek Higgins (derekh)
tags: added: ci
Revision history for this message
Bob Ball (bob-ball) wrote :

Sorry; moved my recheck to bug #1268725

no longer affects: devstack
Revision history for this message
Alexis Lee (alexisl) wrote :

Is this the same issue?

    http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz

    2014-07-11 20:26:28.595 | --> Finished Dependency Resolution
    2014-07-11 20:26:28.631 | Error: Package: gpgme-1.3.2-4.fc20.x86_64 (@koji-override-0/$releasever)
    2014-07-11 20:26:28.631 | Requires: libc.so.6(GLIBC_2.8)(64bit)
    2014-07-11 20:26:28.631 | Removing: glibc-2.18-11.fc20.x86_64 (@koji-override-0/$releasever)
    2014-07-11 20:26:28.631 | libc.so.6(GLIBC_2.8)(64bit)
    2014-07-11 20:26:28.631 | Updated By: glibc-2.18-12.fc20.i686 (updates)
    2014-07-11 20:26:28.631 | Not found

plus several pages more packages not found

Revision history for this message
James Polley (tchaypo) wrote : Re: [Bug 1292141] Re: network connections to http://pypi.openstack.org/openstack/ can cause CI failures with various symptoms
Download full text (6.2 KiB)

Nope, for a few reasons

* I checked the log for mentions of pypi.openstack.org, all the package
downloads from there that I could see succeeded
* These aren't python packages, they're system packages.

If you scroll back up you'll see that this all starts with an attempt to
remove grub:

2014-07-11 20:25:54.775
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_25_54_775>
| dib-run-parts Fri Jul 11 20:25:54 UTC 2014 Running
/tmp/in_target.d/pre-install.d/15-remove-grub2014-07-11 20:25:54.910
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_25_54_910>
| No Match for argument: grub22014-07-11 20:25:54.911
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_25_54_911>
| No Packages marked for removal2014-07-11 20:25:54.943
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_25_54_943>
| Installing grub2-tools2014-07-11 20:25:54.943
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_25_54_943>
| gettext2014-07-11 20:25:54.943
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_25_54_943>
| os-prober2014-07-11 20:25:54.943
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_25_54_943>
| system-logos2014-07-11 20:26:16.322
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_26_16_322>
| Resolving Dependencies2014-07-11 20:26:16.364
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_26_16_364>
| --> Running transaction check2014-07-11 20:26:16.364
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_26_16_364>
| ---> Package fedora-logos.i686 0:21.0.1-1.fc20 will be
installed2014-07-11 20:26:16.364
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_26_16_364>
| ---> Package gettext.i686 0:0.18.3.2-1.fc20 will be
installed2014-07-11 20:26:16.364
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_26_16_364>
| --> Processing Dependency: libxml2.so.2 for package:
gettext-0.18.3.2-1.fc20.i6862014-07-11 20:26:17.002
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_26_17_002>
| --> Processing Dependency: libunistring.so.0 for package:
gettext-0.18.3.2-1.fc20.i6862014-07-11 20:26:17.043
<http://logs.openstack.org/00/105400/4/check-tripleo/check-tripleo-overcloud-f20/18e5099/console.html.gz#_2014-07-11_20_26_17_043>
| --> Processing Dependency: libtinfo.so.5 for package:
gettext-0.18.3.2-1.fc20.i6862014-07-11 20:26:17.044
<http://logs.openstack.org/00/105400/4/check-tripleo/check-...

Read more...

Revision history for this message
Alexis Lee (alexisl) wrote :

Oh how embarrassing :x

Thanks for looking.

Revision history for this message
Derek Higgins (derekh) wrote :

We are now using a bandersnatch mirror local to the each region so closing this

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.