tempest

HTTP errors observed in test_load_balancer_basic

Bug #1309252 reported by Salvatore Orlando on 2014-04-17

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	tempest	Fix Released	High	Unassigned

Bug Description

Recently test_load_balancer_basic started using inetd rather than netcat to emulate a web server.
While this might have fixed something, it probably broke something else.

Indeed commit 4a27b4623fe22be119ea7f4e10f37df6eb3b7186 was submitted on 2014-04-15 21:42 UTC.
Since then we've started seeing "got a bad status line" errors from urllib with a much higher frequency

Logstash reveals this error was seen only two times before the patch merged.
http://logstash.openstack.org/#eyJmaWVsZHMiOltdLCJzZWFyY2giOiJtZXNzYWdlOlwiSU9FcnJvcjogKCdodHRwIHByb3RvY29sIGVycm9yJywgMCwgJ2dvdCBhIGJhZCBzdGF0dXMgbGluZScsIE5vbmUpXCIgQU5EIHRhZ3M6XCJjb25zb2xlXCIgQU5EIGJ1aWxkX2JyYW5jaDpcIm1hc3RlclwiIiwidGltZWZyYW1lIjoiYWxsIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJvZmZzZXQiOjAsInRpbWUiOnsiZnJvbSI6IjIwMTQtMDQtMTVUMjI6MTE6NDkrMDA6MDAiLCJ0byI6IjIwMTQtMDQtMTdUMjI6MTE6NDkrMDA6MDAiLCJ1c2VyX2ludGVydmFsIjoiMCJ9LCJzdGFtcCI6MTM5Nzc3NTkyMDk0NywibW9kZSI6IiIsImFuYWx5emVfZmllbGQiOiIifQ==

This bug is being marked at high as it is affecting the gate.
The bug targets tempest because it's about test design, in my opinion, and not about the neutron functionality. No error is indeed observed in neutron lbaas logs or other neutron logs around the time of the failure.

Tags:

Matthew Treinish (treinish) on 2014-04-17

Changed in tempest:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Darragh O'Reilly (darragh-oreilly) wrote on 2014-04-22:

wget.cap Edit (1012 bytes, application/cap)

This new inetd web server does not work for me locally at all. The test fails at _check_connection(). I did a wget to the floating ip of the instance, and it returns "server1" but it stops the tcp connection abruptly.

$ wget --tries=1 -O - http://172.24.4.25
--2014-04-22 16:58:38-- http://172.24.4.25/
Connecting to 172.24.4.25:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: `STDOUT'

[<=> ] 0 --.-K/s server1
[ <=> ] 8 --.-K/s in 0s

2014-04-22 16:58:38 (504 KB/s) - Read error at byte 8 (Connection reset by peer).Giving up.

Wireshark shows the tcp connection was not closed properly by the server - see tcpdump attached.

Also the pool members are using the floating ip of the instance:
$ neutron lb-member-list
+--------------------------------------+-------------+---------------+--------+----------------+--------+
| id | address | protocol_port | weight | admin_state_up | status |
+--------------------------------------+-------------+---------------+--------+----------------+--------+
| 3f355de7-98a4-45f9-b953-ebf1027cdaad | 172.24.4.25 | 88 | 1 | True | ACTIVE |
| d337a753-c2c4-445a-be78-2c54deea2e81 | 172.24.4.25 | 80 | 1 | True | ACTIVE |
+--------------------------------------+-------------+---------------+--------+----------------+--------+

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2014-04-23:

We're thinking about moving back to netcat as a backend.
Although tests for inetd only don't expose the issue, tests for combination of haproxy+inetd seem to show it.

Revision history for this message

Darragh O'Reilly (darragh-oreilly) wrote on 2014-04-26:

For some reason I don't get the error in #1, if I put haproxy in between wget (or curl) and the inetd web. And the reason _check_connection was failing was because I'm using the linux-bridge agent and the security groups for the vip port work, unlike the ovs-agent https://bugs.launchpad.net/neutron/+bug/1163569.

Currently the number of requests is 100, and I don't see the problem locally. But when I increase it to 500, it fails with "IOError: ('http protocol error', 0, 'got a bad status line', None)" every time. Maybe reducing it from 100 to 10 would help.

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2014-04-30:

We've been (mostly, Elena) testing this behavior and it seems that it happens sometimes with haproxy, regardless of the backend.

The root cause is the packets with RST+ACK that node sends in response on SYN which happens with inetd much more frequently than with nc as a backend.

I think we eventually will have to tolerate 'bad status line' error.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-03: Related fix merged to tempest (master)

Reviewed: https://review.openstack.org/88579
Committed: https://git.openstack.org/cgit/openstack/tempest/commit/?id=6e73f468cf924587b32f94f41a33ab9a7a90fa96
Submitter: Jenkins
Branch: master

commit 6e73f468cf924587b32f94f41a33ab9a7a90fa96
Author: Elena Ezhova <email address hidden>
Date: Fri Apr 18 17:38:13 2014 +0400

Switch back to nc in test_load_balancer_basic

After test_load_balancer_basic started using inetd instead of netcat
we've started getting "BadStatusLine" errors from urllib.

That is why we need to move back to using netcat.
Also handle "BadStatusLine" error.

Related bug: 1309252
Change-Id: Ida919c63e27c6a003be4c249ba2e6f3e2ea7a7b3

Revision history for this message

Joe Gordon (jogo) wrote on 2014-06-05:

No hits in elastic-recheck in 2 weeks, marking as resolved.

Changed in tempest:
status:	Triaged → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-14: Change abandoned on tempest (master)

Change abandoned by Darragh O'Reilly (<email address hidden>) on branch: master
Review: https://review.openstack.org/90539
Reason: this has been superseded

Matthew Treinish (treinish) on 2014-09-09

Changed in tempest:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

wget.cap Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.