HTTP errors observed in test_load_balancer_basic

Bug #1309252 reported by Salvatore Orlando
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tempest
Fix Released
High
Unassigned

Bug Description

Recently test_load_balancer_basic started using inetd rather than netcat to emulate a web server.
While this might have fixed something, it probably broke something else.

Indeed commit 4a27b4623fe22be119ea7f4e10f37df6eb3b7186 was submitted on 2014-04-15 21:42 UTC.
Since then we've started seeing "got a bad status line" errors from urllib with a much higher frequency

Logstash reveals this error was seen only two times before the patch merged.
http://logstash.openstack.org/#eyJmaWVsZHMiOltdLCJzZWFyY2giOiJtZXNzYWdlOlwiSU9FcnJvcjogKCdodHRwIHByb3RvY29sIGVycm9yJywgMCwgJ2dvdCBhIGJhZCBzdGF0dXMgbGluZScsIE5vbmUpXCIgQU5EIHRhZ3M6XCJjb25zb2xlXCIgQU5EIGJ1aWxkX2JyYW5jaDpcIm1hc3RlclwiIiwidGltZWZyYW1lIjoiYWxsIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJvZmZzZXQiOjAsInRpbWUiOnsiZnJvbSI6IjIwMTQtMDQtMTVUMjI6MTE6NDkrMDA6MDAiLCJ0byI6IjIwMTQtMDQtMTdUMjI6MTE6NDkrMDA6MDAiLCJ1c2VyX2ludGVydmFsIjoiMCJ9LCJzdGFtcCI6MTM5Nzc3NTkyMDk0NywibW9kZSI6IiIsImFuYWx5emVfZmllbGQiOiIifQ==

This bug is being marked at high as it is affecting the gate.
The bug targets tempest because it's about test design, in my opinion, and not about the neutron functionality. No error is indeed observed in neutron lbaas logs or other neutron logs around the time of the failure.

Tags: neutron
Changed in tempest:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Darragh O'Reilly (darragh-oreilly) wrote :

This new inetd web server does not work for me locally at all. The test fails at _check_connection(). I did a wget to the floating ip of the instance, and it returns "server1" but it stops the tcp connection abruptly.

$ wget --tries=1 -O - http://172.24.4.25
--2014-04-22 16:58:38-- http://172.24.4.25/
Connecting to 172.24.4.25:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: `STDOUT'

    [<=> ] 0 --.-K/s server1
    [ <=> ] 8 --.-K/s in 0s

2014-04-22 16:58:38 (504 KB/s) - Read error at byte 8 (Connection reset by peer).Giving up.

Wireshark shows the tcp connection was not closed properly by the server - see tcpdump attached.

Also the pool members are using the floating ip of the instance:
$ neutron lb-member-list
+--------------------------------------+-------------+---------------+--------+----------------+--------+
| id | address | protocol_port | weight | admin_state_up | status |
+--------------------------------------+-------------+---------------+--------+----------------+--------+
| 3f355de7-98a4-45f9-b953-ebf1027cdaad | 172.24.4.25 | 88 | 1 | True | ACTIVE |
| d337a753-c2c4-445a-be78-2c54deea2e81 | 172.24.4.25 | 80 | 1 | True | ACTIVE |
+--------------------------------------+-------------+---------------+--------+----------------+--------+

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

We're thinking about moving back to netcat as a backend.
Although tests for inetd only don't expose the issue, tests for combination of haproxy+inetd seem to show it.

Revision history for this message
Darragh O'Reilly (darragh-oreilly) wrote :

For some reason I don't get the error in #1, if I put haproxy in between wget (or curl) and the inetd web. And the reason _check_connection was failing was because I'm using the linux-bridge agent and the security groups for the vip port work, unlike the ovs-agent https://bugs.launchpad.net/neutron/+bug/1163569.

Currently the number of requests is 100, and I don't see the problem locally. But when I increase it to 500, it fails with "IOError: ('http protocol error', 0, 'got a bad status line', None)" every time. Maybe reducing it from 100 to 10 would help.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

We've been (mostly, Elena) testing this behavior and it seems that it happens sometimes with haproxy, regardless of the backend.

The root cause is the packets with RST+ACK that node sends in response on SYN which happens with inetd much more frequently than with nc as a backend.

I think we eventually will have to tolerate 'bad status line' error.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tempest (master)

Reviewed: https://review.openstack.org/88579
Committed: https://git.openstack.org/cgit/openstack/tempest/commit/?id=6e73f468cf924587b32f94f41a33ab9a7a90fa96
Submitter: Jenkins
Branch: master

commit 6e73f468cf924587b32f94f41a33ab9a7a90fa96
Author: Elena Ezhova <email address hidden>
Date: Fri Apr 18 17:38:13 2014 +0400

    Switch back to nc in test_load_balancer_basic

    After test_load_balancer_basic started using inetd instead of netcat
    we've started getting "BadStatusLine" errors from urllib.

    That is why we need to move back to using netcat.
    Also handle "BadStatusLine" error.

    Related bug: 1309252
    Change-Id: Ida919c63e27c6a003be4c249ba2e6f3e2ea7a7b3

Revision history for this message
Joe Gordon (jogo) wrote :

No hits in elastic-recheck in 2 weeks, marking as resolved.

Changed in tempest:
status: Triaged → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tempest (master)

Change abandoned by Darragh O'Reilly (<email address hidden>) on branch: master
Review: https://review.openstack.org/90539
Reason: this has been superseded

Changed in tempest:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.