pingtest failing on OVB jobs to create Cinder volume and Nova server with 504 error

Bug #1638350 reported by Emilien Macchi on 2016-11-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Gabriele Cerami

Bug Description

Right now, OVB jobs are broken or very unstable because pingtest is failing.

I saw 3 different traces in logs:

1) ClientException: resources.volume1: Gateway Time-out (HTTP 504)
http://logs.openstack.org/24/392124/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/39b76eb/console.html#_2016-11-01_14_08_44_359965

2) ERROR: <html><body><h1>504 Gateway Time-out</h1>
http://logs.openstack.org/24/392124/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/2d428de/console.html#_2016-11-01_14_22_51_274890

3) ConnectFailure: resources.server1: Unable to establish connection to http://10.0.0.14:8774/v2.1/os-volumes_boot: ('Connection aborted.', BadStatusLine("''",))
http://logs.openstack.org/64/391064/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/2d15c11/console.html#_2016-10-31_10_19_54_304885

All 3 errors happens randomly or seem to.

Gabriele Cerami (gcerami) wrote :

Another related fix proposed but not automatically added here: https://review.openstack.org/#/c/392647

Reviewed: https://review.openstack.org/392288
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=44d3ebe54661df0fcea30969f495f9780ee7c671
Submitter: Jenkins
Branch: master

commit 44d3ebe54661df0fcea30969f495f9780ee7c671
Author: Alex Schultz <email address hidden>
Date: Tue Nov 1 13:43:17 2016 -0600

    Create heat user in keystone profile

    Rather than use the heat::keystone::domain class which also includes the
    configuration options, we should just create the user for heat in
    keystone independently of the configuration.

    Change-Id: I7d42d04ef0c53dc1e62d684d8edacfed9fd28fbe
    Related-Bug: #1638350
    Closes-Bug: #1638626

Changed in tripleo:
assignee: Emilien Macchi (emilienm) → Gabriele Cerami (gcerami)

fwiw, I've just tested an HA deployment with 3 controller nodes on an overcloud built from latest promoted delorean and pingtest succeeded

Reviewed: https://review.openstack.org/393000
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=77213b5f55213e3324f46cfb02b5ec63bbc58c41
Submitter: Jenkins
Branch: stable/newton

commit 77213b5f55213e3324f46cfb02b5ec63bbc58c41
Author: Alex Schultz <email address hidden>
Date: Tue Nov 1 13:43:17 2016 -0600

    Create heat user in keystone profile

    Rather than use the heat::keystone::domain class which also includes the
    configuration options, we should just create the user for heat in
    keystone independently of the configuration.

    Change-Id: I7d42d04ef0c53dc1e62d684d8edacfed9fd28fbe
    Related-Bug: #1638350
    Closes-Bug: #1638626
    (cherry picked from commit 44d3ebe54661df0fcea30969f495f9780ee7c671)

tags: added: in-stable-newton

For traces 2 and 3 suspects fall to timeouts and delays caused by https://bugs.launchpad.net/tripleo/+bug/1637961 which should be solved by new redis package

Gabriele Cerami (gcerami) wrote :

With the new redis package all gates pass again. It's sill unclear if trace 3 is a fatal error.

tags: removed: alert
Changed in tripleo:
status: In Progress → Fix Released

Change abandoned by Juan Antonio Osorio Robles (<email address hidden>) on branch: master
Review: https://review.openstack.org/392647

summary: pingtest failing on OVB jobs to create Cinder volume and Nova server
+ with 504 error
Changed in tripleo:
status: Fix Released → Confirmed
status: Confirmed → Triaged
Gabriele Cerami (gcerami) wrote :

All searches are inconclusive, no sign of errors in the logs, no process eating all the CPU. The only thing off seems to be in /var/log/messages in overcloud controller. This error is continuously repeated.

Nov 8 17:07:05 localhost Keepalived_vrrp[19924]: bogus VRRP packet received on br-ex !!!
Nov 8 17:07:05 localhost Keepalived_vrrp[19924]: VRRP_Instance(51) Dropping received VRRP packet...
Nov 8 17:07:05 localhost Keepalived_vrrp[19924]: ip address associated with VRID not present in received packet : 192.0.2.6
Nov 8 17:07:05 localhost Keepalived_vrrp[19924]: one or more VIP associated with VRID mismatch actual MASTER advert
Nov 8 17:07:05 localhost Keepalived_vrrp[19924]: bogus VRRP packet received on br-ex !!!
Nov 8 17:07:05 localhost Keepalived_vrrp[19924]: VRRP_Instance(52) Dropping received VRRP packet...

No progress other that this, and it seems it doesn't happen often, so it's hard to reproduce.

Changed in tripleo:
milestone: ocata-1 → ocata-2

Seems like it doesn't happen anymore, or we just don't see it because the recent resources upgrade which hides it now.

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers