pingtest failing on OVB jobs to create Cinder volume and Nova server with 504 error

Bug #1638350 reported by Emilien Macchi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Gabriele Cerami

Bug Description

Right now, OVB jobs are broken or very unstable because pingtest is failing.

I saw 3 different traces in logs:

1) ClientException: resources.volume1: Gateway Time-out (HTTP 504)
http://logs.openstack.org/24/392124/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/39b76eb/console.html#_2016-11-01_14_08_44_359965

2) ERROR: <html><body><h1>504 Gateway Time-out</h1>
http://logs.openstack.org/24/392124/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/2d428de/console.html#_2016-11-01_14_22_51_274890

3) ConnectFailure: resources.server1: Unable to establish connection to http://10.0.0.14:8774/v2.1/os-volumes_boot: ('Connection aborted.', BadStatusLine("''",))
http://logs.openstack.org/64/391064/1/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/2d15c11/console.html#_2016-10-31_10_19_54_304885

All 3 errors happens randomly or seem to.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/392288

Revision history for this message
wes hayutin (weshayutin) wrote : Re: pingtest failing on OVB jobs to create Cinder volume and Nova server

FYI..
Checked the periodic newton job and the ping test is working there..
http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-ovb-ha-newton/5fff306/

Revision history for this message
Gabriele Cerami (gcerami) wrote :

Another related fix proposed but not automatically added here: https://review.openstack.org/#/c/392647

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/392288
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=44d3ebe54661df0fcea30969f495f9780ee7c671
Submitter: Jenkins
Branch: master

commit 44d3ebe54661df0fcea30969f495f9780ee7c671
Author: Alex Schultz <email address hidden>
Date: Tue Nov 1 13:43:17 2016 -0600

    Create heat user in keystone profile

    Rather than use the heat::keystone::domain class which also includes the
    configuration options, we should just create the user for heat in
    keystone independently of the configuration.

    Change-Id: I7d42d04ef0c53dc1e62d684d8edacfed9fd28fbe
    Related-Bug: #1638350
    Closes-Bug: #1638626

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/newton)

Related fix proposed to branch: stable/newton
Review: https://review.openstack.org/393000

Changed in tripleo:
assignee: Emilien Macchi (emilienm) → Gabriele Cerami (gcerami)
Revision history for this message
Giulio Fidente (gfidente) wrote : Re: pingtest failing on OVB jobs to create Cinder volume and Nova server

fwiw, I've just tested an HA deployment with 3 controller nodes on an overcloud built from latest promoted delorean and pingtest succeeded

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/newton)

Reviewed: https://review.openstack.org/393000
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=77213b5f55213e3324f46cfb02b5ec63bbc58c41
Submitter: Jenkins
Branch: stable/newton

commit 77213b5f55213e3324f46cfb02b5ec63bbc58c41
Author: Alex Schultz <email address hidden>
Date: Tue Nov 1 13:43:17 2016 -0600

    Create heat user in keystone profile

    Rather than use the heat::keystone::domain class which also includes the
    configuration options, we should just create the user for heat in
    keystone independently of the configuration.

    Change-Id: I7d42d04ef0c53dc1e62d684d8edacfed9fd28fbe
    Related-Bug: #1638350
    Closes-Bug: #1638626
    (cherry picked from commit 44d3ebe54661df0fcea30969f495f9780ee7c671)

tags: added: in-stable-newton
Revision history for this message
Gabriele Cerami (gcerami) wrote : Re: pingtest failing on OVB jobs to create Cinder volume and Nova server

For traces 2 and 3 suspects fall to timeouts and delays caused by https://bugs.launchpad.net/tripleo/+bug/1637961 which should be solved by new redis package

Revision history for this message
Gabriele Cerami (gcerami) wrote :

With the new redis package all gates pass again. It's sill unclear if trace 3 is a fatal error.

tags: removed: alert
Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (master)

Change abandoned by Juan Antonio Osorio Robles (<email address hidden>) on branch: master
Review: https://review.openstack.org/392647

summary: pingtest failing on OVB jobs to create Cinder volume and Nova server
+ with 504 error
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :
Changed in tripleo:
status: Fix Released → Confirmed
status: Confirmed → Triaged
Revision history for this message
Gabriele Cerami (gcerami) wrote :

All searches are inconclusive, no sign of errors in the logs, no process eating all the CPU. The only thing off seems to be in /var/log/messages in overcloud controller. This error is continuously repeated.

Nov 8 17:07:05 localhost Keepalived_vrrp[19924]: bogus VRRP packet received on br-ex !!!
Nov 8 17:07:05 localhost Keepalived_vrrp[19924]: VRRP_Instance(51) Dropping received VRRP packet...
Nov 8 17:07:05 localhost Keepalived_vrrp[19924]: ip address associated with VRID not present in received packet : 192.0.2.6
Nov 8 17:07:05 localhost Keepalived_vrrp[19924]: one or more VIP associated with VRID mismatch actual MASTER advert
Nov 8 17:07:05 localhost Keepalived_vrrp[19924]: bogus VRRP packet received on br-ex !!!
Nov 8 17:07:05 localhost Keepalived_vrrp[19924]: VRRP_Instance(52) Dropping received VRRP packet...

No progress other that this, and it seems it doesn't happen often, so it's hard to reproduce.

Changed in tripleo:
milestone: ocata-1 → ocata-2
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

Seems like it doesn't happen anymore, or we just don't see it because the recent resources upgrade which hides it now.

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.