API endpoints become inaccessible after tempest run

Bug #1667443 reported by Nuno Santos
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tempest
Expired
Undecided
Unassigned

Bug Description

In a simple 4-node deployment, some API endpoints (namely glance) become inaccessible after one pass of the tempest smoke tests. Deployment is via juju charms (juju bundle/deployment file attached).

$ glance image-list
Error finding address for http://10.244.222.221:9292/versions: Unable to establish connection to http://10.244.222.221:9292/versions

$ telnet 10.244.222.221 9292
Trying 10.244.222.221...
Connected to 10.244.222.221.
Escape character is '^]'.
...

$ wget http://10.244.222.221:9292
--2017-02-23 19:10:21-- http://10.244.222.221:9292/
Connecting to 10.244.222.221:9292... connected.
HTTP request sent, awaiting response... No data received.
Retrying.
...

There are 3 tempest failures (not the subject of this bug) and some cleanup is not being fully done (namely, there are 2 images left behind after tempest finishes, tempest-image-xxxxx and tempest-new-image-xxxxx).

Juju status output attached in a comment.

Tags: tuning uosci
Revision history for this message
Nuno Santos (nunosantos) wrote :
Revision history for this message
Nuno Santos (nunosantos) wrote :
description: updated
Revision history for this message
Luz Cazares (luz-cazares) wrote :

Nuno provide the tempest logs

Nuno Santos (nunosantos)
summary: - Deployment becomes inaccessible after tempest run
+ API endpoints become inaccessible after tempest run
description: updated
Revision history for this message
Nuno Santos (nunosantos) wrote :
Revision history for this message
Nuno Santos (nunosantos) wrote :
Revision history for this message
Nuno Santos (nunosantos) wrote :
Revision history for this message
Nuno Santos (nunosantos) wrote :
Revision history for this message
Nuno Santos (nunosantos) wrote :

Logs/output posted, please let me know if you would like to see other files.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Can you please describe the system specs for the machines in this deployment? API endpoints becoming unstable after a cloud is hammered could be a memory, disk or cpu contention indicator (ie. a system sizing/tuning issue).

Changed in tempest:
status: New → Incomplete
Ryan Beisner (1chb1n)
tags: added: tuning uosci
Revision history for this message
Nuno Santos (nunosantos) wrote :

They're all 8-core CPU, 16 GB RAM, 429.5GB on 2 disks, running Xenial.

I'm attaching some machine stats (df -h && free -m && uptime) after a pass of tempest.

Revision history for this message
Nuno Santos (nunosantos) wrote :

One more data-point: glance endpoint only becomes inaccessible when smoke tests are ran in parallel mode; running "smoke-serial" tests doesn't cause the issue.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for tempest because there has been no activity for 60 days.]

Changed in tempest:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers