error on virsh destroy of test env seed

Bug #1291323 reported by Derek Higgins
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Invalid
High
Unassigned

Bug Description

Error calling virsh destroy on a test env

From http://logs.openstack.org/54/79654/1/check-tripleo/check-tripleo-overcloud-precise/768a7f0/console.html
2014-03-12 11:04:50.154 | Calling <function virsh_start at 0x7fbfdf32dc80> with: ['start', 'seed_2']
2014-03-12 11:04:50.193 | error: failed to connect to the hypervisor
2014-03-12 11:04:50.194 | error: Cannot recv data: Connection reset by peer

Revision history for this message
Derek Higgins (derekh) wrote :

One of the test-env nodes(testenv-testenv8-v6ci3v2kefqm) doesn't seem to be responding

I got ssh'd in but virsh list didn't return and couldn't be killed, now having trouble sshing in.

While I was in the load was
top - 12:26:24 up 16 days, 12:28, 2 users, load average: 11.89, 11.52, 10.81
Tasks: 467 total, 1 running, 410 sleeping, 0 stopped, 56 zombie
%Cpu(s): 1.3 us, 0.3 sy, 0.0 ni, 97.5 id, 0.8 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 98996644 total, 89473708 used, 9522936 free, 72776 buffers
KiB Swap: 0 total, 0 used, 0 free, 65427132 cached

nothing mad there, but things seemed to be running slow (possibly just the ssh connection)

There was also about 20 ci_commans stuck doing virsh destorys, the earliest ones I can see are
root 11406 11163 0 00:41 ? 00:00:00 /usr/bin/python /usr/local/bin/ci_commands
root 11407 1 0 00:41 ? 00:00:00 /usr/bin/python /usr/local/bin/ci_commands
root 11667 11615 0 00:45 ? 00:00:00 /usr/bin/python /usr/local/bin/ci_commands
root 11722 1 0 00:46 ? 00:00:00 /usr/bin/python /usr/local/bin/ci_commands

Revision history for this message
Derek Higgins (derekh) wrote :

I seem to be able to ssh in on some attempts for about 1 minute or so

I tried killing libvirtd, service command freezes, so kill -9

didn't help

I've killed all the testenv workers on the host so it at least stops picking up ci jobs

Revision history for this message
Derek Higgins (derekh) wrote :

logging into console gave a little info

[1384369.578286] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables thi
s message.
[1384369.578463] INFO: task kworker/4:1:32389 blocked for more than 120 seconds.
[1384369.578492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables thi
s message.
[1384369.578586] INFO: task kworker/4:2:11654 blocked for more than 120 seconds.
[1384369.578615] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables thi
s message.
[1384369.578720] INFO: task kworker/4:0:8212 blocked for more than 120 seconds.
[1384369.578749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables thi
s message.
[1385499.056595] hpsa 0000:06:00.0: cmd_alloc returned NULL!
[1385499.226146] hpsa 0000:06:00.0: cmd_alloc returned NULL!
[1386201.644683] hpsa 0000:06:00.0: cmd_alloc returned NULL!
[1422832.300304] hpsa 0000:06:00.0: cmd_alloc returned NULL!
[1422840.901916] hpsa 0000:06:00.0: cmd_alloc returned NULL!
[1424176.790538] hpsa 0000:06:00.0: cmd_alloc returned NULL!
[1424183.046912] hpsa 0000:06:00.0: cmd_alloc returned NULL!
[1424885.974928] hpsa 0000:06:00.0: cmd_alloc returned NULL!
[1424914.998170] hpsa 0000:06:00.0: cmd_alloc returned NULL!

going to bounce the server

Revision history for this message
Derek Higgins (derekh) wrote :
Revision history for this message
Derek Higgins (derekh) wrote :

Havn't seen this in months, closing

Changed in tripleo:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.