Lava Scheduler rtsm_foundation-armv8 devices is idle state since 29 March 2013

Bug #1164853 reported by Soumya Basak
24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
LAVA Validation Lab
Fix Released
Critical
Dave Pigott

Bug Description

On our Lava Scheduler rtsm_foundation-armv8 devices is idle state since 29 March 2013
http://validation.linaro.org/lava-server/scheduler/alljobs

all job is in *submitted* status

http://validation.linaro.org/lava-server/scheduler/job/50838

as discussed with davepigott over this issue:

(03:11:03 IST) soumya: stylesen: as seen from LAVA Scheduler , some of the device are offline/idle state... would you please look into the issue :'(
(03:11:29 IST) davepigott: soumya: Which devices in particular?
(03:13:23 IST) stylesen: hi soumya, davepigott is the right person to answer this, we already have email threads discussing about this
(03:14:29 IST) davepigott: soumya: We have some deliberately offline - can you be more specific?
(03:14:44 IST) soumya: davepigott: rtsm_foundation-armv8,
(03:15:07 IST) davepigott: soumya: ok - on my list to investigate
(03:16:39 IST) asac: whast the problem with rtsm for soumya ?
(03:16:50 IST) asac: why are his jobs not scheduled?
(03:20:02 IST) davepigott: asac: Possibly - it may be to do with the cloud problems we had recently. Iirc doanac found that a reboot meant something wasn't installed properly - I'm looking into it now
(04:19:43 IST) davepigott: soumya: OK - very strange. Reboot the instance, rebooted the cloud node and there's no boot log or anything and I can't ping it. Looking at ways of fixing this.

would you please look into the issue and fixed it ups.

description: updated
Revision history for this message
Alexander Sack (asac) wrote :

can we get an ETA and a preliminary overview of what the issue is and what is in progress to fix it?

Revision history for this message
Dave Pigott (dpigott) wrote :

Problem is that the cloud node is in an odd state. As discussed, I am moving it over to a bare metal instance. The problem is that - as luck would have it - the fastmodels instance is running on the cloud controller node, so I've got to shuffle things around to free up another node that I can turn back into bare metal.

Plan is to get this done in the next 48 hours

Revision history for this message
Riku Voipio (riku-voipio) wrote :

Does this affect the rtsm_ve-a15x1-a7x1 and other ARMv7 Fast Models as well? I see there is 30 Idle jobs awaiting:

http://validation.linaro.org/lava-server/scheduler/device_type/rtsm_ve-a15x1-a7x1

Revision history for this message
Soumya Basak (soumya-basak) wrote :

it seems the same prob. The device is in idle state.

http://validation.linaro.org/lava-server/scheduler/

Dave Pigott (dpigott)
Changed in lava-lab:
importance: Undecided → Critical
assignee: nobody → Dave Pigott (dpigott)
milestone: none → 2013.04
status: New → In Progress
Revision history for this message
Dave Pigott (dpigott) wrote :

Decommissioned lava-server04 and turned it into bare metal server, deployed as fastmodels01 and updated through salt - still some issues with getting dispatcher running - working on it

Revision history for this message
Dave Pigott (dpigott) wrote : Re: [Bug 1164853] Lava Scheduler rtsm_foundation-armv8 devices is idle state since 29 March 2013

Yes - that was also running on the cloud node. We're working it as a priority.

On 9 Apr 2013, at 10:12, Riku Voipio <email address hidden> wrote:

> Does this affect the rtsm_ve-a15x1-a7x1 and other ARMv7 Fast Models as
> well? I see there is 30 Idle jobs awaiting:
>
> http://validation.linaro.org/lava-server/scheduler/device_type/rtsm_ve-
> a15x1-a7x1
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1164853
>
> Title:
> Lava Scheduler rtsm_foundation-armv8 devices is idle state since 29
> March 2013
>
> Status in LAVA Validation Lab:
> New
>
> Bug description:
> On our Lava Scheduler rtsm_foundation-armv8 devices is idle state since 29 March 2013
> http://validation.linaro.org/lava-server/scheduler/alljobs
>
> all job is in *submitted* status
>
> http://validation.linaro.org/lava-server/scheduler/job/50838
>
> as discussed with davepigott over this issue:
>
> (03:11:03 IST) soumya: stylesen: as seen from LAVA Scheduler , some of the device are offline/idle state... would you please look into the issue :'(
> (03:11:29 IST) davepigott: soumya: Which devices in particular?
> (03:13:23 IST) stylesen: hi soumya, davepigott is the right person to answer this, we already have email threads discussing about this
> (03:14:29 IST) davepigott: soumya: We have some deliberately offline - can you be more specific?
> (03:14:44 IST) soumya: davepigott: rtsm_foundation-armv8,
> (03:15:07 IST) davepigott: soumya: ok - on my list to investigate
> (03:16:39 IST) asac: whast the problem with rtsm for soumya ?
> (03:16:50 IST) asac: why are his jobs not scheduled?
> (03:20:02 IST) davepigott: asac: Possibly - it may be to do with the cloud problems we had recently. Iirc doanac found that a reboot meant something wasn't installed properly - I'm looking into it now
> (04:19:43 IST) davepigott: soumya: OK - very strange. Reboot the instance, rebooted the cloud node and there's no boot log or anything and I can't ping it. Looking at ways of fixing this.
>
> would you please look into the issue and fixed it ups.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/lava-lab/+bug/1164853/+subscriptions

Revision history for this message
Dave Pigott (dpigott) wrote :

New server deployed and backlog cleared

Changed in lava-lab:
status: In Progress → Fix Released
Revision history for this message
Antonio Terceiro (terceiro) wrote : Re: [Bug 1164853] Re: Lava Scheduler rtsm_foundation-armv8 devices is idle state since 29 March 2013

On Wed, Apr 10, 2013 at 08:58:46AM -0000, Dave Pigott wrote:
> New server deployed and backlog cleared

fastastic. Look at what happened to our queue (image attached)

\o/

--
Antonio Terceiro
Software Engineer - Linaro
http://www.linaro.org

Revision history for this message
Dave Pigott (dpigott) wrote : Re: [Bug 1164853] Lava Scheduler rtsm_foundation-armv8 devices is idle state since 29 March 2013

Yeah - I looked at that as well. Now all I need is to fix a5 and we're golden. :)

On 10 Apr 2013, at 14:12, Antonio Terceiro <email address hidden> wrote:

> On Wed, Apr 10, 2013 at 08:58:46AM -0000, Dave Pigott wrote:
>> New server deployed and backlog cleared
>
> fastastic. Look at what happened to our queue (image attached)
>
> \o/
>
> --
> Antonio Terceiro
> Software Engineer - Linaro
> http://www.linaro.org
>
>
> ** Attachment added: "lava_queue-day.png"
> https://bugs.launchpad.net/bugs/1164853/+attachment/3637482/+files/lava_queue-day.png
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1164853
>
> Title:
> Lava Scheduler rtsm_foundation-armv8 devices is idle state since 29
> March 2013
>
> Status in LAVA Validation Lab:
> Fix Released
>
> Bug description:
> On our Lava Scheduler rtsm_foundation-armv8 devices is idle state since 29 March 2013
> http://validation.linaro.org/lava-server/scheduler/alljobs
>
> all job is in *submitted* status
>
> http://validation.linaro.org/lava-server/scheduler/job/50838
>
> as discussed with davepigott over this issue:
>
> (03:11:03 IST) soumya: stylesen: as seen from LAVA Scheduler , some of the device are offline/idle state... would you please look into the issue :'(
> (03:11:29 IST) davepigott: soumya: Which devices in particular?
> (03:13:23 IST) stylesen: hi soumya, davepigott is the right person to answer this, we already have email threads discussing about this
> (03:14:29 IST) davepigott: soumya: We have some deliberately offline - can you be more specific?
> (03:14:44 IST) soumya: davepigott: rtsm_foundation-armv8,
> (03:15:07 IST) davepigott: soumya: ok - on my list to investigate
> (03:16:39 IST) asac: whast the problem with rtsm for soumya ?
> (03:16:50 IST) asac: why are his jobs not scheduled?
> (03:20:02 IST) davepigott: asac: Possibly - it may be to do with the cloud problems we had recently. Iirc doanac found that a reboot meant something wasn't installed properly - I'm looking into it now
> (04:19:43 IST) davepigott: soumya: OK - very strange. Reboot the instance, rebooted the cloud node and there's no boot log or anything and I can't ping it. Looking at ways of fixing this.
>
> would you please look into the issue and fixed it ups.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/lava-lab/+bug/1164853/+subscriptions

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.