TC2 fails to boot with latest MP builds

Bug #1069851 reported by vishal
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
LAVA Validation Lab
Fix Released
High
Dave Pigott

Bug Description

TC2 fails to boot when we submit jobs with latest MP builds :
Here is the link to the image :
https://android-build.linaro.org/builds/~linaro-android/vexpress-jb-gcc47-armlt-tracking-open/#build=76
Here is the job :
https://validation.linaro.org/lava-server/scheduler/job/36380

The master image needs to be updated to use the latest kernel and the Device tree blob.

vishal (vishalbhoj)
Changed in lava-master-image-scripts:
assignee: nobody → Dave Pigott (dpigott)
Dave Pigott (dpigott)
affects: lava-master-image-scripts → lava-lab
Andy Doan (doanac)
Changed in lava-lab:
milestone: none → 2012.10
Revision history for this message
Dave Pigott (dpigott) wrote :

In progress. Having l-m-c issues

Changed in lava-lab:
importance: Undecided → High
status: New → Confirmed
Dave Pigott (dpigott)
Changed in lava-lab:
status: Confirmed → In Progress
Revision history for this message
Dave Pigott (dpigott) wrote :

After getting a successful build, and finally getting the config right, the serial console on that build is very flakey. It drops characters left right and centre (literally - anywhere in a command line).

Ryan is investigating

Revision history for this message
Dave Pigott (dpigott) wrote :

Record of our conversation. Upshot of it is that we may have a problem that means we're blocked.

davepigott: ryanharkin: I'm starting a binary chop to see if I can narrow it down. I'm starting with 313. Is that as good a place to start as any?
[3:06pm] ryanharkin: davepigott, not sure! I don't quite understand the symptoms, unfortunately. I don't see how it used to work and now doesn't if it's a UEFI problem, but perhaps it is a UEFI *and* kernel combination problem
[3:07pm] davepigott: Hmm. I can't see how it can be (solely) a UEFI problem since it worked with the older image, same UEFI
[3:08pm] davepigott: ryanharkin: OK. Well, I'll start the process with 313, since that's the first available in snapshots, and then binary chop, as I say.
[3:08pm] davepigott: Should be a max of 7 iterations to nail it
[3:09pm] ryanharkin: davepigott, aye, if it fails at 313 it'll be even quicker
[3:12pm] davepigott: ryanharkin:
[3:15pm] davepigott: ryanharkin: Actually, before I go down that route, could it possibly be that sparkly new dtb I loaded?
[3:16pm] ryanharkin: davepigott, i won't say "no" in case it comes back to bite me.... but I'm surprised that it ever worked since you've updated UEFI to run from Boot Monitor
[3:17pm] davepigott: ryanharkin: I got this dtb from yesterday's Android snapshot
[3:19pm] ryanharkin: davepigott, when you said that the 12.06 definitely worked, when did you last verify that it worked?
[3:20pm] davepigott: ryanharkin: Yesterday morning, with the same UEFI we keep discussing, but with the dtb you provided a way back
[3:21pm] ryanharkin: davepigott, ok
[3:22pm] ryanharkin: davepigott, i suppose the problem is that we've changed hwpack (ie. kernel) and DTB at the same time so can't tell which one caused the problem, although I can't see the DTB causing the problem unless there's some weird UART device setting in there
[3:22pm] ryanharkin: davepigott, and it happens for me with 12.08 release, 12.08 DTB and that UEFI
[3:23pm] davepigott: ryanharkin: The serial input problem?
[3:23pm] ryanharkin: davepigott, yes
[3:23pm] davepigott: ryanharkin: Oh crap
[3:23pm] ryanharkin: davepigott, exactly
[3:23pm] davepigott: ryanharkin: Pardon my use of the vernacular.
[3:23pm] ryanharkin: davepigott,
[3:24pm] ryanharkin: davepigott, so, that 12.06 + special kernel and DTB with that UEFI could be a magic combo
[3:24pm] davepigott: ryanharkin: So are we essentially up the proverbial creek, without the proverbial rowing device?
[3:24pm] ryanharkin: davepigott, swimming in it
[3:25pm] davepigott: ryanharkin: Not much point me binary chopping.
[3:26pm] ryanharkin: davepigott, not sure where to go... i'm going to take a 12.06 image, add in my early kernel and DTB (assuming I can find it in my email) and see if it works OK for me
[3:27pm] davepigott: ryanharkin: I have them all safely here, if you want me to ease your pain by sending them to you?
[3:27pm] ryanharkin: davepigott, yes, please!
[3:29pm] davepigott: ryanharkin: On its way
[3:30pm] ryanharkin: ta

Revision history for this message
Ryan Harkin (ryanharkin) wrote :

To narrow it down, I tested 12.07, 12.08 and 12.09 hwpacks with the 12.06 nano disk image.

Copy/Paste works fine with 12.07 and 12.08, but not 12.09.

Revision history for this message
Ryan Harkin (ryanharkin) wrote :

Tixy tested the latest image with the latest nano image (12.10 release) when booting from Boot Monitor directly and he has no problem with copy/paste.

Revision history for this message
Dave Pigott (dpigott) wrote :

OK, so practically speaking, what do I do to? A change to config.txt?

Fathi Boudra (fboudra)
Changed in lava-lab:
milestone: 2012.10 → 2012.11
Revision history for this message
Ryan Harkin (ryanharkin) wrote :

I think I've fixed it! (At least, it works for me ;-))

Basically, the problem was that UEFI was booting on the LITTLE cores.
And it was using the LITTLE cores as the "primary" interrupt
controller. The released that failed to copy/paste properly were the
recent ones with all the big.LITTLE power management stuff in them,
meaning the LITTLE cores were often powered down or running at lower
clock speed, meaning that the interrupt latency was much slower.

And of course, it works for everyone not using UEFI because they were
always booting from the big cores in the first place.

I worked this all out because my copy/paste tests always started to
drop characters after the 16th. And the "fifo half full" interrupt is
triggered after 16 characters.

I've sent an email to Dave with the latest UEFI binary attached so he can update LAVA and test my propsed fix.

Dave Pigott (dpigott)
Changed in lava-lab:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.