snapdragon uc18 image fails to boot (current stable)

Bug #1846397 reported by Kyle Nitzsche on 2019-10-02
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
snapd
Critical
Ondrej Kubik

Bug Description

The current stable uc18 snapdragon arm64 doesn't boot.

http://us.cdimage.ubuntu.com/ubuntu-core/18/stable/current/

ubuntu-core-18-arm64+snapdragon.img.xz 2019-08-06 07:56

An earlier image boots fine: http://us.cdimage.ubuntu.com/ubuntu-core/18/stable/20190213/ubuntu-core-18-arm64+snapdragon.img.xz

Other than the usual quick flash of a blue LED, there is no apparent boot activity. HDMI shows none of the usual boot sequence. I have no UART so cannot see boot console.

Paul Larson (pwlars) wrote :

This image is used successfully many times a day in our lab, but those have a uart attached for capturing debug information and for control. I tried this without a uart connected and confirmed that it does not boot though. I also noticed that if you boot with the uart connected, then disconnect it, it remains booted and usable.

Changed in snapd:
status: New → Confirmed
Łukasz Zemczak (sil2100) wrote :

This seems like one of those bugs that one would never actually explicitly testing, because it doesn't make sense to treat as a separate test-case (i.e. testing with and without serial connected). Sadly there's not much one could have done automated-testing wise - we just need to use this as a lesson and dedicate one device with serial disconnected to make sure this test case is handled. From what Chris mentioned, this is now done, so we should be covered for the future!

For now I have asked Ondrej if he could take a look, since this might be something that got introduced with his latest dragonboard gadget update.

Ondrej: could you take a look? I'd like to know if this issue can be fixed easily - otherwise, per Chris Wayne's proposition, we'd probably want to revert the dragonboard images to the previous stable version. Since an older working image is better than a newer one that doesn't boot (for some cases).

Changed in snapd:
assignee: nobody → Ondrej Kubik (ondrak)
importance: Undecided → Critical
Łukasz Zemczak (sil2100) wrote :

Ok, for now we have reverted the snapdragon images to the old images. But we'd really need this fixed.

Ondrej Kubik (ondrak) wrote :

After some debugging this seems to be u-boot caused change. By wiring uart directly without 96boards mezzanine I was able to test boot sequence with disconnected RX and TX.
And I can now see u-boot stopping at:
DRAM: 986 MiB
MMC: sdhci@07824000: 0, sdhci@07864000: 1
Loading Environment from FAT... OK
In: serial@78b0000
Out: serial@78b0000
Err: serial@78b0000
## Error: Can't overwrite "serial#"
## Error inserting "serial#" variable, errno=1
Net: Net Initialization Skipped
No ethernet found.
Hit any key to stop autoboot: 0
dragonboard410c =>

I will see if this is bug in v2019.04 or we need extra config flag...... stay tuned

Ondrej Kubik (ondrak) wrote :

OK correction and update
This was false lead, close though....
Seems like u-boot "thinks" someone pressed key to abort boot when there is nothing connected on uart.
Now Testing some patches to fix this

Ondrej Kubik (ondrak) wrote :

I have narrow it down to this commit:
https://gitlab.denx.de/u-boot/u-boot/commit/b460b889e28379014a7f951c08d93a151116b1ad
Questions now is, do we revert it completely, or debug further which part of initialisation is broken there

Paul Larson (pwlars) wrote :

As I understand it, the 20191008.2 core18 beta image should have this change included. I tried booting it on a system where serial is disconnected, and things are improved at least - It started booting the kernel after a few seconds. But then it died with "No init found".
After the resizing on first boot, I saw quite a few mount errors. I can get a picture of the screen if that helps.

I rebooted it without changing anything though, and it worked on the second boot.

Paul Larson (pwlars) wrote :

Here's a snapshot of the screen when I got the errors

Paul Larson (pwlars) wrote :

Also forgot to mention, the first two times I tested it, it failed for me (one on each of those images), then when I went back to reproduce it by writing a fresh image each time, it failed to reproduce this problem for the next 5 attempts, then I got it to happen again. So it seems to be somewhat random.

that's interesting as resizing and generally init run should be way beyond
bootloader.
I wonder if we can reproduce same with previous gadget snap revision

On Tue, Oct 8, 2019 at 5:31 PM Paul Larson <email address hidden>
wrote:

> I also tried https://people.canonical.com/~okubik/ubuntu-core-
> dragonboard-20191008-00.img.xz with the same result
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1846397
>
> Title:
> snapdragon uc18 image fails to boot (current stable)
>
> Status in snapd:
> Confirmed
>
> Bug description:
> The current stable uc18 snapdragon arm64 doesn't boot.
>
> http://us.cdimage.ubuntu.com/ubuntu-core/18/stable/current/
>
> ubuntu-core-18-arm64+snapdragon.img.xz 2019-08-06 07:56
>
> An earlier image boots fine: http://us.cdimage.ubuntu.com/ubuntu-
> core/18/stable/20190213/ubuntu-core-18-arm64+snapdragon.img.xz
>
> Other than the usual quick flash of a blue LED, there is no apparent
> boot activity. HDMI shows none of the usual boot sequence. I have no
> UART so cannot see boot console.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/snapd/+bug/1846397/+subscriptions
>

Ondrej Kubik (ondrak) wrote :

From screenshot this is indeed way further in the boot chain. We should
validate this is not happening with previous gadget. As error happens
inside initrd. Only relation to u-boot would be messed up hw init. I can
revert u-boot version to one we used till now
Also is this happening with and without uart connected?

On Tue, 8 Oct 2019 at 19:08, Ondrej Kubik <email address hidden>
wrote:

> that's interesting as resizing and generally init run should be way beyond
> bootloader.
> I wonder if we can reproduce same with previous gadget snap revision
>
> On Tue, Oct 8, 2019 at 5:31 PM Paul Larson <email address hidden>
> wrote:
>
>> I also tried https://people.canonical.com/~okubik/ubuntu-core-
>> dragonboard-20191008-00.img.xz with the same result
>>
>> --
>> You received this bug notification because you are a bug assignee.
>> https://bugs.launchpad.net/bugs/1846397
>>
>> Title:
>> snapdragon uc18 image fails to boot (current stable)
>>
>> Status in snapd:
>> Confirmed
>>
>> Bug description:
>> The current stable uc18 snapdragon arm64 doesn't boot.
>>
>> http://us.cdimage.ubuntu.com/ubuntu-core/18/stable/current/
>>
>> ubuntu-core-18-arm64+snapdragon.img.xz 2019-08-06 07:56
>>
>> An earlier image boots fine: http://us.cdimage.ubuntu.com/ubuntu-
>> core/18/stable/20190213/ubuntu-core-18-arm64+snapdragon.img.xz
>>
>> Other than the usual quick flash of a blue LED, there is no apparent
>> boot activity. HDMI shows none of the usual boot sequence. I have no
>> UART so cannot see boot console.
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/snapd/+bug/1846397/+subscriptions
>>
>

Paul Larson (pwlars) wrote :

I can try some more in the morning, but here are some more observations from the rest of my testing today:
1. with both the current beta image, and the one from your people.c.c, I had previously been rewriting the sd card each time to test. I was hitting the errors maybe 10-20% of the time or so. After a successful boot, I then tried rebooting the same sd several more times without rewriting it, and after about 6 more successful boots, I was able to reproduce the error again. It would be interesting to see if someone else with a dragonboard can reproduce this behavior. The SD card I'm using has never given me problems before, but given the nature of them, I don't think I could rule out the possibility of a media problem.

2. I also tried booting the current/stable image that is on cdimage now. This is the image from before this problem was detected. I've rebooted it at least 12 times so far, and have not yet been able to reproduce the problems. I'll still try this some more though, given how randomly I've been able to reproduce this so far.

To be fair, things are definitely *better* than they were before, but there may or may not be a second issue we are seeing.

Ondrej Kubik (ondrak) wrote :

So image from cd image is not giving us good reference as it's running
different kernel, and this problem is happening in early boot. Let me build
some test images to compare with

On Wed, 9 Oct 2019, 04:55 Paul Larson, <email address hidden> wrote:

> I can try some more in the morning, but here are some more observations
> from the rest of my testing today:
> 1. with both the current beta image, and the one from your people.c.c, I
> had previously been rewriting the sd card each time to test. I was hitting
> the errors maybe 10-20% of the time or so. After a successful boot, I then
> tried rebooting the same sd several more times without rewriting it, and
> after about 6 more successful boots, I was able to reproduce the error
> again. It would be interesting to see if someone else with a dragonboard
> can reproduce this behavior. The SD card I'm using has never given me
> problems before, but given the nature of them, I don't think I could rule
> out the possibility of a media problem.
>
> 2. I also tried booting the current/stable image that is on cdimage now.
> This is the image from before this problem was detected. I've rebooted
> it at least 12 times so far, and have not yet been able to reproduce the
> problems. I'll still try this some more though, given how randomly I've
> been able to reproduce this so far.
>
> To be fair, things are definitely *better* than they were before, but
> there may or may not be a second issue we are seeing.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1846397
>
> Title:
> snapdragon uc18 image fails to boot (current stable)
>
> Status in snapd:
> Confirmed
>
> Bug description:
> The current stable uc18 snapdragon arm64 doesn't boot.
>
> http://us.cdimage.ubuntu.com/ubuntu-core/18/stable/current/
>
> ubuntu-core-18-arm64+snapdragon.img.xz 2019-08-06 07:56
>
> An earlier image boots fine: http://us.cdimage.ubuntu.com/ubuntu-
> core/18/stable/20190213/ubuntu-core-18-arm64+snapdragon.img.xz
>
> Other than the usual quick flash of a blue LED, there is no apparent
> boot activity. HDMI shows none of the usual boot sequence. I have no
> UART so cannot see boot console.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/snapd/+bug/1846397/+subscriptions
>

Ondrej Kubik (ondrak) wrote :

I got same error when build latest UC18 image with gadget from stable
So this is not related to latest change
It's probably more related to the fact we do not always test clean boot, or
do we?
My test image is dragonboard_48.snap and dragonboard-kernel_114.snap

On Wed, Oct 9, 2019 at 8:15 AM Ondrej Kubik <email address hidden>
wrote:

> So image from cd image is not giving us good reference as it's running
> different kernel, and this problem is happening in early boot. Let me build
> some test images to compare with
>
> On Wed, 9 Oct 2019, 04:55 Paul Larson, <email address hidden> wrote:
>
>> I can try some more in the morning, but here are some more observations
>> from the rest of my testing today:
>> 1. with both the current beta image, and the one from your people.c.c, I
>> had previously been rewriting the sd card each time to test. I was hitting
>> the errors maybe 10-20% of the time or so. After a successful boot, I then
>> tried rebooting the same sd several more times without rewriting it, and
>> after about 6 more successful boots, I was able to reproduce the error
>> again. It would be interesting to see if someone else with a dragonboard
>> can reproduce this behavior. The SD card I'm using has never given me
>> problems before, but given the nature of them, I don't think I could rule
>> out the possibility of a media problem.
>>
>> 2. I also tried booting the current/stable image that is on cdimage now.
>> This is the image from before this problem was detected. I've rebooted
>> it at least 12 times so far, and have not yet been able to reproduce the
>> problems. I'll still try this some more though, given how randomly I've
>> been able to reproduce this so far.
>>
>> To be fair, things are definitely *better* than they were before, but
>> there may or may not be a second issue we are seeing.
>>
>> --
>> You received this bug notification because you are a bug assignee.
>> https://bugs.launchpad.net/bugs/1846397
>>
>> Title:
>> snapdragon uc18 image fails to boot (current stable)
>>
>> Status in snapd:
>> Confirmed
>>
>> Bug description:
>> The current stable uc18 snapdragon arm64 doesn't boot.
>>
>> http://us.cdimage.ubuntu.com/ubuntu-core/18/stable/current/
>>
>> ubuntu-core-18-arm64+snapdragon.img.xz 2019-08-06 07:56
>>
>> An earlier image boots fine: http://us.cdimage.ubuntu.com/ubuntu-
>> core/18/stable/20190213/ubuntu-core-18-arm64+snapdragon.img.xz
>>
>> Other than the usual quick flash of a blue LED, there is no apparent
>> boot activity. HDMI shows none of the usual boot sequence. I have no
>> UART so cannot see boot console.
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/snapd/+bug/1846397/+subscriptions
>>
>

Paul Larson (pwlars) wrote :

I just grepped through all the serial output since 9/22 that we have in the lab, and I don't see any occurrences of "No init found" showing up in the logs.

We always provision the system with a fresh image every time on dragonboard. However in my testing at home, I was able to reproduce this even by booting the same image over and over enough times, so I don't think it has to only show up on the first boot.

I'll try this new image also, and also try to see if I can get it to happen with serial unless you already did that.

Ondrej Kubik (ondrak) wrote :

I was able to reproduce it with serial connected.
To confirm, area also able to reproduce it with stable channel image at
home?

On Wed, 9 Oct 2019, 19:20 Paul Larson, <email address hidden> wrote:

> I just grepped through all the serial output since 9/22 that we have in
> the lab, and I don't see any occurrences of "No init found" showing up
> in the logs.
>
> We always provision the system with a fresh image every time on
> dragonboard. However in my testing at home, I was able to reproduce this
> even by booting the same image over and over enough times, so I don't
> think it has to only show up on the first boot.
>
> I'll try this new image also, and also try to see if I can get it to
> happen with serial unless you already did that.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1846397
>
> Title:
> snapdragon uc18 image fails to boot (current stable)
>
> Status in snapd:
> Confirmed
>
> Bug description:
> The current stable uc18 snapdragon arm64 doesn't boot.
>
> http://us.cdimage.ubuntu.com/ubuntu-core/18/stable/current/
>
> ubuntu-core-18-arm64+snapdragon.img.xz 2019-08-06 07:56
>
> An earlier image boots fine: http://us.cdimage.ubuntu.com/ubuntu-
> core/18/stable/20190213/ubuntu-core-18-arm64+snapdragon.img.xz
>
> Other than the usual quick flash of a blue LED, there is no apparent
> boot activity. HDMI shows none of the usual boot sequence. I have no
> UART so cannot see boot console.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/snapd/+bug/1846397/+subscriptions
>

Paul Larson (pwlars) wrote :

No, exactly the opposite. I have *not* been able to reproduce the "No init found" error with the current stable image so far. I was about to try the new image from today that you have on people.c.c, but I can switch back to trying the current stable image instead. I've only run through it about 12 times so far, so it's still possible - just haven't seen it so far, and I usually do by that many times on the other images

Ondrej Kubik (ondrak) wrote :

hmm strange.
Are you testing image you build yourself or image from cdimage?
Let's see if you can reproduce it with image I built

On Wed, Oct 9, 2019 at 7:50 PM Paul Larson <email address hidden>
wrote:

> No, exactly the opposite. I have *not* been able to reproduce the "No
> init found" error with the current stable image so far. I was about to
> try the new image from today that you have on people.c.c, but I can
> switch back to trying the current stable image instead. I've only run
> through it about 12 times so far, so it's still possible - just haven't
> seen it so far, and I usually do by that many times on the other images
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1846397
>
> Title:
> snapdragon uc18 image fails to boot (current stable)
>
> Status in snapd:
> Confirmed
>
> Bug description:
> The current stable uc18 snapdragon arm64 doesn't boot.
>
> http://us.cdimage.ubuntu.com/ubuntu-core/18/stable/current/
>
> ubuntu-core-18-arm64+snapdragon.img.xz 2019-08-06 07:56
>
> An earlier image boots fine: http://us.cdimage.ubuntu.com/ubuntu-
> core/18/stable/20190213/ubuntu-core-18-arm64+snapdragon.img.xz
>
> Other than the usual quick flash of a blue LED, there is no apparent
> boot activity. HDMI shows none of the usual boot sequence. I have no
> UART so cannot see boot console.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/snapd/+bug/1846397/+subscriptions
>

Paul Larson (pwlars) wrote :

None of this is with an image I've built myself.
Using your image from 20191009, I was able to reproduce the "No init found" error after 6 attempts.
Using the current/stable image, I made 12 attempts yesterday, and 15 today, and I still have not managed to reproduce the "No init found" error

Changed in snapd:
status: Confirmed → In Progress
Zygmunt Krynicki (zyga) wrote :
Download full text (4.4 KiB)

I tried reproducing this so I took my board and the image referenced above. I flashed it to an 8GB card, ensured that the boot select DPI switches were set up for booting from SD card (switch #2 on, rest off) and hit the power button.

To my surprise I booted to the eMMC image I happened to have. I tried several times but I was unable to boot the SD card at all.

Not sure if this helps but those are the early boot loader messages:

Format: Log Type - Time(microsec) - Message - Optional Info
Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic
S - QC_IMAGE_VERSION_STRING=BOOT.BF.3.0-00286
S - IMAGE_VARIANT_STRING=HAAAANAZA
S - OEM_IMAGE_VERSION_STRING=CRM
S - Boot Config, 0x000002e3
S - Core 0 Frequency, 0 MHz
B - 1545 - PBL, Start
B - 3490 - bootable_media_detect_entry, Start
B - 521579 - bootable_media_detect_success, Start
B - 521583 - elf_loader_entry, Start
B - 523171 - auth_hash_seg_entry, Start
B - 523382 - auth_hash_seg_exit, Start
B - 539271 - elf_segs_hash_verify_entry, Start
B - 600304 - PBL, End
B - 476867 - SBL1, Start
B - 526155 - pm_device_init, Start
D - 14182 - pm_device_init, Delta
B - 540673 - boot_flash_init, Start
D - 0 - boot_flash_init, Delta
B - 544699 - boot_config_data_table_init, Start
D - 5106706 - boot_config_data_table_init, Delta - (452 Bytes)
B - 5655950 - CDT version:3,Platform ID:24,Major ID:1,Minor ID:0,Subtype:0
B - 5662203 - sbl1_ddr_set_params, Start
B - 5666015 - cpr_init, Start
D - 0 - cpr_init, Delta
B - 5671627 - Pre_DDR_clock_init, Start
D - 213 - Pre_DDR_clock_init, Delta
D - 0 - sbl1_ddr_set_params, Delta
B - 5684132 - pm_driver_init, Start
D - 3477 - pm_driver_init, Delta
B - 5696332 - SBC platform detected: XO_ADJ_FINE = 0x20
B - 5696363 - clock_init, Start
D - 30 - clock_init, Delta
B - 5711277 - Image Load, Start
D - 27999 - QSEE Image Loaded, Delta - (567468 Bytes)
B - 5739307 - Image Load, Start
D - 427 - SEC Image Loaded, Delta - (2048 Bytes)
B - 5746535 - sbl1_efs_handle_cookies, Start
D - 61 - sbl1_efs_handle_cookies, Delta
B - 5754313 - Image Load, Start
D - 13725 - QHEE Image Loaded, Delta - (56048 Bytes)
B - 5768068 - Image Load, Start
D - 13481 - RPM Image Loaded, Delta - (149316 Bytes)
B - 5781580 - Image Load, Start
D - 28822 - APPSBL Image Loaded, Delta - (519844 Bytes)
B - 5810433 - QSEE Execution, Start
D - 61 - QSEE Execution, Delta
B - 5816136 - SBL1, End
D - 5341618 - SBL1, Delta
S - Flash Throughput, 101000 KB/s (1295176 Bytes, 12749 us)
S - DDR Frequency, 400 MHz
Android Bootloader - UART_DM Initialized!!!
[0] [0] BUILD_VERSION=dragonboard410c-LA_BR_1_2_7-03810-8x16_0-linaro2
[0] [0] BUILD_DATE=15:13:18 - Feb 5 2018
[0] [0] welcome to lk

[10] [10] platform_init()
[10] [10] target_init()
[40] [40] SDHC Running in HS200 mode
[90] [90] Done initialization of the card
[90] [90] pm8x41_get_is_cold_boot: cold boot
[100] [100] Neither 'config' nor 'frp' partition found
[100] [100] Not able to search the panel:
[110] [110] Display not enabled for 24 HW type
[110] [110] Target panel init not found!
[120] [120] pm...

Read more...

Chris Wayne (cwayne18) wrote :

This should be marked as fixed, we've verified the fix coinciding with the 18.04.4 release.

Zygmunt Krynicki (zyga) on 2020-02-18
Changed in snapd:
status: In Progress → Fix Released
Zygmunt Krynicki (zyga) wrote :

I've used a different SD card and booted to a working core system. I managed to correctly register and access the device over SSH over wifi. Everything is good.

Łukasz Zemczak (sil2100) wrote :

So the state of this bug is a bit, let's say, complicated. What image exactly did you use for testing?

The current uc18 stable images should not have this issue as they're built with an older, previous stable gadget snap. Paul only experienced the issues when using stable images with the new gadget, so for instance the one that's in 18/candidate.
You could basically use ubuntu-image to build a dragonboard image with --channel=candidate, or actually only `--snap dragonboard=18/candidate`.

Changed in snapd:
status: Fix Released → Incomplete

Is this bug still reproducible on any condition?

Last runs we did using stable/candidate/beta images were fine, in all the cases the board managed to boot fine and could run the tests.

Paul Larson (pwlars) wrote :

Perhaps Lukasz can confirm, but my understanding is that this was never really fixed yet in our uboot/gadget. It was worked around in the image by rebuilding the image with an older version of the gadget snap, correct? So the potential risk, which we almost hit on a previous core18 image build, is that if it gets forgotten and an image is rebuilt using the broken version, then it could regress.

If there was actually a fix that landed in the gadget though, then it could be closed.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments