after a successful update and reboot, the bootloader snappy_mode is set to 'try'
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | Snappy |
Low
|
Unassigned | ||
Bug Description
After a snappy update, reboot and successful boot, it's possible to run commands before the boot-ok service has run and updated the bootloader config file.
This causes the rollback test to go crazy because snappy rollback updates the bootloader conf, then boot-ok runs and changes the values of the file.
It shouldn't be possible to log in before boot-ok has finished running.
This doesn't affect manual scenarios because for a person it's slow to do something like snappy rollback. It affects the integration suite because after ssh all the commands are executed very fast.
I have proposed a branch with a workaround for the integration suite. Also a branch with an assertion that exposes the problem. Once this is fixed, the second branch should be merged and the workaround removed.
Original report
===============
To reproduce:
$ go run ./_integration-
****** Resuming rollbackSuite.
PASS: <autogenerated>:52: rollbackSuite.
snappy list
Name Date Version Developer
ubuntu-core 2015-09-21 180 ubuntu
generic-amd64 2015-09-22 1.4 canonical
/home/elopio/
...open /home/elopio/
... obtained bool = false
... expected bool = true
The test expects ubuntu core to be 179, the previous version. But it boots on 180.
I think this is something that the test causes, and not a snappy issue.
Related branches
- Federico Gimenez (community): Approve on 2015-10-01
-
Diff: 67 lines (+10/-4)3 files modified_integration-tests/tests/rollback_test.go (+5/-0)
_integration-tests/testutils/partition/bootloader.go (+4/-3)
_integration-tests/testutils/partition/bootloader_test.go (+1/-1)
- Snappy Developers: Pending requested 2015-09-30
-
Diff: 13 lines (+3/-0)1 file modified_integration-tests/tests/update_test.go (+3/-0)
| Federico Gimenez (fgimenez) wrote : | #1 |
| Federico Gimenez (fgimenez) wrote : | #2 |
This is the complete log of a failed execution with calls to show the contents of /boot/grub/grubenv [1]. It seems that after the first update snappy_trial_boot is not properly unset and when trying the rollback the code at [2] switches the boot partition.
[1] http://
[2] http://
| Leo Arias (elopio) wrote : | #3 |
I've ran this like 30 times this evening. I haven't been able to reproduce it.
Take a look at this from your paste:
kpartx -ds error: device-mapper: remove ioctl on loop0p5 failed: Device or resource busy
loop deleted : /dev/loop0
That is https:/
when that happens to me, I can't ssh into the machine. Can you please look if everytime you have the rollback error your udf failed?
On your paste, there's also this problem: https:/
udf fails, but doesn't stop the run.
| Federico Gimenez (fgimenez) wrote : | #4 |
After some executions I've been able to get the error without any reported failures from kpartx [1]. It seems to be more frequent when the kpartx failure is present, but I've also seen successful executions with them [2]
[1] http://
[2] http://
| Leo Arias (elopio) wrote : | #5 |
Many runs today again.
I can't reproduce the error if I call kvm to start the vm manually. I can reproduce it when I let adt-run start the vm.
Here's the grubenv on a successful run:
boot at 187
snappy_mode=regular
snappy_ab=a
snappy_trial_boot=0
fake update
snappy_mode=try
snappy_ab=b
snappy_trial_boot=0
reboot
snappy_mode=regular
snappy_ab=b
snappy_trial_boot=0
rollback
snappy_mode=try
snappy_ab=a
snappy_trial_boot=0
reboot
snappy_mode=regular
snappy_ab=a
snappy_trial_boot=0
Here's on a failed run:
boot at 187
snappy_mode=regular
snappy_ab=a
snappy_trial_boot=0
fake update
snappy_mode=try
snappy_ab=b
snappy_trial_boot=0
reboot
snappy_mode=try
snappy_ab=b
snappy_trial_boot=1
rollback
snappy_mode=try
snappy_ab=a
snappy_trial_boot=1
reboot
snappy_mode=regular
snappy_ab=b
snappy_trial_boot=0
So the problem seems to start after the first reboot. We are in the b partition successfully, but it still says snappy_mode=try and
snappy_
| Leo Arias (elopio) wrote : | #6 |
There's this service that should run on a successful boot:
http://
If I break after the first reboot, systemctl status ubuntu-
| summary: |
- fake rollback integration test fails + after a successful boot, the bootloader snappy_mode is set to 'try' |
| summary: |
- after a successful boot, the bootloader snappy_mode is set to 'try' + after a successful update and reboot, the bootloader snappy_mode is set + to 'try' |
| description: | updated |
| Changed in snappy: | |
| status: | New → Confirmed |
| Leo Arias (elopio) wrote : | #7 |
Thanks to Chipaca and Federico we found that boot-ok is slow, causing havok for this test. Description updated, branches linked.
| Michael Vogt (mvo) wrote : | #8 |
I think we need to also look into this outside of the tests, i.e. a normal user that ssh into the machine and then run stuff (or something like ansible) could hit this scenario as well.
| Changed in snappy: | |
| status: | Confirmed → Triaged |
| importance: | Undecided → Critical |
| Leo Arias (elopio) wrote : | #9 |
I'm confirming now that on bbb is extra slow.
I can log in through ssh or normally through serial, run fw_printenv snappy_mode and see that the value is try. I still have some time to do snappy rollback ubuntu_core and reboot, which is what produces the funny results.
So to answer mvo: yes, scripts are likely to hit this. But it can even happen for a user entering commands manually.
| Michael Vogt (mvo) wrote : | #10 |
We are in a much better position now. As soon as snapd runs (which is relatively early) the boot is considered done. Still, there is a smal window where its not.
| Changed in snappy: | |
| importance: | Critical → Low |


I'm getting this randomly, most of the time it's working. It seems to be related to the value of snappy_trial_boot on /boot/grub/grubenv, when it succeeds the file has:
# GRUB Environment Block
snappy_mode=try
snappy_ab=a
snappy_trial_boot=0
and when it fails:
# GRUB Environment Block
snappy_mode=try
snappy_ab=a
snappy_trial_boot=1
Leo, can you please confirm?