maas serving old image to nodes

Bug #1554636 reported by Andreas Hasenack on 2016-03-08
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Critical
Lee Trager
1.9
Critical
Unassigned
2.0
Critical
Unassigned

Bug Description

This is a difficult bug to file. I can just say what was happening, provide logs, but not really pinpoint at one or two log lines that show what's wrong.

What was happening is that my maas 1.9.1 server (upgraded from 1.9.0) supposedly had only xenial images from March 8th, 2016, but nodes were still deploying the March 5th image. I kicked another import, via API and UI, I even rm -rf'ed all of the on-disk boot resources, but the nodes were still getting March 5th images.

At last, I rebooted the maas server, and after that the new deploys started using the March 8th image.

Related branches

Andreas Hasenack (ahasenack) wrote :
Andres Rodriguez (andreserl) wrote :

Andreas, can you please attach the logs after the reboot (/var/log/maas/*.log)

Andreas Hasenack (ahasenack) wrote :

Tarball with all the logs.

tags: added: kanban-cross-team landscape
tags: removed: kanban-cross-team
Changed in maas:
milestone: none → 2.0.0
Scott Moser (smoser) wrote :

I suspect the issue here is that maas updates images, and updates /etc/tgt/conf.d/maas.conf and then in provisioningserver/import_images/boot_resources.py) calls:
   tgt-admin --conf=/etc/tgt/conf.d/maas.conf --update all

The man page for tgt-admin says:
 If you want to update targets which are in use, you have to add "--force" flag.

maas.conf above has entries like:
<target iqn.2004-05.com.ubuntu:maas:ephemeral-ubuntu-amd64-hwe-x-xenial-daily>
    readonly 1
    allow-in-use yes
    backing-store "/var/lib/maas/boot-resources/snapshot-20160309-005347/ubuntu/amd64/hwe-x/xenial/daily/root-image"
    driver iscsi
</target>

So, if a target was in use (due to a system doing an install or a commissioning), then when maas updates the target 'iqn.2004-05.com.ubuntu:maas:ephemeral-ubuntu-amd64-hwe-x-xenial-daily' and issues a '--update all', it will not actually be updated.

It doesn't seem like maas can just simply use '--force' as if that tgt target is in use its because something is using it and that something would then fail.

So the solution would seem to have to be to not re-use names for new content, and then clean up old in-use targets lazily.

Changed in maas:
importance: Undecided → Critical
status: New → Triaged
milestone: 2.0.0 → 2.0.1
milestone: 2.0.1 → 2.1.0
Changed in maas:
milestone: 2.1.0 → 2.1.1
no longer affects: maas/trunk
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers