[2.3.x] Preseed fails to render when bcache backed partition is reformatted.

Bug #1799161 reported by Vladimir Grevtsev on 2018-10-22
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Critical
Newell Jensen
2.3
Critical
Newell Jensen
2.4
Critical
Newell Jensen

Bug Description

=== Environment ===
OS: Xenial 16.04.4, Linux BLRKECROSINF32 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
MAAS version: 2.3.5
MAAS packages installed: https://pastebin.canonical.com/p/PtwDfKQDP4/

=== Problem summary ===

While deploying some random Juju bundle (in this case it was "magpie"), Juju is not able to get a machine from MAAS: https://pastebin.canonical.com/p/sWSfrFyGxk/
In the same time, MAAS regiond log contains following records:

2018-10-19 13:34:28 maasserver: [error] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/maasserver/api/machines.py", line 561, in deploy
    get_curtin_merged_config(machine)
  File "/usr/lib/python3/dist-packages/maasserver/preseed.py", line 399, in get_curtin_merged_config
    yaml_config = get_curtin_yaml_config(node)
  File "/usr/lib/python3/dist-packages/maasserver/preseed.py", line 387, in get_curtin_yaml_config
    storage_config = compose_curtin_storage_config(node)
  File "/usr/lib/python3/dist-packages/maasserver/preseed_storage.py", line 640, in compose_curtin_storage_config
    return [generator.generate()]
  File "/usr/lib/python3/dist-packages/maasserver/preseed_storage.py", line 68, in generate
    self._generate_bcache_operations()
  File "/usr/lib/python3/dist-packages/maasserver/preseed_storage.py", line 505, in _generate_bcache_operations
    self._generate_bcache_operation(filesystem_group)
  File "/usr/lib/python3/dist-packages/maasserver/preseed_storage.py", line 514, in _generate_bcache_operation
    "backing_device": filesystem_group.get_bcache_backing_filesystem(
AttributeError: 'NoneType' object has no attribute 'get_parent'

Full log: https://pastebin.canonical.com/p/RTrPCDkQ59/

Manual deployment via GUI leads to same error messages in logs - "Logs > Installation output" shows only "System is booting", while similar log records appears in regiond log.

CLI:

ubuntu@BLRKECROSINF31:~$ maas maasadmin machine deploy xpywff --debug
400 BAD REQUEST

       Content-Type: text/plain; charset=utf-8
               Date: Mon, 22 Oct 2018 13:20:40 GMT
             Server: TwistedWeb/16.0.0
             Status: 400
  Transfer-Encoding: chunked
               Vary: Cookie
    X-Frame-Options: SAMEORIGIN

Failed to render preseed: 'NoneType' object has no attribute 'get_parent'
ubuntu@BLRKECROSINF31:~$

Output of "maas admin machine read ....": https://pastebin.canonical.com/p/qB8s5m5HSz/

Related branches

tags: added: field-high
tags: added: field-medium
removed: field-high
Vladimir Grevtsev (vlgrevtsev) wrote :

+ ~field-medium team subscription

Vladimir Grevtsev (vlgrevtsev) wrote :

Changed to field-high again since to this issue may become a blocker on customer env.

description: updated
tags: added: field-high
removed: field-medium
Andres Rodriguez (andreserl) wrote :

Hi Valdimir,

Can you please provide how your storage has been configured? Please provide a screenshot and the output of: maas <user> machine get-curtin-config <systemd-id> .

Changed in maas:
status: New → Incomplete
Vladimir Grevtsev (vlgrevtsev) wrote :

Hi Andres,

ubuntu@BLRKECROSINF31:~$ maas maasadmin machine get-curtin-config xpywff
Machine BLRKECRSDCLD301 is not in a deployment state.

Storage configuration - please, see attached document.

In the meanwhile, I have modified a /usr/lib/python3/dist-packages/maasserver/preseed_storage.py [1] and tried to call "machine deploy" once again. Output is: [2]

[1] https://pastebin.canonical.com/p/X3wdx8PSts/
[2] https://pastebin.canonical.com/p/zrty7Tk7Kn/

So I believe the problem is in the bcache0/bcache1, because all another passed that code successfully.

Vladimir Grevtsev (vlgrevtsev) wrote :

Tried to put machine to "Deploying" state via GUI and tried "get-curtin-config" again, it failed with "list index out of range": https://pastebin.canonical.com/p/4PqJtthWnZ/

Vladimir,

Are you guys configuring anything in /etc/Maas/preseeds/curtin_userdata ?

On Mon, Oct 22, 2018 at 11:30 AM Vladimir Grevtsev <email address hidden>
wrote:

> Tried to put machine to "Deploying" state via GUI and tried "get-curtin-
> config" again, it failed with "list index out of range":
> https://pastebin.canonical.com/p/4PqJtthWnZ/
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1799161
>
> Title:
> [2.3.x] "Failed to render preseed: 'NoneType' object has no attribute
> 'get_parent'" on Juju acquisition event
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1799161/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=maas; status=Incomplete; importance=Undecided;
> assignee=None;
> Launchpad-Bug-Tags: cpe-onsite field-high
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: andreserl vlgrevtsev
> Launchpad-Bug-Reporter: Vladimir Grevtsev (vlgrevtsev)
> Launchpad-Bug-Modifier: Vladimir Grevtsev (vlgrevtsev)
> Launchpad-Message-Rationale: Subscriber (MAAS)
> Launchpad-Message-For: andreserl
>
--
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer

No, we don't.

But looks like I found a wrong place in configuration of storage on server.

Before:

- partition from spin drive created
- added format options to partition
- set mountpoint for partition
- create bcache device, set "backing device" to partition, "caching device" to appropriate part of NVMe drive

After:

- partition from spin drive created
- bcache device created, set "backing device" to partition, "caching device" to NVMe
- added format options for bcache device
- set mountpoint for bcache device

The difference is - in first run (where a bug was originally reported) we have created a ext4 partitions on top of spinning drives, while bcache devices was left intact. The problem was, I guess, in something like - bcache device was created without any filesystem and thus was causing a problems while rendering curtin config.

I will try to share step-by-step MAAS CLI configuration commands to reproduce this case - I believe a little more diagnosis outputs (or some kind of checks) should be added in scope of this.

Vladimir Grevtsev (vlgrevtsev) wrote :

Reproducer: https://pastebin.canonical.com/p/YVdYQr3tDg/ (after this command set execution node is not able to deploy; error like in bug description is shown; also "maas %user% bcaches read %node_id% throws the same exception)

Valid configuration (succeded only after deleting bcaches manually via GUI): https://pastebin.canonical.com/p/w8F475Mn5F/

Changed in maas:
milestone: none → 2.5.0rc1
Changed in maas:
status: Incomplete → Confirmed
importance: Undecided → High
Changed in maas:
importance: High → Critical
Changed in maas:
assignee: nobody → Newell Jensen (newell-jensen)
Changed in maas:
status: Confirmed → In Progress
Changed in maas:
status: In Progress → Fix Committed
Newell Jensen (newell-jensen) wrote :

Vladimir,

Can you please test that latest fix that has landed in master and report back if this clears up the issue for you?

Thanks,

Newell

Changed in maas:
status: Fix Committed → Fix Released
summary: - [2.3.x] "Failed to render preseed: 'NoneType' object has no attribute
- 'get_parent'" on Juju acquisition event
+ [2.3.x] Preseed fails to render when bcache backed partition is
+ reformatted.
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers