[2.0b6] Installation log is not sent back to MAAS

Bug #1590846 reported by Andres Rodriguez
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Critical
Unassigned
curtin
Undecided
Unassigned
curtin (Ubuntu)
Undecided
Unassigned
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned

Bug Description

[Impact]

 * MaaS deployments using curtin revno newer than 365 find that curtin
   no longer sends the installation log file back to maas due to a
   change in how the curtin reporter is configured.

   This affects MaaS users looking to examine the install process of a
   node.

 * This SRU resolves the bug by changing the reporter configuration to
   always send the install log independent of the log level
   configured.

[Test Case]

 * On a Xenial 16.04 system
    - % apt-get install curtin
    - In one shell download and run the report-webhook-logger (attached
      to this bug)
      % python3 ./report-webhook-logger 8000
    - In second shell: run
      % sudo curtin -vvv install \
          --set install/logfile=/tmp/foo \
          --set reporting/mypost/type=webhook \
          --set reporting/mypost/endpoint=http://localhost:8000/ \
          file://tmp/non-existing.tar.gz

  PASS: In the reporting shell, the reporter will recieve an event
        like this with base64 encoded content.

    {
        "event_type": "finish",
        "level": "ERROR",
        "description": "failed: curtin command install",
        "files": [
            {
                "content": <base64 content redacted for brevity>
                "encoding": "base64",
                "path": "/var/log/curtin/install.log"
            }
        ],
        "result": "FAIL",
        "name": "cmd-install",
        "origin": "curtin",
        "timestamp": 1468357973.042974
    }

  FAIL: Reporter does not post an event with install log file.

[Regression Potential]

[Original Description]

Installation log is not sent back to back regardless of whether it being a successful or failed install.

Related branches

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Attaching the rsyslog

Changed in maas:
milestone: none → 2.0.0
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
Andres Rodriguez (andreserl) wrote :

attached is the cloud-init log above.

Revision history for this message
Ryan Harper (raharper) wrote :

Can you supply the curtin command and config files maas sent via cloud-init? IN particular the part-0001 script.

I've seen in other bugs related to curtin install that maas does not appear to have the curtin verbose debug output; that also seems to be missing here as well.

Based on the cloud-init log, it appears that curtin made it at least to the extract stage which would have already configured the reporter and sent these events back to maas; so I'd like to see those events from maas attached here.

Lastly, it appears that the dd image was corrupt, tar/gzip complain here:

Jun 9 15:49:07 dexterous-jamison cloud-init[1187]: gzip: stdin: invalid compressed data--crc error
Jun 9 15:50:22 dexterous-jamison cloud-init[1187]: tar: Child returned status 1
Jun 9 15:50:22 dexterous-jamison cloud-init[1187]: tar: Error is not recoverable: exiting now

which likely means the image is corrupt which prevented the in-target hook execution from working:

Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: /usr/bin/env: 'python': No such file or directory
Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: Unexpected error while running command.
Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: Command: ['/tmp/tmpj_9jbwi6/target/curtin/finalize']
Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: Exit code: 127
Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: Reason: -
Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: Stdout: ''
Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: Stderr: ''
Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: Installation failed with exception: Unexpected error while running command.
Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: Command: ['curtin', 'hook']
Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: Exit code: 3
Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: Reason: -
Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: Stdout: b"/usr/bin/env: 'python': No such file or directory\nUnexpected error while running command.\nCommand: ['/tmp/tmpj_9jbwi6/target/curtin/finalize']\nExit code: 127\nReason: -\nStdout: ''\nStderr: ''\n"
Jun 9 15:50:32 dexterous-jamison cloud-init[1187]: Stderr: ''

Changed in curtin:
status: New → Incomplete
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I'm hitting this too trying to deploy xenial with 2.0.0~beta6+bzr5073-0ubuntu1~xenial1.

I think I have a non working partition config - I have one big partition on /dev/sda and nothing else. I think I need to create a /boot/efi partition too. That may be related?

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Attached /var/log/maas/rsyslog/<machine-name>/<date>/messages from the failed system.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Attached /curtin, /var/log/cloud-init*, and /var/lib/cloud from the failed deployment.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

get-curtin-config output from the node.

http://pastebin.ubuntu.com/17183588/

Changed in curtin:
status: Incomplete → New
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I think I've attached the requested info for curtin, so I set the bug back to new.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

These are the curtin versions on the MAAS server:

ii curtin-common 0.1.0~bzr389-0ubuntu1~16.04.1 all Library and tools for curtin installer
ii python3-curtin 0.1.0~bzr389-0ubuntu1~16.04.1 all Library and tools for curtin installer

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Attached /tmp/install.log

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Attached maas logs from the MAAS server.

Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for the update. So far here's what I've seen.

the storage config for the node isn't quite right.

First the nvme devices is marked as grub_boot device, but it's not being used at the root; that's typically what we see. That itself isn;'t fatal, in this case the fatal error is the fact that the node is booting under UEFI (curtin is installing grub-efi packages) and then that fails to work as the storage config does not contain a /boot/efi partition and mountpoint. This typically happens if the node hasn't been commissioned after upgrading to a maas with custom storage config; or generally if the node is out of sync w.r.t maas's view of it's uefi status.

For the reporting/event/install log bug, we can see that the curtin config sent includes a reporting configuration:

install:
  log_file: /tmp/install.log
  post_files: [/tmp/install.log]
reporting:
  maas: {consumer_key: XfeDkMX7pqfMWRx8bL, endpoint: 'http://10.246.64.50/MAAS/metadata/status/4y3h7p',
    token_key: w3M3BpvTecqBGxKCS6, token_secret: LYF5FwgGHwkPN79xWnakTM3uzwQCLqKj,
    type: webhook}

After examining the maas regiond log, we can see that curtin does send events back to maas:

016-06-10 19:45:22 [-] 127.0.0.1 - - [10/Jun/2016:19:45:21 +0000] "POST /MAAS/metadata/status/4y3h7p HTTP/1.1" 200 2 "-" "python-requests/2.9.1"
2016-06-10 19:45:22 [-] 127.0.0.1 - - [10/Jun/2016:19:45:21 +0000] "POST /MAAS/metadata/status/4y3h7p HTTP/1.1" 200 2 "-" "python-requests/2.9.1"
2016-06-10 19:45:23 [-] 127.0.0.1 - - [10/Jun/2016:19:45:22 +0000] "POST /MAAS/metadata/status/4y3h7p HTTP/1.1" 200 2 "-" "Curtin/0.1"
2016-06-10 19:45:23 [-] 127.0.0.1 - - [10/Jun/2016:19:45:22 +0000] "POST /MAAS/metadata/status/4y3h7p HTTP/1.1" 200 2 "-" "Curtin/0.1"

The python-requests post is from cloud-init; which IIUC, those messages are being returned
the 'Curtin/0.1' are from curtin but are not appearing in maas UI.

At this point, I don't think there is a curtin bug here.

Changed in curtin:
status: New → Invalid
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Actually, This is not a MAAS issue. It is a curtin issue and a regression in curtin 0.1.0~bzr389-0ubuntu1~16.04.1. This doesn't necessarily affect failed deployments, but also successful deployments.

I downgraded to curtin 0.1.0~bzr365-0ubuntu1 on the *same* MAAS, and after a successful deployment, install.log was present. Upgraded again to curtin 0.1.0~bzr389-0ubuntu1~16.04.1, and now the install.log wasn't present. Tested this behavior with different machines, same MAAS, different curtin.

Changed in maas:
status: Triaged → Invalid
Changed in curtin:
status: Invalid → New
Revision history for this message
Mike Pontillo (mpontillo) wrote :

I was in the middle of doing a packet capture to try to troubleshoot the issue before Andres determined it was "Invalid" for MAAS.

Here's a packet capture that shows Curtin's last words on the network, in case it helps:

https://paste.ubuntu.com/17190232/

This was with curtin bzr389.

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1590846] Re: [2.0b6] Installation log of a failed install not sent back to MAAS

On Fri, Jun 10, 2016 at 5:49 PM, Mike Pontillo <email address hidden>
wrote:

> I was in the middle of doing a packet capture to try to troubleshoot the
> issue before Andres determined it was "Invalid" for MAAS.
>
> Here's a packet capture that shows Curtin's last words on the network,
> in case it helps:
>
> https://paste.ubuntu.com/17190232/
>
> This was with curtin bzr389.
>

I don't understand. The trace clearly shows curtin posting events in
detail back to maas.
Why aren't these shown?

Do we have a trace from the previous curtin which succeeds?

I do know that curtin now sends

level: "INFO"

in the data. where prior versions didn't include the level key.

I just don't see how if curtin is posting this too maas and what appears
from the trace is that maas is *getting* the data, the even, how this is a
curtin issue?

Revision history for this message
Ryan Harper (raharper) wrote : Re: [2.0b6] Installation log of a failed install not sent back to MAAS

Unless I'm misreading the packet dump, the events are clearly posted to maas and being decoded as a curtin event with curtin details. Why maas isn't reading/storing/displaying them is not something that curtin controls.

I don't want to bat this back and forth but the very fact that maas accepts the POSTed data and it returns OK clearly means the data is arriving at the owner of the reporting URL. It *cannot* be curtin at fault; it's sending the data; full-stop.

Changed in curtin:
status: New → Invalid
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Ryan,

As i said, curtin in Xenial (main/released) does send the install log. Curtin in Xenial-updates does NOT Send the install log.

Note that sending *messages* vs the *install.log* is different. This is a regression in curtin because the *new* version of curtin in Xenial-updates results in no install.log being stored.

Changed in curtin:
status: Invalid → New
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Ryan,

Let me explain this better:

When I use MAAS 2.0 with curtin 0.1.0~bzr365-0ubuntu1 installed, the following happens:

1. I deploy a machine
2. MAAS receives status messages from both, curtin/clout-init.
3. The machine installs successfully.
4. After the install, install.log is stored.
5. Full list of events/status messages: http://pastebin.ubuntu.com/17218366/
6. Logs on regiond.log (ignore the difference on the date here, because logs time is UTC): http://paste.ubuntu.com/17218488/

Now, if I use the /same/ MAAS with curtin 0.1.0~bzr389-0ubuntu1~16.04.1, the following happens:

1. I deploy a machine
2. MAAS receives status messages from both, curtin/cloud-init
3. The machine installs successfully.
4. After the install, install.log is NOT stored.
   - Full list of events/status messages: http://pastebin.ubuntu.com/17219039/
   - Logs on regiond.log: http://paste.ubuntu.com/17218981/

While I have not done a packet capture like Mike has, I do see that Mike's packet capture correctly reflects what's currently happening. Curtin is sending status messages/events, which are correctly stored, *however* it does not seem to be sending 'install.log' at all. So this is definitely a regression in curtin's side. The simple fact that with curtin 0.1.0~bzr365-0ubuntu1 install.log is stored, and with curtin 0.1.0~bzr389-0ubuntu1~16.04.1 install.log is not stored, confirms this.

Hope this helps. I can try run a packet capture on Monday to see the differences in both if that helps.

summary: - [2.0b6] Installation log of a failed install not sent back to MAAS
+ [2.0b6] Installation log is not sent back to MAAS
description: updated
Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1590846] Re: [2.0b6] Installation log of a failed install not sent back to MAAS

On Sat, Jun 11, 2016 at 12:13 PM, Andres Rodriguez <email address hidden>
wrote:

> Hi Ryan,
>
> Let me explain this better:
>
> When I use MAAS 2.0 with curtin 0.1.0~bzr365-0ubuntu1 installed, the
> following happens:
>
> 1. I deploy a machine
> 2. MAAS receives status messages from both, curtin/clout-init.
> 3. The machine installs successfully.
> 4. After the install, install.log is stored.
> 5. Full list of events/status messages:
> http://pastebin.ubuntu.com/17218366/
> 6. Logs on regiond.log (ignore the difference on the date here, because
> logs time is UTC): http://paste.ubuntu.com/17218488/
>
> Now, if I use the /same/ MAAS with curtin 0.1.0~bzr389-0ubuntu1~16.04.1,
> the following happens:
>
> 1. I deploy a machine
> 2. MAAS receives status messages from both, curtin/cloud-init
> 3. The machine installs successfully.
> 4. After the install, install.log is NOT stored.
> - Full list of events/status messages:
> http://pastebin.ubuntu.com/17219039/
> - Logs on regiond.log: http://paste.ubuntu.com/17218981/
>
> While I have not done a packet capture like Mike has, I do see that
> Mike's packet capture correctly reflects what's currently happening.
> Curtin is sending status messages/events, which are correctly stored,
> *however* it does not seem to be sending 'install.log' at all. So this
> is definitely a regression in curtin's side. The simple fact that with
> curtin 0.1.0~bzr365-0ubuntu1 install.log is stored, and with curtin
> 0.1.0~bzr389-0ubuntu1~16.04.1 install.log is not stored, confirms this.
>

Thanks for the clarification. That helps me focus on the test-case needed
here, the post_files portion.

I'll setup a test-case and see where we go from there.

Revision history for this message
Ryan Harper (raharper) wrote :

It looks like when we added the level key support, this also applied some implicit event
filtering where the install log was considered DEBUG level with no reporting level set
it defaults to INFO which prevented curtin from publishing the post_files event.

A quick test on the maas side would update the following dictionary:

reporting:
  maas:
     level: DEBUG

A complete install config would look like this.

install:
    log_file: /tmp/install.log
    post_files:
    - /tmp/install.log
reporting:
    maas:
        consumer_key: XfeDkMX7pqfMWRx8bL
        endpoint: http://10.246.64.50/MAAS/metadata/status/4y3h7p
        token_key: w3M3BpvTecqBGxKCS6
        token_secret: LYF5FwgGHwkPN79xWnakTM3uzwQCLqKj
        type: webhook
        level: DEBUG

Revision history for this message
Ryan Harper (raharper) wrote :

I can reproduce this issue by setting the install config with a post_files setting and observe the output not including the requested file. A workaround is to include a level value in the reporting dictionary set to the value of 'DEBUG'.

Curtin will likely default the level to DEBUG to retain sending all information to MAAS, which can use the level value in the events to filter as they like.

Changed in curtin:
status: New → Confirmed
Revision history for this message
Ryan Harper (raharper) wrote :

One-liner to have maas emit a reporting config to curtin with level: DEBUG.

Ryan Harper (raharper)
tags: added: curtin-sru
Changed in curtin:
status: Confirmed → Fix Committed
Revision history for this message
Ryan Harper (raharper) wrote :
description: updated
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Andres, or anyone else affected,

Accepted curtin into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr399-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in curtin (Ubuntu):
status: New → Fix Released
Changed in curtin (Ubuntu Xenial):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I verified this fixes the bug with package version 0.1.0~bzr399-0ubuntu1~16.04.1

tags: added: verification-done
removed: verification-needed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Andres, or anyone else affected,

Accepted curtin into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr399-0ubuntu1~14.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in curtin (Ubuntu Trusty):
status: New → Fix Committed
tags: removed: verification-done
tags: added: verification-needed
Revision history for this message
Andres Rodriguez (andreserl) wrote :

I've tested this and it works. MArking verification-done.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr399-0ubuntu1~16.04.1

---------------
curtin (0.1.0~bzr399-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * debian/new-upstream-snapshot: fix for specifying revision.
  * SRU current curtin
    - curtin/net: fix inet value for subnets, don't add interface attributes
      to alias (LP: #1588547)
    - improve net-meta network configuration (LP: #1592149)
    - reporting: set webhook handler level to DEBUG, no filtering
      (LP: #1590846)
    - tests/vmtests: add yakkety, remove vivid
    - curtin/net: use post-up for interface alias, resolve 120 second time out
      on Trusty when using interface aliases
    - vmtest: provide info on images used
    - fix multipath configuration and add multipath tests (LP: #1551937)
    - tools/launch and tools/xkvm: whitespace cleanup and bash -x
    - tools/launch: boot by root=LABEL=cloudimg-rootfs
    - Initial vmtest power8 support and TestSimple test.

 -- Ryan Harper <email address hidden> Tue, 12 Jul 2016 11:29:30 -0500

Changed in curtin (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for curtin has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.5 KiB)

This bug was fixed in the package curtin - 0.1.0~bzr399-0ubuntu1~14.04.1

---------------
curtin (0.1.0~bzr399-0ubuntu1~14.04.1) trusty-proposed; urgency=medium

  * SRU current curtin
    - curtin/net: fix inet value for subnets, don't add interface attributes
      to alias (LP: #1588547)
    - improve net-meta network configuration (LP: #1592149)
    - reporting: set webhook handler level to DEBUG, no filtering
      (LP: #1590846)
    - tests/vmtests: add yakkety, remove vivid
    - curtin/net: use post-up for interface alias, resolve 120 second time out
      on Trusty when using interface aliases
    - vmtest: provide info on images used
    - fix multipath configuration and add multipath tests (LP: #1551937)
    - tools/launch and tools/xkvm: whitespace cleanup and bash -x
    - tools/launch: boot by root=LABEL=cloudimg-rootfs
    - Initial vmtest power8 support and TestSimple test.

curtin (0.1.0~bzr389-0ubuntu1~14.04.1) trusty-proposed; urgency=medium

  * New upstream snapshot.
    * Detect and remove legacy /etc/network/interfaces.d/eth0.cfg from
      target (LP: #1577872)

curtin (0.1.0~bzr387-0ubuntu1~14.04.1) trusty-proposed; urgency=medium

  * sru current curtin (LP: #1577872)
  * debian/new-upstream-snapshot, debian/README.source: add
    new-upstream-snapshot and mention it in README.source
  * debian/control: drop python from curtin-common Depends.
     remove unnecessary Depends on util-linux as it is essential.
     python3-curtin, python-curtin: drop unnecessary 'curl' from Depends.
     python3-curtin, python-curtin: list oauthlib and yaml Depends
  * debian/control: add bcache-tools to curtin Depends.
  * New upstream snapshot.
    - fix timestamp not being updated in reported events
    - mdadm: resolve mdadm/bcache and trusty+hwe issues
    - fix support for 4k disks
    - emit source /etc/network/interfaces.d/*.cfg in
      rendered /etc/network/interfaces
    - net: introduce 'control' field to network configuration to allow
      for declaring manual controlled interfaces
    - disable cloud-init networking as curtin is the source of network config
    - block: wipe_volume improvements
    - reporter: enhance reporting events to include levels and
      improve usefullness of messages
    - network: add bonding tests and cleanup newline rendering
    - block: fix partition path issue with nvme devices
    - fix logic error in kernel installation
    - block: add debug regarding raid modules being missing on mdadm create
    - add s390x support to curtin and vmtest
    - support build on xenial where python3 pyflakes is split out
    - fix uefi install path on nvme devices
    - numerous unit tests and vmtests improvements. Add running
      of pylint for static checking.
    - Add bond parsing & improved source, source-directory parsing
      of /etc/network/interfaces.
    - move global dns-* options under auto lo in /etc/network/interfaces
    - partitioning: limited support for odd ordering of partition numbers
    - change use of mkfs.fat to mkfs.vfat and add dependency.
    - block-meta: use removable devices if no non-removable devices are
      found [Robert Clark]
    - Improve 'curtin mkfs' and move mkfs...

Read more...

Changed in curtin (Ubuntu Trusty):
status: Fix Committed → Fix Released
Changed in curtin:
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers