[RFE] Collect deployment logs from IPA

Bug #1587143 reported by Lucas Alvares Gomes
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Wishlist
Lucas Alvares Gomes
ironic-python-agent
Fix Released
Wishlist
Lucas Alvares Gomes

Bug Description

Link to the spec: https://review.openstack.org/#/c/323511/

Problem
-------

Currently, there are few ways to access the logs from the IPA ramdisk when a deployment fails. None of the ways are easy to use or is intended to be used in production, e.g:

One could have a console session opened and watch the logs there. While this works, it's hard to use because we don't know which node will be pick by the scheduler at deployment time. Also, not all drivers do support console.

Another way is to disable powering off a node upon a deployment failure [0]. This method has some problems per-si:

0) It does not work in conjunction with nova, nova will call destroy() on the virt driver upon a failure which will power the node off in Ironic.

1) Leaving the nodes powered on after a failure is not desirable in some deployments.

Proposal
--------

This RFE introduces the work to retrieve the IPA system logs via its API and upload it to Swift.

Changes in IPA
~~~~~~~~~~~~~~

A new extension called "log" would be added to IPA, this extension will introduce a new command called "collect_system_logs" which will collect the logs from the system, gzip it, base64 encode the binary and return the result string.

The logs will be collected from journald and if not present we should fallback and get the logs from the /var/log/* folder as well as dmesg and so on.

Changes in Ironic
~~~~~~~~~~~~~~~~~

The new IPA method will be be invoked upon a node deployment failure, if the command is not supported a warning message will be logged to alert the operator about it.

Two new configuration options will be added to Ironic:

0) "agent_retrieve_logs_on_deploy_failure": (Boolean) If True retrieve the logs from IPA when the deployment fails. Defaults to False.

1) "agent_logs_swift_container": (String) Name of the Swift container to store the deployment logs.

[0] https://review.openstack.org/#/c/259119/

Tags: rfe-approved
Changed in ironic:
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
importance: Undecided → Wishlist
tags: added: rfe
description: updated
Revision history for this message
Mathieu Mitchell (mat128) wrote :

Would there be a possibility to avoid providing IPA with credentials? Ironic currently gives IPA a tempurl for image download via Swift and uses unauthenticated means of doing the lookup and heartbeat.

Is there a possibility to provide a tempurl to upload a Swift object?
Could Ironic query IPA for it's logs and upload that to Swift or simply log it itself, avoiding Swift? What amount of logs are we looking at?

Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

Hi Mathieu,

Thanks for reading the RFE.

So yes, there's no credentials being passed to IPA, the logs will be passed from the ramdisk to Ironic via the IPA API and Ironic is responsible for uploading it to Swift. This way we could even extend this behavior in the future, e.g, we may want to just save the logs locally on the conductor instead of uploading it to swift.

Revision history for this message
Mathieu Mitchell (mat128) wrote :

Very interesting then :) I was concerned with the security aspect. Thanks for your quick reply.
By the way (haven't mentioned it in my first reply) but this is a very interesting feature, and would really save us in production from getting onto the machines to look at the logs :)

Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :
description: updated
Dmitry Tantsur (divius)
Changed in ironic:
status: New → Confirmed
Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

setting rfe-approved since the spec is already merged

summary: - [RFE] Collect logs from IPA on deploy failure
+ [RFE] Collect deployment logs from IPA
tags: added: rfe-approved
removed: rfe
Changed in ironic-python-agent:
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
importance: Undecided → Wishlist
Dmitry Tantsur (divius)
Changed in ironic-python-agent:
status: New → Confirmed
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/336102

Changed in ironic:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic-python-agent (master)

Reviewed: https://review.openstack.org/248832
Committed: https://git.openstack.org/cgit/openstack/ironic-python-agent/commit/?id=af81914ce7309b23edcccc030d40f59f98fc258e
Submitter: Jenkins
Branch: master

commit af81914ce7309b23edcccc030d40f59f98fc258e
Author: Lucas Alvares Gomes <email address hidden>
Date: Mon May 30 17:39:13 2016 +0100

    Add a log extension

    The log extension is responsible for retrieving logs from the system,
    if journalctl is present the logs will come from it, otherwise we
    fallback to getting the logs from the /var/log directory + dmesg logs.

    In the coreos ramdisk, we need to bind mount /run/log in the container
    so the IPA service can have access to the journal.

    For the tinyIPA ramdisk, the logs from IPA are now being redirected to
    /var/logs/ironic-python-agent.log instead of only going to the default
    stdout.

    Inspector now shares the same method of collecting logs, extending its
    capabilities for non-systemd systems.

    Partial-Bug: #1587143
    Change-Id: Ie507e2e5c58cffa255bbfb2fa5ffb95cb98ed8c4

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/336102
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=cd7507f04b309383629ef83b7fec478128918ec1
Submitter: Jenkins
Branch: master

commit cd7507f04b309383629ef83b7fec478128918ec1
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed Jun 29 16:47:16 2016 +0100

    Collect deployment logs from IPA

    This patch adds the code to collect the deployment logs from the IPA
    ramdisk. The logs can be collect for every deployment, upon a failure or
    never. By default, logs are collected upon a failure.

    After collection, logs can be storaged either in the local filesystem
    (default) or in Swift.

    If an error occurs when the logs are being collected, storaged or if the
    ramdisk does not support the collect_system_logs command Ironic will log
    an error message, but the deployment will proceed.

    Documentation on how to enable and other configuration will be done on a
    subsequent patch.

    Partial-Bug: #1587143
    Change-Id: I6da1110daa94ea25670f71f9862e51cc9bbc6f93

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/352483

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/352483
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=00df0890f00e4081a598c73b9bebdaf3d4ba6869
Submitter: Jenkins
Branch: master

commit 00df0890f00e4081a598c73b9bebdaf3d4ba6869
Author: Lucas Alvares Gomes <email address hidden>
Date: Mon Aug 8 16:26:59 2016 +0100

    Document retrieving logs from the deploy ramdisk

    This patch is documenting how operators can configure Ironic to be able
    to retrieve the logs from the deploy ramdisk (or disable it).

    Closes-Bug: #1587143
    Change-Id: I233e925f4dd9a1aa04a722eb852a6f95c74603f2

Changed in ironic:
status: In Progress → Fix Released
Dmitry Tantsur (divius)
Changed in ironic-python-agent:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.