Wrong Logging level for "WARNING heat.engine.service [-] Stack create failed, status FAILED"

Bug #1346755 reported by Venu Murthy
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Low
Angus Salkeld

Bug Description

Hello!

The log level for a critical error such as "heat.engine.service [-] Stack create failed, status FAILED" is set to Warning State currently.

But looking at the severity of the issue, i.e. the entire stack fails creation, no matter how many other resources have spun up well, one such failure is as good as the stack not being created and not being available.

Taking the definitions of log level from here, http://stackoverflow.com/questions/2031163/when-to-use-log-level-warn-vs-error

this error should fall in the Error or Fatal category!

Tags: error heat
Revision history for this message
Venu Murthy (venu-murthy) wrote :

a snippet of the error

space/rackspace/resources/meghdoot/remote_utils.py", line 173, in execute_remote_command
2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource mode=700)
2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource File "/opt/stack/heat/contrib/rackspace/rackspace/resources/meghdoot/remote_utils.py", line 53, in wrapper
2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource if (not remote.sftp_connection.sock.closed) and close_on_error:
2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource File "/opt/stack/heat/contrib/rackspace/rackspace/resources/meghdoot/remote_utils.py", line 106, in sftp_connection
2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource self.private_key)
2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource File "/opt/stack/heat/contrib/rackspace/rackspace/resources/meghdoot/remote_utils.py", line 92, in get_sftp_connection
2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource transport = paramiko.Transport((host, 22))
2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource File "/usr/lib/python2.6/site-packages/paramiko/transport.py", line 288, in __init__
2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource for (family, socktype, proto, canonname, sockaddr) in socket.getaddrinfo(hostname, port, socket.AF_UNSPEC, socket.SOCK_STREAM):
2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource gaierror: [Errno -2] Name or service not known
2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource
2014-07-22 11:04:34.499 18696 WARNING heat.engine.service [-] Stack create failed, status FAILED

Revision history for this message
Steven Hardy (shardy) wrote :

From an operator perspective, the error is a non fatal error - the service is still working and available, there's not been any internal error which is unrecoverable - we've just failed to create a stack, which is a normal and expected outcome in certain situations, it can happen for many reasons, many of which are a result of user mistakes in their template.

So personally I think WARNING or even INFO is perfectly acceptable as a log level for this, given that logs are for consumption by operators (e.g the owner of the cloud) not end users (to whom the create failed state may be a fatal failure, but they have to fix whatever the issue is and try again, they never see any of these logs so the log level is irrelevant to them)

Changed in heat:
status: New → Incomplete
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Excerpts from Venu Murthy's message of 2014-07-22 05:54:39 UTC:
> a snippet of the error
>
> space/rackspace/resources/meghdoot/remote_utils.py", line 173, in execute_remote_command
> 2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource mode=700)
> 2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource File "/opt/stack/heat/contrib/rackspace/rackspace/resources/meghdoot/remote_utils.py", line 53, in wrapper
> 2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource if (not remote.sftp_connection.sock.closed) and close_on_error:
> 2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource File "/opt/stack/heat/contrib/rackspace/rackspace/resources/meghdoot/remote_utils.py", line 106, in sftp_connection
> 2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource self.private_key)
> 2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource File "/opt/stack/heat/contrib/rackspace/rackspace/resources/meghdoot/remote_utils.py", line 92, in get_sftp_connection
> 2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource transport = paramiko.Transport((host, 22))
> 2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource File "/usr/lib/python2.6/site-packages/paramiko/transport.py", line 288, in __init__
> 2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource for (family, socktype, proto, canonname, sockaddr) in socket.getaddrinfo(hostname, port, socket.AF_UNSPEC, socket.SOCK_STREAM):
> 2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource gaierror: [Errno -2] Name or service not known
> 2014-07-22 11:04:34.270 18696 TRACE heat.engine.resource
> 2014-07-22 11:04:34.499 18696 WARNING heat.engine.service [-] Stack create failed, status FAILED
>

The error 'Name or service not known' should be the "reason" for the
event which makes this resource's status change to FAILED. The user
would see it there.

This particular error, I suspect, is bubbling too far up and it should be
an INFO level log, not WARNING. For the operator, it is entirely normal,
but we do like to log "a unit of work" in INFO, and failing a unit of
work lands in that I think.

Revision history for this message
Venu Murthy (venu-murthy) wrote : Re: [Bug 1346755] Re: Wrong Logging level for "WARNING heat.engine.service [-] Stack create failed, status FAILED"

Thanks for the mail Steven,
In my understanding a stack is an environment and even in a simple
environment with a LB, 2 DB server and 2 webservers, one of them not being
created is as good as the entire environment being of no use, which much
deserves the level of being a Fatal or at least an Error level in the log,
as not getting a full stack in the example such as above is as good as not
having the entire environment.

Kindly refer to the snippet that I had enclosed, falling back on a Default
value is also treated as a Warning and the entire stack not getting created
has also been treated with the same level?

Best regards,
Venu

[image: ThoughtWorks] <http://www.thoughtworks.com/>
---
*“Excellence is never an accident. It is always the result of high
intention, sincere effort, and intelligent execution.... ― Aristotle*

On Tue, Jul 22, 2014 at 7:32 PM, Steven Hardy <email address hidden> wrote:

> >From an operator perspective, the error is a non fatal error - the
> service is still working and available, there's not been any internal
> error which is unrecoverable - we've just failed to create a stack,
> which is a normal and expected outcome in certain situations, it can
> happen for many reasons, many of which are a result of user mistakes in
> their template.
>
> So personally I think WARNING or even INFO is perfectly acceptable as a
> log level for this, given that logs are for consumption by operators
> (e.g the owner of the cloud) not end users (to whom the create failed
> state may be a fatal failure, but they have to fix whatever the issue is
> and try again, they never see any of these logs so the log level is
> irrelevant to them)
>
> ** Changed in: heat
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1346755
>
> Title:
> Wrong Logging level for "WARNING heat.engine.service [-] Stack create
> failed, status FAILED"
>
> Status in Orchestration API (Heat):
> Incomplete
>
> Bug description:
> Hello!
>
> The log level for a critical error such as "heat.engine.service [-]
> Stack create failed, status FAILED" is set to Warning State currently.
>
> But looking at the severity of the issue, i.e. the entire stack fails
> creation, no matter how many other resources have spun up well, one
> such failure is as good as the stack not being created and not being
> available.
>
> Taking the definitions of log level from here,
> http://stackoverflow.com/questions/2031163/when-to-use-log-level-warn-
> vs-error
>
> this error should fall in the Error or Fatal category!
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/heat/+bug/1346755/+subscriptions
>

Revision history for this message
Steven Hardy (shardy) wrote :

> deserves the level of being a Fatal or at least an Error level

Again, users can't see logs, so as both Clint and I have now said, even a warning level log is too severe, it should probably be reduced to info level.

The user-visible status is that available via heat event-list <stack>, and heat stack-show <stack>, so if that is not providing sufficient information as to the reason for the failure we should fix that - users can't see logs, so increasing the log level of the engine log message doesn't help (non developer) users at all.

Revision history for this message
Venu Murthy (venu-murthy) wrote :

OK, in that case I think what Clint proposed is good to go with, i.e. to
reduce the log level of some informative logs to Info level.
The bottom line is, the current logging level is a little mixed up and it
will need to be relooked into and set appropriately.

Best regards,
Venu

[image: ThoughtWorks] <http://www.thoughtworks.com/>
---
*“Excellence is never an accident. It is always the result of high
intention, sincere effort, and intelligent execution.... ― Aristotle*

On Thu, Jul 24, 2014 at 6:56 PM, Steven Hardy <email address hidden> wrote:

> > deserves the level of being a Fatal or at least an Error level
>
> Again, users can't see logs, so as both Clint and I have now said, even
> a warning level log is too severe, it should probably be reduced to info
> level.
>
> The user-visible status is that available via heat event-list <stack>,
> and heat stack-show <stack>, so if that is not providing sufficient
> information as to the reason for the failure we should fix that - users
> can't see logs, so increasing the log level of the engine log message
> doesn't help (non developer) users at all.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1346755
>
> Title:
> Wrong Logging level for "WARNING heat.engine.service [-] Stack create
> failed, status FAILED"
>
> Status in Orchestration API (Heat):
> Incomplete
>
> Bug description:
> Hello!
>
> The log level for a critical error such as "heat.engine.service [-]
> Stack create failed, status FAILED" is set to Warning State currently.
>
> But looking at the severity of the issue, i.e. the entire stack fails
> creation, no matter how many other resources have spun up well, one
> such failure is as good as the stack not being created and not being
> available.
>
> Taking the definitions of log level from here,
> http://stackoverflow.com/questions/2031163/when-to-use-log-level-warn-
> vs-error
>
> this error should fall in the Error or Fatal category!
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/heat/+bug/1346755/+subscriptions
>

Revision history for this message
Angus Salkeld (asalkeld) wrote :
Download full text (4.0 KiB)

I'll look at bumping a bunch of the warnings down:

current warings below, I'll be agressive and we can go through them on the review:

heat/engine/environment.py: LOG.warn(_('Removing %(item)s from %(path)s') % {
heat/engine/environment.py: LOG.warn(_('Removing %(item)s from %(path)s') % {
heat/engine/environment.py: LOG.warn(_('Changing %(path)s from %(was)s to %(now)s') % details)
heat/engine/event.py: LOG.warning(_('Duplicating event'))
heat/engine/resource.py: LOG.warn(_('db error %s') % ex)
heat/engine/resource.py: LOG.warning(_("Resource %s does not implement metadata update")
heat/engine/resources/eip.py: LOG.warn(_("Floating IPs not found"))
heat/engine/resources/eip.py: LOG.warn(_("Floating IPs not found"))
heat/engine/resources/eip.py: LOG.warn(_('Skipping association, InstanceId not specified'))
heat/engine/resources/eip.py: LOG.warn(_('Skipping association, resource not specified'))
heat/engine/resources/neutron/neutron.py: LOG.warn(_("failed to fetch resource attributes: %s") % ex)
heat/engine/resources/nova_utils.py: LOG.warning(msg % {'name': server.name,
heat/engine/resources/nova_utils.py: LOG.warning(msg % {'name': server.name,
heat/engine/resources/nova_utils.py: LOG.warn(_('Instance (%(server)s) not found: %(ex)s')
heat/engine/resources/os_database.py: LOG.warning(msg % {'name': self.stack.name,
heat/engine/resources/server.py: LOG.warn(_('Instance (%s) not found') % self.resource_id)
heat/engine/resources/swift.py: LOG.warn(_("Head container failed: %s") % ex)
heat/engine/resources/user.py: LOG.warning(_("Ignoring policy %s, must be string "
heat/engine/resources/user.py: LOG.warning(_("Ignoring policy %s, must be string "
heat/engine/resources/user.py: LOG.warning(_('Error deleting %s - user not found') % str(self))
heat/engine/resources/user.py: LOG.warn(_('could not get secret for %(username)s '
heat/engine/resources/user.py: LOG.warn(_('could not get secret for %(username)s '
heat/engine/resources/volume.py: LOG.warn(_('can not delete volume when in-use'))
heat/engine/resources/volume.py: LOG.warning(_('%s - volume not found') % str(self))
heat/engine/resources/volume.py: LOG.warning(_('%(res)s - %(err)s') % {'res': str(self),
heat/engine/resources/volume.py: LOG.warning(_('%s - volume not found') % str(self))
heat/engine/resources/wait_condition.py: LOG.warning(_("Overwriting Metadata item for id %s!")
heat/engine/service.py: LOG.warn(_('periodic_task db error watch rule removed? %(ex)s')
heat/engine/service.py: LOG.warning(_("Stack create failed, status %s") % stack.status)
heat/engine/service.py: LOG.warning(_("Access denied to resource %s") % resource_name)
heat/engine/service.py: LOG.warn(_('show_watch (all) db error %s') % ex)
heat/engine/service.py: LOG.warn(_('s...

Read more...

Changed in heat:
assignee: nobody → Angus Salkeld (asalkeld)
importance: Undecided → Low
status: Incomplete → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/117085

Changed in heat:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/117085
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=f94db70b719ad1b7cd7ea2e453f6c986d3c90bcc
Submitter: Jenkins
Branch: master

commit f94db70b719ad1b7cd7ea2e453f6c986d3c90bcc
Author: Angus Salkeld <email address hidden>
Date: Wed Aug 27 13:37:04 2014 +1000

    Cut down on warning logs

    - don't log on ignore_not_found()
    - drop some logs from warn to info
    - remove the "no resources in template" log
    - remove the "Duplicating event" log

    Change-Id: Ie4d68b43607336c8c054a32126d6a05ba4e5e15c
    Closes-bug: #1346755

Changed in heat:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in heat:
milestone: none → juno-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
milestone: juno-3 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.