nailgun-api log is too verbose

Bug #1393148 reported by Mike Scherbakov
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
High
Anton Chevychalov
8.0.x
Won't Fix
High
MOS Maintenance
Mitaka
Won't Fix
High
Anton Chevychalov
Newton
Won't Fix
High
Fuel Sustaining
Ocata
Won't Fix
High
Anton Chevychalov

Bug Description

See similar issue from nailgun-agent side: https://bugs.launchpad.net/fuel/+bug/1393115

api.log became almost 1Mb in 1.5h, which is way too much. Being archived into diag snapshot then, it makes snapshot too large and not very useful for support team.
Analysis of log file shows that it is mostly polluted by data from nailgun-agent, like:
2014-11-16 00:53:17.716 DEBUG [7fb4b2b95740] (logger) Request PUT /api/nodes/agent/ from 172.17.42.1:39247 {"manufacturer":"QEMU","os_platform":"centos","mac":"64:10:F4:76:B1:6E","is_agent":true,"agent_checksum":"fdcb2f84cfef53e8e7a97479b3e10cc4c6835292","platform_name":"Standard PC (i440FX + PIIX, 1996)","meta":{"disks":[{"model":null,"disk":"disk/by-path/pci-0000:00:0a.0-virtio-pci-virtio7","removable":"0","size":53687091200,"extra":[],"name":"vdc"},{"model":null,"disk":"disk/by-path/pci-0000:00:09.0-virtio-pci-virtio6","removable":"0","size":53687091200,"extra":[],"name":"vdb"},{"model":null,"disk":"disk/by-path/pci-0000:00:08.0-virtio-pci-virtio5","removable":"0","size":53687091200,"extra":[],"name":"vda"}],"memory":{"devices":[{"type":"RAM","size":2147483648}],"maximum_capacity":2147483648,"slots":1,"total":2147483648},"cpu":{"total":1,"real":1,"spec":[{"model":"Intel Xeon E312xx (Sandy Bridge)","frequency":3500}]},"system":{"manufacturer":"QEMU","version":"pc-i440fx-trusty","fqdn":"node-3.test.domain.local","product":"Standard PC (i440FX + PIIX, 1996)"},"interfaces":[{"mac":"64:7B:AA:D8:9F:E2","state":"up","current_speed":null,"name":"eth4"},{"mac":"64:DF:C3:24:6A:A4","state":"up","current_speed":null,"name":"eth3"},{"mac":"64:13:7C:70:EB:73","state":"up","current_speed":null,"name":"eth2"},{"mac":"64:59:E3:6F:3A:F9","state":"up","current_speed":null,"name":"eth1"},{"mac":"64:10:F4:76:B1:6E","netmask":"255.255.255.0","state":"up","current_speed":null,"name":"eth0","ip":"10.108.0.4"}]},"ip":"10.108.0.4"}

If we already log this data on nailgun-agent side, I propose to remove this verbosity and not to log data structure which comes from agent.
Instead, we have a tricky code which works on Nailgun side, and which updates internal data structure only if it changed. I propose to log only that moment: changed data, not everything.

Revision history for this message
Mike Scherbakov (mihgen) wrote :
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Ivan Kliuk (ivankliuk)
Changed in fuel:
importance: Critical → High
Changed in fuel:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/137053

Revision history for this message
Łukasz Oleś (loles) wrote :

This data are not logged on nailgun-agent side. If you remove it from api we will not see what agent is sending.
Yesterday I was helping support team and I used this data to solve the problem. Without this logs I couldn't solve it.

I see three ways to fix this problem:

1) fix it in shotgun. Do not download all logs. We usually just need data from last week
2) fix agent to not send so many data. Do wee need it all?
3) log this data on nodes (this lead to hacks on api side)

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 6.0 → 6.1
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: In Progress → Confirmed
Changed in fuel:
assignee: Ivan Kliuk (ivankliuk) → Fuel Python Team (fuel-python)
Nikolay Markov (nmarkov)
Changed in fuel:
status: Confirmed → Triaged
Dmitry Pyzhov (dpyzhov)
tags: added: feature-logging
removed: nailgun
Changed in fuel:
milestone: 6.1 → 7.0
tags: added: qa-agree-7.0
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Ivan Kliuk (ivankliuk)
status: Triaged → In Progress
Changed in fuel:
assignee: Ivan Kliuk (ivankliuk) → Fuel Python Team (fuel-python)
Changed in fuel:
status: In Progress → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (master)

Change abandoned by Dmitry Pyzhov (<email address hidden>) on branch: master
Review: https://review.openstack.org/137053
Reason: No activity for more than a month

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Ivan Kliuk (ivankliuk)
Changed in fuel:
assignee: Ivan Kliuk (ivankliuk) → Fuel Python Team (fuel-python)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Sergey Vilgelm (sergey.vilgelm)
Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

I don't think it's really important for 7.0. Let's move it to 8.0, because we can remove something important by accident here.

Moreover, we shouldn't remove API input. We definitely should log it.

Changed in fuel:
milestone: 7.0 → 8.0
assignee: Sergey Vilgelm (sergey.vilgelm) → Fuel Python Team (fuel-python)
Dmitry Pyzhov (dpyzhov)
tags: added: feature
Changed in fuel:
importance: High → Medium
status: Confirmed → Triaged
Dmitry Pyzhov (dpyzhov)
tags: added: area-python
Changed in fuel:
milestone: 8.0 → 9.0
Mike Scherbakov (mihgen)
tags: added: customer-found
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Just hit this one again. 20 nodes deployment, and default configuration of logrotate for nailgun api.log lasts for a ===few hours=== with 4 archived copies of log file, each 14Mb in size.
For production installation configuration, I'd expect to see nailgun log for at least a few months.

Without this log, I spent a few hours trying to debug an issue, and was essentially unable to do so. As this severely affects operations and any troubleshooting, I raised priority to High.

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Added customer-found tag as well.

Revision history for this message
Roman Rufanov (rrufanov) wrote :

customer found on 8.0, adding milestone

no longer affects: fuel/newton
Revision history for this message
Rodion Tikunov (rtikunov) wrote :

Reopened for 9.0 as it was closed as won't fix by Fuel Devops McRobotson.

Revision history for this message
JohnsonYi (yichengli) wrote :

The nailgun api log size increased two fast in fuel 8.0.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/392528

Revision history for this message
Anton Chevychalov (achevychalov) wrote :

I prepared commit to stable/mitaka based on the following idea.

Problem:

1. api.log rotate to fast due to messages with node information.
2. we can't disable massage storing because it is useful information.

Solution:

1. Create separate log for storing nodes information from nailgun-agent (see commit above).
2. Create logrotate config for new log.
3. Change snapshot generation logic to restrict amount of information from new log to something like 5k lines.

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Retargeted to 9.3 as more time is needed to get the patch reviewed and merged.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/425239

Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Anton Chevychalov (achevychalov)
status: Triaged → In Progress
Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Setting as Won't Fix since it would take a lot of effort and doesn't fit the maintenance policy. Moreover, Nailgun has extensive log rotation in the fuel's logrotate settings.

Changed in fuel:
status: In Progress → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (master)

Change abandoned by Andreas Jaeger (<email address hidden>) on branch: master
Review: https://review.opendev.org/425239
Reason: This repo is retired now, no further work will get merged.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (stable/mitaka)

Change abandoned by Andreas Jaeger (<email address hidden>) on branch: stable/mitaka
Review: https://review.opendev.org/392528
Reason: This repo is retired now, no further work will get merged.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.