Some units not reporting swift usage

Bug #1588404 reported by Andreas Hasenack on 2016-06-02
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Landscape Client
Undecided
Simon Poirier
landscape-client (Ubuntu)
Undecided
Andreas Hasenack

Bug Description

landscape-client 16.04~bzr841-0ubuntu0~ubuntu14.04.1

I had a ceph/swift cloud deploy where for some reason just 1/3 of the swift units were reporting swift data. Two of them were saying this:

2016-06-02 02:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.

The openstack dashboard in landscape was reporting just 1/3 of the swift storage (see screenshot).

swift recon was showing all the storage (also see attached):
Disk usage: space used: 1996472320 of 199605878784
Disk usage: space free: 197609406464 of 199605878784
Disk usage: lowest: 0.11%, highest: 3.48%, avg: 1.00020717434%

This was all seen several hours after the deployment finished, almost half a day.

I then decided to restart landscape-client in the foreground, to see if there were any backtraces (that's the usual trick, because backtraces in the swift plugin are lost, see bug #1563565). To my surprise, the swift plugin started reporting data.

monitor log covering the time when it was broken, and after my restart where at first I ran it in the foreground, and then in the background with a shorter reporting interval:

# grep Swift monitor.log
2016-06-01 22:08:46,449 INFO [MainThread] Registering plugin landscape.monitor.swiftusage.SwiftUsage.
2016-06-01 23:08:46,451 WARNING [MainThread] 1 of 720 expected Swift device usage snapshot events (0.14%) occurred in the last 3600.00s.
2016-06-02 00:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 01:08:46,451 WARNING [MainThread] 0 of 720 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 02:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 03:08:46,451 WARNING [MainThread] 0 of 720 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 04:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 05:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 06:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 07:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 08:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 09:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 10:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 11:08:46,450 WARNING [MainThread] 0 of 720 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 12:08:46,450 WARNING [MainThread] 0 of 719 expected Swift device usage snapshot events (0.00%) occurred in the last 3600.00s.
2016-06-02 12:54:05,236 WARNING [MainThread] 0 of 543 expected Swift device usage snapshot events (0.00%) occurred in the last 2718.79s.
2016-06-02 12:57:02,272 INFO [MainThread] Registering plugin landscape.monitor.swiftusage.SwiftUsage.
2016-06-02 12:57:29,322 INFO [MainThread] 5 of 5 expected Swift device usage snapshot events (100.00%) occurred in the last 27.05s.
2016-06-02 12:58:10,375 INFO [MainThread] Registering plugin landscape.monitor.swiftusage.SwiftUsage.
2016-06-02 13:02:04,883 INFO [MainThread] 46 of 46 expected Swift device usage snapshot events (100.00%) occurred in the last 234.51s.
2016-06-02 13:02:07,217 INFO [MainThread] Registering plugin landscape.monitor.swiftusage.SwiftUsage.
2016-06-02 13:04:07,218 INFO [MainThread] 23 of 24 expected Swift device usage snapshot events (95.83%) occurred in the last 120.00s.
2016-06-02 13:06:07,218 INFO [MainThread] 24 of 23 expected Swift device usage snapshot events (104.35%) occurred in the last 120.00s.

And indeed, after I restarted the clients on the two broken units, it all worked as it should. You can see the jump in the graph in the attached screenshot.

It's not clear how to debug this should it happen in a live system again.

Related branches

Andreas Hasenack (ahasenack) wrote :
Andreas Hasenack (ahasenack) wrote :

output of swift-recon --all

Andreas Hasenack (ahasenack) wrote :

disk space on all 3 swift-storage units

description: updated
description: updated
Andreas Hasenack (ahasenack) wrote :

juju status --format=tabular of the cloud deployment

description: updated
Andreas Hasenack (ahasenack) wrote :

logs from the swift units. This includes /var/log/*, i.e., also landscape-client logs.

tags: removed: kanban
Simon Poirier (simpoir) on 2016-07-15
Changed in landscape-client:
assignee: nobody → Simon Poirier (simpoir)
Simon Poirier (simpoir) wrote :

Well, from the juju logs, 2 landscape-clients units failed on some hooks. My guess is those are the units that were not reporting. I'll try reproducing this with a new deployment and adding/removing swift units.

Simon Poirier (simpoir) on 2016-07-18
Changed in landscape-client:
status: New → In Progress
Changed in landscape-client:
status: In Progress → Fix Committed
Changed in landscape-client (Ubuntu):
assignee: nobody → Andreas Hasenack (ahasenack)
status: New → In Progress
Launchpad Janitor (janitor) wrote :
Download full text (3.9 KiB)

This bug was fixed in the package landscape-client - 18.01-0ubuntu1

---------------
landscape-client (18.01-0ubuntu1) bionic; urgency=medium

  * New upstream release 18.01:
    - Ported to python3 (LP: #1577850)
    - move Replaces/Breaks landscape-client-ui rules to landscape-common
      (LP: #1560424)
    - Add a libpam-systemd Depends if built for xenial (LP: #1590838)
    - Some units not reporting swift usage (LP: #1588404)
    - Fix missing install directories for landscape-common and drop
      usr/share/landscape as its only used and created by landscape-client.
      (LP: #1680842)
    - Fix VM detection for Xen, by returning "xen" only for paravirtualized and
      HVM hosts, not for dom0. (LP: #1601818)
    - Add an indication of truncation to process output that has been truncated
      prior to delivery to the server. (LP: #1629000)
    - add /snap/bin to the PATH when executing scripts. (LP: #1635634)
    - Save the original sources.list file when a repository profile is
      associated with a computer and restore it when the profile is removed.
      (LP: #1607529)
    - Drop the legacy HAService plugin, which is no longer used.
    - Avoid double-decoding package descriptions in build_skeleton_apt, which
      causes an error with Xenial python-apt. (LP: #1655395)
    - Remove dead dbus code and textmessage (confirmed not supported in server
      for ~2 years). (LP: #1657372)
    - Move bzr-builddeb conf file from deprecated location to debian/
      (LP: #1658796)
    - Support for new server error message about there being too many pending
      computers already (LP: #1662530)
    - Add a timestamp to the package reporter result (LP: #1674252)
    - Check if ubuntu-release-upgrader is running before apt-update (LP: #1699179)
    - Implicitly trust file-local sources managed by landscape. On upgrades,
      add the trusted flag to the landscape file-local apt source file if it's
      not there. (LP: #1736576)
    - Use local system tools to change the user's password (LP: #1743558)
  * clean up packaging and getting in sync with the new landscape version:
    - d/rules: drop extra:suggests which is unused since 13.07.1-0ubuntu2
    - Remove antique postinst code. No supported landscape-client version
      installs cronjobs anymore (since a long time).
    - d/landscape-client.docs: the README file is now a markdown file, so
      install that instead.
    - d/landscape-common.postinst: no need to single out
      /var/lib/landscape/.gnupg when fixing ownerships, just do it over
      the entire parent directory.
    - guard user and group removal via an empty .cleanup.* file in post, so we
      only remove the user/group if we were the ones who created them at
      install time.
    - lintian: remove absolute path from update-motd calls in maintainer
      scripts
    - d/rules: drop special handling for dapper, hardy and lucid, which are no
      longer supported.
    - d/rules: make sure we have an "extra:Depends=" in substvars even if it's
      empty
    - d/rules: drop dh_pycentral handling, it's obsolete
  * Dropped (already included in this version):
    - d/p/set-vm-info-to-kvm-for-aws-C5-instances.patch:
  ...

Read more...

Changed in landscape-client (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers