[Xenial 2.0] tgt fails to start with tgtadm out of memory error

Bug #1559088 reported by Ryan Collis
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
tgt (Ubuntu)
Medium
Christian Ehrhardt 

Bug Description

fresh install of Xenial daily image + MAAS Version 2.0.0 (alpha3+bzr4804.
 After image import of 14.04, 15.04, 16.04 images for amd64 and arm64 tgt/tgtadm stops working and fails with out of memory error. The server is a Opteron 6200 with 24 cores and 16Gb of memory. After image import free memory = approx 200Mb. When We only have the Trusty images tgt run just fine. Tried several fresh installs and configurations and all have the same result.

systemctl status tgt.service - WITH ONLY TRUSTY IMAGES:

● tgt.service - (i)SCSI target daemon
   Loaded: loaded (/lib/systemd/system/tgt.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2016-03-17 16:42:02 EDT; 50min ago
     Docs: man:tgtd(8)
 Main PID: 15321 (tgtd)
   Status: "Starting event loop..."
    Tasks: 145 (limit: 512)
   Memory: 1.6M
      CPU: 545ms
   CGroup: /system.slice/tgt.service
           └─15321 /usr/sbin/tgtd -f

Mar 17 17:03:00 stratar8 tgtd[15321]: tgtd: device_mgmt(246) sz:102 params:path=/var/lib/maas/boot-resources/snapshot-20160317-205952/ubuntu/arm64/hwe-s/trusty/daily/root-image
Mar 17 17:03:00 stratar8 tgtd[15321]: tgtd: bs_thread_open(409) 16
Mar 17 17:03:00 stratar8 tgtd[15321]: tgtd: device_mgmt(246) sz:102 params:path=/var/lib/maas/boot-resources/snapshot-20160317-205952/ubuntu/arm64/hwe-t/trusty/daily/root-image
Mar 17 17:03:00 stratar8 tgtd[15321]: tgtd: bs_thread_open(409) 16
Mar 17 17:03:00 stratar8 tgtd[15321]: tgtd: device_mgmt(246) sz:102 params:path=/var/lib/maas/boot-resources/snapshot-20160317-205952/ubuntu/arm64/hwe-u/trusty/daily/root-image
Mar 17 17:03:00 stratar8 tgtd[15321]: tgtd: bs_thread_open(409) 16
Mar 17 17:03:00 stratar8 tgtd[15321]: tgtd: device_mgmt(246) sz:102 params:path=/var/lib/maas/boot-resources/snapshot-20160317-205952/ubuntu/arm64/hwe-v/trusty/daily/root-image
Mar 17 17:03:00 stratar8 tgtd[15321]: tgtd: bs_thread_open(409) 16
Mar 17 17:03:00 stratar8 tgtd[15321]: tgtd: device_mgmt(246) sz:102 params:path=/var/lib/maas/boot-resources/snapshot-20160317-205952/ubuntu/arm64/hwe-w/trusty/daily/root-image

systemctl status tgt.services with Trusty, Vivid and Xenial amd64 and arm64 (after import but before reboot or service restart):

systemctl status tgt.service
● tgt.service - (i)SCSI target daemon
   Loaded: loaded (/lib/systemd/system/tgt.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2016-03-17 16:42:02 EDT; 52min ago
     Docs: man:tgtd(8)
 Main PID: 15321 (tgtd)
   Status: "Starting event loop..."
    Tasks: 497 (limit: 512)
   Memory: 9.0M
      CPU: 621ms
   CGroup: /system.slice/tgt.service
           └─15321 /usr/sbin/tgtd -f

Mar 17 17:33:13 stratar8 tgtd[15321]: tgtd: bs_thread_open(437) stopped the worker thread 9
Mar 17 17:33:13 stratar8 tgtd[15321]: tgtd: bs_thread_open(437) stopped the worker thread 8
Mar 17 17:33:13 stratar8 tgtd[15321]: tgtd: bs_thread_open(437) stopped the worker thread 7
Mar 17 17:33:13 stratar8 tgtd[15321]: tgtd: bs_thread_open(437) stopped the worker thread 6
Mar 17 17:33:13 stratar8 tgtd[15321]: tgtd: bs_thread_open(437) stopped the worker thread 5
Mar 17 17:33:13 stratar8 tgtd[15321]: tgtd: bs_thread_open(437) stopped the worker thread 4
Mar 17 17:33:13 stratar8 tgtd[15321]: tgtd: bs_thread_open(437) stopped the worker thread 3
Mar 17 17:33:13 stratar8 tgtd[15321]: tgtd: bs_thread_open(437) stopped the worker thread 2
Mar 17 17:33:13 stratar8 tgtd[15321]: tgtd: bs_thread_open(437) stopped the worker thread 1
Mar 17 17:33:13 stratar8 tgtd[15321]: tgtd: bs_thread_open(437) stopped the worker thread 0

systemctl status tgt.service after attempted service restart / system reboot:

● tgt.service - (i)SCSI target daemon
   Loaded: loaded (/lib/systemd/system/tgt.service; enabled; vendor preset: enabled)
   Active: deactivating (stop-sigterm) (Result: exit-code) since Thu 2016-03-17 17:45:46 EDT; 35s ago
     Docs: man:tgtd(8)
  Process: 27874 ExecStop=/usr/sbin/tgtadm --op delete --mode system (code=exited, status=0/SUCCESS)
  Process: 27836 ExecStop=/usr/sbin/tgt-admin --update ALL -c /dev/null -f (code=exited, status=0/SUCCESS)
  Process: 27800 ExecStop=/usr/sbin/tgt-admin --offline ALL (code=exited, status=0/SUCCESS)
  Process: 27797 ExecStop=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS)
  Process: 27888 ExecStartPost=/usr/sbin/tgt-admin -e -c /etc/tgt/targets.conf (code=exited, status=22)
  Process: 27883 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS)
 Main PID: 27877 (tgtd)
   Status: "Starting event loop..."
    Tasks: 497 (limit: 512)
   Memory: 9.0M
      CPU: 713ms
   CGroup: /system.slice/tgt.service
           └─27877 /usr/sbin/tgtd -f

Mar 17 17:45:47 stratar8 tgtd[27877]: tgtd: bs_thread_open(437) stopped the worker thread 4
Mar 17 17:45:47 stratar8 tgtd[27877]: tgtd: bs_thread_open(437) stopped the worker thread 3
Mar 17 17:45:47 stratar8 tgtd[27877]: tgtd: bs_thread_open(437) stopped the worker thread 2
Mar 17 17:45:47 stratar8 tgtd[27877]: tgtd: bs_thread_open(437) stopped the worker thread 1
Mar 17 17:45:47 stratar8 tgtd[27877]: tgtd: bs_thread_open(437) stopped the worker thread 0
Mar 17 17:45:47 stratar8 tgt-admin[27888]: tgtadm: out of memory
Mar 17 17:45:47 stratar8 tgt-admin[27888]: Command:
Mar 17 17:45:47 stratar8 tgt-admin[27888]: tgtadm -C 0 --lld iscsi --op new --mode logicalunit --tid 32 --lun 1 -b /var/lib/maas/boot-resources/snapshot-20160317-211902/ubunMar 17 17:45:47 stratar8 tgt-admin[27888]: exited with code: 22.
Mar 17 17:45:47 stratar8 systemd[1]: tgt.service: Control process exited, code=exited status=22

 dpkg -l '*maas*'|cat:

dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-=====================================-============-=============================================
ii maas 2.0.0~alpha3+bzr4804-0ubuntu1~xenial1 all MAAS server all-in-one metapackage
ii maas-cli 2.0.0~alpha3+bzr4804-0ubuntu1~xenial1 all MAAS command line API tool
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.0.0~alpha3+bzr4804-0ubuntu1~xenial1 all MAAS server common files
ii maas-dhcp 2.0.0~alpha3+bzr4804-0ubuntu1~xenial1 all MAAS DHCP server
ii maas-dns 2.0.0~alpha3+bzr4804-0ubuntu1~xenial1 all MAAS DNS server
ii maas-proxy 2.0.0~alpha3+bzr4804-0ubuntu1~xenial1 all MAAS Caching Proxy
ii maas-rack-controller 2.0.0~alpha3+bzr4804-0ubuntu1~xenial1 all MAAS server cluster controller
ii maas-region-controller 2.0.0~alpha3+bzr4804-0ubuntu1~xenial1 all MAAS server complete region controller
ii maas-region-controller-min 2.0.0~alpha3+bzr4804-0ubuntu1~xenial1 all MAAS Server minimum region controller
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.0.0~alpha3+bzr4804-0ubuntu1~xenial1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.0.0~alpha3+bzr4804-0ubuntu1~xenial1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.0.0~alpha3+bzr4804-0ubuntu1~xenial1 all MAAS server provisioning libraries (Python 3)

tgt version
 tgt 1:1.0.62-1ubuntu2 amd64 Linux SCSI target user-space daemon and tools

Revision history for this message
Ryan Collis (vyan) wrote :
Ryan Collis (vyan)
tags: added: arm64
Ryan Collis (vyan)
tags: added: tgtadm
Revision history for this message
Ryan Collis (vyan) wrote :

Upon further investigation it turns out that this is a problem with systems running kernel 4.3 or newer. It stems from the systemd limit of 512 tasks. This explains why the problem only occurred when multiple arch images (amd64 and arm64) were downloaded. Adding "TasksMax=infinity" to the /lib/system/systemd/tgt.service file (under the Service section) resolved the problem and allows tgtadm to start without error.

Changed in maas:
status: New → Invalid
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
the "workaround" is IMHO a valid fix and in use my lxd, libvirt, docker and others that know to spawn many threads.
TGT is just another case - I guess we need no SRU since you closed it, but I'd add it to Zesty and push to Debian if all works out.

Changed in tgt (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → ChristianEhrhardt (paelzer)
Changed in tgt (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package tgt - 1:1.0.67-1ubuntu1

---------------
tgt (1:1.0.67-1ubuntu1) zesty; urgency=medium

  * Merge from Debian. Remaining changes:
   - Drop glusterfs support, package not in main.
     - debian/control
     - debian/rules
     - debian/tests/{control, storage}
     - debian/tgt-glusterfs.install (Deleted)
  * Drop changes:
   - debian/patches/util_strtoull_errno.patch by Stas Sergeev (now upstream)
   - Disable AIO backend support. This was done to reduce risk in a previous
     feature freeze; Now that we are before Zesty's feature freeze, we
     can introduce it this cycle.
   - changing file modes of debian/tests/{admin, daemon} (missing in
     changelog, not needed)
  * Add changes:
   - fix tgt being killed when serving many targets (LP: #1559088)
   - d/t/localtgt, d/t/control: add dep8 test that sets up targets and luns via
     iscsi using rdwr/aio backends and runs fio read/write/verify (LP: #1640785)

 -- Christian Ehrhardt <email address hidden> Thu, 10 Nov 2016 10:35:15 +0100

Changed in tgt (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Spyderdyne (spyderdyne) wrote :
Download full text (3.2 KiB)

Raspberry Pi 3B

MAAS Version 2.1.3+bzr5573-0ubuntu1 (16.04.1)

spyderdyne@juju-rack2:~$ uname -r
4.4.43-v7+
spyderdyne@juju-rack2:~$ cat /etc/issue
Ubuntu 16.04.1 LTS \n \l

spyderdyne@juju-rack2:~$ dpkg -l tgt
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==========================-==================-==================-=========================================================
ii tgt 1:1.0.63-1ubuntu1. armhf Linux SCSI target user-space daemon and tools

spyderdyne@juju-rack2:~$ service tgt status
‚óè tgt.service - (i)SCSI target daemon
   Loaded: loaded (/lib/systemd/system/tgt.service; enabled; vendor preset: enabled)
   Active: deactivating (stop-sigterm) (Result: exit-code)
     Docs: man:tgtd(8)
  Process: 5557 ExecStartPost=/usr/sbin/tgt-admin -e -c /etc/tgt/targets.conf (code=exited, status=22)
  Process: 5554 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS)
 Main PID: 5552 (tgtd)
   Status: "Starting event loop..."
   CGroup: /system.slice/tgt.service
           ‚îî‚îÄ5552 /usr/sbin/tgtd -f

Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgtd[5552]: tgtd: bs_thread_open(437) stopped the worker thread 4
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgtd[5552]: tgtd: bs_thread_open(437) stopped the worker thread 3
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgtd[5552]: tgtd: bs_thread_open(437) stopped the worker thread 2
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgtd[5552]: tgtd: bs_thread_open(437) stopped the worker thread 1
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgtd[5552]: tgtd: bs_thread_open(437) stopped the worker thread 0
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgt-admin[5557]: tgtadm: out of memory
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgt-admin[5557]: Command:
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgt-admin[5557]: tgtadm -C 0 --lld iscsi --op new --mode logicalunit --ti
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net tgt-admin[5557]: exited with code: 22.
Feb 08 22:25:24 juju-rack2.home.spyderdyne.net systemd[1]: tgt.service: Control process exited, code=exited status=22

spyderdyne@juju-rack2:~$ free -m
              total used free shared buff/cache available
Mem: 925 557 65 24 302 316
Swap: 0 0 0

I cannot actually say this is a bug. I do not think that the iSCSI daemon is able to run with this tiny amount of memory. It would be a lot cooler if it could though. I can see it becoming popular to be able to spend $50 on a MaaS rack controller for each rack with a quad core computer, some sticky velcro, and a patch cable. We are already embedding mini-computers inside our servers for IPMI and management anyway. Unfortunately this might render my RPi MaaS rack controller idea unusable. Will attempt to set up a swap space on these 64GB Class10 SD cards and ...

Read more...

Revision history for this message
Spyderdyne (spyderdyne) wrote :

It looks like Python Twisted is actually taking up most of the resources. Wondering if there is a resource cap that I can force Python to stay under to keep from killing off other services. I suspected Postgres at first but it's definitely Python.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Spyderdyne,
trying to keep things together.
This here was about the number of files (and thereby threads) being allowed to be served.

The xenial/oom portion will be continued in "bug 1389811" until we are sure (or not) that it is the same issue where we then could dup it.
I see you are active there as well, so here just a ping as FYI.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers