Comment 0 for bug 1651518

bugproxy (bugproxy) wrote :

== Comment: #1 - Application Cdeadmin <email address hidden> - 2016-12-19 04:15:10 ==

Configuration: IBM 8001-22C (S822LC), LSI SAS adapters, SMC 4U90 disk drawers, HDD (180) 7.3TB

Problem: HTX exercisers stopped on error, with HTX log showing "rc 11, errno 11 from main(): pthread_create"

htxubuntu-425

lpar: busybee.aus.stglabs.ibm.com (root/ lab passwd)

root@busybee:~# uname -a
Linux busybee 4.4.0-51-generic #72-Ubuntu SMP Thu Nov 24 18:27:59 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

root@busybee:~# cat /tmp/htxerr

/dev/sdh Dec 12 23:52:42 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

/dev/sdh Dec 12 23:52:42 2016 err=0000000b sev=1 hxestorage
Hardware Exerciser stopped on an error

/dev/sdao Dec 12 23:52:42 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

/dev/sdao Dec 12 23:52:42 2016 err=0000000b sev=1 hxestorage
Hardware Exerciser stopped on an error

/dev/sddx Dec 12 23:52:42 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

/dev/sddx Dec 12 23:52:42 2016 err=0000000b sev=1 hxestorage
Hardware Exerciser stopped on an error

/dev/sdcz Dec 12 23:52:42 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

/dev/sdcz Dec 12 23:52:42 2016 err=0000000b sev=1 hxestorage
Hardware Exerciser stopped on an error

/dev/sddp Dec 12 23:52:42 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

/dev/sddp Dec 12 23:52:42 2016 err=0000000b sev=1 hxestorage
Hardware Exerciser stopped on an error

No errors logged in syslog after starting HTX:

==== State: Open by: asperez on 16 December 2016 10:28:02 ====
This error recreated on the smaller 1U Open Power system with the same smaller 1-adapter/1-4U90 drawer/90 HDD. There are 2 cables connected to the drawer (one to each ESM) that requires multipath enabled.

lpar: yellowbee

root@yellowbee:~# cat /tmp/htxerr

/dev/sdao Dec 16 01:14:44 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

/dev/sdak Dec 16 01:14:44 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

/dev/sdt Dec 16 01:14:44 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

/dev/sdaz Dec 16 01:14:44 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

/dev/sdn Dec 16 01:14:44 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

/dev/sdv Dec 16 01:14:44 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

/dev/sdaj Dec 16 01:14:44 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

/dev/sdal Dec 16 01:14:44 2016 err=0000000b sev=1 hxestorage
rc 11, errno 11 from main(): pthread_create

==== State: Open by: cde00 on 19 December 2016 02:48:46 ====

This defect will go to Linux as even after making the below 2 changes in systemd resource limit, errors are seen:

root@yellowbee:/etc/systemd/logind.conf.d# cat htxlogindcustom.conf
[Login]
UserTasksMax=infinity
root@yellowbee:/etc/systemd/logind.conf.d# cat ../system.conf.d/htxsystemdcustom.conf
[Manager]
DefaultTasksAccounting=yes
DefaultTasksMax=infinity

root@yellowbee:/etc/systemd/logind.conf.d#

logind limit for MaxUserTask as well as systemd limit was made infinite.
Please look defect SW363655 for more details and the suggestion given earlier by Linux team.

errors are still seen. So, would ask Linux team to take a look and see if anything else is causing these
errors.

== Comment: #3 - Kevin W. Rudd <email address hidden> - 2016-12-19 17:34:27 ==
It appears that the version of logind on this system does not support the value of "infinity", and is reverting to the default of 12288:

# cat /sys/fs/cgroup/pids/user.slice/user-0.slice/pids.max
12288

As a workaround until this can be resolved, specify an exact value. You can try using the current system thread-max value:

# cat /proc/sys/kernel/threads-max
3974272

/etc/systemd/logind.conf.d/htxlogindcustom.conf:
[Login]
UserTasksMax=3974272

== Comment: #4 - Kevin W. Rudd - 2016-12-20 10:25:09 ==

Canonical,

This issue appears to map to the following systemd bug and patch:

https://github.com/systemd/systemd/issues/3833

https://github.com/systemd/systemd/commit/f50582649f8eee73f59aff95fadd9a963ed4ffea

This patch appears to be included in debian/232-7, but is missing in the xenial and yakkety versions.