RemoteFS is not working properly with privsep in os_brick 1.5.0

Bug #1610395 reported by Jay Bryant
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Invalid
High
Jay Bryant
os-brick
Invalid
Undecided
Unassigned

Bug Description

It appears that, with privsep enabled, RemoteFS doesn't work properly for NFS. With the release of os-brick 1.5.0 we started seeing the following failure:

016-08-05 18:06:37.180 35383 DEBUG oslo_concurrency.processutils [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Running cmd (subprocess): mkdir -p /var/lib/cinder/mnt/5fd17b88c08c2759966c40234dfd10c0 execute /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/oslo_concurrency/processutils.py:344
2016-08-05 18:06:37.185 35383 DEBUG oslo_concurrency.processutils [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] CMD "mkdir -p /var/lib/cinder/mnt/5fd17b88c08c2759966c40234dfd10c0" returned: 0 in 0.005s execute /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/oslo_concurrency/processutils.py:374
2016-08-05 18:06:37.185 35383 DEBUG os_brick.remotefs.remotefs [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] JSBRYANT: mnt_cmd is ['mount', '-t', 'nfs', '-o', 'rw,vers=4', '10.2.125.7:/SL01SEV316090_1', '/var/lib/cinder/mnt/5fd17b88c08c2759966c40234dfd10c0'] _do_mount /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/os_brick/remotefs/remotefs.py:124
2016-08-05 18:06:37.186 35383 INFO oslo.privsep.daemon [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Running privsep helper: ['sudo', 'cinder-rootwrap', '/etc/cinder/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/cinder/cinder.conf', '--config-dir', ['/etc/cinder'], '--privsep_context', 'os_brick.privileged.default', '--privsep_sock_path', '/tmp/tmpjPxFIy/privsep.sock']
2016-08-05 18:06:37.201 35383 DEBUG os_brick.remotefs.remotefs [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Failed to do nfs mount. _mount_nfs /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/os_brick/remotefs/remotefs.py:145
2016-08-05 18:06:37.202 35383 DEBUG cinder.volume.drivers.nfs [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Mount attempt 1 failed: NFS mount failed for share 10.2.125.7:/SL01SEV316090_1. Error - {'nfs': u'execv() arg 2 must contain only strings'}.
Retrying mount ... _ensure_share_mounted /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/cinder/volume/drivers/nfs.py:164
2016-08-05 18:06:38.203 35383 DEBUG oslo_concurrency.processutils [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Running cmd (subprocess): mount execute /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/oslo_concurrency/processutils.py:344
2016-08-05 18:06:38.209 35383 DEBUG oslo_concurrency.processutils [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] CMD "mount" returned: 0 in 0.006s execute /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/oslo_concurrency/processutils.py:374
2016-08-05 18:06:38.210 35383 DEBUG oslo_concurrency.processutils [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Running cmd (subprocess): mkdir -p /var/lib/cinder/mnt/5fd17b88c08c2759966c40234dfd10c0 execute /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/oslo_concurrency/processutils.py:344
2016-08-05 18:06:38.216 35383 DEBUG oslo_concurrency.processutils [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] CMD "mkdir -p /var/lib/cinder/mnt/5fd17b88c08c2759966c40234dfd10c0" returned: 0 in 0.006s execute /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/oslo_concurrency/processutils.py:374
2016-08-05 18:06:38.217 35383 DEBUG os_brick.remotefs.remotefs [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] JSBRYANT: mnt_cmd is ['mount', '-t', 'nfs', '-o', 'rw,vers=4', '10.2.125.7:/SL01SEV316090_1', '/var/lib/cinder/mnt/5fd17b88c08c2759966c40234dfd10c0'] _do_mount /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/os_brick/remotefs/remotefs.py:124
2016-08-05 18:06:38.218 35383 INFO oslo.privsep.daemon [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Running privsep helper: ['sudo', 'cinder-rootwrap', '/etc/cinder/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/cinder/cinder.conf', '--config-dir', ['/etc/cinder'], '--privsep_context', 'os_brick.privileged.default', '--privsep_sock_path', '/tmp/tmpJU2WgV/privsep.sock']
2016-08-05 18:06:38.233 35383 DEBUG os_brick.remotefs.remotefs [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Failed to do nfs mount. _mount_nfs /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/os_brick/remotefs/remotefs.py:145
2016-08-05 18:06:38.234 35383 ERROR cinder.volume.drivers.nfs [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Mount failure for 10.2.125.7:/SL01SEV316090_1 after 3 attempts.
2016-08-05 18:06:38.235 35383 ERROR cinder.volume.drivers.remotefs [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Exception during mounting NFS mount failed for share 10.2.125.7:/SL01SEV316090_1. Error - {'nfs': u'execv() arg 2 must contain only strings'}
2016-08-05 18:06:38.235 35383 DEBUG cinder.volume.drivers.remotefs [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Available shares [] _ensure_shares_mounted /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/cinder/volume/drivers/remotefs.py:279
2016-08-05 18:06:38.236 35383 ERROR cinder.volume.manager [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Failed to initialize driver.
2016-08-05 18:06:38.236 35383 ERROR cinder.volume.manager Traceback (most recent call last):
2016-08-05 18:06:38.236 35383 ERROR cinder.volume.manager File "/opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/cinder/volume/manager.py", line 388, in init_host
2016-08-05 18:06:38.236 35383 ERROR cinder.volume.manager self.driver.do_setup(ctxt)
2016-08-05 18:06:38.236 35383 ERROR cinder.volume.manager File "/opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/cinder/volume/drivers/nfs.py", line 144, in do_setup
2016-08-05 18:06:38.236 35383 ERROR cinder.volume.manager self.set_nas_security_options(self._is_voldb_empty_at_startup)
2016-08-05 18:06:38.236 35383 ERROR cinder.volume.manager File "/opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/cinder/volume/drivers/nfs.py", line 342, in set_nas_security_options
2016-08-05 18:06:38.236 35383 ERROR cinder.volume.manager raise exception.NfsNoSharesMounted()
2016-08-05 18:06:38.236 35383 ERROR cinder.volume.manager NfsNoSharesMounted: No mounted NFS shares found
2016-08-05 18:06:38.236 35383 ERROR cinder.volume.manager
2016-08-05 18:06:38.238 35383 DEBUG cinder.service [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Creating RPC server for service cinder-volume start /opt/bbc/openstack-11.0-master/cinder/local/lib/python2.7/site-packages/cinder/service.py:232

The root of the problem is here:

2016-08-05 18:06:38.235 35383 ERROR cinder.volume.drivers.remotefs [req-88f40bc4-b88b-4ae8-85a0-735250e0eadd - - - - -] Exception during mounting NFS mount failed for share 10.2.125.7:/SL01SEV316090_1. Error - {'nfs': u'execv() arg 2 must contain only strings'}

For some reason execv thinks we are sending an integer instead of a string. The command that is being sent:
['mount', '-t', 'nfs', '-o', 'rw,vers=4', '10.2.125.7:/SL01SEV316090_1', '/var/lib/cinder/mnt/5fd17b88c08c2759966c40234dfd10c0']

Not sure why one of those arguments is being interpreted as a non-string.

If we force the RemoteFS driver to use rootwrap instead, we don't see this problem.

os-brick will need to be fixed.

Jay Bryant (jsbryant)
Changed in cinder:
importance: Undecided → Critical
tags: added: os-brick
Matt Riedemann (mriedem)
tags: added: nfs privsep
Revision history for this message
Matt Riedemann (mriedem) wrote :

What version of oslo.rootwrap do you have installed? 5.0.0?

Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm running the nfs experimental queue job on this change (unrelated) to see if this can be reproduced in the upstream CI:

https://review.openstack.org/#/c/351890/

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm not seeing the same failures in the nfs ci results:

http://logs.openstack.org/90/351890/1/experimental/gate-tempest-dsvm-full-devstack-plugin-nfs-nv/da3ebbf/logs/screen-n-cpu.txt.gz?level=TRACE

So what's different about your config from that?

Revision history for this message
Matt Riedemann (mriedem) wrote :

What is the exec_dirs value in your cinder rootwrap.conf?

There was some discussion in https://review.openstack.org/#/c/344450/ about venvs and exec_dirs causing issues with finding privsep-helper.

Revision history for this message
Angus Lees (gus) wrote :

Is the full log file available?

Revision history for this message
Angus Lees (gus) wrote :

and what version of oslo.privsep is this?

Revision history for this message
Jay Bryant (jsbryant) wrote :

The root of the problem was the following was missing from the cinder.conf file:

[privsep_osbrick]
helper_command = sudo cinder-rootwrap $rootwrap_config /opt/bbc/openstack-11.0-master/cinder/bin/privsep-helper --config-file /etc/cinder/cinder.conf

Once that was added to cinder.conf I was able to move forward. It seems to me that the lack of the helper_command should lead to some sort of error message rather than us getting a weird error back from subprocess.Popen early in the privsep initialization.

Revision history for this message
Angus Lees (gus) wrote :

Jay: agreed, which is why I'm asking if more of the log file is available. privsep *should* have logged an error (at critical), including stderr explanation (at warning), when it failed to exec privsep-helper. See oslo_privsep.daemon.RootwrapClientChannel.

A suggestion that came out of one of the sessions at last summit was to make fatal privsep failures fatal to the regular process too (following the "it's really just an in-process thread, only it isn't because unix needs a full process for a different uid" metaphor). If nothing else, this would have made your configuration issue here more immediately noticeable.

Also, the justification in https://review.openstack.org/#/c/344450/ was that your rootwrap config should already configure exec_dirs to include your venv bin/ - and hence the explicit helper_command config should not be required. I would be interested to know if this mechanism is not working as expected.

Revision history for this message
Jay Bryant (jsbryant) wrote :

Let me try to answer all the questions here: First, Angus, here is output of the full log for startup: http://paste.openstack.org/show/552658/ You can see it gets as far as this line: https://github.com/openstack/oslo.privsep/blob/master/oslo_privsep/daemon.py#L284 I put debug output before and after that line and saw that it never got beyond that line. I do not see any other logs from privsep when I don't have the helper_command set for os_brick.

oslo.privsep level: 1.11.0
oslo.rootwrap level: 5.0.0
os.brick level: 1.5.0

We worked with Matt and added a line to exec_dirs in rootwrap.conf that enabled privsep-helper to be found without changing paths in the volume.filters file. We still, however, need to add the helper_command into cinder.conf as indicated in comment #8.

Revision history for this message
Tony Breeds (o-tony) wrote :

When I tested this I assumed the venv would be activated before the binary is run. Something like:

. /opt/bbc/openstack-11.0-master/cinder/bin/activate
/opt/bbc/openstack-11.0-master/cinder/bin/cinder

Is that assumption valid?

Revision history for this message
Angus Lees (gus) wrote :

Jay: Thanks for the additional information. Those logs show the privsep command line that is being attempted:

 2016-08-09 18:30:31.713 36136 INFO oslo.privsep.daemon [req-9f5eb85c-1014-4749-a16d-fd73a6ce9700 - - - - -] Running privsep helper: ['sudo', 'cinder-rootwrap', '/etc/cinder/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/cinder/cinder.conf', '--config-dir', ['/etc/cinder'], '--privsep_context', 'os_brick.privileged.default', '--privsep_sock_path', '/tmp/tmpExaYUA/privsep.sock']

I'm interested also in the next privsep log line, which doesn't appear in your quoted piece. It should be either an INFO level "Spawned new privsep daemon via rootwrap" if everything went well, or an ERROR level "privsep helper command exited non-zero" if it failed (with hopefully some stderr warnings before the exception). From the logs shown so far we don't actually *know* that privsep-helper failed to start (and that it was due to not finding the executable).

Changed in cinder:
milestone: none → newton-rc1
Changed in cinder:
importance: Critical → High
assignee: nobody → Jay Bryant (jsbryant)
status: New → In Progress
Changed in cinder:
milestone: newton-rc1 → ocata-1
Revision history for this message
Jay Bryant (jsbryant) wrote :

So, I think we can close this bug now. This was all around the move to os-brick using privsep and us trying to run off master in IBM's public cloud. We got to a point where we couldn't recreate any issues and I haven't heard of people having further problems. So, I think we can just close this.

Changed in cinder:
status: In Progress → Won't Fix
Changed in os-brick:
status: New → Invalid
Changed in cinder:
status: Won't Fix → Invalid
Changed in cinder:
milestone: ocata-1 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.