VMware: Broken pipe ERROR when boot VM

Bug #1364685 reported by David Geng on 2014-09-03
36
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned
VMwareAPI-Team
High
Unassigned
oslo.vmware
Low
Unassigned

Bug Description

This error happens intermittently, but always can be reproduced after long run and have multiple vmware computer connect to the same vCenter in our test environment:

2014-09-02 09:34:53.489 9439 ERROR nova.virt.vmwareapi.io_util [-] [Errno 32] Broken pipe
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util Traceback (most recent call last):
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util File "/usr/lib/python2.6/site-packages/nova/virt/vmwareapi/io_util.py", line 178, in _inner
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util self.output.write(data)
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util File "/usr/lib/python2.6/site-packages/nova/virt/vmwareapi/read_write_util.py", line 138, in write
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util self.file_handle.send(data)
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util File "/usr/lib64/python2.6/httplib.py", line 759, in send
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util self.sock.sendall(str)
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util File "/usr/lib/python2.6/site-packages/eventlet/green/ssl.py", line 131, in sendall
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util v = self.send(data[count:])
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util File "/usr/lib/python2.6/site-packages/eventlet/green/ssl.py", line 107, in send
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util super(GreenSSLSocket, self).send, data, flags)
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util File "/usr/lib/python2.6/site-packages/eventlet/green/ssl.py", line 77, in _call_trampolining
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util return func(*a, **kw)
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util File "/usr/lib64/python2.6/ssl.py", line 174, in send
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util v = self._sslobj.write(data)
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util error: [Errno 32] Broken pipe
2014-09-02 09:34:53.489 9439 TRACE nova.virt.vmwareapi.io_util

We are using the 'VMware vCenter Server Appliance', version is 5.5.0.

Normally, there are about 2000+ connection in TIME_WAIT status on port 443 when this error happens, and have 80 session in idle in our test env.

David Geng (genggjh) wrote :

So, I want to know, does that mean the VMware driver/VMware API can not release the TCP connection correctly?
Or should I do some tuning on the vCenter server for example to change the ulimit?

David Geng (genggjh) wrote :

I can find the following in vpx log when the error happens:

2014-09-02T17:42:23.730Z [7FB1B1F00700 error 'HTTP server /folder'] NfcFssrvr_FileOpen returned 3, fileIOErr: 0
2014-09-02T17:42:23.730Z [7FB1B2893700 warning 'Proxy Req 51065'] Write to named pipe /var/run/vmware/vpxd-webserver-pipe failed with error N7Vmacore15SystemExceptionE(Broken pipe)

David Geng (genggjh) wrote :

Currently, the only workaround we can find is restart the vCenter server.

Gary Kotton (garyk) on 2014-09-03
Changed in nova:
importance: Undecided → High
tags: added: vmware
Tracy Jones (tjones-i) on 2014-09-03
Changed in openstack-vmwareapi-team:
importance: Undecided → High
Sean Dague (sdague) on 2014-09-12
Changed in nova:
status: New → Confirmed
importance: High → Medium
David Geng (genggjh) wrote :

Anyone has update on this bug?

David Geng (genggjh) wrote :

We hit another Broken pipe when upload config driver ISO:
[instance: 1bee1ade-f02a-4c32-8c96-52fb30dcfcea] Creating config drive failed with error: [Errno 32] Broken pipe

There are corresponding error in vCetner vpxd log file:
2014-09-25T13:58:36.758Z [22524 error 'Default'] SSLStreamImpl::DoServerHandshake for SSL(TCPStreamWin32(socket=TCP(fd=5480) local=172.16.11.254:443, peer=172.23.5.131:49803)): SSL_accept failed with Unexpected EOF
2014-09-25T13:58:36.758Z [22524 warning 'ProxySvc'] SSL Handshake failed for stream TCPStreamWin32(socket=TCP(fd=5480) local=172.16.11.254:443, peer=172.23.5.131:49803), error: class Vmacore::Ssl::SSLException(SSL Exception: Unexpected EOF)

The workaround is restart the VC Operating System.

zhu zhu (zhuzhubj) wrote :

Maybe enlarge the Handshake timeout may help:
Similar issue:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020210

Connect to your ESXi/ESX host via a remote Kernel-based Virtual Machine (KVM) or Secure Shell (SSH) session or directly as root.
Open the /etc/vmware/hostd/config.xml file in a text editor.
Enter <handshakeTimeoutMs> 120000 </handshakeTimeoutMs> between the <ssl> tags in the vmacore section. This parameter is in milliseconds. For example, 120000 millseconds = 2 minutes.

<vmacore>
<threadPool>
<MaxFdsPerThread>2048</MaxFdsPerThread>
</threadPool>
<ssl>
<handshakeTimeoutMs> 120000 </handshakeTimeoutMs>
<useCompression>true</useCompression>
</ssl>
<vmdb>
<maxConnectionCount>8</maxConnectionCount>
</vmdb>
<loadPlugins> true </loadPlugins>
</vmacore>

Feng Xi Yan (yanfengxi) wrote :

With sufficient debugging, I can conclude that, when using datastore that contain "&", the image or config-drive iso transmission will fail with pipe error.
This issue is not about config drive, it is about the system module "/usr/lib64/python2.6/lib-dynload/_ssl.so".
The vmware drive uses this module to transfer the vmdk image or iso file to datastores. But for some reason, if the datastore name has "&", the transmission will never finish.

I observed the image transmission with illegal datastore name, the image will can only be transferred 0.49K to datastore, and will never increase. When datastore name is changed to normal, the transmission will be OK.

Based above comments, one solution is adding datastore name filter in vmware driver, to avoid users from using illegal names.

The legal name can only contain letters, numbers, _, - and whitespaces.

David Geng (genggjh) wrote :

@yanfengxi, hi FengXi, thanks for your update, but I would say that should a different issue with the datastore name include some special characters.

When I reported this issue which based on our FVT environment and there is only one datastore in our vCenter, the datastore name is 'datastore1'.

Changed in oslo.vmware:
status: New → Confirmed
importance: Undecided → Low

I'm getting the exact error you mentioned @genggjh.
"Broken pipe error" in nova compute log.

and VPXD log reads :
" Write to named pipe /var/run/vmware/vpxd-webserver-pipe failed with error N7Vmacore15SystemExceptionE(Broken pipe)"

 and i'm not in a position to restart the VC ;-)

Feng Xi Yan (yanfengxi) wrote :

I found another case that will result in this error. When the datacenter(not datastore) are under a sub-folder, this pipeline broken error will happen, too.

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance: Medium → Undecided
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers