nova-novncproxy process gets wedged, requiring kill -HUP
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Nova Cloud Controller Charm |
Invalid
|
Undecided
|
Unassigned | ||
Ubuntu Cloud Archive |
Invalid
|
Undecided
|
Unassigned | ||
Kilo |
Fix Released
|
Medium
|
Seyeong Kim | ||
Mitaka |
Fix Released
|
Medium
|
Seyeong Kim | ||
websockify (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Seyeong Kim |
Bug Description
[Impact]
affected
- UCA Mitaka, Kilo
- Xenial
not affected
- UCA Icehouse
- Trusty
( log symptom is different, there is no reaing(which is errata) zombie... etc)
When number of connections are many or frequently reconnecting to console, nova-novncproxy daemon is stuck because websockify is hang.
[Test case]
1. Deploy openstack
2. Creating instances
3. open console in browser with auto refresh extension ( set 5 seconds )
4. after several hours connection rejected
[Regression Potential]
Components that using websockify, escpecially nova-novncproxy, will be affected by this patch. However, After upgrading this and refreshing test above mentioned for 2 days without restarting any services, no hang happens. I tested this test in my local simple environment, so need to be considered possibility in different circumstances.
[Others]
related commits
- https:/
- https:/
[Original Description]
Users reported they were unable to connect to instance consoles via either Horizon or direct URL. Upon investigation we found errors suggesting the address and port were in use:
2017-08-23 14:51:56.248 1355081 INFO nova.console.
2017-08-23 14:51:56.248 1355081 INFO nova.console.
2017-08-23 14:51:56.248 1355081 INFO nova.console.
2017-08-23 14:51:56.248 1355081 INFO nova.console.
2017-08-23 14:51:56.248 1355081 INFO nova.console.
2017-08-23 14:51:56.249 1355081 CRITICAL nova [-] error: [Errno 98] Address already in use
2017-08-23 14:51:56.249 1355081 ERROR nova Traceback (most recent call last):
2017-08-23 14:51:56.249 1355081 ERROR nova File "/usr/bin/
2017-08-23 14:51:56.249 1355081 ERROR nova sys.exit(main())
2017-08-23 14:51:56.249 1355081 ERROR nova File "/usr/lib/
2017-08-23 14:51:56.249 1355081 ERROR nova port=CONF.
2017-08-23 14:51:56.249 1355081 ERROR nova File "/usr/lib/
2017-08-23 14:51:56.249 1355081 ERROR nova RequestHandlerC
2017-08-23 14:51:56.249 1355081 ERROR nova File "/usr/lib/
2017-08-23 14:51:56.249 1355081 ERROR nova tcp_keepintvl=
2017-08-23 14:51:56.249 1355081 ERROR nova File "/usr/lib/
2017-08-23 14:51:56.249 1355081 ERROR nova sock.bind(
2017-08-23 14:51:56.249 1355081 ERROR nova File "/usr/lib/
2017-08-23 14:51:56.249 1355081 ERROR nova return getattr(
2017-08-23 14:51:56.249 1355081 ERROR nova error: [Errno 98] Address already in use
2017-08-23 14:51:56.249 1355081 ERROR nova
This lead us to the discovery of a stuck nova-novncproxy process after stopping the service. Once we sent a kill -HUP to that process, we were able to start the nova-novncproxy and restore service to the users.
This was not the first time we have had to restart nova-novncproxy services after users reported that were unable to connect with VNC. This time, as well as at least 2 other times, we have seen the following errors in the nova-novncproxy.log during the time frame of the issue:
gaierror: [Errno -8] Servname not supported for ai_socktype
which seems to correspond to a log entries for connection strings with an invalid port ('port': u'-1'). As well as a bunch of:
error: [Errno 104] Connection reset by peer
affects: | nova (Ubuntu) → charm-nova-cloud-controller |
Changed in charm-nova-cloud-controller: | |
status: | New → Invalid |
Changed in nova (Ubuntu): | |
importance: | Undecided → Medium |
Changed in websockify (Ubuntu): | |
importance: | Undecided → Medium |
Changed in websockify (Ubuntu): | |
assignee: | nobody → Seyeong Kim (xtrusia) |
no longer affects: | nova (Ubuntu) |
description: | updated |
description: | updated |
Changed in websockify (Ubuntu Trusty): | |
assignee: | Seyeong Kim (xtrusia) → nobody |
no longer affects: | websockify (Ubuntu Trusty) |
no longer affects: | cloud-archive/icehouse |
tags: |
added: sts sts-sru-done verification-done removed: verification-needed |
Additional information
List of nova packages installed on nova-cloud- controller:
$ dpkg -l | grep nova 4-0ubuntu2~ cloud0 all OpenStack Compute - OpenStack Compute API frontend 4-0ubuntu2~ cloud0 all OpenStack Compute - certificate management 4-0ubuntu2~ cloud0 all OpenStack Compute - common files 4-0ubuntu2~ cloud0 all OpenStack Compute - conductor service 4-0ubuntu2~ cloud0 all OpenStack Compute - Console Authenticator 4-0ubuntu2~ cloud0 all OpenStack Compute - NoVNC proxy 4-0ubuntu2~ cloud0 all OpenStack Compute - virtual machine scheduler 4-0ubuntu2~ cloud0 all OpenStack Compute Python libraries 2ubuntu1~ cloud0 all client library for OpenStack Compute API - Python 2.7
ii nova-api-os-compute 2:13.1.
ii nova-cert 2:13.1.
ii nova-common 2:13.1.
ii nova-conductor 2:13.1.
ii nova-consoleauth 2:13.1.
ii nova-novncproxy 2:13.1.
ii nova-scheduler 2:13.1.
ii python-nova 2:13.1.
ii python-novaclient 2:3.3.1-
Keystone is configured for multi-domains, and there are 2 domains in case that is pertinent, also their endpoints are not SSL:
$ openstack endpoint list --format csv -c "Service Name" -c "Service Type" -c "Interface" -c URL | grep keystone ,"identity" ,"internal" ,"http://<ip>:5000/v3" ,"identity" ,"admin" ,"http://<ip>:35357/v3" ,"identity" ,"public" ,"http://<ip>:5000/v3"
"keystone"
"keystone"
"keystone"