nova-novncproxy process gets wedged, requiring kill -HUP
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack nova-cloud-controller charm |
Undecided
|
Unassigned | ||
| Ubuntu Cloud Archive |
Undecided
|
Unassigned | ||
| Kilo |
Medium
|
Seyeong Kim | ||
| Mitaka |
Medium
|
Seyeong Kim | ||
| websockify (Ubuntu) |
Undecided
|
Unassigned | ||
| Xenial |
Medium
|
Seyeong Kim |
Bug Description
[Impact]
affected
- UCA Mitaka, Kilo
- Xenial
not affected
- UCA Icehouse
- Trusty
( log symptom is different, there is no reaing(which is errata) zombie... etc)
When number of connections are many or frequently reconnecting to console, nova-novncproxy daemon is stuck because websockify is hang.
[Test case]
1. Deploy openstack
2. Creating instances
3. open console in browser with auto refresh extension ( set 5 seconds )
4. after several hours connection rejected
[Regression Potential]
Components that using websockify, escpecially nova-novncproxy, will be affected by this patch. However, After upgrading this and refreshing test above mentioned for 2 days without restarting any services, no hang happens. I tested this test in my local simple environment, so need to be considered possibility in different circumstances.
[Others]
related commits
- https:/
- https:/
[Original Description]
Users reported they were unable to connect to instance consoles via either Horizon or direct URL. Upon investigation we found errors suggesting the address and port were in use:
2017-08-23 14:51:56.248 1355081 INFO nova.console.
2017-08-23 14:51:56.248 1355081 INFO nova.console.
2017-08-23 14:51:56.248 1355081 INFO nova.console.
2017-08-23 14:51:56.248 1355081 INFO nova.console.
2017-08-23 14:51:56.248 1355081 INFO nova.console.
2017-08-23 14:51:56.249 1355081 CRITICAL nova [-] error: [Errno 98] Address already in use
2017-08-23 14:51:56.249 1355081 ERROR nova Traceback (most recent call last):
2017-08-23 14:51:56.249 1355081 ERROR nova File "/usr/bin/
2017-08-23 14:51:56.249 1355081 ERROR nova sys.exit(main())
2017-08-23 14:51:56.249 1355081 ERROR nova File "/usr/lib/
2017-08-23 14:51:56.249 1355081 ERROR nova port=CONF.
2017-08-23 14:51:56.249 1355081 ERROR nova File "/usr/lib/
2017-08-23 14:51:56.249 1355081 ERROR nova RequestHandlerC
2017-08-23 14:51:56.249 1355081 ERROR nova File "/usr/lib/
2017-08-23 14:51:56.249 1355081 ERROR nova tcp_keepintvl=
2017-08-23 14:51:56.249 1355081 ERROR nova File "/usr/lib/
2017-08-23 14:51:56.249 1355081 ERROR nova sock.bind(
2017-08-23 14:51:56.249 1355081 ERROR nova File "/usr/lib/
2017-08-23 14:51:56.249 1355081 ERROR nova return getattr(
2017-08-23 14:51:56.249 1355081 ERROR nova error: [Errno 98] Address already in use
2017-08-23 14:51:56.249 1355081 ERROR nova
This lead us to the discovery of a stuck nova-novncproxy process after stopping the service. Once we sent a kill -HUP to that process, we were able to start the nova-novncproxy and restore service to the users.
This was not the first time we have had to restart nova-novncproxy services after users reported that were unable to connect with VNC. This time, as well as at least 2 other times, we have seen the following errors in the nova-novncproxy.log during the time frame of the issue:
gaierror: [Errno -8] Servname not supported for ai_socktype
which seems to correspond to a log entries for connection strings with an invalid port ('port': u'-1'). As well as a bunch of:
error: [Errno 104] Connection reset by peer
Graham Burgess (stormmore) wrote : | #1 |
affects: | nova (Ubuntu) → charm-nova-cloud-controller |
Jill Rouleau (jillrouleau) wrote : | #2 |
This issue continues to reoccur on this cloud. From nova-novncproxy
It's necessary to kill -HUP all nova-novncproxy pids before init'ing the service again.
Trusty/Mitaka/17.02 charms.
Changed in charm-nova-cloud-controller: | |
status: | New → Invalid |
Launchpad Janitor (janitor) wrote : | #3 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in nova (Ubuntu): | |
status: | New → Confirmed |
Changed in websockify (Ubuntu): | |
status: | New → Confirmed |
Changed in nova (Ubuntu): | |
importance: | Undecided → Medium |
Changed in websockify (Ubuntu): | |
importance: | Undecided → Medium |
Changed in websockify (Ubuntu): | |
assignee: | nobody → Seyeong Kim (xtrusia) |
no longer affects: | nova (Ubuntu) |
Seyeong Kim (xtrusia) wrote : | #6 |
The attachment "lp1715254-
[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]
tags: | added: patch |
Corey Bryant (corey.bryant) wrote : | #8 |
Thanks for the patches Seyeong. Assuming those fix the problem this only affects websockify < 0.8.0, which are releases prior to Yakkety/Newton.
Changed in cloud-archive: | |
status: | New → Invalid |
Changed in websockify (Ubuntu Trusty): | |
status: | New → Triaged |
Changed in websockify (Ubuntu Xenial): | |
status: | New → Triaged |
Changed in websockify (Ubuntu Trusty): | |
importance: | Undecided → Medium |
Changed in websockify (Ubuntu Xenial): | |
importance: | Undecided → Medium |
Changed in websockify (Ubuntu Trusty): | |
assignee: | nobody → Seyeong Kim (xtrusia) |
Changed in websockify (Ubuntu Xenial): | |
assignee: | nobody → Seyeong Kim (xtrusia) |
Changed in websockify (Ubuntu): | |
status: | Confirmed → Invalid |
assignee: | Seyeong Kim (xtrusia) → nobody |
importance: | Medium → Undecided |
Corey Bryant (corey.bryant) wrote : | #9 |
I've uploaded Seyeong's xenial patch to the xenial review queue and it is awaiting SRU team review.
https:/
If you'd like to provide patches for trusty-kilo and trusty-icehouse I'd be happy to sponsor those as well.
Seyeong Kim (xtrusia) wrote : | #10 |
Seyeong Kim (xtrusia) wrote : | #11 |
Hello Corey,
I've uploaded patch for kilo.
I'm going to upload patches for icehouse and trusty
after testing them.
I'm testing them but log is little different.
will keep posting
Thanks
description: | updated |
description: | updated |
Seyeong Kim (xtrusia) wrote : | #12 |
Hello Corey,
I've tested Trusty & UCA Icehouse.
However, I couldn't reproduce this issue.
msgs in logs are different to kilo, mitaka, xenial
There is no 'Reaing zombies, active child count is'.
There are a lot of them on kilo, mitaka, xenial
I saw jame's latest commit which is patch for multiprocessing
but it seems not working on trusty, uca icehouse ( not sure 100% )
Hello Graham, or anyone else affected,
Accepted websockify into xenial-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-
Further information regarding the verification process can be found at https:/
Changed in websockify (Ubuntu Xenial): | |
status: | Triaged → Fix Committed |
tags: | added: verification-needed verification-needed-xenial |
Seyeong Kim (xtrusia) wrote : | #14 |
Hello,
I tested this proposed pkg and confirmed it is working fine.
For testing, I just did steps on [Test case] section.
1. juju deploy xenial.bundle
2. create network & subnet
3. juju config nova-cloud-
4. create instance
5. refreshing every 5 seconds on 2 browsers with console url for several hors
Thanks
ii websockify 0.6.1+dfsg1-
tags: |
added: verification-done-xenial removed: verification-needed-xenial |
The verification of the Stable Release Update for websockify has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
Launchpad Janitor (janitor) wrote : | #16 |
This bug was fixed in the package websockify - 0.6.1+dfsg1-
---------------
websockify (0.6.1+
* Fix hanging nova-novncproxy and can't be restarted (LP: #1715254)
- [PATCH] Make websockify respect SIGTERM
- [PATCH] Remove additional signal calls in websockify that
causes novnc to hang.
-- Seyeong Kim <email address hidden> Mon, 23 Oct 2017 18:31:40 +0900
Changed in websockify (Ubuntu Xenial): | |
status: | Fix Committed → Fix Released |
Hello Graham, or anyone else affected,
Accepted websockify into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.
Please help us by testing this new package. To enable the -proposed repository:
sudo add-apt-repository cloud-archive:
sudo apt-get update
Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-
Further information regarding the verification process can be found at https:/
tags: | added: verification-mitaka-needed |
Seyeong Kim (xtrusia) wrote : | #18 |
it seems not in -proposed yet,
I'll test this when I can upgrade websockify
Seyeong Kim (xtrusia) wrote : | #19 |
hello corey
I checked trusty-
but websockify version is
0.6.1+dfsg1-
but it is current version i think.
you need to check this?
Thanks
Corey Bryant (corey.bryant) wrote : | #20 |
Hello Seyeong,
This is all set now. We had an issue with the cloud archive sync. Can you try again?
Thanks,
Corey
Seyeong Kim (xtrusia) wrote : | #21 |
upgraded to 0.6.1+dfsg1-
tested same steps as above.
it works fine.
Thanks.
tags: |
added: verification-mitaka-done removed: verification-mitaka-needed |
The verification of the Stable Release Update for websockify has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
James Page (james-page) wrote : | #23 |
This bug was fixed in the package websockify - 0.6.1+dfsg1-
---------------
websockify (0.6.1+
.
* New update for the Ubuntu Cloud Archive.
.
websockify (0.6.1+
.
* Fix hanging nova-novncproxy and can't be restarted (LP: #1715254)
- [PATCH] Make websockify respect SIGTERM
- [PATCH] Remove additional signal calls in websockify that
causes novnc to hang.
Changed in websockify (Ubuntu Trusty): | |
assignee: | Seyeong Kim (xtrusia) → nobody |
no longer affects: | websockify (Ubuntu Trusty) |
no longer affects: | cloud-archive/icehouse |
Hello Graham, or anyone else affected,
Accepted websockify into kilo-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.
Please help us by testing this new package. To enable the -proposed repository:
sudo add-apt-repository cloud-archive:
sudo apt-get update
Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-
Further information regarding the verification process can be found at https:/
tags: | added: verification-kilo-needed |
Seyeong Kim (xtrusia) wrote : | #25 |
ii websockify 0.6.0+dfsg1-
reconnection test for several hours.
Thanks.
tags: |
added: verification-kilo-done removed: verification-kilo-needed |
The verification of the Stable Release Update for websockify has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
Corey Bryant (corey.bryant) wrote : | #27 |
This bug was fixed in the package websockify - 0.6.0+dfsg1-
---------------
websockify (0.6.0+
.
* Fix hanging nova-novncproxy and can't be restarted (LP: #1715254)
- [PATCH] Make websockify respect SIGTERM
- [PATCH] Remove additional signal calls in websockify that
causes novnc to hang.
tags: |
added: sts sts-sru-done verification-done removed: verification-needed |
Additional information
List of nova packages installed on nova-cloud- controller:
$ dpkg -l | grep nova 4-0ubuntu2~ cloud0 all OpenStack Compute - OpenStack Compute API frontend 4-0ubuntu2~ cloud0 all OpenStack Compute - certificate management 4-0ubuntu2~ cloud0 all OpenStack Compute - common files 4-0ubuntu2~ cloud0 all OpenStack Compute - conductor service 4-0ubuntu2~ cloud0 all OpenStack Compute - Console Authenticator 4-0ubuntu2~ cloud0 all OpenStack Compute - NoVNC proxy 4-0ubuntu2~ cloud0 all OpenStack Compute - virtual machine scheduler 4-0ubuntu2~ cloud0 all OpenStack Compute Python libraries 2ubuntu1~ cloud0 all client library for OpenStack Compute API - Python 2.7
ii nova-api-os-compute 2:13.1.
ii nova-cert 2:13.1.
ii nova-common 2:13.1.
ii nova-conductor 2:13.1.
ii nova-consoleauth 2:13.1.
ii nova-novncproxy 2:13.1.
ii nova-scheduler 2:13.1.
ii python-nova 2:13.1.
ii python-novaclient 2:3.3.1-
Keystone is configured for multi-domains, and there are 2 domains in case that is pertinent, also their endpoints are not SSL:
$ openstack endpoint list --format csv -c "Service Name" -c "Service Type" -c "Interface" -c URL | grep keystone ,"identity" ,"internal" ,"http://<ip>:5000/v3" ,"identity" ,"admin" ,"http://<ip>:35357/v3" ,"identity" ,"public" ,"http://<ip>:5000/v3"
"keystone"
"keystone"
"keystone"