Activity log for bug #1908452

Date Who What changed Old value New value Message
2020-12-16 22:33:26 Patricia Domingues bug added bug
2020-12-16 22:34:51 Patricia Domingues attachment added rackd_and_regiond_logs https://bugs.launchpad.net/maas/+bug/1908452/+attachment/5444320/+files/maas_bug_dez20.tar.gz
2020-12-16 22:37:31 Patricia Domingues attachment added full_history_events https://bugs.launchpad.net/maas/+bug/1908452/+attachment/5444321/+files/maas-lp1908452-full_eventsLog
2020-12-16 22:39:50 Patricia Domingues attachment added server_console_log https://bugs.launchpad.net/maas/+bug/1908452/+attachment/5444322/+files/maas-lp1908452_console_log
2020-12-16 22:40:02 Patricia Domingues bug added subscriber dann frazier
2020-12-16 22:40:16 Patricia Domingues bug added subscriber Andrew Cloke
2020-12-16 22:40:32 Patricia Domingues bug added subscriber Alexandre Erwin Ittner
2021-01-06 00:23:38 dann frazier maas: status New Invalid
2021-01-07 15:35:14 Patricia Domingues maas: status Invalid New
2021-01-07 16:25:32 Andrew Cloke attachment added starmie.log https://bugs.launchpad.net/maas/+bug/1908452/+attachment/5450313/+files/starmie.log
2021-01-13 02:30:10 Gabriel Ramirez bug added subscriber Gabriel Ramirez
2021-01-26 18:59:48 dann frazier bug watch added http://bugs.python.org/issue34438
2021-01-26 19:01:11 dann frazier bug task added simplestreams (Ubuntu)
2021-02-02 11:03:30 Adam Collard bug task added simplestreams
2021-02-02 14:27:02 Adam Collard merge proposal linked https://code.launchpad.net/~adam-collard/simplestreams/+git/simplestreams/+merge/397354
2021-02-02 14:32:59 Adam Collard maas: status New In Progress
2021-02-02 14:33:02 Adam Collard maas: importance Undecided High
2021-02-02 14:33:13 Adam Collard maas: assignee Lee Trager (ltrager)
2021-02-02 17:21:55 Adam Collard simplestreams: status New In Progress
2021-02-02 17:21:57 Adam Collard simplestreams: assignee Adam Collard (adam-collard)
2021-02-03 10:28:52 Adam Collard simplestreams: status In Progress Fix Committed
2021-02-04 12:35:28 Paride Legovini simplestreams (Ubuntu): status New Triaged
2021-03-16 21:25:22 dann frazier description We are having an issue with our production MAAS The web UI is available normally, we can start to deploy, but the result is a failure - systems get stuck during `Loading ephemeral` step: ``` Tue, 15 Dec. 2020 23:08:57 Node - Powered off 'akis'. Tue, 15 Dec. 2020 23:05:25 Marking node failed - Node operation 'Deploying' timed out after 30 minutes. Tue, 15 Dec. 2020 22:35:31 Loading ephemeral Tue, 15 Dec. 2020 22:34:35 Performing PXE boot Tue, 15 Dec. 2020 22:31:35 Powering node on Tue, 15 Dec. 2020 22:31:35 Node - Started deploying 'akis'. Tue, 15 Dec. 2020 22:31:35 Deploying Tue, 15 Dec. 2020 22:31:09 Node - Acquired 'akis'. ``` It's the 3rd time we are seeing this behavior, which is fixed after a restart. MAAS version: 2.8.2 (8577-g.a3e674063) = How to determine you are seeing this problem = Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server? Get a list of pids of your regiond processes: $ ps -ef | grep regiond Run strace on each one to see if one is stuck in a connect() or recv() call: $ sudo strace -p $pid recv(... (normally you should see a lot of epoll_ctl() calls go by if not hung) If one is hung, use lsof to see what it is connected to: sudo lsof -i -a -p $pid If you see an open connection to your images server, then this maybe your problem.
2021-03-19 16:08:00 dann frazier description = How to determine you are seeing this problem = Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server? Get a list of pids of your regiond processes: $ ps -ef | grep regiond Run strace on each one to see if one is stuck in a connect() or recv() call: $ sudo strace -p $pid recv(... (normally you should see a lot of epoll_ctl() calls go by if not hung) If one is hung, use lsof to see what it is connected to: sudo lsof -i -a -p $pid If you see an open connection to your images server, then this maybe your problem. = How to determine you are seeing this problem = Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server? Get a list of pids of your regiond processes: $ ps -ef | grep regiond Run strace on each one to see if one is stuck in a connect() or recv() call: $ sudo strace -p $pid recv(... (normally you should see a lot of epoll_ctl() calls go by if not hung) If one is hung, use lsof to see what it is connected to: sudo lsof -i -a -p $pid If you see an open connection to your images server, then this maybe your problem. sudo kill -9 of the hung pid will cause it to respawn and recover.
2021-05-18 10:00:09 Paride Legovini simplestreams (Ubuntu): assignee Paride Legovini (paride)
2021-05-18 16:34:05 Paride Legovini nominated for series Ubuntu Focal
2021-05-18 16:34:05 Paride Legovini bug task added simplestreams (Ubuntu Focal)
2021-05-18 16:34:14 Paride Legovini simplestreams (Ubuntu Focal): status New Triaged
2021-05-18 16:34:26 Paride Legovini simplestreams (Ubuntu): status Triaged In Progress
2021-05-18 16:54:16 Paride Legovini simplestreams: status Fix Committed Fix Released
2021-05-18 16:54:23 Paride Legovini simplestreams (Ubuntu): status In Progress Fix Released
2021-06-10 16:05:36 Paride Legovini simplestreams (Ubuntu Focal): assignee Paride Legovini (paride)
2021-06-15 14:39:27 Paride Legovini description = How to determine you are seeing this problem = Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server? Get a list of pids of your regiond processes: $ ps -ef | grep regiond Run strace on each one to see if one is stuck in a connect() or recv() call: $ sudo strace -p $pid recv(... (normally you should see a lot of epoll_ctl() calls go by if not hung) If one is hung, use lsof to see what it is connected to: sudo lsof -i -a -p $pid If you see an open connection to your images server, then this maybe your problem. sudo kill -9 of the hung pid will cause it to respawn and recover. [Impact] The bug is about simplestreams possibly getting stuck waiting forever for an an HTTP response that never comes, e.g. because of networking issues. This can potentially affect any package depending on simplestreams, but specifically it was reported affecting MAAS, where it causes server deployments to timeout. [Test Plan] Ideally this should be tested by building a MAAS snap with the simplestreams package including the fix, verifying that is works as expected. [Regression Potential] Very little. Scenarios where it takes more than 10s for a remote server to provide simplestreams with the data it requested are unlikely, but can't be fully excluded. [Original Description] = How to determine you are seeing this problem = Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server? Get a list of pids of your regiond processes: $ ps -ef | grep regiond Run strace on each one to see if one is stuck in a connect() or recv() call: $ sudo strace -p $pid recv(... (normally you should see a lot of epoll_ctl() calls go by if not hung) If one is hung, use lsof to see what it is connected to: sudo lsof -i -a -p $pid If you see an open connection to your images server, then this maybe your problem. sudo kill -9 of the hung pid will cause it to respawn and recover.
2021-06-15 14:56:19 Launchpad Janitor merge proposal linked https://code.launchpad.net/~paride/ubuntu/+source/simplestreams/+git/simplestreams/+merge/404202
2021-06-15 15:14:35 Paride Legovini simplestreams (Ubuntu Focal): status Triaged In Progress
2021-06-16 16:24:01 Launchpad Janitor merge proposal linked https://code.launchpad.net/~paride/ubuntu/+source/simplestreams/+git/simplestreams/+merge/404259
2021-06-16 16:24:57 Paride Legovini nominated for series Ubuntu Hirsute
2021-06-16 16:24:57 Paride Legovini bug task added simplestreams (Ubuntu Hirsute)
2021-06-16 16:24:57 Paride Legovini nominated for series Ubuntu Groovy
2021-06-16 16:24:57 Paride Legovini bug task added simplestreams (Ubuntu Groovy)
2021-06-16 16:25:04 Paride Legovini simplestreams (Ubuntu Groovy): assignee Paride Legovini (paride)
2021-06-16 16:25:07 Paride Legovini simplestreams (Ubuntu Hirsute): assignee Paride Legovini (paride)
2021-06-16 16:25:14 Paride Legovini simplestreams (Ubuntu Groovy): status New In Progress
2021-06-16 16:25:18 Paride Legovini simplestreams (Ubuntu Hirsute): status New In Progress
2021-06-16 16:30:10 Launchpad Janitor merge proposal linked https://code.launchpad.net/~paride/ubuntu/+source/simplestreams/+git/simplestreams/+merge/404261
2021-06-16 16:33:57 Paride Legovini simplestreams (Ubuntu): status Fix Released Confirmed
2021-06-16 16:34:04 Paride Legovini simplestreams (Ubuntu): status Confirmed Fix Released
2021-06-17 12:37:42 Paride Legovini simplestreams (Ubuntu Focal): status In Progress Fix Committed
2021-06-17 12:37:47 Paride Legovini simplestreams (Ubuntu Groovy): status In Progress Fix Committed
2021-06-17 12:37:50 Paride Legovini simplestreams (Ubuntu Hirsute): status In Progress Fix Committed
2021-06-18 09:15:29 Timo Aaltonen bug added subscriber Ubuntu Stable Release Updates Team
2021-06-18 09:15:32 Timo Aaltonen bug added subscriber SRU Verification
2021-06-18 09:15:38 Timo Aaltonen tags verification-needed verification-needed-hirsute
2021-06-18 09:21:20 Timo Aaltonen tags verification-needed verification-needed-hirsute verification-needed verification-needed-groovy verification-needed-hirsute
2021-06-22 16:11:27 dann frazier description [Impact] The bug is about simplestreams possibly getting stuck waiting forever for an an HTTP response that never comes, e.g. because of networking issues. This can potentially affect any package depending on simplestreams, but specifically it was reported affecting MAAS, where it causes server deployments to timeout. [Test Plan] Ideally this should be tested by building a MAAS snap with the simplestreams package including the fix, verifying that is works as expected. [Regression Potential] Very little. Scenarios where it takes more than 10s for a remote server to provide simplestreams with the data it requested are unlikely, but can't be fully excluded. [Original Description] = How to determine you are seeing this problem = Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server? Get a list of pids of your regiond processes: $ ps -ef | grep regiond Run strace on each one to see if one is stuck in a connect() or recv() call: $ sudo strace -p $pid recv(... (normally you should see a lot of epoll_ctl() calls go by if not hung) If one is hung, use lsof to see what it is connected to: sudo lsof -i -a -p $pid If you see an open connection to your images server, then this maybe your problem. sudo kill -9 of the hung pid will cause it to respawn and recover. [Impact] The bug is about simplestreams possibly getting stuck waiting forever for an an HTTP response that never comes, e.g. because of networking issues. This can potentially affect any package depending on simplestreams, but specifically it was reported affecting MAAS, where it causes server deployments to timeout. [Test Plan] Install an iptables rule to block SSL handshaking w/ the MAAS simplestreams repo: ------------------------- $ sudo iptables -A INPUT -p tcp -s 91.189.88.136 -m string --string maas.io --algo bm -j DROP ------------------------- Run the reproducer described below, and verify that it hangs indefinitely (I recommend waiting 60s): ------------------------- $ cat repro.py #!/usr/bin/env python3 from simplestreams.contentsource import RequestsUrlReader url = "https://images.maas.io/ephemeral-v3/stable/streams/v1/index.sjson" r = RequestsUrlReader(url) ------------------------- With the fix applied, verify that it does timeout in ~10s. [Regression Potential] Scenarios where it takes more than 10s to initiate a connection are unlikely, but possible. Code that does not properly handle a timeout exception in these situations may begin to fail. [Original Description] = How to determine you are seeing this problem = Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server? Get a list of pids of your regiond processes: $ ps -ef | grep regiond Run strace on each one to see if one is stuck in a connect() or recv() call: $ sudo strace -p $pid recv(... (normally you should see a lot of epoll_ctl() calls go by if not hung) If one is hung, use lsof to see what it is connected to: sudo lsof -i -a -p $pid If you see an open connection to your images server, then this maybe your problem. sudo kill -9 of the hung pid will cause it to respawn and recover.
2021-06-22 16:16:52 dann frazier tags verification-needed verification-needed-groovy verification-needed-hirsute verification-done-hirsute verification-needed verification-needed-groovy
2021-06-22 16:25:15 dann frazier tags verification-done-hirsute verification-needed verification-needed-groovy verification-done-groovy verification-done-hirsute verification-needed
2021-06-22 18:42:15 Brian Murray tags verification-done-groovy verification-done-hirsute verification-needed verification-done-groovy verification-done-hirsute verification-needed verification-needed-focal
2021-06-22 19:53:39 dann frazier tags verification-done-groovy verification-done-hirsute verification-needed verification-needed-focal verification-done verification-done-focal verification-done-groovy verification-done-hirsute
2021-06-29 17:21:32 dann frazier tags verification-done verification-done-focal verification-done-groovy verification-done-hirsute verification-done-groovy verification-done-hirsute verification-needed verification-needed-focal
2021-06-29 17:26:06 Launchpad Janitor simplestreams (Ubuntu Hirsute): status Fix Committed Fix Released
2021-06-29 17:26:11 Brian Murray removed subscriber Ubuntu Stable Release Updates Team
2021-06-29 17:29:41 Launchpad Janitor simplestreams (Ubuntu Groovy): status Fix Committed Fix Released
2021-06-29 18:04:00 dann frazier tags verification-done-groovy verification-done-hirsute verification-needed verification-needed-focal verification-done verification-done-focal verification-done-groovy verification-done-hirsute
2021-06-29 21:12:09 Launchpad Janitor simplestreams (Ubuntu Focal): status Fix Committed Fix Released
2022-07-01 08:42:31 Adam Collard maas: assignee Lee Trager (ltrager)
2023-01-13 16:15:20 Taihsiang Ho bug added subscriber Taihsiang Ho
2023-11-23 09:12:42 Adam Collard maas: status In Progress Fix Committed
2023-11-23 09:12:47 Adam Collard maas: milestone 3.4.0
2024-01-04 09:09:36 Adam Collard maas: milestone 3.4.0 3.4.0-rc2
2024-01-04 09:16:39 Adam Collard maas: assignee Adam Collard (adam-collard)
2024-01-05 09:47:14 Alberto Donato maas: status Fix Committed Fix Released