2020-12-16 22:33:26 |
Patricia Domingues |
bug |
|
|
added bug |
2020-12-16 22:34:51 |
Patricia Domingues |
attachment added |
|
rackd_and_regiond_logs https://bugs.launchpad.net/maas/+bug/1908452/+attachment/5444320/+files/maas_bug_dez20.tar.gz |
|
2020-12-16 22:37:31 |
Patricia Domingues |
attachment added |
|
full_history_events https://bugs.launchpad.net/maas/+bug/1908452/+attachment/5444321/+files/maas-lp1908452-full_eventsLog |
|
2020-12-16 22:39:50 |
Patricia Domingues |
attachment added |
|
server_console_log https://bugs.launchpad.net/maas/+bug/1908452/+attachment/5444322/+files/maas-lp1908452_console_log |
|
2020-12-16 22:40:02 |
Patricia Domingues |
bug |
|
|
added subscriber dann frazier |
2020-12-16 22:40:16 |
Patricia Domingues |
bug |
|
|
added subscriber Andrew Cloke |
2020-12-16 22:40:32 |
Patricia Domingues |
bug |
|
|
added subscriber Alexandre Erwin Ittner |
2021-01-06 00:23:38 |
dann frazier |
maas: status |
New |
Invalid |
|
2021-01-07 15:35:14 |
Patricia Domingues |
maas: status |
Invalid |
New |
|
2021-01-07 16:25:32 |
Andrew Cloke |
attachment added |
|
starmie.log https://bugs.launchpad.net/maas/+bug/1908452/+attachment/5450313/+files/starmie.log |
|
2021-01-13 02:30:10 |
Gabriel Ramirez |
bug |
|
|
added subscriber Gabriel Ramirez |
2021-01-26 18:59:48 |
dann frazier |
bug watch added |
|
http://bugs.python.org/issue34438 |
|
2021-01-26 19:01:11 |
dann frazier |
bug task added |
|
simplestreams (Ubuntu) |
|
2021-02-02 11:03:30 |
Adam Collard |
bug task added |
|
simplestreams |
|
2021-02-02 14:27:02 |
Adam Collard |
merge proposal linked |
|
https://code.launchpad.net/~adam-collard/simplestreams/+git/simplestreams/+merge/397354 |
|
2021-02-02 14:32:59 |
Adam Collard |
maas: status |
New |
In Progress |
|
2021-02-02 14:33:02 |
Adam Collard |
maas: importance |
Undecided |
High |
|
2021-02-02 14:33:13 |
Adam Collard |
maas: assignee |
|
Lee Trager (ltrager) |
|
2021-02-02 17:21:55 |
Adam Collard |
simplestreams: status |
New |
In Progress |
|
2021-02-02 17:21:57 |
Adam Collard |
simplestreams: assignee |
|
Adam Collard (adam-collard) |
|
2021-02-03 10:28:52 |
Adam Collard |
simplestreams: status |
In Progress |
Fix Committed |
|
2021-02-04 12:35:28 |
Paride Legovini |
simplestreams (Ubuntu): status |
New |
Triaged |
|
2021-03-16 21:25:22 |
dann frazier |
description |
We are having an issue with our production MAAS
The web UI is available normally, we can start to deploy, but the result is a failure - systems get stuck during `Loading ephemeral` step:
```
Tue, 15 Dec. 2020 23:08:57 Node - Powered off 'akis'.
Tue, 15 Dec. 2020 23:05:25 Marking node failed - Node operation 'Deploying' timed out after 30 minutes.
Tue, 15 Dec. 2020 22:35:31 Loading ephemeral
Tue, 15 Dec. 2020 22:34:35 Performing PXE boot
Tue, 15 Dec. 2020 22:31:35 Powering node on
Tue, 15 Dec. 2020 22:31:35 Node - Started deploying 'akis'.
Tue, 15 Dec. 2020 22:31:35 Deploying
Tue, 15 Dec. 2020 22:31:09 Node - Acquired 'akis'.
```
It's the 3rd time we are seeing this behavior, which is fixed after a restart.
MAAS version: 2.8.2 (8577-g.a3e674063) |
= How to determine you are seeing this problem =
Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server?
Get a list of pids of your regiond processes:
$ ps -ef | grep regiond
Run strace on each one to see if one is stuck in a connect() or recv() call:
$ sudo strace -p $pid
recv(...
(normally you should see a lot of epoll_ctl() calls go by if not hung)
If one is hung, use lsof to see what it is connected to:
sudo lsof -i -a -p $pid
If you see an open connection to your images server, then this maybe your problem. |
|
2021-03-19 16:08:00 |
dann frazier |
description |
= How to determine you are seeing this problem =
Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server?
Get a list of pids of your regiond processes:
$ ps -ef | grep regiond
Run strace on each one to see if one is stuck in a connect() or recv() call:
$ sudo strace -p $pid
recv(...
(normally you should see a lot of epoll_ctl() calls go by if not hung)
If one is hung, use lsof to see what it is connected to:
sudo lsof -i -a -p $pid
If you see an open connection to your images server, then this maybe your problem. |
= How to determine you are seeing this problem =
Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server?
Get a list of pids of your regiond processes:
$ ps -ef | grep regiond
Run strace on each one to see if one is stuck in a connect() or recv() call:
$ sudo strace -p $pid
recv(...
(normally you should see a lot of epoll_ctl() calls go by if not hung)
If one is hung, use lsof to see what it is connected to:
sudo lsof -i -a -p $pid
If you see an open connection to your images server, then this maybe your problem. sudo kill -9 of the hung pid will cause it to respawn and recover. |
|
2021-05-18 10:00:09 |
Paride Legovini |
simplestreams (Ubuntu): assignee |
|
Paride Legovini (paride) |
|
2021-05-18 16:34:05 |
Paride Legovini |
nominated for series |
|
Ubuntu Focal |
|
2021-05-18 16:34:05 |
Paride Legovini |
bug task added |
|
simplestreams (Ubuntu Focal) |
|
2021-05-18 16:34:14 |
Paride Legovini |
simplestreams (Ubuntu Focal): status |
New |
Triaged |
|
2021-05-18 16:34:26 |
Paride Legovini |
simplestreams (Ubuntu): status |
Triaged |
In Progress |
|
2021-05-18 16:54:16 |
Paride Legovini |
simplestreams: status |
Fix Committed |
Fix Released |
|
2021-05-18 16:54:23 |
Paride Legovini |
simplestreams (Ubuntu): status |
In Progress |
Fix Released |
|
2021-06-10 16:05:36 |
Paride Legovini |
simplestreams (Ubuntu Focal): assignee |
|
Paride Legovini (paride) |
|
2021-06-15 14:39:27 |
Paride Legovini |
description |
= How to determine you are seeing this problem =
Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server?
Get a list of pids of your regiond processes:
$ ps -ef | grep regiond
Run strace on each one to see if one is stuck in a connect() or recv() call:
$ sudo strace -p $pid
recv(...
(normally you should see a lot of epoll_ctl() calls go by if not hung)
If one is hung, use lsof to see what it is connected to:
sudo lsof -i -a -p $pid
If you see an open connection to your images server, then this maybe your problem. sudo kill -9 of the hung pid will cause it to respawn and recover. |
[Impact]
The bug is about simplestreams possibly getting stuck waiting forever for an an HTTP response that never comes, e.g. because of networking issues. This can potentially affect any package depending on simplestreams, but specifically it was reported affecting MAAS, where it causes server deployments to timeout.
[Test Plan]
Ideally this should be tested by building a MAAS snap with the simplestreams package including the fix, verifying that is works as expected.
[Regression Potential]
Very little. Scenarios where it takes more than 10s for a remote server to provide simplestreams with the data it requested are unlikely, but can't be fully excluded.
[Original Description]
= How to determine you are seeing this problem =
Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server?
Get a list of pids of your regiond processes:
$ ps -ef | grep regiond
Run strace on each one to see if one is stuck in a connect() or recv() call:
$ sudo strace -p $pid
recv(...
(normally you should see a lot of epoll_ctl() calls go by if not hung)
If one is hung, use lsof to see what it is connected to:
sudo lsof -i -a -p $pid
If you see an open connection to your images server, then this maybe your problem. sudo kill -9 of the hung pid will cause it to respawn and recover. |
|
2021-06-15 14:56:19 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~paride/ubuntu/+source/simplestreams/+git/simplestreams/+merge/404202 |
|
2021-06-15 15:14:35 |
Paride Legovini |
simplestreams (Ubuntu Focal): status |
Triaged |
In Progress |
|
2021-06-16 16:24:01 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~paride/ubuntu/+source/simplestreams/+git/simplestreams/+merge/404259 |
|
2021-06-16 16:24:57 |
Paride Legovini |
nominated for series |
|
Ubuntu Hirsute |
|
2021-06-16 16:24:57 |
Paride Legovini |
bug task added |
|
simplestreams (Ubuntu Hirsute) |
|
2021-06-16 16:24:57 |
Paride Legovini |
nominated for series |
|
Ubuntu Groovy |
|
2021-06-16 16:24:57 |
Paride Legovini |
bug task added |
|
simplestreams (Ubuntu Groovy) |
|
2021-06-16 16:25:04 |
Paride Legovini |
simplestreams (Ubuntu Groovy): assignee |
|
Paride Legovini (paride) |
|
2021-06-16 16:25:07 |
Paride Legovini |
simplestreams (Ubuntu Hirsute): assignee |
|
Paride Legovini (paride) |
|
2021-06-16 16:25:14 |
Paride Legovini |
simplestreams (Ubuntu Groovy): status |
New |
In Progress |
|
2021-06-16 16:25:18 |
Paride Legovini |
simplestreams (Ubuntu Hirsute): status |
New |
In Progress |
|
2021-06-16 16:30:10 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~paride/ubuntu/+source/simplestreams/+git/simplestreams/+merge/404261 |
|
2021-06-16 16:33:57 |
Paride Legovini |
simplestreams (Ubuntu): status |
Fix Released |
Confirmed |
|
2021-06-16 16:34:04 |
Paride Legovini |
simplestreams (Ubuntu): status |
Confirmed |
Fix Released |
|
2021-06-17 12:37:42 |
Paride Legovini |
simplestreams (Ubuntu Focal): status |
In Progress |
Fix Committed |
|
2021-06-17 12:37:47 |
Paride Legovini |
simplestreams (Ubuntu Groovy): status |
In Progress |
Fix Committed |
|
2021-06-17 12:37:50 |
Paride Legovini |
simplestreams (Ubuntu Hirsute): status |
In Progress |
Fix Committed |
|
2021-06-18 09:15:29 |
Timo Aaltonen |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2021-06-18 09:15:32 |
Timo Aaltonen |
bug |
|
|
added subscriber SRU Verification |
2021-06-18 09:15:38 |
Timo Aaltonen |
tags |
|
verification-needed verification-needed-hirsute |
|
2021-06-18 09:21:20 |
Timo Aaltonen |
tags |
verification-needed verification-needed-hirsute |
verification-needed verification-needed-groovy verification-needed-hirsute |
|
2021-06-22 16:11:27 |
dann frazier |
description |
[Impact]
The bug is about simplestreams possibly getting stuck waiting forever for an an HTTP response that never comes, e.g. because of networking issues. This can potentially affect any package depending on simplestreams, but specifically it was reported affecting MAAS, where it causes server deployments to timeout.
[Test Plan]
Ideally this should be tested by building a MAAS snap with the simplestreams package including the fix, verifying that is works as expected.
[Regression Potential]
Very little. Scenarios where it takes more than 10s for a remote server to provide simplestreams with the data it requested are unlikely, but can't be fully excluded.
[Original Description]
= How to determine you are seeing this problem =
Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server?
Get a list of pids of your regiond processes:
$ ps -ef | grep regiond
Run strace on each one to see if one is stuck in a connect() or recv() call:
$ sudo strace -p $pid
recv(...
(normally you should see a lot of epoll_ctl() calls go by if not hung)
If one is hung, use lsof to see what it is connected to:
sudo lsof -i -a -p $pid
If you see an open connection to your images server, then this maybe your problem. sudo kill -9 of the hung pid will cause it to respawn and recover. |
[Impact]
The bug is about simplestreams possibly getting stuck waiting forever for an an HTTP response that never comes, e.g. because of networking issues. This can potentially affect any package depending on simplestreams, but specifically it was reported affecting MAAS, where it causes server deployments to timeout.
[Test Plan]
Install an iptables rule to block SSL handshaking w/ the MAAS simplestreams repo:
-------------------------
$ sudo iptables -A INPUT -p tcp -s 91.189.88.136 -m string --string maas.io --algo bm -j DROP
-------------------------
Run the reproducer described below, and verify that it hangs indefinitely (I recommend waiting 60s):
-------------------------
$ cat repro.py
#!/usr/bin/env python3
from simplestreams.contentsource import RequestsUrlReader
url = "https://images.maas.io/ephemeral-v3/stable/streams/v1/index.sjson"
r = RequestsUrlReader(url)
-------------------------
With the fix applied, verify that it does timeout in ~10s.
[Regression Potential]
Scenarios where it takes more than 10s to initiate a connection are unlikely, but possible. Code that does not properly handle a timeout exception in these situations may begin to fail.
[Original Description]
= How to determine you are seeing this problem =
Does your MAAS server seem to get "hung up", where deployments suddenly start failing w/ lots of connection timeouts to the MAAS server?
Get a list of pids of your regiond processes:
$ ps -ef | grep regiond
Run strace on each one to see if one is stuck in a connect() or recv() call:
$ sudo strace -p $pid
recv(...
(normally you should see a lot of epoll_ctl() calls go by if not hung)
If one is hung, use lsof to see what it is connected to:
sudo lsof -i -a -p $pid
If you see an open connection to your images server, then this maybe your problem. sudo kill -9 of the hung pid will cause it to respawn and recover. |
|
2021-06-22 16:16:52 |
dann frazier |
tags |
verification-needed verification-needed-groovy verification-needed-hirsute |
verification-done-hirsute verification-needed verification-needed-groovy |
|
2021-06-22 16:25:15 |
dann frazier |
tags |
verification-done-hirsute verification-needed verification-needed-groovy |
verification-done-groovy verification-done-hirsute verification-needed |
|
2021-06-22 18:42:15 |
Brian Murray |
tags |
verification-done-groovy verification-done-hirsute verification-needed |
verification-done-groovy verification-done-hirsute verification-needed verification-needed-focal |
|
2021-06-22 19:53:39 |
dann frazier |
tags |
verification-done-groovy verification-done-hirsute verification-needed verification-needed-focal |
verification-done verification-done-focal verification-done-groovy verification-done-hirsute |
|
2021-06-29 17:21:32 |
dann frazier |
tags |
verification-done verification-done-focal verification-done-groovy verification-done-hirsute |
verification-done-groovy verification-done-hirsute verification-needed verification-needed-focal |
|
2021-06-29 17:26:06 |
Launchpad Janitor |
simplestreams (Ubuntu Hirsute): status |
Fix Committed |
Fix Released |
|
2021-06-29 17:26:11 |
Brian Murray |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2021-06-29 17:29:41 |
Launchpad Janitor |
simplestreams (Ubuntu Groovy): status |
Fix Committed |
Fix Released |
|
2021-06-29 18:04:00 |
dann frazier |
tags |
verification-done-groovy verification-done-hirsute verification-needed verification-needed-focal |
verification-done verification-done-focal verification-done-groovy verification-done-hirsute |
|
2021-06-29 21:12:09 |
Launchpad Janitor |
simplestreams (Ubuntu Focal): status |
Fix Committed |
Fix Released |
|
2022-07-01 08:42:31 |
Adam Collard |
maas: assignee |
Lee Trager (ltrager) |
|
|
2023-01-13 16:15:20 |
Taihsiang Ho |
bug |
|
|
added subscriber Taihsiang Ho |
2023-11-23 09:12:42 |
Adam Collard |
maas: status |
In Progress |
Fix Committed |
|
2023-11-23 09:12:47 |
Adam Collard |
maas: milestone |
|
3.4.0 |
|
2024-01-04 09:09:36 |
Adam Collard |
maas: milestone |
3.4.0 |
3.4.0-rc2 |
|
2024-01-04 09:16:39 |
Adam Collard |
maas: assignee |
|
Adam Collard (adam-collard) |
|
2024-01-05 09:47:14 |
Alberto Donato |
maas: status |
Fix Committed |
Fix Released |
|