devstack timeout because g-api takes longer than 60 seconds to start

Bug #1806912 reported by Matt Riedemann
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack-Gate
Fix Released
Undecided
Unassigned
devstack
Fix Released
Undecided
Unassigned

Bug Description

For example:

http://logs.openstack.org/60/620660/1/check/heat-functional-orig-mysql-lbaasv2/bc7ef21/logs/devstacklog.txt#_2018-11-30_20_14_13_418

2018-11-30 20:14:13.418 | + lib/glance:start_glance:353 : die 353 'g-api did not start'

The g-api logs show it took 62 seconds to start:

http://logs.openstack.org/60/620660/1/check/heat-functional-orig-mysql-lbaasv2/bc7ef21/logs/screen-g-api.txt.gz#_Nov_30_20_14_12_852280

Nov 30 20:14:12.852280 ubuntu-xenial-ovh-bhs1-0000840057 <email address hidden>[7937]: WSGI app 0 (mountpoint='') ready in 62 seconds on interpreter 0xb23d10 pid: 7942 (default app)

Looks like this primarily happens on ovh-bhs1 nodes.

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22-api%20did%20not%20start%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d

Matt Riedemann (mriedem)
Changed in openstack-gate:
status: New → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

Now that bug 1807219 and bug 1807044 are closed in nova, it looks like the hits for this now are coming from glance-api.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looking at one of the failures, g-api starts in ~3 seconds:

http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/screen-g-api.txt.gz

Dec 17 18:37:34.594832 ubuntu-bionic-rax-dfw-0001240665 <email address hidden>[1909]: WSGI app 0 (mountpoint='') ready in 3 seconds on interpreter 0x562a8bbd57c0 pid: 1910 (default app)

Dec 17 18:37:34.620388 ubuntu-bionic-rax-dfw-0001240665 <email address hidden>[1909]: WSGI app 0 (mountpoint='') ready in 3 seconds on interpreter 0x562a8bbd57c0 pid: 1911 (default app)

but it looks like GET /images requests are returning a 503;

2018-12-17 20:34:07.843 | ++ :: : curl -g -k --noproxy '*' -s -o /dev/null -w '%{http_code}' https://10.209.34.117/image
2018-12-17 20:34:07.874 | + :: : [[ 503 == 503 ]]

I don't see anything wrong in the glance logs, but I do see proxy errors in the apache logs:

http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/apache/error_log.txt.gz

[Mon Dec 17 20:33:08.847250 2018] [proxy:error] [pid 8631:tid 140513730557696] (111)Connection refused: AH00957: HTTP: attempt to connect to 127.0.0.1:60998 (127.0.0.1) failed
[Mon Dec 17 20:33:08.847344 2018] [proxy_http:error] [pid 8631:tid 140513730557696] [client 10.209.34.117:51988] AH01114: HTTP: failed to make connection to backend: 127.0.0.1

And in the access log:

http://logs.openstack.org/55/625555/1/check/tempest-full/60bd495/controller/logs/apache/access_log.txt.gz

10.209.34.117 - - [17/Dec/2018:20:33:15 +0000] "GET /image HTTP/1.1" 503 568 "-" "curl/7.58.0"
10.209.34.117 - - [17/Dec/2018:20:33:16 +0000] "GET /image HTTP/1.1" 503 568 "-" "curl/7.58.0"
10.209.34.117 - - [17/Dec/2018:20:33:17 +0000] "GET /image HTTP/1.1" 503 568 "-" "curl/7.58.0"
10.209.34.117 - - [17/Dec/2018:20:33:18 +0000] "GET /image HTTP/1.1" 503 568 "-" "curl/7.58.0"

summary: - devstack timeout because n-api/g-api takes longer than 60 seconds to
- start
+ devstack timeout because g-api takes longer than 60 seconds to start
Revision history for this message
Matt Riedemann (mriedem) wrote :

g-api seems to be dead before devstack starts hitting it, and looking at the rabbit logs there are errors about the time the g-api logs stop:

=ERROR REPORT==== 17-Dec-2018::18:42:10 ===
closing AMQP connection <0.1344.0> (10.209.34.117:41178 -> 10.209.34.117:5672 - uwsgi:9448:b8040f63-7cc7-4025-9957-3626a7000014):
missed heartbeats from client, timeout: 60s

Revision history for this message
Matt Riedemann (mriedem) wrote :

In this case, something very weird is happening because g-api is started twice:

Dec 17 18:37:31 ubuntu-bionic-rax-dfw-0001240665 sudo[1907]: stack : TTY=unknown ; PWD=/opt/stack/devstack ; USER=root ; COMMAND=/bin/systemctl start <email address hidden>

Dec 17 20:33:08 ubuntu-bionic-rax-dfw-0001240665 sudo[9402]: stack : TTY=unknown ; PWD=/opt/stack/devstack ; USER=root ; COMMAND=/bin/systemctl start <email address hidden>

Changed in glance:
status: New → Incomplete
Erno Kuvaja (jokke)
affects: glance → devstack
Revision history for this message
Matt Riedemann (mriedem) wrote :

Looking at logstash today for the past 10 days this hits predominantly in vexxhost-sjc1 nodes, but I'm not sure how that might be related.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like it could be a problem with misconfigured ports:

http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/apache_config/glance-wsgi-api_conf.txt.gz

Shows:

ProxyPass "/image" "http://127.0.0.1:60998" retry=0

but glance-api shows:

http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/screen-g-api.txt.gz#_Apr_23_04_58_17_713760

Apr 23 04:58:11.635308 ubuntu-bionic-vexxhost-sjc1-0005468291 <email address hidden>[23037]: uwsgi socket 0 bound to TCP address 127.0.0.1:60999 fd 4

http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/etc/glance/glance-uwsgi.ini.gz also shows:

http-socket = 127.0.0.1:60998

The wait_for_service function isn't using proxies and just gets 503 back.

It looks like write_local_uwsgi_http_config gets a random port:

https://github.com/openstack/devstack/blob/master/lib/apache#L312

port=$(get_random_port)

echo "ProxyPass \"${url}\" \"http://127.0.0.1:$port\" retry=0 " | sudo tee -a $apache_conf

This is the code that gets the port:

# Get a random port from the local port range
#
# This function returns an available port in the local port range. The search
# order is not truly random, but should be considered a random value by the
# user because it depends on the state of your local system.
function get_random_port {
    read lower_port upper_port < /proc/sys/net/ipv4/ip_local_port_range
    while true; do
        for (( port = upper_port ; port >= lower_port ; port-- )); do
            sudo lsof -i ":$port" &> /dev/null
            if [[ $? > 0 ]] ; then
                break 2
            fi
        done
    done
    echo $port
}

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Manula Chathurika Thantriwatte (manulachathurika) wrote :

I'm also getting the same issue. I'm installing from stable/stein branch in Ubuntu 18.04. Could you please help me on how to resolve this issue.

Revision history for this message
LIU Yulong (dragon889) wrote :

Hit this issue:

/home/yulong/github/devstack/lib/glance:402:die
[ERROR] /home/yulong/github/devstack/lib/glance:402 g-api did not start

# cat glance-wsgi-api.conf
KeepAlive Off
SetEnv proxy-sendchunked 1
ProxyPass "/image" "http://127.0.0.1:60998" retry=0

$ netstat -natpl|grep -i glance
tcp 0 0 127.0.0.1:60999 0.0.0.0:* LISTEN 18820/glance-apiuWS

Changed in devstack:
status: Incomplete → Confirmed
Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

AFAICT the original issue was resolved. If you still see something similar, please create a new bug report with fresh logs.

Changed in devstack:
status: Confirmed → Fix Released
Changed in openstack-gate:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.