[master][promotion] standalone job is failing during undercloud install with Server Error for url: https://192.168.24.2:13696/v2.0/network

Bug #1803703 reported by chandan kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Harald Jensås

Bug Description

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-standalone-master/0323c3f/logs/undercloud/home/zuul/undercloud_install.log.txt.gz#_2018-11-16_12_03_35 standalone promotion job is failing while deploying undercloud and giving HttpException: 504: Server Error for url: https://192.168.24.2:13696/v2.0/networks, <html><body><h1>504 Gateway Time-out</h1>",
2018-11-16 12:03:35 | "The server didn't respond in time.",
2018-11-16 12:03:35 | "</body></html>",

"[2018-11-16 12:01:32,148] (heat-config) [DEBUG] Running /var/lib/heat-config/heat-config-script/cef18d72-d11d-4026-9948-64f2fe22972f",
2018-11-16 12:03:35 | "[2018-11-16 12:03:34,621] (heat-config) [INFO] ERROR: Network create/update failed.",
2018-11-16 12:03:35 | "",
2018-11-16 12:03:35 | "[2018-11-16 12:03:34,622] (heat-config) [DEBUG] Traceback (most recent call last):",
2018-11-16 12:03:35 | " File \"/var/lib/heat-config/heat-config-script/cef18d72-d11d-4026-9948-64f2fe22972f\", line 275, in <module>",
2018-11-16 12:03:35 | " network = _ensure_neutron_network(sdk)",
2018-11-16 12:03:35 | " File \"/var/lib/heat-config/heat-config-script/cef18d72-d11d-4026-9948-64f2fe22972f\", line 53, in _ensure_neutron_network",
2018-11-16 12:03:35 | " mtu=CONF['mtu'])",
2018-11-16 12:03:35 | " File \"/usr/lib/python2.7/site-packages/openstack/network/v2/_proxy.py\", line 1087, in create_network",
2018-11-16 12:03:35 | " return self._create(_network.Network, **attrs)",
2018-11-16 12:03:35 | " File \"/usr/lib/python2.7/site-packages/openstack/proxy.py\", line 192, in _create",
2018-11-16 12:03:35 | " return res.create(self)",
2018-11-16 12:03:35 | " File \"/usr/lib/python2.7/site-packages/openstack/resource.py\", line 763, in create",
2018-11-16 12:03:35 | " self._translate_response(response)",
2018-11-16 12:03:35 | " File \"/usr/lib/python2.7/site-packages/openstack/resource.py\", line 695, in _translate_response",
2018-11-16 12:03:35 | " exceptions.raise_from_response(response, error_message=error_message)",
2018-11-16 12:03:35 | " File \"/usr/lib/python2.7/site-packages/openstack/exceptions.py\", line 212, in raise_from_response",
2018-11-16 12:03:35 | " http_status=http_status, request_id=request_id",
2018-11-16 12:03:35 | "openstack.exceptions.HttpException: HttpException: 504: Server Error for url: https://192.168.24.2:13696/v2.0/networks, <html><body><h1>504 Gateway Time-out</h1>",
2018-11-16 12:03:35 | "The server didn't respond in time.",

There might be some issue with HAProxy. This job has runned for the first time today.

summary: - [master] master standalone job is failing during undercloud install with
- Server Error for url: https://192.168.24.2:13696/v2.0/network
+ [master][promotion] standalone job is failing during undercloud install
+ with Server Error for url: https://192.168.24.2:13696/v2.0/network
Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

So the 504 is probably returned by haproxy directly, meaning the service behind it is down or not answering for some reason.

We can see in netstat haproxy is listening on that port (13696) at least:
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-standalone-master/0323c3f/logs/undercloud/var/log/extra/netstat.txt.gz

Revision history for this message
Harald Jensås (harald-jensas) wrote :
Download full text (4.9 KiB)

Long story below.
Short story: Rabbit MQ is not running

## The first command in UndercloudCtlplaneNetworkDeployment would be [1], GET networks?name=ctlplane can be seen in the neutron log.

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-standalone-master/0323c3f/logs/undercloud/var/log/containers/neutron/server.log.txt.gz#_2018-11-16_12_01_34_564

2018-11-16 12:01:34.564 31 INFO neutron.wsgi [req-452824ab-2c8a-4a91-a627-0204a057c95b 4f57adfa456741acb057996b874bfd3b 1a495f2157094df0aed147f2115de58b - default default] 192.168.24.1 "GET /v2.0/networks?name=ctlplane HTTP/1.1" status: 200 len: 189 time: 1.0968869

Then not much happening in the neutron logs, and we the next thing in UndercloudCtlplaneNetworkDeployment POST command[2] gets the 504.

## The 504 ::

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-standalone-master/0323c3f/logs/undercloud/var/log/journal.txt.gz#_Nov_16_12_03_34

Nov 16 12:03:34 upstream-centos-7-rdo-cloud-tripleo-0000299802.tripleodomain.exa haproxy[42025]: 192.168.24.2:49982 [16/Nov/2018:12:01:34.564] neutron~ neutron/upstream-centos-7-rdo-cloud-tripleo-0000299802.internalapi.localdomain 3/0/0/-1/120004 504 194 - - sH-- 53/0/0/0/0 0/0 "POST /v2.0/networks HTTP/1.1"

At the same time Neutrons seems to be up, and responding

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-standalone-master/0323c3f/logs/undercloud/var/log/containers/neutron/server.log.txt.gz#_2018-11-16_12_03_35_231

What I find strange is what is not happening in neutron logs right after it starts:
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-standalone-master/0323c3f/logs/undercloud/var/log/containers/neutron/server.log.txt.gz

Ususally we get some RPC related stuff such as:

  neutron.services.l3_router.l3_router_plugin [-] neutron.services.l3_router.l3_router_plugin.L3RouterPlugin method start_rpc_listeners
  neutron.api.rpc.handlers.dhcp_rpc [req-49a9edd9-d581-4806-93e2-a6a872d50f4c - - - - -] get_active_networks_info from undercloud.localdomain

So, RPC ... Where is the rabbit mq logs?
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-standalone-master/0323c3f/logs/undercloud/var/log/containers/

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-standalone-master/0323c3f/logs/undercloud/var/log/extra/docker/containers/rabbitmq/stdout.log.txt.gz

[<0.32.0>,<0.31.0>]
[]
true
running
987
27
192
    initial_call: pid: registered_name: error_info: ancestors: messages: links: dictionary: trap_exit: status: heap_size: stack_size: reductions: 2018-11-16 11:57:21 std_info
kernel
{{shutdown,{failed_to_start_child,inet_db,{'EXIT',{function_clause,[{inet_config,set_hostname,[{error,enametoolong}],[{file,"inet_config.erl"},{line,228}...

Read more...

Revision history for this message
Harald Jensås (harald-jensas) wrote :
Revision history for this message
Harald Jensås (harald-jensas) wrote :

Or ... we make the hostname shorter:
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-standalone-master/0323c3f/logs/undercloud/etc/hostname.txt.gz

We are trying to use a hostname longer than what is supported by linux here?

GETHOSTNAME(2)

  On Linux, HOST_NAME_MAX is defined with the value 64, which has been the limit since Linux 1.0 (earlier kernels imposed a limit of 8 bytes).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)

Fix proposed to branch: master
Review: https://review.openstack.org/618588

Changed in tripleo:
assignee: nobody → Harald Jensås (harald-jensas)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/618773

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart (master)

Reviewed: https://review.openstack.org/618773
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart/commit/?id=4453efa2f8925821fd65dfb45e2e3ac64bbd6665
Submitter: Zuul
Branch: master

commit 4453efa2f8925821fd65dfb45e2e3ac64bbd6665
Author: Harald Jensås <email address hidden>
Date: Mon Nov 19 17:00:17 2018 +0100

    Use a shorter default domain name

    Some of the CI jobs use very long <hostname>'s,
    resulting in the FQDN to be more than 63 characters
    long which is the max for linux, see GETHOSTNAME(2).

    Change the default to use: <hostname>.ooo.test instead.

    Related-Bug: #1803703
    Change-Id: Ia8f758a094d36c3f3bec383f1d3a8e1a5d4bd052

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/618588
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=a31f99546aa512b50b82be646030512db4efa685
Submitter: Zuul
Branch: master

commit a31f99546aa512b50b82be646030512db4efa685
Author: Harald Jensås <email address hidden>
Date: Fri Nov 16 20:38:31 2018 +0100

    Use a shorter default domain name

    Since https://review.openstack.org/615730 we default
    to <hostname>.tripleodomain.example.com if hostname is
    not set.

    Some of the CI jobs use very long <hostname>'s, resulting
    in the FQDN to be more than 63 characters long which is
    the max for linux, see GETHOSTNAME(2).

    Change the default to use: <hostname>.ooo.test instead.

    Also, yamllint complain about indentation.

    Closes-Bug: #1803703
    Change-Id: If041b1c6e1da1d89d66ffafbbc6ab6c33bd80801

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-quickstart-extras 2.1.1

This issue was fixed in the openstack/tripleo-quickstart-extras 2.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.