[swarm][horizon][murano] system_test.ubuntu.services_ha failed because murano-dashboard couldn't installed properly

Bug #1457893 reported by Victor Ryzhenkin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Dmitry Ilyin

Bug Description

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  openstack_version: "2014.2.2-6.1"
  api: "1.0"
  build_number: "446"
  build_id: "2015-05-21_04-04-09"
  nailgun_sha: "403c6b7ea3c62bb4fda27eb9cedee37f7144558c"
  python-fuelclient_sha: "e19f1b65792f84c4a18b5a9473f85ef3ba172fce"
  astute_sha: "795f8a045400fe82ccc30ae018e85324b3fa1de5"
  fuel-library_sha: "a03efb582b06bfe8d9776dce244d4a2f2e2ba886"
  fuel-ostf_sha: "3dd25a018f2a5c47ec6c885436b3ba69690ef1b9"
  fuelmain_sha: "5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93"

The problem was encountered in system tests with Ubuntu Services HA.
But don't reproduced on hardware.

The first error in puppet is:
2015-05-22 02:03:40 ERR

2015-05-22 02:03:40 ERR E: Sub-process /usr/bin/dpkg returned an error code (1)
2015-05-22 02:03:40 ERR murano-dashboard
2015-05-22 02:03:40 ERR Errors were encountered while processing:
2015-05-22 02:03:40 ERR subprocess installed post-installation script returned error exit status 1
2015-05-22 02:03:40 ERR dpkg: error processing package murano-dashboard (--configure):
2015-05-22 02:03:40 ERR * The apache2 instance did not start within 20 seconds. Please read the log files to discover problems
2015-05-22 02:03:40 ERR ...fail!
2015-05-22 02:03:40 ERR The Apache error log may have more information.
2015-05-22 02:03:40 ERR Action 'start' failed.
2015-05-22 02:03:40 ERR AH00015: Unable to open logs
2015-05-22 02:03:40 ERR no listening sockets available, shutting down
2015-05-22 02:03:40 ERR (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:35357
2015-05-22 02:03:40 ERR (98)Address already in use: AH00072: make_sock: could not bind to address [::]:35357
2015-05-22 02:03:40 ERR for a maximum of 23 servers.
2015-05-22 02:03:40 ERR ThreadsPerChild of 25, decreasing to nearest multiple 575,
2015-05-22 02:03:40 ERR AH00316: WARNING: MaxRequestWorkers of 596 is not an integer multiple of
2015-05-22 02:03:40 ERR AH00548: NameVirtualHost has no effect and will be removed in the next release /etc/apache2/ports.conf:10
2015-05-22 02:03:40 ERR * Restarting web server apache2
2015-05-22 02:03:40 ERR Setting up murano-dashboard (2014.2.2-1~u14.04+mos28) ...
2015-05-22 02:03:40 ERR Setting up python-django-floppyforms (1.1.1-1) ...
2015-05-22 02:03:40 ERR Unpacking murano-dashboard (2014.2.2-1~u14.04+mos28) ...
2015-05-22 02:03:40 ERR Preparing to unpack .../murano-dashboard_2014.2.2-1~u14.04+mos28_all.deb ...
2015-05-22 02:03:40 ERR Selecting previously unselected package murano-dashboard.
2015-05-22 02:03:40 ERR Unpacking python-django-floppyforms (1.1.1-1) ...
2015-05-22 02:03:40 ERR Preparing to unpack .../python-django-floppyforms_1.1.1-1_all.deb ...
2015-05-22 02:03:40 ERR (Reading database ... 94266 files and directories currently installed.)
2015-05-22 02:03:40 ERR Selecting previously unselected package python-django-floppyforms.
2015-05-22 02:03:40 ERR Fetched 189 kB in 0s (3637 kB/s)
2015-05-22 02:03:40 ERR Get:2 http://mirror.seed-cz1.fuel-infra.org/pkgs/ubuntu-2015-05-20-073127/ trusty/universe python-django-floppyforms all 1.1.1-1 [38.0 kB]
2015-05-22 02:03:40 ERR Get:1 http://10.109.5.2:8080/2014.2.2-6.1/ubuntu/x86_64/ mos6.1/main murano-dashboard all 2014.2.2-1~u14.04+mos28 [151 kB]
2015-05-22 02:03:40 ERR Authentication warning overridden.
2015-05-22 02:03:40 ERR murano-dashboard
2015-05-22 02:03:40 ERR WARNING: The following packages cannot be authenticated!
2015-05-22 02:03:40 ERR After this operation, 852 kB of additional disk space will be used.
2015-05-22 02:03:40 ERR Need to get 189 kB of archives.
2015-05-22 02:03:40 ERR 0 upgraded, 2 newly installed, 0 to remove and 63 not upgraded.
2015-05-22 02:03:40 ERR murano-dashboard python-django-floppyforms
2015-05-22 02:03:40 ERR The following NEW packages will be installed:
2015-05-22 02:03:40 ERR python-django-floppyforms
2015-05-22 02:03:40 ERR The following extra packages will be installed:
2015-05-22 02:03:40 ERR Use 'apt-get autoremove' to remove them.
2015-05-22 02:03:40 ERR unattended-upgrades
2015-05-22 02:03:40 ERR python3-pycurl python3-software-properties software-properties-common
2015-05-22 02:03:40 ERR cloud-guest-utils eatmydata python-configobj python-oauth python-serial
2015-05-22 02:03:40 ERR The following packages were automatically installed and are no longer required:
2015-05-22 02:03:40 ERR Reading state information...
2015-05-22 02:03:40 ERR Building dependency tree...
2015-05-22 02:03:40 ERR Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install murano-dashboard' returned 100: Reading package lists...

Diagnostic snapshot attached.

Tags: horizon murano
Revision history for this message
Victor Ryzhenkin (vryzhenkin) wrote :
Andrey Maximov (maximov)
Changed in fuel:
assignee: nobody → Fuel QA Team (fuel-qa)
Revision history for this message
Mike Scherbakov (mihgen) wrote :

According logs, looks like connectivity issue so packages can't be installed. Please perform further troubleshooting in order to find correct assignee.

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → MOS Murano (mos-murano)
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

looks like the same https://bugs.launchpad.net/fuel/+bug/1455787 that marked as duplicate https://bugs.launchpad.net/fuel/+bug/1455389, so seems we need to check if fix was included..

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

The problem has nothing to do with connectivity. The package installation fails due to invalid apache configuration:

2015-05-22 02:03:40 ERR (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:35357
2015-05-22 02:03:40 ERR (98)Address already in use: AH00072: make_sock: could not bind to address [::]:35357

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

By the way, why apache configuration is not included into logs?

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :
Changed in fuel:
status: Confirmed → Incomplete
status: Incomplete → Fix Released
Changed in fuel:
assignee: MOS Murano (mos-murano) → nobody
Changed in fuel:
status: Fix Released → Confirmed
Revision history for this message
Victor Ryzhenkin (vryzhenkin) wrote :

Encountered again:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  openstack_version: "2014.2.2-6.1"
  api: "1.0"
  build_number: "462"
  build_id: "2015-05-24_15-51-50"
  nailgun_sha: "76441596e4fe6420cc7819427662fa244e150177"
  python-fuelclient_sha: "e19f1b65792f84c4a18b5a9473f85ef3ba172fce"
  astute_sha: "0bd72c72369e743376864e8e8dabfe873d40450a"
  fuel-library_sha: "889c2534ceadf8afd5d1540c1cadbd913c0c8c14"
  fuel-ostf_sha: "9a5f55602c260d6c840c8333d8f32ec8cfa65c1f"
  fuelmain_sha: "5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93"

Changed in fuel:
assignee: nobody → MOS Murano (mos-murano)
summary: - [swarm][murano] system_test.ubuntu.services_ha failed because murano-
- dashboard couldn't installed properly
+ [swarm][horizon][murano] system_test.ubuntu.services_ha failed because
+ murano-dashboard couldn't installed properly
tags: added: horizon
Changed in fuel:
status: Confirmed → In Progress
Changed in fuel:
assignee: MOS Murano (mos-murano) → Vladimir Kuklin (vkuklin)
Changed in fuel:
status: In Progress → Triaged
Revision history for this message
Kirill Zaitsev (kzaitsev) wrote :

Looks like this can be happening because /etc/init.d/apache2 restart can sometimes fail because it restarts apache too fast for it to release the socket.

I've been able to reproduce it with the following script (although it doesn't happen every time):

for i in $(seq 100); do /etc/init.d/apache2 restart; if [ ! $? -eq 0 ]; then break; fi; done

Revision history for this message
Kirill Zaitsev (kzaitsev) wrote :

I suggest we reload apache instead of restarting it, after installing murano-dashboard, if that is possible.

Also. It looks like a generic issue, not murano-related. =)

Revision history for this message
Igor Yozhikov (iyozhikov) wrote :
Download full text (9.8 KiB)

What was found from logs:
* Apache configured to serve keystone
* murano-dashboad has been installed successfully on node-1 and node-2. Installation failed only on node-3 due to network/sockets problems with apache service. Full error we can see at bug description.

node-1:
2015-05-22 01:37:46 +0000 /Stage[main]/Murano::Dashboard/Package[murano_dashboard] (info): Starting to evaluate the resource
2015-05-22 01:37:46 +0000 Puppet (debug): Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n' murano-dashboard'
2015-05-22 01:37:46 +0000 Puppet (debug): Executing '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install murano-dashboard'
2015-05-22 01:37:49 +0000 /Stage[main]/Murano::Dashboard/Package[murano_dashboard]/ensure (notice): ensure changed 'purged' to 'present'
2015-05-22 01:37:49 +0000 /Stage[main]/Murano::Dashboard/Package[murano_dashboard] (debug): The container Class[Murano::Dashboard] will propagate my refresh event
2015-05-22 01:37:49 +0000 /Stage[main]/Murano::Dashboard/Package[murano_dashboard] (info): Evaluated in 2.56 seconds

node-2:
2015-05-22 02:03:34 +0000 /Stage[main]/Murano::Dashboard/Package[murano_dashboard] (info): Starting to evaluate the resource
2015-05-22 02:03:34 +0000 Puppet (debug): Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n' murano-dashboard'
2015-05-22 02:03:34 +0000 Puppet (debug): Executing '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install murano-dashboard'
2015-05-22 02:03:37 +0000 /Stage[main]/Murano::Dashboard/Package[murano_dashboard]/ensure (notice): ensure changed 'purged' to 'present'
2015-05-22 02:03:37 +0000 /Stage[main]/Murano::Dashboard/Package[murano_dashboard] (debug): The container Class[Murano::Dashboard] will propagate my refresh event
2015-05-22 02:03:37 +0000 /Stage[main]/Murano::Dashboard/Package[murano_dashboard] (info): Evaluated in 2.76 seconds

node-3:
015-05-22 02:03:18 +0000 /Stage[main]/Murano::Dashboard/Package[murano_dashboard] (info): Starting to evaluate the resource
2015-05-22 02:03:18 +0000 Puppet (debug): Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n' murano-dashboard'
2015-05-22 02:03:18 +0000 Puppet (debug): Executing '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install murano-dashboard'
2015-05-22 02:03:40 +0000 Puppet (err): Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install murano-dashboard' returned 100: Reading package lists...

* There are connectivity errors during attempts to establish connections to keystone service from node-3 to vip:35357 :
2015-05-22 02:04:12 +0000 /Stage[main]/Murano::Keystone/Keystone_user[murano] (err): Could not evaluate: Execution of '/usr/bin/keystone --os-endpoint http://10.109.7.6:35357/v2.0/ user-list' returned 1: Unable to establish connection to http://10.109.7.6:35357/v2.0/users
2015-05-22 02:04:23 +0000 /Stage[main]/Murano::Keystone/Keystone_service[murano] (err): Could not evaluate: Execution of '/usr/bin/keystone --os-endpoint http://10.109.7.6:35357/v2.0/ service-list' returned 1: Unable to establish connection to http://10.109.7.6:35357/v2.0/OS-KSADM/servic...

Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Oleksiy Molchanov (omolchanov)
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Kirill, we cannot change restart to reload because we are using upstream modules with hard-coded behavior (that means restart for apache).

Perhaps the problem is with apache package by itself and we should transfer it to MOS team. Does it happen on CentOS?

Revision history for this message
Victor Ryzhenkin (vryzhenkin) wrote :

Oleksiy, Nope, we have never encountered this issue on CentOS.

Revision history for this message
Kirill Zaitsev (kzaitsev) wrote :

> Perhaps the problem is with apache package by itself and we should transfer it to MOS team. Does it happen on CentOS?

The problem is with the way apache is restarted. The process is just killed and started. You can reproduce the behaviour with my script from my previous commit. Sooner or later you would hit the error.

I suppose that if you're not lucky enough — someone would be writing or reading from that socket the moment apache is being killed. Or the socked would be in CLOSE_WAIT, LAST_ACK or maybe SYN_RCVD state (I mean any intermediate state). In that case I suppose(hard to check that actually) that the kernel would not just destroy the socket, but would wait for it to send it's final pieces of data. apache process would probably also be alive at this moment.

What can be done: 1) Changing restart to reload (not an option afaiu)
2) Changing restart scripts behaviour: add sleep or even a check, that apache did died after stop() part of the restart function
3) Retrying the restart in case it failed. This is roughly equivalent to sleep option.

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> What can be done: 1) Changing restart to reload (not an option afaiu)

What about SO_REUSADDR?

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Alexei, AFAIK, it is already used by apache httpd, or it's use is broken somehow

Changed in fuel:
assignee: Oleksiy Molchanov (omolchanov) → Vladimir Kuklin (vkuklin)
Revision history for this message
Andrey Maximov (maximov) wrote :

Vova, did you try 'fuser -k -n tcp 80' before starting apache ?

Revision history for this message
Andrey Maximov (maximov) wrote :

regarding SO_REUSADDR
"
BSD spec:
If SO_REUSEADDR is enabled on a socket prior to binding it, the socket can be successfully bound unless there is a conflict with another socket bound to exactly the same combination of source address and port. Now you may wonder how is that any different than before? The keyword is "exactly". SO_REUSEADDR mainly changes the way how wildcard addresses ("any IP address") are treated when searching for conflicts.
Without SO_REUSEADDR, binding socketA to 0.0.0.0:21 and then binding socketB to 192.168.0.1:21 will fail (with error EADDRINUSE), since 0.0.0.0 means "any local IP address", thus all local IP addresses are considered in use by this socket and this includes 192.168.0.1, too. With SO_REUSEADDR it will succeed, since 0.0.0.0 and 192.168.0.1 are not exactly the same address, one is a wildcard for all local addresses and the other one is a very specific local address. Note that the statement above is true regardless in which order socketA and socketB are bound; without SO_REUSEADDR it will always fail, with SO_REUSEADDR it will always succeed.
Linux exception:
..SO_REUSEADDR option behaves generally the as in BSD with two important exceptions. One exception is that a if a listening (server) TCP socket is already bound to a wildcard IP address and a specific port, no other TCP socket can be bound to the same port, regardless whether either one or both sockets have this flag set. Not even if it would use a more specific address (as is allowed in case of BSD)..."
so SO_REUSEADDR won't help us at all.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix proposed to openstack-build/murano-dashboard-build (openstack-ci/fuel-6.1/2014.2)

Related fix proposed to branch: openstack-ci/fuel-6.1/2014.2
Change author: Igor Yozhikov <email address hidden>
Review: https://review.fuel-infra.org/7070

Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Dmitry Ilyin (idv1985)
status: Triaged → In Progress
Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → Vladimir Kuklin (vkuklin)
Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Sergii Golovatiuk (sgolovatiuk)
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix merged to openstack-build/murano-dashboard-build (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/7070
Submitter: Igor Yozhikov <email address hidden>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: ca672d25fc6a746a418e8ab383653610d44798a1
Author: Igor Yozhikov <email address hidden>
Date: Wed May 27 10:53:45 2015

Add double restart of apache in postinst

Change-Id: Ic2b7b766a9f95485aba1b87b265cbecf371c18c4
Related-Bug: #1457893

Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Dmitry Ilyin (idv1985)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/185923
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=3119253955ac49b0ff9c49b84b56206f621e5524
Submitter: Jenkins
Branch: master

commit 3119253955ac49b0ff9c49b84b56206f621e5524
Author: Vladimir Kuklin <email address hidden>
Date: Wed May 27 21:45:12 2015 +0300

    Create a wrapper to apache start and stop

    Stop/Start apache service, on failure repeat operation after timeout

    Closes-bug: #1457893
    Related-Bug: #1459357
    Change-Id: I906139554c7cf770ffdac29f95d3d97f29b87f43
    Signed-off-by: Sergii Golovatiuk <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Murano SWARM tests are green now, bug marked as verified.

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/209924
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=ff2274e44ab732d943976d1b21c8997dc90a7d94
Submitter: Jenkins
Branch: master

commit ff2274e44ab732d943976d1b21c8997dc90a7d94
Author: vsaienko <email address hidden>
Date: Fri Aug 14 11:45:12 2015 +0300

    Tune tweak::apache_wrappers module

    - Sometimes apache fails to start after stop, due to unclosed
      resources. The problem frequently reproduced with keystone wsgi
      module, and didn't reproduced with horizon or radosgw.
      'apachectl restart' is recommended if doing start/stop rapidly
      https://wiki.apache.org/httpd/CouldNotBindToAddress
    - Redefine restart => 'apachectl graceful' for apache service
    - Remove disabling of GarbageCollector

    Related-Bug: #1472675
    Related-Bug: #1484066
    Related-Bug: #1457893
    Related-Bug: #1459357

    Change-Id: I34843639eacc9bcb6d451d3376440c8bfe9014f7

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.