[swarm] Unexpected exception in API method during instance launching

Bug #1612215 reported by Andrey Lavrentyev
This bug report is a duplicate of:  Bug #1623394: [gates] OOM failures on CI. Edit Remove
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Fuel QA Team
Mitaka
Confirmed
High
Fuel QA Team
Newton
Confirmed
High
Fuel QA Team

Bug Description

Detailed bug description:
Unexpected exception in API method during instance launching which leads to OSTF failure:
AssertionError: Failed 2 OSTF tests; should fail 0 tests. Names of failed tests:
  - Launch instance, create snapshot, launch instance from snapshot (error)
  - Create user and authenticate with it. (failure) Authorization failure. Please provide the valid credentials for your OpenStack environment, and reattempt.

Various exceptions can be found in logs:
HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.

Some piece of log: http://paste.openstack.org/show/554118/

Swarm failure: https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.multirole/24/testReport/%28root%29/deploy_multiple_services_local_mirror/

Steps to reproduce:
Execute 'deploy_multiple_services_local_mirror' test with the following steps:
1. Revert snapshot 'prepare_slaves_5' with default set of mirrors
2. Run 'fuel-mirror' to create mirror repositories
3. Create cluster with many components to check as many packages in local mirrors have correct dependencies
4. Run 'fuel-mirror' to replace cluster repositories with local mirrors
5. Check that repositories are changed
6. Deploy cluster
7. Check running services with OSTF

Expected results:
OSTF tests pass

Actual result:
OSTF test failure with inability to launch an instance

Impact:
Swarm failure

Description of the environment:
9.1 snapshot #119
[root@nailgun remote]# shotgun2 short-report
cat /etc/fuel_build_id:
 495
cat /etc/fuel_build_number:
 495
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6349.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8750.noarch
 fuel-bootstrap-cli-9.0.0-1.mos285.noarch
 fuel-notify-9.0.0-1.mos8498.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-9.0.0-1.mos6349.noarch
 python-fuelclient-9.0.0-1.mos326.noarch
 python-packetary-9.0.0-1.mos142.noarch
 fuel-openstack-metadata-9.0.0-1.mos8750.noarch
 fuel-library9.0-9.0.0-1.mos8498.noarch
 fuelmenu-9.0.0-1.mos275.noarch
 nailgun-mcagents-9.0.0-1.mos754.noarch
 fuel-setup-9.0.0-1.mos6349.noarch
 network-checker-9.0.0-1.mos74.x86_64
 fuel-agent-9.0.0-1.mos285.noarch
 fuel-nailgun-9.0.0-1.mos8750.noarch
 rubygem-astute-9.0.0-1.mos754.noarch
 fuel-ui-9.0.0-1.mos2718.noarch
 fuel-ostf-9.0.0-1.mos939.noarch
 fuel-utils-9.0.0-1.mos8498.noarch
 fuel-misc-9.0.0-1.mos8498.noarch
 fuel-mirror-9.0.0-1.mos142.noarch
 fuel-migrate-9.0.0-1.mos8498.noarch

MOS_CENTOS_OS_MIRROR_ID: os-2016-06-23-135731
MOS_CENTOS_PROPOSED_MIRROR_ID: proposed-2016-08-10-164321
MOS_CENTOS_UPDATES_MIRROR_ID: updates-2016-06-23-135916
MOS_CENTOS_SECURITY_MIRROR_ID: security-2016-06-23-140002
MOS_CENTOS_HOLDBACK_MIRROR_ID: holdback-2016-06-23-140047
MOS_UBUNTU_MIRROR_ID: 9.0-2016-08-10-160322
UBUNTU_MIRROR_ID: ubuntu-2016-08-03-174238
CENTOS_MIRROR_ID: centos-7.2.1511-2016-05-31-083834

Logs: https://drive.google.com/open?id=0B5HPBFb7K7gXbWk0N3g3Y0pQUDQ

Tags: swarm-fail
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Nova team, please take a look.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Oleksiy,

I checked the logs and see that periodic status updates failed with DBConnectionError

2016-08-11 05:06:00.783 10087 ERROR nova.servicegroup.drivers.db DBConnectionError: (_mysql_exceptions.OperationalError) (2013, "Lost connectio
n to MySQL server at 'reading initial communication packet', system error: 0")

which basically means that Galera was not operational at the moment, which is confirmed by the logs of OCF script:

<27>Aug 11 05:04:09 node-4 ocf-mysql-wss: ERROR: p_mysqld: mysql_status(): MySQL is not running
<27>Aug 11 05:04:10 node-4 ocf-mysql-wss: ERROR: p_mysqld: proc_kill(): cannot find any processes matching the mysqld.*/var/lib/mysql!
<27>Aug 11 05:04:10 node-4 ocf-mysql-wss: ERROR: p_mysqld: proc_kill(): cannot find any processes matching the mysqld.*/var/lib/mysql!
<27>Aug 11 05:04:10 node-4 ocf-mysql-wss: ERROR: p_mysqld: proc_stop(): ERROR: could not stop mysqld.*/var/lib/mysql
<27>Aug 11 05:04:12 node-4 ocf-mysql-wss: ERROR: p_mysqld: mysql_status(): PIDFile /var/run/resource-agents/mysql-wss/mysql-wss.pid of MySQL se
rver not found. Sleeping for 2 seconds. 0 retries left
<27>Aug 11 05:04:14 node-4 ocf-mysql-wss: ERROR: p_mysqld: mysql_status(): MySQL is not running

You can see that this affects *all* services and there is nothing nova specific about it.

Revision history for this message
Alex Schultz (alex-schultz) wrote :

mysql was OOM killed.

<3>Aug 11 05:03:46 node-4 kernel: [ 4193.814374] Out of memory: Kill process 22818 (mysqld) score 39 or sacrifice child
<3>Aug 11 05:03:46 node-4 kernel: [ 4193.816421] Killed process 22818 (mysqld) total-vm:2558332kB, anon-rss:114348kB, file-rss:0kB

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Please consider increasing memory for the controllers. A similar issue was reported under Bug 1608504.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.