Several OSTF test fail: ClientException: Unexpected API Error. <class 'sqlalchemy.exc.ProgrammingError'> (HTTP 500)

Bug #1619740 reported by ElenaRossokhina
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Maksim Malchuk
Mitaka
Fix Released
High
Maksim Malchuk

Bug Description

Detailed bug description:
Found on ci, huge_ha_neutron_vlan_ceph_ceilometer_rados
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.huge_ha_neutron/46/testReport/(root)/huge_ha_neutron_vlan_ceph_ceilometer_rados/

Steps to reproduce:
group=huge_ha_neutron_vlan_ceph_ceilometer_rados

Expected results:
No failures

Actual result:
Several ostf test failed:
  - Create volume and boot instance from it (failure) Instance creation failed. Please refer to OpenStack logs for more details.
  - Create volume and attach it to instance (failure) Instance creation failed. Please refer to OpenStack logs for more details.
  - Check network connectivity from instance via floating IP (failure) Server can not be created. Please refer to OpenStack logs for more details.
  - Launch instance (failure) Creating instance using the new security group has failed. Please refer to OpenStack logs for more details.
  - Launch instance with file injection (failure) Creating instance using the new security group has failed. Please refer to OpenStack logs for more details.
  - Launch instance, create snapshot, launch instance from snapshot (failure) Image can not be booted. Please refer to OpenStack logs for more details.

ostf logs full of ClientException: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'sqlalchemy.exc.ProgrammingError'> (HTTP 500) (Request-ID: req-b7d15c67-0a42-409d-bcb0-1de757fdce32)
ostf logs - http://paste.openstack.org/show/566119/

all logs, including nova: https://drive.google.com/open?id=0B2ag_Bf-ShtTV01xOTE2V0VjM28

Changed in fuel:
assignee: nobody → MOS QA Team (mos-qa)
milestone: none → 9.1
importance: Undecided → High
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :
Download full text (9.0 KiB)

Nova API logs:

2016-09-06 03:15:02.287 11428 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:212
2016-09-06 03:15:02.289 11428 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:225
2016-09-06 03:15:02.235 11858 DEBUG oslo_db.sqlalchemy.engines [req-a55cfeec-cf17-41ba-bfa0-ed8b990b0675 - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py:256
2016-09-06 03:15:02.324 11429 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:212
2016-09-06 03:15:02.342 11429 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:225
2016-09-06 03:15:02.452 12084 INFO oslo_service.service [-] Child 11428 exited with status 1
2016-09-06 03:15:02.490 12084 DEBUG oslo_service.service [-] Started child 11915 _start_child /usr/lib/python2.7/dist-packages/oslo_service/service.py:475
2016-09-06 03:15:02.550 12084 INFO oslo_service.service [-] Child 11429 exited with status 1
2016-09-06 03:15:02.667 12084 DEBUG oslo_service.service [-] Started child 11923 _start_child /usr/lib/python2.7/dist-packages/oslo_service/service.py:475
2016-09-06 03:15:03.306 11915 DEBUG oslo_db.sqlalchemy.engines [req-306febd7-3f59-49ef-ab24-127303c76c48 - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py:256
2016-09-06 03:15:03.325 11923 DEBUG oslo_db.sqlalchemy.engines [req-fc808806-e531-4dc2-9512-6ce6dc34c81d - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py:256
2016-09-06 03:15:04.361 11507 ERROR oslo_service.service [req-c9831327-9d76-46d7-b24e-e572f2bbd0bd - - - - -] Error starting thread.
2016-09-06 03:15:04.361 11507 ERROR oslo_service.service Traceback (most recent call last):
2016-09-06 03:15:04.361 11507 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 680, in run_service
2016-09-06 03:15:04.361 11507 ERROR oslo_service.service service.start()
2016-09-06 03:15:04.361 11507 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/service.py", line 434, in start
2016-09-06 03:15:04.361 11507 ERROR oslo_service.service self.binary)
2016-09-06 03:15:04.361 11507 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 181, in wrapper
2016-09-06 03:15:04.361 11...

Read more...

Changed in fuel:
assignee: MOS QA Team (mos-qa) → MOS Nova (mos-nova)
tags: added: area-nova
Changed in fuel:
status: New → Confirmed
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Timur,

Please note the error message

2016-09-06 03:15:04.361 11507 ERROR oslo_service.service OperationalError: (_mysql_exceptions.OperationalError) (1054, "Unknown column 'services.last_seen_up' in 'field list'") [SQL: u'SELECT services.created_at AS services_created_at, services.updated_at AS services_updated_at, services.deleted_at AS services_deleted_at, services.deleted AS services_deleted, services.id AS services_id, services.host AS services_host, services.`binary` AS services_binary, services.topic AS services_topic, services.report_count AS services_report_count, services.disabled AS services_disabled, services.disabled_reason AS services_disabled_reason, services.last_seen_up AS services_last_seen_up, services.forced_down AS services_forced_down, services.version AS services_version \nFROM services \nWHERE services.deleted = %s AND services.host = %s AND services.`binary` = %s \n LIMIT %s'] [parameters: (0, '10.109.6.9', 'nova-osapi_compute', 1)]

^ you must be running tests *before* migration scripts have been applied.

E.g. this particular column is added in migration #294:

https://github.com/openstack/nova/blob/stable/mitaka/nova/db/sqlalchemy/migrate_repo/versions/294_add_service_heartbeat.py

nova-* services *must not* be started before "nova-manage db sync" has completed successfully.

Please check your job template - looks like you try to use services too early, when they have not been properly configured yet (one might also argue that services must not be even started before Puppet manifests finished applying of the configuration).

Changed in fuel:
assignee: MOS Nova (mos-nova) → Timur Nurlygayanov (tnurlygayanov)
status: Confirmed → Incomplete
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

it is strange because we run these tests only after the fresh deployment.

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Hi MOS Puppet team,

in the traceback, we can see that we don't have the required column in the table. It looks like nova-api started earlier than we successfully execute nova-db-sync command (we think it can be a problem) or because of other reasons.

Could you please take a look and help to investigate and fix the issue?

Thank you!

Changed in fuel:
assignee: Timur Nurlygayanov (tnurlygayanov) → MOS Puppet Team (mos-puppet)
status: Incomplete → Confirmed
Revision history for this message
Ivan Berezovskiy (iberezovskiy) wrote :

2016-09-02T02:45:04.746478+00:00 notice: (/Stage[main]/Nova::Db::Sync_api/Exec[nova-db-sync-api]/returns) error: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')

http://paste.openstack.org/show/567180/

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Ivan, thank you for your help!

It is MySQL Deadlock issue, Fuel Sustaining team, could you please take a look?

Thank you!

Changed in fuel:
assignee: MOS Puppet Team (mos-puppet) → Fuel Sustaining (fuel-sustaining-team)
Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

the fail:

node-1.test.domain.local/mysqld.log:2016-09-02T02:45:04.537997+00:00 err: 2016-09-02 02:45:04 13875 [Warning] WSREP: TO isolation failed for: 3, schema: nova_api, sql: CREATE INDEX flavor_extra_specs_flavor_id_key_idx ON flavor_extra_specs (flavor_id, `key`). Check wsrep connection state and retry the query.

is happens right during the split-brain process:

node-1.test.domain.local/ocf-mysql-wss.log:496:2016-09-02T02:45:04.149886+00:00 err: ERROR: p_mysqld: check_if_galera_pc(): But I'm running a new cluster, PID:13875, this is a split-brain!

20 seconds later the galera cluster recovered:

node-1.test.domain.local/mysqld.log:2016-09-02T02:45:24.605658+00:00 err: 2016-09-02 02:45:24 17094 [Note] WSREP: New cluster view: global state: bf1c6c99-70b4-11e6-ad87-6a2b8c2d2a6a:965, view# 5: Primary, number of nodes: 3, my index: 2, protocol version 3
node-1.test.domain.local/mysqld.log:2016-09-02T02:45:24.606194+00:00 err: 2016-09-02 02:45:24 17094 [Note] WSREP: SST complete, seqno: 965

the cluster installed with fuel-ha-utils package version 9.0.0-1~u14.04+mos8568 which contain the fix for split-brain: https://review.openstack.org/#/c/361943/

all fine with mysql/galera but nova-manage don't handle deadlocks exeptions even the parameter nova/db_max_retries set to 20. nova-manage should retry in case of deadlock error before error is raised.

Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → MOS Nova (mos-nova)
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

We have the same issues for Heat:

https://bugs.launchpad.net/mos/+bug/1620638

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Maksim,

Regardless of the fact whether it does 20 retries or not, if "nova-manage db sync" fails, deployment must be stopped at this point with a failure. Or, alternatively, the whole "nova-manage db sync" call must be retried.

Moreover, the migration is performed by migrate library - Nova just propagates the error up the stack, so that nova-manage exits with a proper error code. It's up to a deployment tool (or operator) to handle the problem at this point (taking into account the fact that MySQL does not support transactional DDL and retries are not guaranteed to be idempotent, because applying of migration scripts might have failed in the middle of the process).

Changed in fuel:
assignee: MOS Nova (mos-nova) → Maksim Malchuk (mmalchuk)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

https://bugs.launchpad.net/fuel/+bug/1619740/comments/5 clearly states db sync step failed - the corresponding deployment task should have also failed rather than silently continuing, as if everything succeeded. At this point the deployment must be in error state, not in complete state.

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Roman, thanks for the feedback, the deployment shouldn't be stopped instead of nova-manage should be executed once more, so this should be fixed in the nova puppet modules.

Changed in fuel:
assignee: Maksim Malchuk (mmalchuk) → MOS Puppet Team (mos-puppet)
Changed in fuel:
assignee: MOS Puppet Team (mos-puppet) → Maksim Malchuk (mmalchuk)
Changed in fuel:
milestone: 9.1 → 10.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/366978

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/#/c/366971/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/366978
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=a1b1289d2ce5f1b88b8dbda5bcd84bcb1f4bd633
Submitter: Jenkins
Branch: master

commit a1b1289d2ce5f1b88b8dbda5bcd84bcb1f4bd633
Author: Maksim Malchuk <email address hidden>
Date: Wed Sep 7 23:46:26 2016 +0300

    Tweak 'nova-db-sync' and 'nova-db-sync-api' execs

    This change adds retries to not fail on 'nova-manage' execution

    Closes-Bug: #1619740
    Change-Id: I33637b17a18f8809f0a5f3778c57d5066fd56b8d
    Signed-off-by: Maksim Malchuk <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/366971
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=6ff84695316f07362552883f302afcaaef53b26c
Submitter: Jenkins
Branch: stable/mitaka

commit 6ff84695316f07362552883f302afcaaef53b26c
Author: Maksim Malchuk <email address hidden>
Date: Wed Sep 7 23:46:26 2016 +0300

    Tweak 'nova-db-sync' and 'nova-db-sync-api' execs

    This change adds retries to not fail on 'nova-manage' execution

    Closes-Bug: #1619740
    Change-Id: I33637b17a18f8809f0a5f3778c57d5066fd56b8d
    Signed-off-by: Maksim Malchuk <email address hidden>

tags: added: on-verification
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :

SWARM runs for 9.1 #50-#58 passed.

tags: removed: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-library 10.0.0rc1

This issue was fixed in the openstack/fuel-library 10.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-library 10.0.0

This issue was fixed in the openstack/fuel-library 10.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.