Oslo_db DBConnectionError with ironic instance

Bug #1619618 reported by Kyrylo Romanenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
Kyrylo Romanenko

Bug Description

Steps to reproduce:
1. Create cluster with Ironic and Ceph.
2. Deploy the cluster
3. Upload image to glance
4. Enroll Ironic nodes
5. Boot virtual ironic nova instance

Actual Result:
ClientException: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_db.exception.DBConnectionError'> (HTTP 500) (Request-ID: req-9035ce1a-29ba-40da-bf0a-e740ab960bae)

Expected results: instance must reach ACTIVE status.

Traceback:
http://paste.openstack.org/show/566044/
Link to failed job:
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ironic_deploy_ceph/46/

Disregard snapshot provided in comment #1, as it was take from the different env without error. The correct snapshot could be downloaded from comment #5.

Tags: area-oslo
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Kyrylo, this specific error is missing in the logs you attached, but it has nothing to do with Nova as far as I can tell:

http://paste.openstack.org/show/566065/

^ basically means the OCF script detected a mysql failure and restarted the Galera clustering process. During the time Galera was not available you could see HTTP 500 errors in every service that uses a DB.

Changed in mos:
assignee: MOS Bugs (mos-bugs) → MOS Oslo (mos-oslo)
tags: added: area-oslo
Changed in mos:
status: New → Confirmed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The case with Ceph OSD at controllers shall not be tested as it is not recommended and causes issues exactly like that, see "Add 1 node with controller+ceph-osd role" - this is wrong. Don't do that please in tests.

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Kyrylo (or somebody from QA team), as per Bogdan's comment could you please change the test and move Ceph OSD outside from controller nodes. In the snapshot it could be seen that all three constollers have them. Judging by the pacemaker logs, MySQL OCF script returned error at some point, which was probably caused by overload. Moving OSDs outside should help.

Changed in mos:
assignee: MOS Oslo (mos-oslo) → Kyrylo Romanenko (kromanenko)
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

For the record, here is the correct snapshot downloaded from the job referenced in the description. The initially provided snapshot is taken from some other env without error.

description: updated
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Dmitry, it is already changed in master, also there is cherry-pick to stable/mitaka: https://review.openstack.org/#/c/365573/

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Will see further results of job.

Changed in mos:
status: Confirmed → Incomplete
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

I have talked with Kyrylo and we agreed to move the bug to incomplete state for now. We will move it back to confirmed if the bug reproduces after we have moved OSD roles out of controller.

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

The last results of the job are green, moved to Invalid.
Looks like the fix of another issue is fixed this bug as well.

Please feel free to change status to Confirmed if the issue reproduce again.

Changed in mos:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.