Observed StaleDataError in gate-neutron-dsvm-api tests if reference IPAM driver is used

Bug #1494351 reported by Pavel Bondar on 2015-09-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
High
Pavel Bondar

Bug Description

Error is observed only in review https://review.openstack.org/#/c/181023/, which enables Reference IPAM Driver by default, so all tests uses new IPAM interface instead of build IPAM implementation.

Last 4 rechecks shows different test failures but, with the same root cause
Example:
http://logs.openstack.org/23/181023/29/check/gate-neutron-dsvm-api/17493d1/logs/screen-q-svc.txt.gz

See a lot of ~20-30 :
UPDATE statement on table 'ipamavailabilityranges' expected to update 1 row(s); 0 were matched.

Errors started to be observed in between Aug 10 and Sep 2.

Pavel Bondar (pasha117) on 2015-09-10
Changed in neutron:
assignee: nobody → Pavel Bondar (pasha117)
Changed in neutron:
importance: Undecided → High
Pavel Bondar (pasha117) wrote :

More errors seen in logs:

"DBReferenceError: (pymysql.err.IntegrityError) (1452, u'Cannot add or update a child row: a foreign key constraint fails (`neutron`.`ipallocations`, CONSTRAINT `ipallocations_ibfk_3` FOREIGN KEY (`subnet_id`) REFERENCES `subnets` (`id`) ON DELETE CASCADE)') [SQL: u'INSERT INTO ipallocations (port_id, ip_address, subnet_id, network_id) VALUES (%s, %s, %s, %s)'] [parameters: (u'e2750172-ae02-4068-ba8d-91edd42d15e6', u'10.100.0.20', u'e4ce0e25-b471-4a1e-a467-ca95070c5569', u'bb6196e8-581e-44fe-b3d1-e3ac56f081ec')]\n"

Since each test run different tests failures are observed, it looks like race condition issue.

Pavel Bondar (pasha117) wrote :

Copied failures with logs around to pastebin:
http://pastebin.com/5S5gcUs7

Fix proposed to branch: master
Review: https://review.openstack.org/223123

Changed in neutron:
status: New → In Progress
Pavel Bondar (pasha117) wrote :

Found possible root cause:
list_ranges_by_allocation_pool had locking parameter, but it was not used in code.
So IpamAvailabilityRange was not locked during transaction.

Fix is https://review.openstack.org/223123
Need to test it with IPAM Driver Enabled, i.e make https://review.openstack.org/#/c/181023/ depend on this fix and verify no more failures observed in gate-neutron-dsvm-api

Changed in neutron:
milestone: none → liberty-rc1
Kyle Mestery (mestery) on 2015-09-24
Changed in neutron:
milestone: liberty-rc1 → mitaka-1
tags: added: liberty-rc-potential

How close to completion is this? Can we really consider it backport potential?

I mean liberty-rc-potential, rather than simply liberty-backport-potential?

tags: added: l3-ipam-dhcp
Changed in neutron:
assignee: Pavel Bondar (pasha117) → Carl Baldwin (carl-baldwin)

Related fix proposed to branch: master
Review: https://review.openstack.org/237677

Changed in neutron:
assignee: Carl Baldwin (carl-baldwin) → Pavel Bondar (pasha117)
Akihiro Motoki (amotoki) on 2015-10-21
tags: added: liberty-backport-potential
removed: liberty-rc-potential

Reviewed: https://review.openstack.org/237677
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a7b976e6529b744dd01f26f9fa7769518fe7b20b
Submitter: Jenkins
Branch: master

commit a7b976e6529b744dd01f26f9fa7769518fe7b20b
Author: Pavel Bondar <email address hidden>
Date: Tue Oct 20 18:59:16 2015 +0300

    Deepcopy port dict in dhcp rpc handler

    Added deepcopy of port dict in dhcp rpc handler to prevent operating on
    changed dict in case of raising retry request.

    Change-Id: Ie1816fe819ba77061e71bd61de2fd9ebe4574cef
    Related-Bug: #1494351

Reviewed: https://review.openstack.org/238981
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4b23b68a651d385ab7e99165299e5fcc323b80d3
Submitter: Jenkins
Branch: stable/liberty

commit 4b23b68a651d385ab7e99165299e5fcc323b80d3
Author: Pavel Bondar <email address hidden>
Date: Tue Oct 20 18:59:16 2015 +0300

    Deepcopy port dict in dhcp rpc handler

    Added deepcopy of port dict in dhcp rpc handler to prevent operating on
    changed dict in case of raising retry request.

    Change-Id: Ie1816fe819ba77061e71bd61de2fd9ebe4574cef
    Related-Bug: #1494351
    (cherry picked from commit a7b976e6529b744dd01f26f9fa7769518fe7b20b)

tags: added: in-stable-liberty
tags: removed: liberty-backport-potential

Reviewed: https://review.openstack.org/223123
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d755f7248d324bb4c44b3efc9d200f8eb075066d
Submitter: Jenkins
Branch: master

commit d755f7248d324bb4c44b3efc9d200f8eb075066d
Author: Pavel Bondar <email address hidden>
Date: Tue Oct 20 19:11:30 2015 +0300

    Use compare-and-swap for IpamAvailabilityRange

    Existing locking mechanism 'select for update' causes
    deadlocks with galera multi-writers.
    Replaced locking rows with compare-and-swap approach.

    Compare-and-swap verifies that row is not changed by
    another thread before updating/deleting it.
    Filter-and-update and filter-and-delete are used.
    They return count of affected rows.
    If count of affected row is less than expected,
    then another thread already changed our row
    and RetryRequest is raised.

    Change-Id: I514cae0fa43033433ec2982bcf3726e02e6692bf
    Closes-Bug: #1494351

Changed in neutron:
status: In Progress → Fix Committed

This issue was fixed in the openstack/neutron 8.0.0.0b1 development milestone.

Changed in neutron:
status: Fix Committed → Fix Released

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/284438
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers