tripleo

Bug #1933639
Comment #17

Comment 17 for bug 1933639

Revision history for this message

Christian Schwede (cschwede) wrote on 2021-07-07 (last edit on 2021-07-07):

#17

Thanks Chandan & Marios for your help!

What I noticed is that Tempest runs sometimes into issues with the eventual consistency due to slow disk writes on some of the nodes. This happens usually on deleting a container that is not empty yet. Tempest sends object delete requests before deleting a container, but these are not finished on all nodes if some of them are still waiting to finish writing the last update to disk.

This is expected behavior due to the eventual consistency. Question is why this was working in the past and is now sometimes failing (the last 3 successful runs are from today[1] with no errors)?

As mentioned above, sometimes there are a few requests pretty slow, and these are the ones that eventually fail. The last failure I was looking into executed object server deletes on average within 18ms, but the failed tests happened when some writes took 1.65, 1.19 and 0.55 seconds and eventual consistency hit in. So something was slowing down disks writes quite a bit.

I'm wondering if anything changed recently in the test environment itself? Slower nodes and/or disks, network issues etc.? Newer Tempest version (I noticed a change that might be an issue, but doesn't explain all the failed tests[2])?

[1] https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/6fa7419/logs/undercloud/var/log/tempest/stestr_results.html.gz
[2] https://review.opendev.org/c/openstack/tempest/+/774428