lvchange -a causing RPC timeouts between c-api and c-vol on inap hosts in CI

Bug #1873234 reported by Lee Yarwood
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
New
Undecided
Unassigned
devstack
Fix Released
Undecided
Unassigned

Bug Description

https://zuul.opendev.org/t/openstack/build/15d948dfe4184df38bcea92d42378ee0/log/zuul-info/inventory.yaml

  hosts:
    compute1:
      ansible_connection: ssh
      ansible_host: 198.72.124.190
      ansible_port: 22
      ansible_python_interpreter: auto
      ansible_user: zuul
      nodepool:
        az: nova
        cloud: inap
        host_id: 3c9c8f46f2a6c8e8daa2acdcef7613a60c986460cc415e7e7aca5f7a
        interface_ip: 198.72.124.190
        label: ubuntu-bionic
        private_ipv4: 198.72.124.190
        provider: inap-mtl01
        public_ipv4: 198.72.124.190
        public_ipv6: ''
        region: mtl01
    controller:
      ansible_connection: ssh
      ansible_host: 198.72.124.210
      ansible_port: 22
      ansible_python_interpreter: auto
      ansible_user: zuul
      nodepool:
        az: nova
        cloud: inap
        host_id: 3c9c8f46f2a6c8e8daa2acdcef7613a60c986460cc415e7e7aca5f7a
        interface_ip: 198.72.124.210
        label: ubuntu-bionic
        private_ipv4: 198.72.124.210
        provider: inap-mtl01
        public_ipv4: 198.72.124.210
        public_ipv6: ''
        region: mtl01

https://zuul.opendev.org/t/openstack/build/15d948dfe4184df38bcea92d42378ee0/log/controller/logs/screen-c-vol.txt#1463

Apr 15 15:31:28.343392 ubuntu-bionic-inap-mtl01-0015954199 cinder-volume[26482]: DEBUG oslo_concurrency.processutils [req-f0848f9c-e74d-4014-8a3d-b869976fc965 req-2075521f-1981-48c8-b307-d9d8f961e9d6 tempest-UpdateMultiattachVolumeNegativeTest-1535704675 None] Running cmd (subprocess): sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-76ef6bca-3a03-4b38-9cee-f5a68a68e315 {{(pid=26774) execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:372}}
Apr 15 15:31:28.356005 ubuntu-bionic-inap-mtl01-0015954199 sudo[13082]: stack : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/bin/cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-76ef6bca-3a03-4b38-9cee-f5a68a68e315
[..]
Apr 15 15:32:50.100520 ubuntu-bionic-inap-mtl01-0015954199 cinder-volume[26482]: DEBUG oslo_concurrency.processutils [req-f0848f9c-e74d-4014-8a3d-b869976fc965 req-2075521f-1981-48c8-b307-d9d8f961e9d6 tempest-UpdateMultiattachVolumeNegativeTest-1535704675 None] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-76ef6bca-3a03-4b38-9cee-f5a68a68e315" returned: 0 in 81.758s {{(pid=26774) execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:409}}

This has the knock on effect of causing c-api to timeout, delete the attachment that c-vol is still trying to update and return a 500 error back to n-cpu.

Revision history for this message
Lee Yarwood (lyarwood) wrote :

FWIW this might be a duplicate of https://bugs.launchpad.net/cinder/+bug/1814916 but I'm not seeing the premature timeout Matt reported in that bug, the timeout is valid in this case.

Revision history for this message
Lee Yarwood (lyarwood) wrote :
Lee Yarwood (lyarwood)
summary: - lvchange -a timing out RPC calls on inap hosts in CI
+ lvchange -a causing RPC timeouts between c-api and c-vol on inap hosts
+ in CI
Revision history for this message
Lee Yarwood (lyarwood) wrote :

Another example below, I'll try to work out a recheck query for this now.

https://zuul.opendev.org/t/openstack/build/56eea988a7684222a750bf5623134d0e/log/controller/logs/screen-n-cpu.txt

54045 Dec 22 22:05:49.380188 fedora-32-vexxhost-ca-ymq-1-0022317164 nova-compute[99839]: ERROR nova.volume.cinder [None req-6fbe3107-470a-4158-83a9-2d56ed6f4e11 tempest-AttachVolumeMultiAttachTest-2040614977 tempest-AttachVolumeMultiAttachTest-2040614977] Update attachment failed for attachment 4f396696-0740-47a5-870c-0776285f861a. Error: Internal Server Error (HTTP 500) Code: 500: cinderclient.exceptions.ClientException: Internal Server Error (HTTP 500)

https://zuul.opendev.org/t/openstack/build/56eea988a7684222a750bf5623134d0e/log/controller/logs/screen-c-api.txt

23124 Dec 22 22:04:48.586155 fedora-32-vexxhost-ca-ymq-1-0022317164 <email address hidden>[100676]: DEBUG cinder.coordination [req-6fbe3107-470a-4158-83a9-2d56ed6f4e11 req-d62dbd0b-5973-45b7-911e-6c1ba606cdce tempest-AttachVolumeMultiAttachTest-2040614977 tempest-AttachVo lumeMultiAttachTest-2040614977] Lock "b'cinder-attachment_update-87862c54-bdb9-49f6-a04c-fa8ed7e791b8-fedora-32-vexxhost-ca-ymq-1-0022317164'" acquired by "attachment_update" :: waited 0.123s {{(pid=100676) _synchronized /opt/stack/cinder/cinder/coordination.py:146}}
[..]
24165 Dec 22 22:05:49.389223 fedora-32-vexxhost-ca-ymq-1-0022317164 <email address hidden>[100676]: ERROR cinder.api.v3.attachments oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 32395480a3264e83a8367e882b2678d2

https://zuul.opendev.org/t/openstack/build/56eea988a7684222a750bf5623134d0e/log/controller/logs/screen-c-vol.txt

5905 Dec 22 22:04:48.642754 fedora-32-vexxhost-ca-ymq-1-0022317164 cinder-volume[101743]: DEBUG oslo_concurrency.processutils [req-6fbe3107-470a-4158-83a9-2d56ed6f4e11 req-d62dbd0b-5973-45b7-911e-6c1ba606cdce tempest-AttachVolumeMultiAttachTest-2040614977 None] Running cmd (subprocess): sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-87862c54-bdb9-49f6-a04c-fa8ed7e791b8 {{(pid=101743) execute /usr/local/lib/python3.8/site-packages/oslo_concurrency/processutils.py:384}}
[..]
Dec 22 22:05:56.037822 fedora-32-vexxhost-ca-ymq-1-0022317164 cinder-volume[101743]: DEBUG oslo_concurrency.processutils [req-6fbe3107-470a-4158-83a9-2d56ed6f4e11 req-d62dbd0b-5973-45b7-911e-6c1ba606cdce tempest-AttachVolumeMultiAttachTest-2040614977 None] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-87862c54-bdb9-49f6-a04c-fa8ed7e791b8" returned: 0 in 67.396s {{(pid=101743) execute /usr/local/lib/python3.8/site-packages/oslo_concurrency/processutils.py:422}}

Revision history for this message
Lee Yarwood (lyarwood) wrote :
Changed in devstack:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.