VolumeEncryptionTest is failing in tripleo-ci-centos-7-scenario002-multinode-oooq-container with request timeout

Bug #1741850 reported by Arx Cruz
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Martin André

Bug Description

As of the day I open this bug, all the tripleo-ci-centos-7-scenario002-multinode-oooq-container gate jobs are failing in barbican_tempest_plugin.tests.scenario.test_volume_encryption.VolumeEncryptionTest with the following error:

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python2.7/site-packages/barbican_tempest_plugin/tests/scenario/test_volume_encryption.py", line 115, in test_encrypted_cinder_volumes_cryptsetup
    self.attach_detach_volume(server, volume, keypair)
  File "/usr/lib/python2.7/site-packages/barbican_tempest_plugin/tests/scenario/test_volume_encryption.py", line 59, in attach_detach_volume
    attached_volume = self.nova_volume_attach(server, volume)
  File "/usr/lib/python2.7/site-packages/barbican_tempest_plugin/tests/scenario/manager.py", line 332, in nova_volume_attach
    volume['id'], 'in-use')
  File "/usr/lib/python2.7/site-packages/tempest/common/waiters.py", line 211, in wait_for_volume_resource_status
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: volume 3b069e8f-e978-4f88-8afe-be563e3d6125 failed to reach in-use status (current available) within the required time (500 s).

Example:
http://logs.openstack.org/82/531182/1/gate/tripleo-ci-centos-7-scenario002-multinode-oooq-container/c99bd6b/logs/undercloud/home/zuul/tempest/tempest.html.gz

I believe it's some miss configuration in barbican, but I haven't take a deeply look on it.

Arx Cruz (arxcruz)
Changed in tripleo:
importance: High → Critical
tags: added: promotion-blocker
wes hayutin (weshayutin)
Changed in tripleo:
milestone: none → queens-3
Revision history for this message
yatin (yatinkarel) wrote :
Revision history for this message
Ade Lee (alee-3) wrote :

The Barbican config looks fine. Per Alan Bishop (and as evidenced by the above log snippet), this is probably due to an IQN issue caused by problems with the iscsid container.

Trying to reproduce to check the barbican logs to be sure.

Revision history for this message
Alan Bishop (alan-bishop) wrote :
Download full text (3.4 KiB)

I analyzed another failure over the weekend [1], and it's the same failure I see here. For reasons I don't understand, it looks like the iscsid container is regenerating its IQN on startup, essentially overwriting the IQN initialization done by puppet. This creates a situation where the iSCSI subsystem thinks that Nova is trying to access a volume using the wrong IQN, which results in an iSCSI login failure. This failure can be seen in the log journal [2], where this appears:

Jan 06 11:48:13 centos-7-ovh-gra1-0001714610 kernel: iSCSI Initiator Node: iqn.1994-05.com.redhat:755d7cc2a05d is not authorized to access iSCSI target portal group: 1.

[1] https://review.openstack.org/530559
[2] http://logs.openstack.org/82/531182/1/gate/tripleo-ci-centos-7-scenario002-multinode-oooq-container/c99bd6b/logs/subnode-2/var/log/journal.txt.gz#_Jan_06_11_48_13

The IQN is supposed to be initialized once, during the iscsid container's puppet_config step. The same IQN value is shared by baremetal services [3] and containers [4]. That value is iqn.1994-05.com.redhat:dc122233454b, which differs from the iqn.1994-05.com.redhat:755d7cc2a05d value flagged in the login failure message. This is what causes the tempest tests to fail!

[3] http://logs.openstack.org/82/531182/1/gate/tripleo-ci-centos-7-scenario002-multinode-oooq-container/c99bd6b/logs/subnode-2/etc/iscsi/initiatorname.iscsi.gz
[4] http://logs.openstack.org/82/531182/1/gate/tripleo-ci-centos-7-scenario002-multinode-oooq-container/c99bd6b/logs/subnode-2/var/log/config-data/iscsid/etc/iscsi/initiatorname.iscsi.gz

I tracked the problem further, to the iscsid container's stdout [5]. I don't know if I'm reading this correctly, but it suggests the IQN initialization code is running twice. The logs in [5] for "Copying service configuration files" don't look correct (the source and destination directory of the files look wrong), and there's no mention of the .initiator_reset file that's supposed to ensure the IQN is only reset once [6].

[5] http://logs.openstack.org/82/531182/1/gate/tripleo-ci-centos-7-scenario002-multinode-oooq-container/c99bd6b/logs/subnode-2/var/log/extra/docker/containers/iscsid/stdout.log.txt.gz
[6] https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/iscsid.pp#L36

Also, from the log journal [2], you see the iscsid container's puppet_config performing the IQN reset at Jan 06 11:16:21:

Jan 06 11:16:21 centos-7-ovh-gra1-0001714610 puppet-user[35337]: (/Stage[main]/Tripleo::Profile::Base::Iscsid/Exec[reset-iscsi-initiator-name]/returns) executed successfully

But, at Jan 06 11:29:44 (13 minutes later), you see it running the commands that generate the IQN that triggers the login failure:

Jan 06 11:29:44 centos-7-ovh-gra1-0001714610 dockerd-current[25619]: Generating new iSCSI initiator name
Jan 06 11:29:44 centos-7-ovh-gra1-0001714610 dockerd-current[25619]: ++ echo 'Generating new iSCSI initiator name'
Jan 06 11:29:44 centos-7-ovh-gra1-0001714610 dockerd-current[25619]: +++ /sbin/iscsi-iname
Jan 06 11:29:44 centos-7-ovh-gra1-0001714610 dockerd-current[25619]: ++ echo InitiatorName=iqn.1994-05.com.redhat:755d7cc2a05d

We need to take a closer look at how the is...

Read more...

Revision history for this message
Emilien Macchi (emilienm) wrote :

Thanks for the analysis Alan, in the meantime I'm looking what changed since it worked.

https://review.rdoproject.org/jenkins/job/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset017-master/

It started to fail on January 6th, I'll share my findings here if any.

Revision history for this message
Emilien Macchi (emilienm) wrote :

If you look at the latest success of the CI job, it was here:
https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset017-master/715c3c2/

And you can check the version of THT:
https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset017-master/715c3c2/rpm-qa.txt.gz

https://github.com/openstack/tripleo-heat-templates/commit/d05b39d149e3d1c761ba55dc7db9b60328c3dd25
Which is a commit on December 30th.

Indeed, RDO became inconsistent on January 4th because of tripleo-ui which failed to build, see https://review.rdoproject.org/r/#/c/11169/ - it's fixed now.

So my theory is, because we were not consistent on January 4th, which is the day we stopped to get a promotion by the way, https://review.openstack.org/#/c/524187/ and https://review.openstack.org/#/c/526390/ weren't tested on January 5th in the gate since scenario002 is non voting. So my guess is that these patches actually broke scenario002 which is kind of why the tests were passing in the gate on January 2nd:
http://logs.openstack.org/17/491317/21/check/tripleo-ci-centos-7-scenario002-multinode-oooq-container/59f68fb/logs/tempest.html.gz

I think we should investigate https://review.openstack.org/#/c/524187/ and https://review.openstack.org/#/c/526390/ again.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

https://review.openstack.org/#/c/532093/ reverts the breaking change. There is an error with the kolla copy command https://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/docker/services/iscsid.yaml#n82

Generated puppet configs do not make it into the expected path:

INFO:__main__:Copying /var/lib/kolla/config_files/src-iscsid/iscsid.conf to /iscsid.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src-iscsid/initiatorname.iscsi to /initiatorname.iscsi

while it should be '/etc/iscsi/{iscsid.conf,initiatorname.iscsi}' instead of '/'

Changed in tripleo:
assignee: Arx Cruz (arxcruz) → Bogdan Dobrelya (bogdando)
status: Triaged → In Progress
tags: added: containers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/532134

Changed in tripleo:
assignee: Bogdan Dobrelya (bogdando) → Martin André (mandre)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (master)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/532115
Reason: https://review.openstack.org/#/c/532134

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/532093
Reason: https://review.openstack.org/#/c/532134

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by Arx Cruz (<email address hidden>) on branch: master
Review: https://review.openstack.org/531756
Reason: Real fix in https://review.openstack.org/#/c/532134

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/532134
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=8eb351d588539c20caf768c2633832a924f40690
Submitter: Zuul
Branch: master

commit 8eb351d588539c20caf768c2633832a924f40690
Author: Martin André <email address hidden>
Date: Tue Jan 9 10:46:36 2018 +0100

    Fix path for iscsi config file

    We changed the bind mount to be /etc/iscsi in
    I838427ccae06cfe1be72939c4bcc2978f7dc36a8, we need to copy the files to
    /etc/iscsi so that they do not end up at '/' in the container.

    Change-Id: Id5c1f16d08ffd36a35a6669d64460a7b2240d401
    Closes-Bug: #1741850

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/532523

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/pike)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/532523
Reason: https://review.openstack.org/#/c/533373/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/533373
Reason: https://review.openstack.org/#/c/533380/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/pike)

Reviewed: https://review.openstack.org/533380
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=fa02a8f8633b682ba2042ac7fb26d76453cfe65a
Submitter: Zuul
Branch: stable/pike

commit fa02a8f8633b682ba2042ac7fb26d76453cfe65a
Author: John Fulton <email address hidden>
Date: Fri Jan 5 15:26:22 2018 -0500

    Align stars to fix stable/pike gate on scenario001

    1) Fix path for iscsi config file

    We changed the bind mount to be /etc/iscsi in
    I838427ccae06cfe1be72939c4bcc2978f7dc36a8, we need to copy the files to
    /etc/iscsi so that they do not end up at '/' in the container.

    Change-Id: Id5c1f16d08ffd36a35a6669d64460a7b2240d401
    Closes-Bug: #1741850
    (cherry picked from commit 8eb351d588539c20caf768c2633832a924f40690)

    2) Fix puppet config volume for iscsid in containers

    Bind mount the /etc/iscsi host path for iscsi container puppet config.
    Use the real host path /etc/iscsi for containers dependsing on it.

    Closes-bug: #1735425

    Change-Id: I838427ccae06cfe1be72939c4bcc2978f7dc36a8
    Co-authored-by: Alan Bishop <email address hidden>
    Co-authored-by: Martin André <email address hidden>
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 82f128f15b1b1eb7bf6ac7df0c6d01e5619309eb)

    3) Allow to override manage polling param

    Without this, we cannot override the polling yaml metrics
    from puppet template.

    Change-Id: I509dd4932402c458e222c52b5d7d5e370a5466c0
    (cherry picked from commit e870783b2c8f3b7b13459693b17425f5bf0fe53d)

    4) Disable voting on scenario001 - now timeouting to ssh the VM created
       by Tempest.

    Related-Bug: 1742936

    5) Update Ceph container CPU/memory limits in Ceph scenarios

    Ceph containers are started with `docker run --memory`
    and `docker run --cpus` to limit their memory and CPU
    resources. The defaults for OSD and MDS containers were
    recently increased [1] to values better for production
    but this change keeps them at lower values just for
    CI.

    [1] https://github.com/ceph/ceph-ansible/pull/2304

    Change-Id: I5b5cf5cc52907af092bea5e162d4b577ee05c23a
    Related-Bug: 1741499
    (cherry picked from commit d68619a26ec7cbd6176f4bb0d352d2aa91439f5c)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/pike)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/532523
Reason: https://review.openstack.org/#/c/533380/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.8

This issue was fixed in the openstack/tripleo-heat-templates 7.0.8 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.0.0.0b3

This issue was fixed in the openstack/tripleo-heat-templates 8.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.