StarlingX

manifest.yaml with test “enabled: true” fails to apply

Bug #1819021 reported by Kristine Bujold on 2019-03-07

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Triaged	Low	Ran An

Bug Description

Brief Description
-----------------

If the manifest.yaml file has "test: enabled" set to "true" the manifest fails to apply.

2019-03-06T21:38:39.000 controller-0 -bash: info HISTORY: PID=7547 UID=0 system application-apply stx-openstack

[root@controller-0 log(keystone_admin)]# system application-list
+---------------+-----------------+---------------+--------------+------------------------------------------+
| application | manifest name | manifest file | status | progress |
+---------------+-----------------+---------------+--------------+------------------------------------------+
| stx-openstack | armada-manifest | manifest.yaml | apply-failed | operation aborted, check logs for detail |
+---------------+-----------------+---------------+--------------+------------------------------------------+

kubectl -n openstack get pods
osh-openstack-aodh-test 0/1 Error 0 72m

kubectl logs osh-openstack-aodh-test --namespace openstack
+ export HOME=/tmp
+ HOME=/tmp
+ RESOURCE_UUID=f2fae3b8-008a-4690-bf2d-d4756b10c6b2
+ echo 'Test: create an alarm'
+ aodh alarm create --name alarm_test --type gnocchi_resources_threshold --metric ram_util --threshold 10.0 --comparison-operator eq --aggregation-method mean --granularity 300 --evaluation-periods 1 --alarm-action http://localhost:8776/alarm --resource-id f2fae3b8-008a-4690-bf2d-d4756b10c6b2 --resource-type generic
Test: create an alarm
internal endpoint for metric service in RegionOne region not found (HTTP 503) (Request-ID: req-08eaf3dc-0784-4cd9-a885-73d2970f77c9)

from "docker exec -it armada_service /bin/bash" stx-openstack-apply.log

2019-03-06 22:27:54.088 40 INFO armada.handlers.tiller [-] FAILED: osh-openstack-aodh-test, run `kubectl logs osh-openstack-aodh-test --namespace openstack` for more info
2019-03-06 22:27:54.113 40 INFO armada.handlers.tiller [-] 1 test(s) failed
<snip>
namespace: "openstack"
get_release_status /usr/local/lib/python3.5/site-packages/armada/handlers/tiller.py:475
2019-03-06 22:27:54.716 40 INFO armada.handlers.armada [-] Test failed for release: osh-openstack-aodh
2019-03-06 22:27:54.716 40 ERROR armada.cli [-] Caught internal exception: armada.exceptions.tiller_exceptions.TestFailedException: Test failed for release: osh-openstack-aodh
2019-03-06 22:27:54.716 40 ERROR armada.cli Traceback (most recent call last):
2019-03-06 22:27:54.716 40 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/cli/__init__.py", line 39, in safe_invoke
2019-03-06 22:27:54.716 40 ERROR armada.cli self.invoke()
2019-03-06 22:27:54.716 40 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/cli/apply.py", line 217, in invoke
2019-03-06 22:27:54.716 40 ERROR armada.cli resp = armada.sync()
2019-03-06 22:27:54.716 40 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/handlers/armada.py", line 494, in sync
2019-03-06 22:27:54.716 40 ERROR armada.cli self._test_chart(*test_chart_args)
2019-03-06 22:27:54.716 40 ERROR armada.cli File "/usr/local/lib/python3.5/site-packages/armada/handlers/armada.py", line 591, in _test_chart
2019-03-06 22:27:54.716 40 ERROR armada.cli raise tiller_exceptions.TestFailedException(release_name)
2019-03-06 22:27:54.716 40 ERROR armada.cli armada.exceptions.tiller_exceptions.TestFailedException: Test failed for release: osh-openstack-aodh
2019-03-06 22:27:54.716 40 ERROR armada.cli

Severity
--------
Minor

Steps to Reproduce
------------------

Set test to “enabled: true” in the manifest file.
Upload the manifest “system application-upload stx-openstack helm-charts-manifest.tgz”
Apply the manifest “system application-apply stx-openstack”

Expected Behavior
------------------

The manifest file should apply

Actual Behavior
----------------

The manifest fails to apply

Reproducibility
---------------
Reproducible

System Configuration
--------------------

Any

Branch/Pull Time/Commit
-----------------------
master

Timestamp/Logs
--------------

See description

Tags:

Ghada Khalil (gkhalil) on 2019-03-08

tags:

added: stx.containers

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-03-11:

Marking as release gating; medium priority. Right now, this doesn't cause issues in the starlingx deployments as tests are set to disabled. But, ideally, we want to address this issue and enable basic tests to be run on the services when they are launched.

Changed in starlingx:
importance:	Undecided → Medium
status:	New → Triaged
tags:	added: stx.2019.05
Changed in starlingx:
assignee:	nobody → Angie Wang (angiewang)

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-03-11:

Assigning to Angie as the first failure appears to be related to aodh

Ken Young (kenyis) on 2019-04-05

tags:

added: stx.2.0
removed: stx.2019.05

Revision history for this message

Bill Zvonar (billzvonar) wrote on 2019-06-21:

Assigned to Bruce for re-assignment.

Changed in starlingx:
assignee:	Angie Wang (angiewang) → Bruce Jones (brucej)

Bruce Jones (brucej) on 2019-06-21

Changed in starlingx:
assignee:	Bruce Jones (brucej) → Cindy Xie (xxie1)

Cindy Xie (xxie1) on 2019-06-24

Changed in starlingx:
assignee:	Cindy Xie (xxie1) → Ran An (an.ran)

Revision history for this message

Ran An (an.ran) wrote on 2019-07-01:

Hi Kristine Bujold, do you know what's the pull time info of the last pass?

Revision history for this message

Ran An (an.ran) wrote on 2019-07-02:

this issue can be reproduced based on iso built by cengn 26th June.

in this issue, as openstack test cases (from upstream) were enabled, the manifest application would fail when one of the tests did not passed.

in the reproduce test, not only aodh-test failed, tests of barbican, neutron, panko and ceilometer also failed. by analyzing related pods/container logs, it should be an authentication issue (internal endpoint ... not found (HTTP 503) or forbidden to ...).

deeper dig is required.

Revision history for this message

Ran An (an.ran) wrote on 2019-07-24:

aodh-test.log Edit (651 bytes, text/plain)

update logs of failed *-test pods

Revision history for this message

Ran An (an.ran) wrote on 2019-07-24:

barbican.log Edit (2.0 KiB, text/plain)

Revision history for this message

Ran An (an.ran) wrote on 2019-07-24:

ceilometer.log Edit (3.7 KiB, text/plain)

Revision history for this message

Ran An (an.ran) wrote on 2019-07-24:

neutron-test.log Edit (110.7 KiB, text/plain)

Revision history for this message

Ran An (an.ran) wrote on 2019-07-24:

#10

panko.log Edit (5.7 KiB, text/plain)

Revision history for this message

Ran An (an.ran) wrote on 2019-07-24:

#11

glance.log Edit (20.4 KiB, text/plain)

glance test failed too with codes based on July 15th master branch

Revision history for this message

Angie Wang (angiewang) wrote on 2019-07-24:

#12

aodh:
Aodh test pod requires the gnocchi pod running. Currently, Aodh installs before Gnocchi. In the manifest openstack-telemetry chart_group, reverse the order of aodh and gnocchi should solve the issue.

ceilometer:
We cannot enable the ceilometer test, it's disabled since day1. The upstream ceilometer test matches the default openstack Ocata which is based on ceilometer-api. But ceilometer-api was removed since openstack Queens and we are using Stein.

panko:
The Panko chart test(matches Ocata) runs ceilometer events rally tests which also requires the ceilometer-api, so we cannot enable the test.

barbican:
The error shows secret get request fails due to "Secret retrieval attempt not allowed - please review your user/project privileges". I checked the barbican chart, I believe all the commands run in the _barbican-test.sh.tpl is using "admin" user. In theory, it shouldn't have user/project policy issue. An, does this problem happens all the time? Have you tried to manually run "openstack secret get <secret-ref>" after barbican api is ready.

glance:
Looks like the image creations failed because image cannot be downloaded from this url http://download.cirros-cloud.net/0.3.5/cirros-0.3.5-x86_64-disk.img
Same question is does this problem happens all the time? Have you tried to access into glance pod to do curl with the url?

neutron:
test scenario NeutronNetworks.create_and_show_network
test scenario NeutronNetworks.create_and_update_networks

The above create network testcases failed due to the following error.
ServiceUnavailable: Unable to create the network. No tenant network is available for allocation.

Not sure if it's configuration problem. From google, it says we need this configuration vni_ranges=1:1000 in /etc/neutron/plugins/ml2/ml2_conf.ini. I believe Joseph can give more suggestions.

Revision history for this message

Joseph Richard (josephrichard) wrote on 2019-07-29:

#13

Have you looked inside of the neutron logs at all?

The neutron logs show that no tenant network is available for allocation, which would be the case if no network segment ranges are allocated. Can you re-run this test with network_segment_range removed from the list of neutron service_plugins in the manifest? Does it still hit this error?

Revision history for this message

Ran An (an.ran) wrote on 2019-07-30:

#14

hi Joseph, I removed "network_segment_range" in list "service_plugins" of secrets neutron-etc and restarted neutron-server* pods but it did not help.

however, setting "vin_ranges = 1:1000" which under section "ml2_conf.ini: ml2_type_vxlan" could solve this neutron test issue.

"vin_ranges" enumerated ranges of VXLAN VNI IDs that are available for tenant network allocation according to https://docs.openstack.org/neutron/stein/configuration/samples/ml2-conf.html

Revision history for this message

Ran An (an.ran) wrote on 2019-07-31:

#15

aodh:
solved by reversing the order of aodh and gnocchi should solve the issue.

barbican:
WIP

glance:
only failed on system with http proxy.
required a patch of glance helm charts to set proxy for the glance test pods.

neutron:
test pods could created tenant network after setting "vin_ranges = 1:1000" which under section "ml2_conf.ini: ml2_type_vxlan" in stx-openstack.yaml.

but there an authentication error on neutron tests pod.
"AuthenticationFailed: Failed to authenticate to http://keystone-api.openstack.svc.cluster.local:5000/v3 for user 'test' in project 'test': The account is locked for user: f61dc48fecf349ffb2592e3421f81ef0.
"
further investigate is on going. and expect more suggestions as well.

thanks Angie's comments, which helps a lot!

Revision history for this message

Ran An (an.ran) wrote on 2019-08-05:

#16

barbican:
Run "openstack secret get <secret-href>" manually failed with errors "Secret retrieval attempt not allowed - please review your user/project privileges" after barbican api is ready.
it should be an issue about barbican policy configuration. policy config should be fixed, or we do not support this case.

from barbican-api log:
{"log":"2019-08-05 03:05:46.074 9 ERROR barbican.api.controllers [req-71a5f70c-ef7e-4a23-ae06-e7acdeedca76 64745dfc3d1a44cbbf9b8592ec950d7e 94b67db79b544a13b8bda20e0612e360 - default default] Secret retrieval attempt not allowed - please review your user/project privileges: PolicyNotAuthorized: secret:get is disallowed by policy\n","stream":"stdout","time":"2019-08-05T03:05:46.074915942Z"}

and the "secret:get" in barbican-api policy config file are show as follows:
"secret:get":"rule:secret_non_private_read or rule:secret_project_creator or rule:secret_project_admin or rule:secret_acl_read",
"secret_creator_user":"user:%(target.secret.creator_id)s",
"secret_decrypt_non_private_read":"rule:all_but_audit and rule:secret_project_match and not rule:secret_private_read",
"secret_non_private_read":"rule:all_users and rule:secret_project_match and not rule:secret_private_read",
"secret_private_read":"'False':%(target.secret.read_project_access)s",
"secret_project_admin":"rule:admin and rule:secret_project_match",
"secret_project_creator":"rule:creator and rule:secret_project_match and rule:secret_creator_user",
"secret_project_match":"project:%(target.secret.project_id)s",
"secrets:get":"rule:all_but_audit",

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-08-23:

#17

As per agreement with the community, moving all unresolved medium priority bugs from stx.2.0 to stx.3.0

tags:

added: stx.3.0
removed: stx.2.0

Revision history for this message

Frank Miller (sensfan22) wrote on 2019-11-22:

#18

Ran please update your current status for this issue. Which of these tests do you now have working and which are not working?
aodh-test, barbican, ceilometer, glance, neutron, panko

Revision history for this message

Ran An (an.ran) wrote on 2019-11-25:

#19

Frank, I will retest this issue on Openstack Train, and update the status later.

Revision history for this message

Ran An (an.ran) wrote on 2019-11-28:

#20

with openstack train (iso built by cengn at 20191115T023000Z):
test pods with name "osh-openstack-*-test" could not be created due to "spec.containers[0].image: Required value"

further investigate is under going.

error logs of "osh-openstack-glance-test":
results {
        name: "osh-openstack-glance-test"
        status: FAILURE
        info: "Pod \"osh-openstack-glance-test\" is invalid: spec.containers[0].image: Required value"
        started_at {
          seconds: 1574847410
          nanos: 624560841
        }
        completed_at {
          seconds: 1574847410
          nanos: 639567037
        }

Revision history for this message

Ran An (an.ran) wrote on 2019-12-02:

#21

issue "Pod \"osh-openstack-glance-test\" is invalid: spec.containers[0].image: Required value" was caused by commit https://review.opendev.org/gitweb?p=starlingx/openstack-armada-app.git;a=commit;h=149dcb306dc831d9646b34da3f072f8aeee16211

Revision history for this message

Frank Miller (sensfan22) wrote on 2019-12-11:

#22

Re-tagging this LP to stx.4.0 as changes will not complete in time for stx.3.0

tags:

added: stx.4.0
removed: stx.3.0

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2020-05-24:

#23

Marking as low priority - As previously noted, this doesn't cause issues in the starlingx deployments as tests are set to disabled.

Changed in starlingx:
importance:	Medium → Low
tags:	removed: stx.4.0

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.