Simplex - application-apply aborts on ceilometer

Bug #1820928 reported by Cristopher Lemus on 2019-03-20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bart Wensley

Bug Description

Simplex - application-apply fails at 95% - ceilometer pod

Brief Description
On simplex virtual and baremetal, system application-apply aborts at 95% of progress, during “processing chart: osh-openstack-ceilometer, overall completion: 95.0%”

Critical on Simplex.
Duplex and Standard controller are not affected.

Steps to Reproduce
Follow up, observed during system application-apply.
This has been tested on two different bare metal servers and virtual environment.

Expected Behavior
The system application-apply should complete.

Actual Behavior
System application-apply aborts at 95%.

100% reproducible on simplex.

System Configuration
Simplex BareMetal and virtual.

Branch/Pull Time/Commit

[wrsroot@controller-0 ~(keystone_admin)]$ system application-list
| application | manifest name | manifest file | status | progress |
| stx-openstack | armada-manifest | manifest.yaml | apply-failed | operation aborted, check logs for detail |

Status of ceilometer pods:
$ kubectl get pods --all-namespaces -o wide |grep ceil
openstack ceilometer-central-6c8bf6d7df-rbxdf 0/1 Init:0/1 0 78m controller-0 <none>
openstack ceilometer-compute-cpb4w 0/1 Init:0/1 0 78m controller-0 <none>
openstack ceilometer-ks-service-jzp8j 0/1 Completed 0 78m controller-0 <none>
openstack ceilometer-ks-user-rzjjn 0/1 Completed 0 78m controller-0 <none>
openstack ceilometer-notification-f5ff4657c-sp5bd 0/1 Init:0/1 0 78m controller-0 <none>
openstack ceilometer-rabbit-init-dfvmf 0/1 Completed 0 78m controller-0 <none>

Full “collect” attached.

Cristopher Lemus (cjlemusc) wrote :
Ghada Khalil (gkhalil) wrote :

Marking as release gating; issue affects simplex deployments and was reported from community sanity.

Changed in starlingx:
importance: Undecided → High
tags: added: stx.2019.05 stx.containers
Changed in starlingx:
status: New → Triaged
assignee: nobody → Angie Wang (angiewang)
Angie Wang (angiewang) wrote :

This is not specific to ceilometer chart.

We observed keystone auth token issues after or during application apply.
All the openstack commands are broken because of the following error:
Failed to discover available identity versions when contacting http://keystone.openstack.svc.cluster.local/v3. Attempting to parse version from URL.
Unable to establish connection to http://keystone.openstack.svc.cluster.local/v3/auth/tokens: HTTPConnectionPool(host='keystone.openstack.svc.cluster.local', port=80): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fc90f088cd0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

In this case, keystone auth token issue happened during ceilometer chart installation. It failed to be installed because ceilometer-db-sync job failed. The ceilometer-db-sync job actually creates ceilometer resources through gnocchi REST API which requires to request keystone token.

We have similar error from internal sanity.

Ghada Khalil (gkhalil) on 2019-03-22
Changed in starlingx:
status: Triaged → In Progress
assignee: Angie Wang (angiewang) → Al Bailey (albailey1974)
Ghada Khalil (gkhalil) on 2019-03-22
Changed in starlingx:
importance: High → Critical
Frank Miller (sensfan22) wrote :

Bart has taken over this investigation - so assigning to Bart

Changed in starlingx:
assignee: Al Bailey (albailey1974) → Bart Wensley (bartwensley)
Bart Wensley (bartwensley) wrote :

I believe we are hitting the following bug:

The kubelet is hitting a limit of 250 http2 streams in a single connection. I have tested both of the mitigations described in the following comment and they both work:

Submitter: Zuul
Branch: master

commit 0d61ade5b9eff7d4f9c61f43c25cdf9a7043f8c0
Author: Bart Wensley <email address hidden>
Date: Tue Apr 2 06:54:43 2019 -0500

    Fix application-apply of stx-openstack on simplex

    The application-apply of the stx-openstack application on
    simplex configurations has been failing since the barbican
    chart was added to the application. The failure was due
    to lost node status messages from the kubelet to the
    kube-apiserver, which causes the node to be marked
    NotReady and endpoints to be removed.

    The root cause is the kubernetes bug here:

    In short, the addition of the barbican chart added enough
    new secrets/configmaps that the kubelet hit the limit of
    http2-max-streams-per-connection. As done upstream, the
    fix is to change the following kubelet config:
    configMapAndSecretChangeDetectionStrategy (from Watch to

    Change-Id: Ic816a91984c4fb82546e4f43b5c83061222c7d05
    Closes-bug: 1820928
    Signed-off-by: Bart Wensley <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Cristopher Lemus (cjlemusc) wrote :

Downloaded the latest CENGN ISO (20190403T013000Z) and deployed it on Virtual and BareMetal environment. Confirmed that, on both environments, that the issue is fixed, application-apply completed without any issues.

Sanity test passed on Virtual environment. For Baremetal it's in progress. A detailed report should be sent between today and tomorrow.

Ken Young (kenyis) on 2019-04-05
tags: added: stx.2.0
removed: stx.2019.05
Ghada Khalil (gkhalil) on 2019-04-09
tags: added: stx.retestneeded
Ghada Khalil (gkhalil) on 2019-04-09
tags: removed: stx.retestneeded
Peng Peng (ppeng) wrote :

Lab: SM_2
Load: 20190410T013000Z

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.