Haproxy_backend_status[keystone-admin] is being configured after Ceilometer::Keystone::Auth/Keystone_role[ResellerAdmin] and breaks deployment

Bug #1451796 reported by Timur Nurlygayanov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Stanislaw Bogatkin
6.0.x
Invalid
High
Fuel QA Team

Bug Description

Note: this issue not reproduced on MOS 6.1 ISO #361 and reproduced on MOS 6.1 ISO #376 (this is regression)

Steps To Reproduce:
1. Create OpenStack cloud with 3 controllers, 3 compute nodes, Centos, Neutron GRE, Ceph
2. Start to deploy the env

Expected Result:
Environment will be successfully deployed.

Observed Result:
Deployment failed, in the puppet logs on controllers we can see the following errors (please see the diagnostic snapshot for more detailed information):

Tue May 05 07:41:58 +0000 2015 Puppet (err): Execution of '/usr/bin/keystone --os-endpoint http://192.168.0.4:35357/v2.0/ user-create --name nova --enabled True --email nova@localhost --pass 4lme1e5a --tenant_id e867f662f7a949c0b968feef58057988' returned 1: Request to http://192.168.0.4:35357/v2.0/users timed out (HTTP 408)

/usr/lib/ruby/site_ruby/1.8/puppet/util/execution.rb:188:in `execute'
/usr/lib/ruby/site_ruby/1.8/puppet/provider/command.rb:23:in `execute'
/usr/lib/ruby/site_ruby/1.8/puppet/provider.rb:237:in `keystone'
/etc/puppet/modules/keystone/lib/puppet/provider/keystone.rb:80:in `auth_keystone'
/etc/puppet/modules/keystone/lib/puppet/provider/keystone.rb:68:in `call'
/etc/puppet/modules/keystone/lib/puppet/provider/keystone.rb:68:in `withenv'
/etc/puppet/modules/keystone/lib/puppet/provider/keystone.rb:79:in `auth_keystone'
/etc/puppet/modules/keystone/lib/puppet/provider/keystone.rb:95:in `auth_keystone'
/etc/puppet/modules/keystone/lib/puppet/provider/keystone_user/keystone.rb:56:in `create'
/usr/lib/ruby/site_ruby/1.8/puppet/property/ensure.rb:16:in `set_present'
/usr/lib/ruby/site_ruby/1.8/puppet/property.rb:197:in `send'
/usr/lib/ruby/site_ruby/1.8/puppet/property.rb:197:in `call_valuemethod'
/usr/lib/ruby/site_ruby/1.8/puppet/property.rb:498:in `set'
/usr/lib/ruby/site_ruby/1.8/puppet/property.rb:581:in `sync'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:193:in `sync'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:130:in `sync_if_needed'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:107:in `manage_via_ensure_if_possible'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:77:in `perform_changes'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:18:in `evaluate'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:174:in `apply'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:187:in `eval_resource'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:117:in `call'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:117:in `evaluate'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:327:in `thinmark'
/usr/lib/ruby/1.8/benchmark.rb:308:in `realtime'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:326:in `thinmark'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:117:in `evaluate'
/usr/lib/ruby/site_ruby/1.8/puppet/graph/relationship_graph.rb:118:in `traverse'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:108:in `evaluate'
/usr/lib/ruby/site_ruby/1.8/puppet/resource/catalog.rb:164:in `apply'
/usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:149:in `with_destination'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/report.rb:108:in `as_logging_destination'
/usr/lib/ruby/site_ruby/1.8/puppet/resource/catalog.rb:163:in `apply'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:125:in `apply_catalog'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:161:in `benchmark'
/usr/lib/ruby/1.8/benchmark.rb:308:in `realtime'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:160:in `benchmark'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:124:in `apply_catalog'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:192:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/application/apply.rb:268:in `apply_catalog'
/usr/lib/ruby/site_ruby/1.8/puppet/application/apply.rb:218:in `main'
/usr/lib/ruby/site_ruby/1.8/puppet/application/apply.rb:146:in `run_command'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:364:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:470:in `plugin_hook'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:364:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:478:in `exit_on_fail'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:364:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/util/command_line.rb:137:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/util/command_line.rb:91:in `execute'
/usr/bin/puppet:4
Tue May 05 07:41:58 +0000 2015 /Stage[main]/Nova::Keystone::Auth/Keystone_user[nova]/ensure (err): change from absent to present failed: Execution of '/usr/bin/keystone --os-endpoint http://192.168.0.4:35357/v2.0/ user-create --name nova --enabled True --email nova@localhost --pass 4lme1e5a --tenant_id e867f662f7a949c0b968feef58057988' returned 1: Request to http://192.168.0.4:35357/v2.0/users timed out (HTTP 408)
Tue May 05 07:41:58 +0000 2015 /Stage[main]/Nova::Keystone::Auth/Keystone_user[nova] (debug): Nothing to manage: no ensure and the resource doesn't exist

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
description: updated
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Reproduced on MOS 6.1 ISO #376 with the following configuration: Ubuntu, Neutron VLAN, Ceph

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Timur, according to logs, the deploy started at
        2015-05-05T06:23:23.912 ./nailgun/app INFO [7febeaee7740] (base) Trying to start deployment at environment '1'
        2015-05-05T06:23:23.916 ./nailgun/app INFO [7febeaee7740] (manager) Trying to start deployment at cluster 'centos-neutron-gre-ha'

while there are some AMQP and kernel errors shown in logs *before* of this:

2015-05-05T06:16:38.647567+00:00 node-6 neutron-openvswitch-agent err: 2015-05-05 06:16:35.764 4021 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.5:5673
is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 30 seconds.
...
2015-05-05T06:20:03.025477+00:00 node-6 nova-compute err: AMQP server on 192.168.0.4:5673 is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 30 seconds.
2015-05-05T06:20:04.337433+00:00 node-1 messages err: kernel: Failed to access perfctr msr (MSR c1 is 0)
2015-05-05T06:20:05.896204+00:00 node-5 nova-compute err: AMQP server on 192.168.0.4:5673 is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 30 seconds.
2015-05-05T06:20:31.840923+00:00 node-2 messages err: kernel: Failed to access perfctr msr (MSR c1 is 0)
2015-05-05T06:20:56.261712+00:00 node-3 messages err: kernel: Failed to access perfctr msr (MSR c1 is 0)
2015-05-05T06:22:22.090908+00:00 node-4 dmesg info: ERST: Error Record Serialization Table (ERST) support is initialized.

Did you re-deploy a "dirty" environment? This looks like an invalid case.

Changed in fuel:
status: Confirmed → Invalid
Changed in fuel:
status: Invalid → Confirmed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Ok, logs from 06:16 till 06:20 can be ignored.
The issue looks valid as node-1 failed to deploy at
2015-05-05T06:51:24.981157+00:00 err: Execution of '/usr/bin/keystone --os-endpoint http://192.168.0.4:35357/v2.0/ role-create --name ResellerAdmin' returned 1: Request to http://192.168.0.4:35357/v2.0/OS-KSADM/roles timed out (HTTP 408)

more reproduces with manual deploy are wellcome

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

According to the logs we start creating role:

2015-05-05T06:41:24.643606+00:00 debug: Executing '/usr/bin/keystone --os-endpoint http://192.168.0.4:35357/v2.0/ role-create --name ResellerAdmin'

But keystone backend became available 3 seconds later

 2015-05-05T06:41:27.112358+00:00 notice: Server keystone-1/node-1 is UP, reason: Layer7 check passed, code: 300, info: "Multiple Choices", check duration: 1ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.

According to keystone logs everything started but was slowing down a bit. It seems purely an environment's issue.

AMQP issues also seem to show environmental issues.

This bug is not 100% reproducible. We can try to introduce a w/a for it, but clearly downgrading severity to 'High'.

Changed in fuel:
importance: Critical → High
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Exactly, this looks like wrong ordering for Haproxy_backend_status[keystone-admin] and Ceilometer::Keystone::Auth/Keystone_role[ResellerAdmin], see details here
http://paste.openstack.org/show/ZrqPJklUp4n37sip5P63/

Changed in fuel:
status: Confirmed → Triaged
summary: - OpenStack deployment failed with the keystone timeout
+ Haproxy_backend_status[keystone-admin] is being configured after
+ Ceilometer::Keystone::Auth/Keystone_role[ResellerAdmin] and breaks
+ deployment
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Matked as invalid for the 6.0.1 as there is no Haproxy_backend_status provider

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Stanislaw Bogatkin (sbogatkin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/180554

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/180554
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=ad644b425084b8844563a3e0448ef1fdb8982438
Submitter: Jenkins
Branch: master

commit ad644b425084b8844563a3e0448ef1fdb8982438
Author: Stanislaw Bogatkin <email address hidden>
Date: Wed May 6 17:01:23 2015 +0300

    Change ordering of haproxy backend status check and keystone class

    Now all *keystone::auth classes required haproxy_backend_status

    Change-Id: I98aaba8852562c2045c431c2b8e1e7cc6d60f2a6
    Closes-Bug: #1451796

Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.