ERR for nova_floating_range: Could not evaluate: Authentication failed with response code 401

Bug #1348171 reported by Anastasia Palkina on 2014-07-24
36
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Critical
Bogdan Dobrelya
5.0.x
Critical
Sergii Golovatiuk

Bug Description

"build_id": "2014-07-23_02-01-14",
"ostf_sha": "c1b60d4bcee7cd26823079a86e99f3f65414498e",
"build_number": "347",
"auth_required": false,
"api": "1.0",
"nailgun_sha": "f5775d6b7f5a3853b28096e8c502ace566e7041f",
"production": "docker",
"fuelmain_sha": "74b9200955201fe763526ceb51607592274929cd",
"astute_sha": "fd9b8e3b6f59b2727b1b037054f10e0dd7bd37f1",
"feature_groups": ["mirantis"],
"release": "5.1",
"fuellib_sha": "fb0e84c954a33c912584bf35054b60914d2a2360"

1. Create new environment (Ubuntu, simple mode)
2. Choose Ceilometer
3. Add controller+mongo, compute
4. Start deployment. It has failed.
5. There is error in puppet.log on controller (node-17):

2014-07-24 11:13:07 ERR

 (/Stage[main]/Osnailyfacter::Cluster_simple/Nova_floating_range[172.16.0.128-172.16.0.254]) Could not evaluate: Authentication failed with response code 401

Logs are here: https://drive.google.com/a/mirantis.com/file/d/0B6SjzarTGFxaRTJ5M0tjcmw5dGs/edit?usp=sharing

Bogdan Dobrelya (bogdando) wrote :

 2014-07-24T11:15:52.017292 node-17 ./node-17.domain.tld/puppet-apply.log:2014-07-24T11:15:52.017292+01:00 err: Could not update: Execution of '/usr/
 bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install mysql-client=5.5.37-0ubuntu0.12.04.1' returned 100: Reading package lists...
 2014-07-24T11:15:52.020081 node-17 ./node-17.domain.tld/puppet-apply.log:2014-07-24T11:15:52.020081+01:00 err: Some packages could not be installed.
  This may mean that you have
 2014-07-24T11:15:52.020081 node-17 ./node-17.domain.tld/puppet-apply.log:2014-07-24T11:15:52.020081+01:00 err: The following information may help to
  resolve the situation:
 2014-07-24T11:15:52.020081 node-17 ./node-17.domain.tld/puppet-apply.log:2014-07-24T11:15:52.020081+01:00 err: The following packages have unmet dependencies:
 2014-07-24T11:15:52.021260 node-17 ./node-17.domain.tld/puppet-apply.log:2014-07-24T11:15:52.021260+01:00 err: E: Unable to correct problems, you have held broken packages.
 2014-07-24T11:15:52.021260 node-17 ./node-17.domain.tld/puppet-apply.log:2014-07-24T11:15:52.021260+01:00 err: mysql-client : Depends: mysql-client-5.5 but it is not going to be installed

Fix proposed to branch: master
Review: https://review.openstack.org/109392

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Sergii Golovatiuk (sgolovatiuk)
status: New → In Progress

Reviewed: https://review.openstack.org/109392
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=28600dad33eadc7b2532a5f90c687abe1235448b
Submitter: Jenkins
Branch: master

commit 28600dad33eadc7b2532a5f90c687abe1235448b
Author: Sergii Golovatiuk <email address hidden>
Date: Thu Jul 24 19:17:04 2014 +0000

    Fix MySQL packages for Simple Deployment

    Change-Id: Id66cbde94b217d80cccdeb91c65b38e76044156d
    Closes-Bug: 1348171

Changed in fuel:
status: In Progress → Fix Committed
Anastasia Palkina (apalkina) wrote :

Verified on ISO #366
"build_id": "2014-07-28_02-01-14",
"ostf_sha": "8c328521b1444f22c50463b9432193e20ed33813",
"build_number": "366",
"auth_required": true,
"api": "1.0",
"nailgun_sha": "83cc9ed44ebc8dd97248483b6d414ebbc4cff3c0",
"production": "docker",
"fuelmain_sha": "9adfbf5a52cedbdd16ec1a74f6c44c5b3419b87c",
"astute_sha": "aa5aed61035a8dc4035ab1619a8bb540a7430a95",
"feature_groups": ["mirantis"],
"release": "5.1",
"fuellib_sha": "d1c7f67b3cf51978d3178c8666ea398f2477dcb5"

Changed in fuel:
status: Fix Committed → Fix Released
Changed in fuel:
status: Fix Released → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/111716

Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Vladimir Kuklin (vkuklin)
Vladimir Kuklin (vkuklin) wrote :

Reopend.

From logs:

Mon Aug 04 12:03:08 +0000 2014 /Stage[main]/Osnailyfacter::Cluster_ha/Exec[wait-for-haproxy-keystone-backend]/returns (debug): Exec try 1/60
...
Mon Aug 04 12:08:08 +0000 2014 Exec[wait-for-haproxy-keystone-backend](provider=posix) (debug): Executing 'echo show stat | socat unix-connect:///var/lib/haproxy/stats stdio | grep -q '^keystone-1,BACKEND,.*,UP,''
Mon Aug 04 12:08:08 +0000 2014 Puppet (debug): Executing 'echo show stat | socat unix-connect:///var/lib/haproxy/stats stdio | grep -q '^keystone-1,BACKEND,.*,UP,''
Mon Aug 04 12:08:08 +0000 2014 /Stage[main]/Osnailyfacter::Cluster_ha/Exec[wait-for-haproxy-keystone-backend]/returns (debug): Sleeping for 5.0 seconds between tries
Mon Aug 04 12:08:13 +0000 2014 /Stage[main]/Osnailyfacter::Cluster_ha/Exec[wait-for-haproxy-keystone-backend]/returns (notice): 2014/08/04 12:08:08 socat[26871] E connect(3, AF=1 "///var/lib/haproxy/stats", 26): No such file or directory
Mon Aug 04 12:08:13 +0000 2014 Puppet (err): echo show stat | socat unix-connect:///var/lib/haproxy/stats stdio | grep -q '^keystone-1,BACKEND,.*,UP,' returned 1 instead of one of [0]
...
Mon Aug 04 12:12:52 +0000 2014 /Stage[main]/Keystone/Package[keystone] (info): Scheduling refresh of Service[keystone]

Reviewed: https://review.openstack.org/111716
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=80f7fcf8dbf6791507e8b4b2ca053b845101bf67
Submitter: Jenkins
Branch: master

commit 80f7fcf8dbf6791507e8b4b2ca053b845101bf67
Author: Vladimir Kuklin <email address hidden>
Date: Mon Aug 4 16:59:20 2014 +0400

    Evalute keystone class before waiting for keystone haproxy backend

    Otherwise may start prematurely and timeout.

    Change-Id: Ic3a97d371f11e4e07ebd3ff956f626cf4e056cd3
    Closes-bug: #1348171

Changed in fuel:
status: In Progress → Fix Committed
Artem Panchenko (apanchenko-8) wrote :

api: '1.0'
astute_sha: b52910642d6de941444901b0f20e95ebbcb2b2e9
auth_required: true
build_id: 2014-08-14_02-01-17
build_number: '436'
feature_groups:
- mirantis
fuellib_sha: 4b085bfbf7be973f0aa29d9d5e4f3ebd5bf789a1
fuelmain_sha: 9f327045cdd72d406d89063393a499635be5e3d4
nailgun_sha: b5bdd19c2dbeb26ce3bd88270d09f5e7541a3aea
ostf_sha: d2a894d228c1f3c22595a77f04b1e00d09d8e463
production: docker
release: '5.1'

This issue was reproduced again:

http://jenkins-product.srt.mirantis.net:8080/view/0_master_swarm/job/master_fuelmain.system_test.ubuntu.thread_3/133/testReport/(root)/deploy_stop_reset_on_ha/deploy_stop_reset_on_ha/

Deployment of 1st controller (node-1) failed with this error in puppet log:

http://paste.openstack.org/show/95092/

Changed in fuel:
status: Fix Committed → Confirmed
Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Bogdan Dobrelya (bogdando)
Bogdan Dobrelya (bogdando) wrote :

Events flow http://pastebin.com/A6jWvdzD shows that something went completely wrong with HAproxy backends, many of them had been marked DOWN, so that was a root cause of "504 Gateway Time-out" as well.

Bogdan Dobrelya (bogdando) wrote :

Looks like a network connectivity issue at node-1 had caused isolation of node-1:
2014-08-14T04:21:05.689938 node-1 ./remote/node-1.test.domain.local/puppet-apply.log:2014-08-14T04:21:05.689938+01:00 notice: (L3_if_downup[eth1](provider=ruby)) Carrier is DOWN, 'eth1' skipping carrier test
2014-08-14T04:21:11.809360 node-1 ./remote/node-1.test.domain.local/puppet-apply.log:2014-08-14T04:21:11.809360+01:00 notice: (L3_if_downup[eth3](provider=ruby)) Carrier is DOWN, 'eth3' skipping carrier test
2014-08-14T04:21:25.132489 node-1 ./remote/node-1.test.domain.local/puppet-apply.log:2014-08-14T04:21:25.132489+01:00 notice: (L3_if_downup[eth4](provider=ruby)) Carrier is DOWN, 'eth4' skipping carrier test
2014-08-14T04:21:31.326668 node-1 ./remote/node-1.test.domain.local/puppet-apply.log:2014-08-14T04:21:31.326668+01:00 notice: (L3_if_downup[eth2](provider=ruby)) Carrier is DOWN, 'eth2' skipping carrier test
2014-08-14T04:21:37.544274 node-1 ./remote/node-1.test.domain.local/puppet-apply.log:2014-08-14T04:21:37.544274+01:00 notice: (L3_if_downup[eth3.103](provider=ruby)) Carrier is DOWN, 'eth3.103' skipping carrier test
2014-08-14T04:29:53.796338 node-1 ./remote/node-1.test.domain.local/haproxy.log:2014-08-14T04:29:53.796338+01:00 alert: Server rabbitmq/node-1 is DOWN, reason: Layer4 connection problem, info: "Network is unreachable", check duration: 0ms. 0 active and 2 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.

Changed in fuel:
status: Confirmed → Fix Committed

Reviewed: https://review.openstack.org/114507
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=44de6e259a1191eb015ead8392f3f08e7559fea2
Submitter: Jenkins
Branch: stable/5.0

commit 44de6e259a1191eb015ead8392f3f08e7559fea2
Author: Vladimir Kuklin <email address hidden>
Date: Fri Aug 1 14:07:51 2014 +0400

    Wait for keystone to become ready for floatingip creation

    Wait for keystone service to become ready otherwise
    haproxy reload can lead to haproxy returning
    empty response and thus nova-floating-range provider
    can not authorize with nova.

    Change-Id: I8f9623411990679560b1694569dbb185883c4733
    Closes-bug: 1351253
    Closes-bug: 1348171

Change abandoned by Dmitry Borodaenko (<email address hidden>) on branch: master
Review: https://review.openstack.org/109739

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers