Fuel for OpenStack

[scale] Remove primary controller and another one after scaling - failed with error on granular_deploy task

Bug #1539003 reported by Dmitry Tyzhnenko on 2016-01-28

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Invalid	High	Fuel QA Team	Fuel for OpenStack 9.0
8.0.x	Invalid	High	Fuel QA Team	Fuel for OpenStack 8.0
Mitaka	Invalid	High	Fuel QA Team	Fuel for OpenStack 9.0

Bug Description

Redeploy after delete controllers faild with error - "Error Deployment has failed. Method granular_deploy. Deployment failed on nodes 4. Inspect Astute logs for the details"

1 Deploy cluster: 1 controller, Neutron Vxlan, ceph for volumes and images, ceph for ephemeral and Rados
2 Add 2 ceph nodes, verify networks, set replication factor to 2
3 Deploy. Run OSTF
4 Add 2 controllers
5 Re-deploy cluster
6 Verify networks
7 Run OSTF
8 Add 2 controllers, 1 compute
9 Re-deploy cluster
10 Verify networks
11 Run OSTF
12 Delete primary controller and the last added
13 Re-deploy cluster
14 Verify networks
15 Run OSTF

Expected result:
All step pass

Actual result:
Failed on 13 step

Astute errors - http://paste.openstack.org/show/485240/

Snapshot log - https://drive.google.com/a/mirantis.com/file/d/0B8U7EvTbuAOlbEEwN1Y1M1NsdTA/view?usp=sharing

Fuel 8.0-478
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "478"
  build_id: "478"
  fuel-nailgun_sha: "ae949905142507f2cb446071783731468f34a572"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "481ed135de2cb5060cac3795428625befdd1d814"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "420c6fa5f8cb51f3322d95113f783967bde9836e"
  fuel-ostf_sha: "ab5fd151fc6c1aa0b35bc2023631b1f4836ecd61"
  fuel-mirror_sha: "b62f3cce5321fd570c6589bc2684eab994c3f3f2"
  fuelmenu_sha: "fac143f4dfa75785758e72afbdc029693e94ff2b"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "9f0ba4577915ce1e77f5dc9c639a5ef66ca45896"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "6c6b088a3d52dd0eaf43d59f3a3a149c93a07e7e"

See original description

Tags:

Dmitry Tyzhnenko (dtyzhnenko) on 2016-01-28

description:

updated

Dmitry Tyzhnenko (dtyzhnenko) on 2016-01-28

description:

updated

Dmitry Klenov (dklenov) on 2016-01-28

tags:	added: area-library
Changed in fuel:
status:	New → Confirmed

Bogdan Dobrelya (bogdando) on 2016-01-28

tags:

added: life-cycle-management

Ivan Ponomarev (ivanzipfer) on 2016-01-28

tags:	added: area-python removed: area-library
tags:	added: area-astute

Alexander Kislitsky (akislitsky) on 2016-01-28

tags:

added: team-bugfix

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2016-01-28:

Deployment fails due to setup_repositories task timeout. This is a pretty straightforward task that generates a set of simple files from erb templates. It seems it could be an issue with the amount of RAM and CPU you have for these VMs. Could you please share your environment details?

Revision history for this message

Dmitry Tyzhnenko (dtyzhnenko) wrote on 2016-01-29:

@vkuklin: 2 vcpu and 6 gb ram per vm

root@node-4:~# free
total used free shared buffers cached
Mem: 6112660 5950516 162144 173836 130768 774288
-/+ buffers/cache: 5045460 1067200
Swap: 6291452 67864 6223588

root@node-4:~# cat /proc/cpuinfo | grep processor
processor : 0
processor : 1

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2016-01-29:

Dima, sorry, my question was ambiguous. I actually asked for info about host node machine parameters.

Revision history for this message

Dmitry Tyzhnenko (dtyzhnenko) wrote on 2016-01-29:

@vkuklin:

$ free
total used free shared buffers cached
Mem: 65941356 65494040 447316 2820 202336 35545060
-/+ buffers/cache: 29746644 36194712
Swap: 67072328 820284 66252044

Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz

$ cat /proc/cpuinfo | grep processor | wc -l
24

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2016-02-02:

I've checked this env with Vladimir Kuklin. For some reason deployment task was send twice with a difference in 10 minutes.

Investigating...

Revision history for this message

Bulat Gaifullin (bulat.gaifullin) wrote on 2016-02-02:

task setup_repositories call function 'generate_apt_pins'. this function tries to download Release information for each repository that does not have field priority. [1], this function can be cause of timeout error.

[1] https://github.com/openstack/fuel-library/blob/2f446c986a76ba48104a5a1bda88f481244e8157/deployment/puppet/osnailyfacter/lib/puppet/parser/functions/generate_apt_pins.rb#L23-L28

Timur Nurlygayanov (tnurlygayanov) on 2016-02-02

summary:

- [scale] Remove primary contorller and another one after scaling - failed
+ [scale] Remove primary controller and another one after scaling - failed
with error on granular_deploy task

Revision history for this message

Bartłomiej Piotrowski (bpiotrowski) wrote on 2016-02-03:

setup_repositories runs for tasks that do have priority field. It doesn't do any heavy processing except downloading Release file for parsing, so if it timeouts, I'd rather say there might be some connectivity issue.

Revision history for this message

Ihor Kalnytskyi (ikalnytskyi) wrote on 2016-02-03:

@Bulat, thanks for investigation.

@Bartłomiej, yeah, we encounter some connectivity issues recently.

Well, since the issue is not in the case, but accidental fall due to connectivity issues with Fuel Infa, I move this bug to Fuel QA and to Incomplete in order to recheck and ensure that the problem is not occurred anymore.

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2016-02-03:

Guys, we have 2 tasks in parallel, which works on same nodes. Looks like this is same problem as here: https://bugs.launchpad.net/fuel/+bug/1496411 and in short word problem in fuel tests.

[776] 5feb2d7e-8c40-4bec-81b2-fa8e85936119
10:56:54 INFO [776] 'granular_deploy' method called with data:
[\"5\", \"8\", \"6\", \"3\", \"4\", \"2\"]}

[782] 'granular_deploy' method called with data
2016-01-27 11:06:04 DEBUG [782] Process message from worker queue:
[\"3\", \"4\", \"2\", \"8\", \"6\", \"5\"]

Also we have another task as in bug about controller removing:
2016-01-27 11:08:48 INFO [800] Casting message to Nailgun:
{"method"=>"remove_nodes_resp",
"args"=>
  {"task_uuid"=>"fad3e6e4-9e3c-4feb-a31b-3490ca55f87b",
   "status"=>"ready",
   "progress"=>100,
   "nodes"=>[{"uid"=>"1"}, {"uid"=>"7"}]}}

Short info about nodes actions:
2016-01-27 09:44:22 INFO [793] Processing RPC call 'verify_networks'
2016-01-27 10:56:52 INFO [780] Processing RPC call 'remove_nodes'
2016-01-27 10:56:53 INFO [776] Processing RPC call 'granular_deploy'
2016-01-27 11:06:04 INFO [800] Processing RPC call 'remove_nodes'
2016-01-27 11:06:05 INFO [782] Processing RPC call 'granular_deploy'
2016-01-27 11:36:16 INFO [795] Processing RPC call 'dump_environment'

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2016-02-03:

#10

And also this is similar to bug: https://bugs.launchpad.net/fuel/+bug/1539693 So i will mark it as duplicate.

Revision history for this message

Ihor Kalnytskyi (ikalnytskyi) wrote on 2016-02-04:

#11

Vladimir S, you're wrong here. It's not about DELETE request. The flow described in this issue goes through Deploy Changes button. So it has nothing to do with parallel tasks.

Revision history for this message

Ihor Kalnytskyi (ikalnytskyi) wrote on 2016-02-04:

#12

And it was discovered by Bulat, that the problem is in setup_repos. Which makes sense.

Revision history for this message

Dmitry Tyzhnenko (dtyzhnenko) wrote on 2016-02-04:

#13

The bug not reproduced on 8.0-506. Move it to invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.