Fuel for OpenStack

Deploy new compute node caused whole cluster Failed with Err status in fuel web

Bug #1555932 reported by JohnsonYi on 2016-03-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	High	JohnsonYi	Fuel for OpenStack 8.0-updates

Bug Description

Deployed environment:
Mos 8.0
3 controllers
2 compute+Cinder with LVM
1 ironic
This environment is ready for weeks.

What I do is to deploy a new compute node(compute+cinder), that caused the whole cluster marked as Error on "Fuel" web, I tryed the deploy again. And the whole cluster began redeploying!!

Logs:
2016-03-11 02:22:20 +0000 Puppet (debug): Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n' python-nova'
2016-03-11 02:22:20 +0000 Puppet (debug): Executing '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install python-nova'
2016-03-11 02:22:20 +0000 Puppet (err): Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install python-nova' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
python-nova : Depends: websockify (>= 0.6.1) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
/usr/lib/ruby/vendor_ruby/puppet/util/execution.rb:219:in `execute'
/usr/lib/ruby/vendor_ruby/puppet/provider/command.rb:23:in `execute'
/usr/lib/ruby/vendor_ruby/puppet/provider.rb:237:in `block in has_command'
/usr/lib/ruby/vendor_ruby/puppet/provider.rb:463:in `block in create_class_and_instance_method'
/usr/lib/ruby/vendor_ruby/puppet/provider/package/apt.rb:73:in `install'
/etc/puppet/modules/osnailyfacter/lib/puppet/provider/package/apt_fuel.rb:49:in `install'

root@node-9:~# apt-get -q -y -o DPkg::Options::=--force-confold install python-nova
Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
python-nova : Depends: websockify (>= 0.6.1) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
root@node-9:~# apt-cache policy websockify
websockify:
  Installed: (none)
  Candidate: 0.6.1+dfsg1-1~u14.04+mos1
  Version table:
     0.6.1+dfsg1-1~u14.04+mos1 0
       1000 http://10.0.20.2:8080/mirrors/mos-repos/ubuntu/8.0/ mos8.0/main amd64 Packages
     0.5.1+dfsg1-3ubuntu0.14.04.1 0
        500 http://10.0.64.242/ubuntu/ trusty-updates/universe amd64 Packages
     0.5.1+dfsg1-3 0
        500 http://10.0.64.242/ubuntu/ trusty/universe amd64 Packages
root@node-9:~# dpkg -l | grep websockify
root@node-9:~#

I use a local ubuntu & mos repo, no connective issue.

Updates:
2016-05-17
Today I emulated this issue by disable dependence package python-numpy for websockify from local ubuntu repo:
root@node-3:~# apt-cache policy python-numpy
python-numpy:
  Installed: 1:1.8.2-0ubuntu0.1
  Candidate: 1:1.8.2-0ubuntu0.1
  Version table:
*** 1:1.8.2-0ubuntu0.1 0
        500 http://10.14.64.242/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:1.8.1-1ubuntu1 0
        500 http://10.14.64.242/ubuntu/ trusty/main amd64 Packages

mv python-numpy_1.8.1-1ubuntu1_amd64.deb python-numpy_1.8.1-1ubuntu1_amd64.deb.bak
mv python-numpy_1.8.2-0ubuntu0.1_amd64.deb python-numpy_1.8.2-0ubuntu0.1_amd64.deb.bak

then deploy a new compute node;

Install python-nova failed as expect:
Get:90 http://10.14.20.2:8080/liberty-8.0/ubuntu/x86_64/ mos8.0/main python-sqlalchemy-ext amd64 1.0.8~u14.04+mos1 [18.4 kB]
Get:91 http://10.14.20.2:8080/liberty-8.0/ubuntu/x86_64/ mos8.0/main websockify amd64 0.6.1+dfsg1-1~u14.04+mos1 [37.6 kB]
Fetched 12.3 MB in 0s (26.7 MB/s)
E: Failed to fetch http://10.14.64.242/ubuntu/pool/main/p/python-numpy/python-numpy_1.8.2-0ubuntu0.1_amd64.deb 404 Not Found

E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
/usr/lib/ruby/vendor_ruby/puppet/util/execution.rb:219:in `execute'
/usr/lib/ruby/vendor_ruby/puppet/provider/command.rb:23:in `execute'
/usr/lib/ruby/vendor_ruby/puppet/provider.rb:237:in `block in has_command'
/usr/lib/ruby/vendor_ruby/puppet/provider.rb:463:in `block in create_class_and_instance_method'

new compute node was marked as "Err", then all controller & other nodes go through the puppet jobs and done smoothly, then all nodes marked as "Err" a bit later which exactly the same as this bug reported.

then restore the package python-numpy, and deploy again, the controller go through the puppet scripts again&again..., stuck at 11%(openstack was not redeployed) almost 1 hour and a half later, all nodes restored to ready status,

So the openstack environment was not actually redeployed and vm creation was still working. I may have misunderstanding for controller became "Deploy" status. But the robustness still may be a problem, as deploy new node may impact the whole environment, potential risk still exist.

See original description

Sachin Yede (yede-sachin45) on 2016-03-11

Changed in fuel:
assignee:	nobody → Sachin Yede (yede-sachin45)

Revision history for this message

Oleksiy Molchanov (omolchanov) wrote on 2016-03-11:

Please attach diagnostic snapshot.

Changed in fuel:
milestone:	none → 8.0-updates
status:	New → Incomplete

Oleksiy Molchanov (omolchanov) on 2016-03-11

Changed in fuel:
importance:	Undecided → High

Revision history for this message

JohnsonYi (yichengli) wrote on 2016-03-12:

@Oleksiy,

I forgot to create a snapshot for this, refer to the logs, I seems to be a problem from local repo which was rsynced by command "fuel-createmirror -M",

Candidate: 0.6.1+dfsg1-1~u14.04+mos1
  Version table:
     0.6.1+dfsg1-1~u14.04+mos1 0
       1000 http://10.0.20.2:8080/mirrors/mos-repos/ubuntu/8.0/ mos8.0/main amd64

Even the local repo on fuel is not correct, it should not affect the deployed environment, and no way to specified a single node for re-deploy on Fuel.

Bartosz Kupidura (zynzel) on 2016-04-12

Changed in fuel:
status:	Incomplete → Confirmed

Sachin Yede (yede-sachin45) on 2016-04-13

Changed in fuel:
assignee:	Sachin Yede (yede-sachin45) → nobody

Bartosz Kupidura (zynzel) on 2016-04-13

Changed in fuel:
assignee:	nobody → MOS Linux (mos-linux)

Revision history for this message

JohnsonYi (yichengli) wrote on 2016-04-13:

The case mentioned in below link is very familiar with this ticket:
http://askubuntu.com/questions/529217/not-able-to-fix-the-error-unable-to-correct-problems-you-have-held-broken-pack

There should be unmet dependencies for package websockify, I should have one more try with command "apt-get install websockify" to find out which package have a unmet version for websockify.

So you can check out the dependencies for websockify, then install a different version package that unmet the dependencies(upper version better), then deploy a compute to simulate this case.

Revision history for this message

JohnsonYi (yichengli) wrote on 2016-05-16:

Similar issue: https://bugs.launchpad.net/fuel/+bug/1529691

Revision history for this message

JohnsonYi (yichengli) wrote on 2016-05-17:

I update the bug description for what I had tested on my MOS 8.0(with mu1), please refer to the updates in bug description.

description:

updated

Revision history for this message

JohnsonYi (yichengli) wrote on 2016-05-17:

B.T.W
All test are on baremetal environment, not vm.

Revision history for this message

Dmitry Teselkin (teselkin-d) wrote on 2016-05-17:

I don't see any reason why this bug should be on mos-linux. websockify wasn't built by mos-linux, and python-nova is a mos-packaging area. Given that the bug is for 8.0 updates I'm reassigning it to mos-maintenance.

Changed in fuel:
assignee:	MOS Linux (mos-linux) → MOS Maintenance (mos-maintenance)

Revision history for this message

JohnsonYi (yichengli) wrote on 2016-05-17:

@Dmitry
This is not a simple bug for a package like websockify, but while there is something wrong with repo(let's say repo service down <- I tested this also) during new node deployment, new node deploy will failed and other nodes will be impacted, and controller take too long to repair the controller to "Ready" status, at mean time, I don't think this action has no chance to fail. More atomicity design for new node deployment process will be a good choice.

Rodion Tikunov (rtikunov) on 2016-06-16

Changed in fuel:
assignee:	MOS Maintenance (mos-maintenance) → Rodion Tikunov (rtikunov)

Revision history for this message

Rodion Tikunov (rtikunov) wrote on 2016-06-20:

Can't reproduce it.
If you have a local Ubuntu mirror and you have package with unmet dependencies you can add this package in /usr/share/fuel-mirror/ubuntu.yaml on the packages section and recreate local repo again by the commands:
fuel-mirror create -G mos -I /usr/share/fuel-mirror/ubuntu.yaml
fuel-mirror create -G ubuntu -I /usr/share/fuel-mirror/ubuntu.yaml
fuel-mirror apply -G mos -I /usr/share/fuel-mirror/ubuntu.yaml
fuel-mirror apply -G ubuntu -I /usr/share/fuel-mirror/ubuntu.yaml
Please, try to add websockify into your local repo like described above.

Changed in fuel:
status:	Confirmed → Incomplete
assignee:	Rodion Tikunov (rtikunov) → JohnsonYi (yichengli)

Revision history for this message

Dmitry Pyzhov (dpyzhov) wrote on 2016-07-22:

#10

Bug is incomplete status for a month without response. Marking as invalid. Please reopen if you have more data.

Changed in fuel:
status:	Incomplete → Invalid

Revision history for this message

JohnsonYi (yichengli) wrote on 2016-07-29:

#11

Similar issue can be fixed by fix the dependencies, and re-deploy the puppet scripts by command "fuel node --node NODE_ID --deploy"。

BTW, actions defined in fuel puppet always doing strong consistency verification, so don't touch the default setting deployed by fuel puppet but create new, like network, vlan, default flavors and etc.

Revision history for this message

JohnsonYi (yichengli) wrote on 2016-07-29:

#12

When one node went through the puppet jobs and got error, the whole cluster may went to "ERR" status, you have to find out the issue and fix the issue, then re-run the puppet jobs to recovery the cluster status to "Ready".

Revision history for this message

JohnsonYi (yichengli) wrote on 2016-07-29:

#13

BTW, when the fuel went to ERR status, the openstack cluster is still running health, what you have to do is let the puppet jobs run smoothly until you can deploy new nodes.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.