deployment fails when a package takes "too long" to download/install

Bug #1640814 reported by Alexei Sheplyakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
Medium
Fuel Sustaining

Bug Description

Steps to reproduce:

1. Install Fuel master node with MOS 9.0 ISO
2. Define a cluster having 3 controller, 2 compute, and 3 ceph-osd nodes
3. Hit 'Deploy changes'

The expected result: deployment succeeds
The actual result: Deployment fails almost at the very beginning after installing fuel-misc, fuel-ha-utils packages. The error message is:

Deployment has failed. All nodes are finished. Failed tasks: Task[fuel_pkgs/1], Task[fuel_pkgs/3], Task[fuel_pkgs/5], Task[fuel_pkgs/4], Task[fuel_pkgs/7], Task[fuel_pkgs/6], Task[fuel_pkgs/8] Stopping the deployment process!

Notes:

Apparently apt was trying to download fuel-ha-utils package from mirror.fuel-infra.org, and it took longer than 10 minutes. Apparently astute considers this a fatal error. Moreover there are no log messages with appropriate level (error, critical). This is very confusing and should be fixed. Also, assuming that 10 minutes is enough to download any package (along with its dependencies) is very naive (to put it extremely mildly)

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Marking as Incomplete, please attach diagnostic snapshot.

Changed in fuel:
status: New → Incomplete
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

What's a 'diagnostic snapshot', and how do I make one?

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :
Download full text (3.6 KiB)

2016-11-10 13:04:56 +0000 Puppet (debug): Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n' fuel-ha-utils'
2016-11-10 13:04:56 +0000 Package[fuel-ha-utils](provider=apt_fuel) (debug): Call: install
2016-11-10 13:04:56 +0000 Package[fuel-ha-utils](provider=apt_fuel) (debug): Call: wait_for_lock
apt was trying to download fuel-ha-utils package from mirror.fuel-infra.org, and it took longer than 10 minutes. Apparently astute considers that a fatal error. Moreover there are no log messages with appropriate level (error, critical). This is very confusing and should be fixed. Also, assumping that 10 minutes is enough to download any package (along with its dependencies) is very naive (to put it extremely mildly)

/var/log/remote/10.20.0.4/puppet-apply.log:

2016-11-10T13:03:39.607812+00:00 info: (Class[Osnailyfacter::Fuel_pkgs::Fuel_pkgs]) Starting to evaluate the resource
2016-11-10T13:03:39.608077+00:00 info: (Class[Osnailyfacter::Fuel_pkgs::Fuel_pkgs]) Evaluated in 0.00 seconds
2016-11-10T13:03:39.608290+00:00 debug: Prefetching apt_fuel resources for package
2016-11-10T13:03:39.608439+00:00 debug: Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n''
2016-11-10T13:03:39.621425+00:00 info: (/Package[fuel-misc]) Starting to evaluate the resource
2016-11-10T13:03:39.621882+00:00 debug: Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n' fuel-misc'
2016-11-10T13:03:39.645406+00:00 debug: (Package[fuel-misc](provider=apt_fuel)) Call: install
2016-11-10T13:03:39.645528+00:00 debug: (Package[fuel-misc](provider=apt_fuel)) Call: wait_for_lock
2016-11-10T13:03:39.645859+00:00 debug: (Package[fuel-misc](provider=apt_fuel)) Call: locked?
2016-11-10T13:03:39.646185+00:00 debug: Executing '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install fuel-misc'
2016-11-10T13:04:56.511800+00:00 notice: (/Stage[main]/Osnailyfacter::Fuel_pkgs::Fuel_pkgs/Package[fuel-misc]/ensure) ensure changed 'purged' to 'present'
2016-11-10T13:04:56.511800+00:00 debug: (/Package[fuel-misc]) The container Class[Osnailyfacter::Fuel_pkgs::Fuel_pkgs] will propagate my refresh event
2016-11-10T13:04:56.511820+00:00 info: (/Package[fuel-misc]) Evaluated in 76.88 seconds
2016-11-10T13:04:56.511820+00:00 info: (/Package[fuel-ha-utils]) Starting to evaluate the resource
2016-11-10T13:04:56.511825+00:00 debug: Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n' fuel-ha-utils'
2016-11-10T13:04:56.519760+00:00 debug: (Package[fuel-ha-utils](provider=apt_fuel)) Call: install
2016-11-10T13:04:56.608103+00:00 debug: (Package[fuel-ha-utils](provider=apt_fuel)) Call: wait_for_lock
2016-11-10T13:04:56.608103+00:00 debug: (Package[fuel-ha-utils](provider=apt_fuel)) Call: locked?
2016-11-10T13:04:56.608122+00:00 debug: Executing '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install fuel-ha-utils'
2016-11-10T13:13:53.043323+00:00 notice: (/Stage[main]/Osnailyfacter::Fuel_pkgs::Fuel_pkgs/Package[fuel-ha-utils]/ensure) ensure changed 'purged' to 'present'
2016-11-10T13:13:53.043323+00:00 debug: (/Package[fuel-ha-utils]) The container Class[Osnailyfacter::F...

Read more...

Changed in fuel:
status: Incomplete → New
summary: - deployment fails: Failed tasks: Task[fuel_pkgs/1], Task[fuel_pkgs/3],
- Task[fuel_pkgs/5], ...
+ deployment fails when a package takes "too long" to download/install
description: updated
Changed in fuel:
importance: Undecided → High
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Raising the Importance to High since any user can experience this problem if mirror.fuel-infra.org happens to be busy enough.

description: updated
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

@Alexei Diagnostic snapshot can be generated using fuel ui menu SUPPORT -> Download Diagnostic Snapshot

@Dev-ops team, can you comment here?

Changed in fuel:
assignee: nobody → Fuel DevOps (fuel-devops)
tags: added: area-devops
Revision history for this message
Andrey Nikitin (heos) wrote :

What a kind of problem do you have with mirror.fuel-infra.org?

Changed in fuel:
status: New → Confirmed
milestone: none → 9.2
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Andrey,

> What a kind of problem do you have with mirror.fuel-infra.org?

None at all. It's a problem of nailgun (or astute?). It assumes that any package can be downloaded (along with dependencies) and installed within 10 minutes. This assumption is clearly wrong and should be lifted.

Changed in fuel:
assignee: Fuel DevOps (fuel-devops) → Fuel Sustaining (fuel-sustaining-team)
tags: removed: area-devops
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Alexei, fuel pkgs such as fuel-ha-utils are relatively small - no more than a couple of megabytes. If it takes almost 600 seconds, that means that your connection to the repository is about 15 kbytes/s fast. This would also mean that other packages downloads (which are bigger than fuel-ha-utils) would take even more up to hours. In this case we set a timeout which should indicate the failure and stop the deployment.

The timeout for fuel pkgs installation task is specified here:

https://github.com/openstack/fuel-library/blob/stable/mitaka/deployment/puppet/osnailyfacter/modular/fuel_pkgs/tasks.yaml#L27

You can change it on the master node at '/etc/puppet/modules/../' for the sake of your current case and synchronize the defnition to the database, e.g by running `fuel rel --sync-deployment-tasks --dir /etc/puppet`.

Alternatively, you can create a mirror of these repositories that are slow for you and point Fuel to use them. You can do it by yourself or using fuel-mirror, fuel-createmirror or packetary utilities.

I will mark this bug as invalid, unless it has to do something with repositories access speed. In the latter case it should be passed to fuel infra team.

Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> that means that your connection to the repository is about 15 kbytes/s fast

Breaking the deployment just because of a transient networking problem (or an overloaded mirror) is absolutely unacceptable.
A hard-coded download timeout, and a clumsy error message are not acceptable either.

> fuel pkgs such as fuel-ha-utils are relatively small - no more than a couple of megabytes.

1) fuel packages depend on other not so small packages
2) puppet tries to upgrade *all* packages, and 10 minutes might be not enough for that

> In the latter case it should be passed to fuel infra team.

Temporary failures are common in networking (by design), fuel-infra team can't possibly change that.

Please remove the hard-coded timeout, and improve the error messages instead of playing the finger-pointing game.

Changed in fuel:
status: Invalid → New
Changed in fuel:
status: New → Confirmed
importance: High → Medium
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Alexei, what we can see here is not a transient error, but rather a continuous performance degradation of the network you are using. Fuel allows for tolerance of transient errors, but not for the unacceptable performance of underlying infrastructure. Thus, please ensure that your network is fast enough for post-DSL era before deploying a complex clustered distributed system.

Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Breaking the deployment just because of a transient networking problem (or an overloaded mirror) is absolutely unacceptable. Please remove the hard-coded timeout, and improve the error messages.

Changed in fuel:
status: Invalid → New
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> please ensure that your network is fast enough for post-DSL era

Improving mirror.fuel-infra.org network link is not something customers can do (or want to do).
Also please pay attention that

> 2) puppet tries to (dist-)upgrade *all* packages, and 10 minutes might be not enough for that

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 9.2 → 9.3
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Alexei, what exactly do you suppose to improve?

>Please remove the hard-coded timeout
And what's next? Which deployment time is okay? Are 10 minutes okay? 100 minutes? A year? It can cost you an age if you have network connection slow enough - but customers rather wanted to see an error than wait too long time.

>improve the error messages.
Don't be vague, please. Propose your improvement idea and dev team will definitely looks at it. Now error messages are standardized across whole deployment process.

>Improving mirror.fuel-infra.org network link is not something customers can do (or want to do)
>puppet tries to (dist-)upgrade *all* packages, and 10 minutes might be not enough for that
We have a documentation to create local package mirrors exactly for such cases. And many customers who cannot have fast internet connection for some reasons, creates and uses these mirrors. And it looks completely okay. Why do you want to change this behavior?

Changed in fuel:
status: New → Invalid
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> Now error messages are standardized across whole deployment process.

"Failed tasks: Task[fuel_pkgs/1], Task[fuel_pkgs/3], ..." is an utter nonsense.
 It gives no slightest idea about what exactly is wrong and how to fix and/or work around it.

> We have a documentation to create local package mirrors exactly for such cases.

And how on Earth a user can guess that "Task[fuel_pkgs/x]" stands for "your Internet link is too slow, consider creating a local mirror".

>> Please remove the hard-coded timeout
> And what's next? Which deployment time is okay?

That depends. A tool (Fuel) must not impose a policy, it might provide a mechanism to abort deployment if package download/installation takes "too long" (and that "too long" must be configurable)

Changed in fuel:
status: Invalid → New
Changed in fuel:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.