Fuel for OpenStack

[devops] Document and train US team for system tests run recovery

Bug #1322114 reported by Mike Scherbakov on 2014-05-22

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	Medium	Fuel DevOps	Fuel for OpenStack 8.0

Bug Description

We need a detailed step by step instruction (if it is not possible now, then let's create separate tickets to address technical side of this question) on what should US engineer to do in order to recover system tests run.

As we know, system tests can take up to 7-8 hours of run. So if for some reason any CI element which prepares its run on multiple servers fails, then no tests will run. We need to have a recovery plan for such situation. At the current moment, system tests depend at least on the following:
1) ISO build
2) smoke for ISO
3) CI job which fetches ISO on servers

If any of these 3 fail currently, no tests will run. We could create something like:
a) in case of failure any of 3 things above, fuel-core-team receives an email alert
b) US engineer follows the link to the instructions what to do, link sent in the email alert
c) US engineers follows simple step-by-step instructions on how to restart the process, possibly using backup server / whatsoever.

We need to have most frequent scenarios of failures documented and action plan should differ depending on it. Possible failures:
a) ISO build fails because it has some package failure. Roman Vyalov already has CI job which can restore packages mirror to some point in the past, needs to be completed and documented - so in this case we should redirect to the instructions which Roman will provide
b) something else fails because new devops or build script was updated, but has bug. In similar cases like this, ideal variant to me is to have replicated backup builds on other hardware nodes, which do the same work, but being updated with 1-2 days delay - while main CI jobs stay on master (I'm talking about build scripts, seed client, etc.). So we could simply guide to try to use backup build, which would still build same ISO - but with usage of older CI instructions.

I'll leave it for DevOps team to think further and invent its own way, but I think I've provided main idea in enough details.

Tags:

Igor Shishkin (teran) on 2014-05-26

tags:

added: docs techtalk

Nastya Urlapova (aurlapova) on 2014-05-26

Changed in fuel:
status:	New → Confirmed

Igor Shishkin (teran) on 2014-07-22

Changed in fuel:
importance:	High → Medium

Igor Shishkin (teran) on 2014-07-28

Changed in fuel:
milestone:	5.1 → 6.0

Revision history for this message

Igor Shishkin (teran) wrote on 2014-12-04:

I think here we have the way we already using for some of aside guys, like we have permanent meetings and keep the info clear between teams.
So I think we should find someone who will be ready to contact us let's say once a week and have grants equal our once on all infrastructure servers.

Dima?

Changed in fuel:
milestone:	6.0 → 6.1

Sergey Kulanov (skulanov) on 2015-04-07

tags:

added: non-release

Nastya Urlapova (aurlapova) on 2015-04-29

Changed in fuel:
milestone:	6.1 → 7.0

Igor Shishkin (teran) on 2015-08-06

Changed in fuel:
milestone:	7.0 → 8.0

Dmitry Pyzhov (dpyzhov) on 2015-10-22

tags:

added: area-devops

Revision history for this message

Igor Shishkin (teran) wrote on 2015-11-25:

Mike, since we have slightly changed structure now I don't think it's still relevant.
Please feel free to provide your thoughts. Marking incomplete for now.

Changed in fuel:
status:	Confirmed → Incomplete

Revision history for this message

Igor Shishkin (teran) wrote on 2015-12-28:

Marking invalid according to our policy.
Please get it back to new if it's still relevant.

Igor Shishkin (teran) on 2015-12-28

Changed in fuel:
status:	Incomplete → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.