tripleo

"fetch the archive" play from tripleo-transfer is not reliable for huge files

Bug #1908425 reported by Jose Luis Franco on 2020-12-16

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Triaged	High	Jose Luis Franco	tripleo xena-3

Bug Description

Upstream bug based on https://bugzilla.redhat.com/show_bug.cgi?id=1904681

During FFU Queens -> Train upgrade of tripleo deployment with separate Database control role customer faced a situation when command [1] failed because of timeout. When we analyzed python logs it turned out that "fetch the archive" play was initiated, but was in progress after few hours.

On director node we saw active ansible-playbook process that used ~2GB of RAM, but wasn't actually doing anything. Our first assumption was that DB archive is too big, but it was only ~7.2GB and there were a lot of space on controller node and director node.

It looks like it is known to some extend that fetch plays use different mechanisms and create extra load when "become" parameter is used.

[1]
"openstack overcloud external-upgrade run --stack overcloud --tags system_upgrade_transfer_data -y"

Bogdan Dobrelya (bogdando) on 2021-01-11

Changed in tripleo:
importance:	Undecided → High
milestone:	none → wallaby-2

Marios Andreou (marios-b) on 2021-01-29

Changed in tripleo:
milestone:	wallaby-2 → wallaby-3

Marios Andreou (marios-b) on 2021-03-17

Changed in tripleo:
milestone:	wallaby-3 → wallaby-rc1