Fuel for OpenStack

Increased RAM usage leads to floating galera-mysql errors during swarm runs

Bug #1630233 reported by Dmitry Kalashnik on 2016-10-04

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Committed	High	Ivan	Fuel for OpenStack 10.0
Mitaka	Fix Released	High	Ivan	Fuel for OpenStack 9.2
Newton	Fix Committed	High	Ivan	Fuel for OpenStack 10.0

Bug Description

During 9.1 cycle we have faced floating issue with mysql-galera cluster caused by increased RAM usage.
Issue could appears during deploy, during ostf run, etc.

Making new ticket to keep all investigation details in the one place.

https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive_vlan/84/
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive/86/testReport/(root)/ha_neutron_mysql_termination/ha_neutron_mysql_termination/

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2016-10-05:

We solved this problem for smoke_neutron by increasing target nodes RAM volume. So, I wanna ask QA team if we want to go this way for other tests and if no, then why.

Changed in fuel:
assignee:	nobody → Fuel CI (fuel-ci)

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2016-10-05:

This question first need to be addressed to CI team also, so I assign this bug to them.

Revision history for this message

Roman Vyalov (r0mikiam) wrote on 2016-10-05:

how much memory should be on VMs ? you are proposing to increase memory only for 2 threads in the swarm ?

Changed in fuel:
status:	New → Incomplete

Roman Vyalov (r0mikiam) on 2016-10-05

Changed in fuel:
assignee:	Fuel CI (fuel-ci) → Stanislaw Bogatkin (sbogatkin)
status:	Incomplete → New

Revision history for this message

Alexandra (aallakhverdieva) wrote on 2016-10-05:

https://product-ci.infra.mirantis.net/job/9.x.acceptance.ubuntu.mixed_os_components/14/testReport/(root)/mixed_components_murano_sahara_ceilometer/

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2016-10-05:

Roman, how much do we have now? 2Gb? Can we raise it to one GB more?

Changed in fuel:
assignee:	Stanislaw Bogatkin (sbogatkin) → Fuel CI (fuel-ci)

Revision history for this message

Roman Vyalov (r0mikiam) wrote on 2016-10-05:

only for 2 swarm threads ?

Revision history for this message

Roman Vyalov (r0mikiam) wrote on 2016-10-05:

now we have 3 GB

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2016-10-06:

Ok, sounds good. Let's look to test results.

Revision history for this message

Roman Vyalov (r0mikiam) wrote on 2016-10-06:

@Stas now we have 3 Gb of RAM, we can increase to 4 Gb if it necessary
Also we should to increase memory only for 2 swarm threads or in the all ones ?

Revision history for this message

Sergii Golovatiuk (sgolovatiuk) wrote on 2016-10-06:

#10

@Rvyalov. The problem is not in Galera. It uses around 500-700MB which is ok. The problem is in services such neutron that uses 100Mb per process, but 16 processes spawned.

Run

ps -C neutron-server -orss= | awk '{ count ++; size += $1 }; END {print "Number of processes =",count; print "Memory usage per process =",size/1024/count, "MB"; print "Total memory usage =", size/1024, "MB"}'

next time to find a victim.

Changed in fuel:
status:	New → Invalid
status:	Invalid → Confirmed

Revision history for this message

Sergii Golovatiuk (sgolovatiuk) wrote on 2016-10-06:

#11

It would be nice to increase RAM for all threads which spawns neutron related jobs

Roman Vyalov (r0mikiam) on 2016-10-06

Changed in fuel:
status:	Confirmed → New
assignee:	Fuel CI (fuel-ci) → Ivan (iremizov)

Revision history for this message

Ivan (iremizov) wrote on 2016-10-06:

#12

Adding global setter of fuel-qa environ stuff:
https://review.fuel-infra.org/27237

Roman Vyalov (r0mikiam) on 2016-10-07

Changed in fuel:
status:	New → In Progress

Revision history for this message

Roman Vyalov (r0mikiam) wrote on 2016-10-19:

#13

https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive_vlan/99/console
SLAVE_NODE_MEMORY=4096

Revision history for this message

ElenaRossokhina (esolomina) wrote on 2016-10-19:

#14

RAM is increased, but the following issues have been occuring from time to time on different test scenarios, for example, the latest 9.x swarm:
https://product-ci.infra.mirantis.net/view/9.x_swarm/job/9.x.system_test.ubuntu.thread_7/99/testReport/(root)/deploy_neutron_tun_ha_nodegroups/deploy_neutron_tun_ha_nodegroups/
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_public/96/testReport/(root)/deploy_neutron_tun_ha_with_public_network/deploy_neutron_tun_ha_with_public_network/

Error Message
Cluster is not deployed: some nodes are in the Error state

Such errors are due to mysql could not start during deploy
SLAVE_NODE_MEMORY=4096

Revision history for this message

Alexandra (aallakhverdieva) wrote on 2016-10-20:

#15

https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive/102/testReport/(root)/ha_neutron_delete_vips/

TatyanaGladysheva (tgladysheva) on 2016-11-23

tags:

added: on-verification

Revision history for this message

TatyanaGladysheva (tgladysheva) wrote on 2016-11-24:

#16

Verified on 9.2 snapshot #549.

Actual results:
SLAVE_NODE_MEMORY=3968
No failures with this simptoms were found.

tags:

removed: on-verification

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.