Increased RAM usage leads to floating galera-mysql errors during swarm runs

Bug #1630233 reported by Dmitry Kalashnik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Ivan
Mitaka
Fix Released
High
Ivan
Newton
Fix Committed
High
Ivan

Bug Description

During 9.1 cycle we have faced floating issue with mysql-galera cluster caused by increased RAM usage.
Issue could appears during deploy, during ostf run, etc.

Making new ticket to keep all investigation details in the one place.

https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive_vlan/84/
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive/86/testReport/(root)/ha_neutron_mysql_termination/ha_neutron_mysql_termination/

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

We solved this problem for smoke_neutron by increasing target nodes RAM volume. So, I wanna ask QA team if we want to go this way for other tests and if no, then why.

Changed in fuel:
assignee: nobody → Fuel CI (fuel-ci)
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

This question first need to be addressed to CI team also, so I assign this bug to them.

Revision history for this message
Roman Vyalov (r0mikiam) wrote :

how much memory should be on VMs ? you are proposing to increase memory only for 2 threads in the swarm ?

Changed in fuel:
status: New → Incomplete
Roman Vyalov (r0mikiam)
Changed in fuel:
assignee: Fuel CI (fuel-ci) → Stanislaw Bogatkin (sbogatkin)
status: Incomplete → New
Revision history for this message
Alexandra (aallakhverdieva) wrote :
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Roman, how much do we have now? 2Gb? Can we raise it to one GB more?

Changed in fuel:
assignee: Stanislaw Bogatkin (sbogatkin) → Fuel CI (fuel-ci)
Revision history for this message
Roman Vyalov (r0mikiam) wrote :

only for 2 swarm threads ?

Revision history for this message
Roman Vyalov (r0mikiam) wrote :

now we have 3 GB

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Ok, sounds good. Let's look to test results.

Revision history for this message
Roman Vyalov (r0mikiam) wrote :

@Stas now we have 3 Gb of RAM, we can increase to 4 Gb if it necessary
Also we should to increase memory only for 2 swarm threads or in the all ones ?

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

@Rvyalov. The problem is not in Galera. It uses around 500-700MB which is ok. The problem is in services such neutron that uses 100Mb per process, but 16 processes spawned.

Run

ps -C neutron-server -orss= | awk '{ count ++; size += $1 }; END {print "Number of processes =",count; print "Memory usage per process =",size/1024/count, "MB"; print "Total memory usage =", size/1024, "MB"}'

next time to find a victim.

Changed in fuel:
status: New → Invalid
status: Invalid → Confirmed
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

It would be nice to increase RAM for all threads which spawns neutron related jobs

Roman Vyalov (r0mikiam)
Changed in fuel:
status: Confirmed → New
assignee: Fuel CI (fuel-ci) → Ivan (iremizov)
Revision history for this message
Ivan (iremizov) wrote :

Adding global setter of fuel-qa environ stuff:
https://review.fuel-infra.org/27237

Roman Vyalov (r0mikiam)
Changed in fuel:
status: New → In Progress
Revision history for this message
Roman Vyalov (r0mikiam) wrote :
Revision history for this message
ElenaRossokhina (esolomina) wrote :

RAM is increased, but the following issues have been occuring from time to time on different test scenarios, for example, the latest 9.x swarm:
https://product-ci.infra.mirantis.net/view/9.x_swarm/job/9.x.system_test.ubuntu.thread_7/99/testReport/(root)/deploy_neutron_tun_ha_nodegroups/deploy_neutron_tun_ha_nodegroups/
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_public/96/testReport/(root)/deploy_neutron_tun_ha_with_public_network/deploy_neutron_tun_ha_with_public_network/

Error Message
Cluster is not deployed: some nodes are in the Error state

Such errors are due to mysql could not start during deploy
SLAVE_NODE_MEMORY=4096

Revision history for this message
Alexandra (aallakhverdieva) wrote :
tags: added: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on 9.2 snapshot #549.

Actual results:
SLAVE_NODE_MEMORY=3968
No failures with this simptoms were found.

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.