Ksmd eats I/O on host(lab)

Bug #1491772 reported by Alexander Arzhanov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Alexander Arzhanov
7.0.x
Invalid
High
Alexander Arzhanov
8.0.x
Invalid
High
Alexander Arzhanov

Bug Description

Problem description
At least two of our labs, we faced a problem with ksmd, ksmd constantly high disk I/O uses at host(lab).
It is reproduced in such a scenario:
1. Create cluster with vCenter support.
2. Add 3 nodes with Controller roles.
3. Add 2 nodes with compute role.
4. Deploy the cluster.
5. Run network verification.
6. Run OSTF.
7. Create 2 VMs on each of hypervisor.
8. Verify that VMs on different hypervisors should communicate
between each other.
In some cases, it took leave env with working instances (5-10 items) for some time (eg at night).
Likely, the problem arises in env without vcenter, specially we have not tested.

As the problem appears on the guest side:
1.At nodes observed jumps load average.
2.If you run ping node-1 -> node-2(for example), you can see high latency, sometimes packets will be lost.
3.Galera Cluster temporarily loses nodes (because of item 2).
4.Corosync occasionally loses nodes (because of item 2).

As the problem appears on the host side:
If you run iotop(for example), you can see, that ksmd using 40-80-99% (usually 99.9%) I/O on host(lab).

If you run echo 2 > /sys/kernel/mm/ksm/run (stop ksmd and unmerge all pages currently merged), the above problems are not observed, everything works fine.

Some details about the labs:
Ubuntu 14.04.3 LTS
Kernel: 3.13.0-24
Libvirt 1.2.2

Solution
1.Update the kernel, there is no problem with the following kernel (the other was not tested) :
3.13.0-39
3.13.0-63
2.Perhaps the problem will solve update libvirt.
3.Do not use KSM.

Changed in fuel:
assignee: nobody → MOS Linux (mos-linux)
Changed in fuel:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Pavel Boldin (pboldin) wrote :
Revision history for this message
Pawel Brzozowski (pbrzozowski) wrote :

tpi97, 102 seems to not be under Fuel DevOps control.

Revision history for this message
Igor Shishkin (teran) wrote :

@Alexander, I can't get how your issue is related to DevOps team or servers DevOps manages.

Please describe:
- Where the issue is happened
- What are you expecting DevOps team to do?

Revision history for this message
Alexander Arzhanov (aarzhanov) wrote :

We fixed (just upgrade kernel) this issue on our labs (tpi97,tpi102).
I create bug FYI, you can check kernel version on your labs. Most likely, you are using something more fresh than affected 3.13.0-24 kernel.

Revision history for this message
Alexander Kislitsky (akislitsky) wrote :

Nothing to do. Moving to Invalid.

Dmitry Pyzhov (dpyzhov)
tags: added: area-partners
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 7.0-updates → 8.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.