Deployment silently failed when openvswitch can't start due to memory lack

Bug #1561481 reported by Maksym Strukov on 2016-03-24
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
MOS Linux
Mitaka
High
MOS Linux
Newton
High
MOS Linux

Bug Description

Steps:
1. Create env with Neutron Vlan network
2. Add two slaves: 1 controller and 1 compute
3. Move private network to enp0s5
4. Enable DPDK on enp0s5 for compute node
5. Configure HugePages for compute node: 2m = 5, 1g = 0, dpdk = 0
6. Run deployment

Actual result:
Deployment failed.
Puppet and astute log contain no errors. Only one warning in astute log about puppet-agent timeout on compute node.

Issue with openvswitch which can't start due to lack of memory:

EAL: Not enough memory available on socket 0! Requested: 64MB, available: 20MB
PANIC in rte_eal_init():
Cannot init memory
7: [ovs-vswitchd() [0x408aa3]]
6: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fe6a14aeec5]]
5: [ovs-vswitchd() [0x407289]]
4: [ovs-vswitchd() [0x505292]]
3: [/usr/lib/x86_64-linux-gnu/libdpdk.so(rte_eal_init+0x1071) [0x7fe6a23fb201]]
2: [/usr/lib/x86_64-linux-gnu/libdpdk.so(__rte_panic+0xc3) [0x7fe6a23c1bb9]]
1: [/usr/lib/x86_64-linux-gnu/libdpdk.so(rte_dump_stack+0x18) [0x7fe6a2400f08]]
Aborted (core dumped)
* Starting ovs-vswitchd
* Enabling remote OVSDB managers
root@node-2:~# echo $?
0

puppet freeze on starting openvswitch and don't report any error.

Expected result:
Puppet logs contain error which points engineer to issue with openvswitch and/or memory misconfiguration.

Env:
9.0 customm http://jenkins-product.srt.mirantis.net:8080/view/custom_iso/job/9.0.custom.iso/1205/

Maksym Strukov (unbelll) on 2016-03-24
description: updated
Vladimir Eremin (yottatsa) wrote :

Patch

tags: added: feature-dpdk
Changed in fuel:
milestone: none → 9.0
assignee: nobody → MOS Packaging Team (mos-packaging)
Changed in fuel:
status: New → Confirmed
Changed in fuel:
importance: Medium → Low
Igor Yozhikov (iyozhikov) wrote :

Issue appeared in system level application. All driver related issues are covered by mos-linux team

Changed in fuel:
assignee: MOS Packaging Team (mos-packaging) → MOS Linux (mos-linux)

Related fix proposed to branch: master
Change author: Ivan Suzdal <email address hidden>
Review: https://review.fuel-infra.org/18760

Changed in fuel:
status: Confirmed → In Progress

Related fix proposed to branch: master
Change author: Ivan Suzdal <email address hidden>
Review: https://review.fuel-infra.org/18780

Reviewed: https://review.fuel-infra.org/18780
Submitter: Pkgs Jenkins <email address hidden>
Branch: master

Commit: 4f29ff452674ac5562ab26057f26f2b6373f643c
Author: Ivan Suzdal <email address hidden>
Date: Fri Mar 25 16:10:44 2016

Return error if daemon couldn't start/restart

Change-Id: I7721ebc91d77f75b3718a0d740d029a79d4ab2b4
Related-Bug: #1561481

Ivan Suzdal (isuzdal) on 2016-03-31
Changed in fuel:
status: In Progress → Fix Committed
tags: added: verification-needed

Related fix proposed to branch: 9.0
Change author: Ivan Suzdal <email address hidden>
Review: https://review.fuel-infra.org/19039

Reviewed: https://review.fuel-infra.org/19039
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0

Commit: 590c949a1c3870f819320acbe3eef143f02dcd7b
Author: Ivan Suzdal <email address hidden>
Date: Thu Mar 31 12:13:25 2016

Return error if daemon couldn't start/restart

Change-Id: I7721ebc91d77f75b3718a0d740d029a79d4ab2b4
Related-Bug: #1561481
(cherry picked from commit 4f29ff452674ac5562ab26057f26f2b6373f643c)

Change abandoned by Ivan Suzdal <email address hidden> on branch: master
Review: https://review.fuel-infra.org/18760

Ivan Suzdal (isuzdal) on 2016-04-06
Changed in fuel:
assignee: MOS Linux (mos-linux) → Maksym Strukov (unbelll)
Maksym Strukov (unbelll) wrote :

Reproduced on 9.0-mos-364

Steps the same except HP: 2m=1, dpdk=1mb
Deployment failed with message:
"Deployment has failed. All nodes are finished. Failed tasks: Task[netconfig/3], Task[netconfig/2] Stopping the deployment process!"

Puppet log has only:
"2016-05-16 23:01:34 WARNING [31061] Puppet agent 2 didn't respond within the allotted time"

Puppet task just freezed, but /var/log/openvswitch/* on compute contain:

2016-05-16T22:05:03Z|00001|dpdk|INFO|No -vhost_sock_dir provided - defaulting to /var/run/openvswitch
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 1 lcore(s)
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up physically contiguous memory...
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7fb03fe00000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7fb03fa00000 (size = 0x200000)
EAL: Requesting 1 pages of size 2MB from socket 0
EAL: rte_eal_common_log_init(): cannot create log_history mempool
PANIC in rte_eal_init():
Cannot init logs
7: [ovs-vswitchd() [0x408aa3]]
6: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fb040181ec5]]
5: [ovs-vswitchd() [0x407289]]
4: [ovs-vswitchd() [0x5052c2]]
3: [/usr/lib/x86_64-linux-gnu/libdpdk.so(rte_eal_init+0xf42) [0x7fb0410ce0d2]]
2: [/usr/lib/x86_64-linux-gnu/libdpdk.so(__rte_panic+0xc3) [0x7fb041094bb9]]
1: [/usr/lib/x86_64-linux-gnu/libdpdk.so(rte_dump_stack+0x18) [0x7fb0410d3f08]]
Aborted (core dumped)

Changed in fuel:
status: Fix Committed → Confirmed
Vladimir Eremin (yottatsa) wrote :

Switching to upstart brokes it again. Need to fix upstart

Vladimir Eremin (yottatsa) wrote :

This line is wrong, because if set -e is in effect, printf didn't work.
/usr/share/openvswitch/scripts/ovs-lib:
            rc=`{ { "${datadir}/scripts/ovs-ctl" "$@" 2>&1 3>&-; printf $? 1>&3; } 4>&- \

Start working with fix:
            rc=`set +e; { { "${datadir}/scripts/ovs-ctl" "$@" 2>&1 3>&-; printf $? 1>&3; } 4>&- \
or:
/etc/init/openvswitch-switch.conf:
  set +e
  if ! "$@"
  then
    exit 1
  fi

Maksym Strukov (unbelll) on 2016-05-17
Changed in fuel:
assignee: Maksym Strukov (unbelll) → nobody
Maksym Strukov (unbelll) on 2016-05-17
Changed in fuel:
assignee: nobody → MOS Linux (mos-linux)

Fix proposed to branch: master
Change author: Ivan Suzdal <email address hidden>
Review: https://review.fuel-infra.org/20774

Vladimir Eremin (yottatsa) wrote :

Bug itself isn't high priority one, but it makes UX awful.

Reviewed: https://review.fuel-infra.org/20774
Submitter: Pkgs Jenkins <email address hidden>
Branch: master

Commit: 1381b358613f0943d328e5387ed7a722bd08e6f5
Author: Ivan Suzdal <email address hidden>
Date: Wed May 18 13:07:06 2016

No more pipeline magic in ovs-lib

Remove pipeline magic in ovs-lib template.
Here is no difference between directly appending to log or piping to tee.
Besides, seems like someone among developers can't make
decision about logging in ovs-lib.

Change-Id: I22afd0e63291082ae11b4f36abf22de8731001d4
Closes-Bug: #1561481

Fix proposed to branch: 9.0
Change author: Ivan Suzdal <email address hidden>
Review: https://review.fuel-infra.org/20854

Reviewed: https://review.fuel-infra.org/20854
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0

Commit: 9a6f969434275f8d2ca0cfd73a2bacc6850c35c5
Author: Ivan Suzdal <email address hidden>
Date: Wed May 18 16:38:52 2016

No more pipeline magic in ovs-lib

Remove pipeline magic in ovs-lib template.
Here is no difference between directly appending to log or piping to tee.
Besides, seems like someone among developers can't make
decision about logging in ovs-lib.

Change-Id: I22afd0e63291082ae11b4f36abf22de8731001d4
Closes-Bug: #1561481
(cherry picked from commit 1381b358613f0943d328e5387ed7a722bd08e6f5)

Artem Panchenko (apanchenko-8) wrote :

verified

"status"=>"error",
 "error_type"=>"deploy",
 "error_msg"=>
  "All nodes are finished. Failed tasks: Task[netconfig/4] Stopping the deployment process!"}

(/Stage[main]/L23network::L2/Service[openvswitch-service]) Failed to call refresh: Could not start Service[openvswitch-service]: Execution of '/sbin/start openvswitch-switch' returned 1: start: Job failed to start

iso version::

shotgun2 short-report
cat /etc/fuel_build_id:
 427
cat /etc/fuel_build_number:
 427
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6347.noarch
 fuel-misc-9.0.0-1.mos8415.noarch
 fuel-bootstrap-cli-9.0.0-1.mos284.noarch
 fuel-migrate-9.0.0-1.mos8415.noarch
 rubygem-astute-9.0.0-1.mos747.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8718.noarch
 network-checker-9.0.0-1.mos72.x86_64
 fuel-mirror-9.0.0-1.mos137.noarch
 fuel-openstack-metadata-9.0.0-1.mos8718.noarch
 fuel-notify-9.0.0-1.mos8415.noarch
 nailgun-mcagents-9.0.0-1.mos747.noarch
 python-fuelclient-9.0.0-1.mos319.noarch
 fuelmenu-9.0.0-1.mos270.noarch
 fuel-9.0.0-1.mos6347.noarch
 fuel-utils-9.0.0-1.mos8415.noarch
 fuel-setup-9.0.0-1.mos6347.noarch
 fuel-library9.0-9.0.0-1.mos8415.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-agent-9.0.0-1.mos284.noarch
 fuel-ui-9.0.0-1.mos2710.noarch
 fuel-ostf-9.0.0-1.mos934.noarch
 python-packetary-9.0.0-1.mos137.noarch
 fuel-nailgun-9.0.0-1.mos8718.noarch

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments