Mirantis OpenStack

After reboot compute with fuel-master network checker was failed

Bug #1533165 reported by Veronica Krayneva on 2016-01-12

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mirantis OpenStack	Invalid	High	Peter Zhurba	Mirantis OpenStack 8.0
	9.x	Invalid	High	Peter Zhurba	Mirantis OpenStack 9.0

Bug Description

Scenario:
1. Deploy cluster with two computes and three controllers
2. Migrate fuel-master
3. Reboot compute with fuel-master via reboot command
4. Wait till compute and fuel-master come up
5. Run OSTF tests
6. Run Network check
7. Check statuses for master’s services

Actual result:
Network checker failed
[root@nailgun ~]# fuel task
id | status | name | cluster | progress | uuid
---|--------|-------------------------|---------|----------|-------------------------------------
6 | ready | deployment | 1 | 100 | 066b4b61-b27f-4d09-98fe-8af69b4e5e32
1 | ready | deploy | 1 | 100 | a7b0dba3-3a29-4080-b875-15963e4fe702
17 | ready | check_dhcp | 1 | 100 | aea61491-4e3b-4f5b-87c0-87e6126bdbeb
18 | error | check_repo_availability | 1 | 100 | 4960ce85-26e0-4d3c-ab67-5883fbc285dd
19 | ready | create_stats_user | 1 | 100 | ebb22607-e7fc-46d6-8b96-127ce33d8a33
16 | error | verify_networks | 1 | 100 | dde7559d-5953-4781-8715-7dddb600aee1
5 | ready | provision | 1 | 100 | 052c4347-0183-43ee-95d1-c202df273384

Also:
root@node-5:~# nova-manage service list
No handlers could be found for logger "oslo_config.cfg"
test.domain.local nova-consoleauth node-3.test.domain.local nova-scheduler node-3.test.domain.local nova-conductor node-3.test.domain.local nova-consoleauth node-5.test.domain.local nova-scheduler node-5.test.domain.local nova-conductor node-5.test.domain.local nova-cert node-5.test.domain.local nova-consoleauth node-4.test.domain.local nova-scheduler node-4.test.domain.local nova-conductor node-4.test.domain.local nova-cert node-4.test.domain.local 2016-01-12 11:39:50.581 8299 DEBUG nova.servicegrotest.domain.local nova-compute node-2.test.domain.local /> [req-841473b8-6817-4992-bfe3-cbf80f15147d - - - - -] Loading backend 'sqlalchemy' from 'nova.db.sqlalchemy.api' _load_backend /usr/lib/python2.7/dist-packages/oslo_db/api.py:230
/>sqlalchemy.engines [req-841473b8-6817-4992-bfe3-cbf80f15147d - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py:256
Status State Updated_At
internal enabled :-) 2016-01-12 11:39:40
internal enabled :-) 2016-01-12 11:39:41
internal enabled :-) 2016-01-12 11:39:40
internal enabled :-) 2016-01-12 11:39:41
internal enabled :-) 2016-01-12 11:39:43
internal enabled :-) 2016-01-12 11:39:43
internal enabled :-) 2016-01-12 11:39:40
internal enabled :-) 2016-01-12 11:39:43
internal enabled :-) 2016-01-12 11:39:40
internal enabled :-) 2016-01-12 11:39:42
internal enabled :-) 2016-01-12 11:39:41
internal enabled :-) 2016-01-12 11:39:40
/>up.drivers.db [req-841473b8-6817-4992-bfe3-cbf80f15147d - - - - -] Seems service is down. Last heartbeat was 2016-01-12 10:27:55. Elapsed time is 4315.581825 is_up /usr/lib/python2.7/dist-packages/nova/servicegroup/drivers/db.py:80
nova enabled XXX 2016-01-12 10:27:55
nova enabled :-) 2016-01-12 11:39:04

ISO #417

Veronica Krayneva (vkrayneva) on 2016-01-12

Changed in mos:
importance:	Undecided → High
milestone:	none → 8.0
assignee:	nobody → Peter Zhurba (pzhurba)

Ilya Kutukov (ikutukov) on 2016-01-12

Changed in mos:
status:	New → Confirmed

Revision history for this message

Peter Zhurba (pzhurba) wrote on 2016-01-12:

messages.gz Edit (98.1 KiB, application/octet-stream)

Main case is
Network interface crash on compute with fuel master. Full log is attached marker [ cut here ]

<6>Jan 12 08:38:38 node-1 kernel: [ 3288.416028] br-fw-admin: port 2(vfm_enp0s3) entered forwarding state
<4>Jan 12 08:40:34 node-1 kernel: [ 3404.998725] ------------[ cut here ]------------
<4>Jan 12 08:40:34 node-1 kernel: [ 3404.998740] WARNING: CPU: 1 PID: 17331 at /build/linux-_xRakU/linux-3.13.0/net/core/dev.c:2228 skb_warn_bad_offload+0xcd/0xda()
<4>Jan 12 08:40:34 node-1 kernel: [ 3404.998743] e1000: caps=(0x0000000200014b89, 0x0000000000000000) len=9854 data_len=9826 gso_size=1480 gso_type=6 ip_summed=3
<4>Jan 12 08:40:34 node-1 kernel: [ 3404.998745] Modules linked in: vhost_net vhost macvtap macvlan xt_mac xt_physdev

........

Revision history for this message

Peter Zhurba (pzhurba) wrote on 2016-01-12:

It happens during compute restart in case if libvirt perform autostart.
Looks like kernel or qemu or test-env related issue.

But if we are adding delay to vm start, it helps

Revision history for this message

Ksenia Svechnikova (kdemina) wrote on 2016-01-13:

There is an open bug for nova-compute in XXX state after node rebooting: https://bugs.launchpad.net/mos/+bug/1529810

Snapshot for this issue is needed

Revision history for this message

Peter Zhurba (pzhurba) wrote on 2016-01-19:

aptitude Edit (5.5 KiB, text/plain)

For reproducing bug was used script which is showed bellow .
#!/bin/bash

t-pass(){
# compute IP
cm=10.109.0.12
# show last kernel log
ssh $cm dmesg | tail -n 20
# kill mograted fuel master
ssh $cm virsh destroy fuel_master
# revert snapshot
ssh $cm qemu-img create -b /var/lib/nova/fuel_master.img -f qcow2 /var/lib/nova/fuel_master-b.img
# reboot compute
ssh $cm reboot
# wait for bug. According logs bug appearers after 2 – 3 minutes after boot.
sleep 360
}

while true ; do t-pass; done

For checking was used grep messages by “[ cut”

After upgrading lab-host (see aptitude log) frequency of reproducing was decreased. Bug appeared 2 time from 500 tries. (see messages2.gzip)

Using virtio instead of e1000 100% helps. Also In this case many other kernel's warnings dissapeared too.

There is no possibility reproduce bug on hardware because e1000 is too old.

Also there is no one confirmed case on other test environments.

All above let me consider that bug is not reproducible

Revision history for this message

Peter Zhurba (pzhurba) wrote on 2016-01-19:

messages.1.gz Edit (2.1 MiB, application/octet-stream)

Revision history for this message

Peter Zhurba (pzhurba) wrote on 2016-01-19:

messages.gz Edit (2.2 MiB, application/octet-stream)

Revision history for this message

Ksenia Svechnikova (kdemina) wrote on 2016-01-20:

The behavior doesn't reproduced on ISO#442.
OSTF tests and Network check is finished successfully. I will mark the issue as Invalid. Please, feel free to reopen it in case of reproduction

Changed in mos:
status:	Confirmed → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.