StarlingX

Instance/VM Console is not opened, Horizon is getting back with Network Error (tcp_error)

Bug #1797234 reported by Fernando Hernandez Gonzalez on 2018-10-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Invalid	Low	Erich Cordoba

Bug Description

Brief Description
-----------------
After Image,flavor,network,networksubnet are created and Virtual machine/Instance is created based on them, the console is not able to be opened and is getting back with Network Error.

Severity
--------
Critical: VMs are not able to be used.
Steps to Reproduce
------------------
Create an instace following next steps on active conrtoller:
source /etc/nova/openrc
wget http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
openstack flavor create --public --id 1 --ram 512 --vcpus 1 --disk 4 m1.tiny
openstack image create --file cirros-0.4.0-x86_64-disk.img --disk-format qcow2 --public cirros
openstack network create --shared net
openstack subnet create --network net --ip-version 4 --subnet-range 192.168.0.0/24 --dhcp net-subnet1
openstack server create --flavor m1.tiny --image cirros --nic net-id=uuid vm1

Expected Behavior
------------------
After we create an instance/VM we should be able to open the prompt console and start working on it.

Actual Behavior
----------------
Prompt console is not able to be opened.

Reproducibility
---------------
This issue is 100% reproducible.

System Configuration
--------------------
Virtual Multinodo Dedicated Ceph storage: 2 Controllers, 2 computes, 3 storage over IPV4 link [0]

Timestamp/Logs
--------------
Provide a snippet of logs if available and the timestamp when issue was seen.
Please indicate the unique identifier in the logs to highlight the problem
Provide a pointer to the logs for debugging (use attachments in Launchpad or paste.openstack.org)

[0] https://wiki.openstack.org/wiki/StarlingX/Installation_Guide_Virtual_Environment/Dedicated_Storage

See original description

Tags:

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-10:

Horizon_screen_shot_Network_error.jpg Edit (57.9 KiB, image/jpeg)

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-10:

cirros created Edit (95.9 KiB, image/jpeg)

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-10:

flavor created Edit (97.3 KiB, image/jpeg)

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-10:

Instance created Edit (101.2 KiB, image/jpeg)

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-10:

Compute-0 logs Edit (29.3 MiB, application/x-tar)

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-10:

Compute-1 logs Edit (26.8 MiB, application/x-tar)

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-10:

Controller-0 logs Edit (85.6 MiB, application/x-tar)

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-10:

Controller-1 logs Edit (58.4 MiB, application/x-tar)

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-10:

storage-0 logs Edit (59.2 MiB, application/x-tar)

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-10:

#10

Storage-1 logs Edit (49.3 MiB, application/x-tar)

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-10:

#11

Storage-2 logs Edit (48.1 MiB, application/x-tar)

description:

updated

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-11:

#12

In Virtual Multinode - Going through "cat /var/log/nova/nova-scheduler.log | grep vm1"
This is what i found:
cat /var/log/nova/nova-scheduler.log | grep <vm_name>
2018-10-10 13:13:53.552 7379 INFO nova.filters [req-8b5076c3-37d2-4673-8857-ae133cad99a2 0c05321bf4e8459a94749b369d86a9ee 01546078c7d14238946429a5481bcf1a - default default] Filters succeeded with 1 out of 2 host(s), uuid=764429c8-f247-4fe4-b164-5481cbbd69b6, id=, name=vm1, flavor=Flavor(created_at=2018-10-09T12:07:30Z,deleted=False,deleted_at=None,disabled=False,ephemeral_gb=0,extra_specs={aggregate_instance_extra_specs:storage='remote'},flavorid='2da3de85-2922-4e8c-b033-bb66ef33bc63',id=4,is_public=True,memory_mb=2048,name='f1.small',projects=<?>,root_gb=20,rxtx_factor=1.0,swap=0,updated_at=None,vcpu_weight=0,vcpus=1), image_props={}, hints={u'provider:physical_network': [u'providernet-a']}
...
2018-10-10 13:13:54.194 7379 INFO nova.scheduler.filter_scheduler [req-8b5076c3-37d2-4673-8857-ae133cad99a2 0c05321bf4e8459a94749b369d86a9ee 01546078c7d14238946429a5481bcf1a - default default] SCHED: PASS. Selected [(compute-0, compute-0, QEMU) ram: 11637MB disk: 586752MB io_ops: 1 closids: 0 instances: 1], uuid=764429c8-f247-4fe4-b164-5481cbbd69b6, name=, display_name=vm1, scheduled=1
...

Revision history for this message

Yosief Gebremariam (ygebrema) wrote on 2018-10-11:

#13

Following your steps above, I could not reproduce the issue in a Duplex two node system (i.e two controllers). I was able to access the VM though Horizon VM console.

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-11:

#14

Yosief please check the Virtual configuration that says:

Virtual Multinodo Dedicated Ceph storage: 2 Controllers, 2 computes, 3 storage over IPV4 link [0]

Many Thanks.

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-12:

#15

Some more debug:
I went to Admin / Compute / flavors / hit "Edit Flavor" button / Select Updated Metadata and update "aggregate_instance_extra_specs:storage" property to remote and created a Instance with this metadata changed.

Then I power off all the nodes and restarted them one by one starting from controller-0, controller-1, Compute-0 and so on... after this I went through the instance I created and I am able to open the console.

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2018-10-12:

#16

V15 Instance prompt console opened Edit (107.4 KiB, image/jpeg)

Revision history for this message

Frank Miller (sensfan22) wrote on 2018-10-15:

#17

Some comments and information:

1. I don't understand what was changed in your Oct 12 comment re the flavors metadata. Based on your controller logs I see that the original VM1 was launched with a flavor that had "aggregate_instance_extra_specs:storage" set to remote. I suspect the power off/power on was the reason you were able to connect to the console on Oct 12.

2. We have a 2+2+2 config (2 controllers + 2 storage + 2 computes) set up on bare metal with the 2018.10 branch load and are able to launch a VM using a "remote" storage flavor and successfully connect to the console. We have to click on "Click here to show only console" to get to the actual VM console screen.

3. I took a look at the wiki you referenced and from a high-level that wiki looks correct. Looking at your controller-0 logs I see you unlocked compute-0 before you unlocked storage-0 but that should be ok.

4. On compute-0 in kern.log I see these bad checksum UDP logs for the IP address that was assigned to VM1. These logs come put a couple of minutes after the VM1 is launched on compute-0. You may need to investigate these further. But it looks like some kind of temporary network issue in your virtual environment that was cleared up after the power off/power on cycle you did on Oct 12.

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2018-10-16:

#18

Based on the above, we believe that this is some kind of configuration issue. Suggest having someone more familiar with this virtual env investigate further.

Revision history for this message

Bruce Jones (brucej) wrote on 2018-10-17:

#19

As per the community meeting on 10/17 this is not release blocking. We'd like to continue the debug of this to see if it's a setup/config issue, a documentation issue or a defect.

Changed in starlingx:
assignee:	nobody → Erich Cordoba (ericho)
tags:	added: stx.docs

Ghada Khalil (gkhalil) on 2018-10-17

Changed in starlingx:
importance:	Undecided → Medium

Revision history for this message

Elio Martinez (elio1979) wrote on 2018-10-22:

#20

This problem is present with BM Multinode Local Storage

Revision history for this message

Erich Cordoba (ericho) wrote on 2018-10-22:

#21

Screenshot from 2018-10-19 18-25-36.png Edit (194.3 KiB, image/png)

I tried to reproduce in a virtual environment but I was unable to see the failure. I got 6 cirros and 3 ubuntu instances running.

Ghada Khalil (gkhalil) on 2018-10-24

Changed in starlingx:
status:	New → In Progress

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2018-10-24:

#22

Based on the latest info from Erich, this issue is not reproducible. Therefore, it doesn't gate the stx.2018.10 release. Removing the label.

tags:	removed: stx.2018.10
Changed in starlingx:
importance:	Medium → Low

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2018-10-24:

#23

I will leave it to Erich to follow up with the reporter to close this bug report. It should be marked as Invalid.

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2018-10-24:

#24

Marking as Invalid based on Ada's input:

-----------------------------
From: Cabrales, Ada [mailto:<email address hidden>]
Sent: Wednesday, October 24, 2018 10:14 AM
To: Khalil, Ghada; Cordoba Malibran, Erich
Cc: Jones, Bruce E; Hernandez Gonzalez, Fernando
Subject: RE: Update on https://bugs.launchpad.net/starlingx/+bug/1797234

Hello,

We haven’t been able to reproduce this problem with latest ISO. Let’s mark it as invalid. If we can reproduce the issue with the ISO that includes the vswitch fix, we will re-open it.

Thanks
A.

Changed in starlingx:
status:	In Progress → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

StarlingX

Instance/VM Console is not opened, Horizon is getting back with Network Error (tcp_error)

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches