Instance/VM Console is not opened, Horizon is getting back with Network Error (tcp_error)

Bug #1797234 reported by Fernando Hernandez Gonzalez on 2018-10-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Low
Erich Cordoba

Bug Description

Brief Description
-----------------
After Image,flavor,network,networksubnet are created and Virtual machine/Instance is created based on them, the console is not able to be opened and is getting back with Network Error.

Severity
--------
Critical: VMs are not able to be used.
Steps to Reproduce
------------------
Create an instace following next steps on active conrtoller:
 source /etc/nova/openrc
 wget http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
 openstack flavor create --public --id 1 --ram 512 --vcpus 1 --disk 4 m1.tiny
 openstack image create --file cirros-0.4.0-x86_64-disk.img --disk-format qcow2 --public cirros
 openstack network create --shared net
 openstack subnet create --network net --ip-version 4 --subnet-range 192.168.0.0/24 --dhcp net-subnet1
 openstack server create --flavor m1.tiny --image cirros --nic net-id=uuid vm1

Expected Behavior
------------------
After we create an instance/VM we should be able to open the prompt console and start working on it.

Actual Behavior
----------------
Prompt console is not able to be opened.

Reproducibility
---------------
This issue is 100% reproducible.

System Configuration
--------------------
Virtual Multinodo Dedicated Ceph storage: 2 Controllers, 2 computes, 3 storage over IPV4 link [0]

Timestamp/Logs
--------------
Provide a snippet of logs if available and the timestamp when issue was seen.
Please indicate the unique identifier in the logs to highlight the problem
Provide a pointer to the logs for debugging (use attachments in Launchpad or paste.openstack.org)

[0] https://wiki.openstack.org/wiki/StarlingX/Installation_Guide_Virtual_Environment/Dedicated_Storage

description: updated

In Virtual Multinode - Going through "cat /var/log/nova/nova-scheduler.log | grep vm1"
This is what i found:
cat /var/log/nova/nova-scheduler.log | grep <vm_name>
2018-10-10 13:13:53.552 7379 INFO nova.filters [req-8b5076c3-37d2-4673-8857-ae133cad99a2 0c05321bf4e8459a94749b369d86a9ee 01546078c7d14238946429a5481bcf1a - default default] Filters succeeded with 1 out of 2 host(s), uuid=764429c8-f247-4fe4-b164-5481cbbd69b6, id=, name=vm1, flavor=Flavor(created_at=2018-10-09T12:07:30Z,deleted=False,deleted_at=None,disabled=False,ephemeral_gb=0,extra_specs={aggregate_instance_extra_specs:storage='remote'},flavorid='2da3de85-2922-4e8c-b033-bb66ef33bc63',id=4,is_public=True,memory_mb=2048,name='f1.small',projects=<?>,root_gb=20,rxtx_factor=1.0,swap=0,updated_at=None,vcpu_weight=0,vcpus=1), image_props={}, hints={u'provider:physical_network': [u'providernet-a']}
...
2018-10-10 13:13:54.194 7379 INFO nova.scheduler.filter_scheduler [req-8b5076c3-37d2-4673-8857-ae133cad99a2 0c05321bf4e8459a94749b369d86a9ee 01546078c7d14238946429a5481bcf1a - default default] SCHED: PASS. Selected [(compute-0, compute-0, QEMU) ram: 11637MB disk: 586752MB io_ops: 1 closids: 0 instances: 1], uuid=764429c8-f247-4fe4-b164-5481cbbd69b6, name=, display_name=vm1, scheduled=1
...

Yosief Gebremariam (ygebrema) wrote :

Following your steps above, I could not reproduce the issue in a Duplex two node system (i.e two controllers). I was able to access the VM though Horizon VM console.

openstack server list
+--------------------------------------+--------+--------+----------------------+-------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+--------+--------+----------------------+-------+--------+
| c5a9ed03-4ad0-44bc-b619-f691d5b917d4 | test_1 | ACTIVE | test-net=192.168.0.3 | | small |
+--------------------------------------+--------+--------+----------------------+-------+--------+

openstack subnet list | grep test_subnet
| e291f08f-b743-4cba-a354-c7797a391686 | test_subnet | 9eeedd24-3996-486f-816d-ebca57b03a39 | 192.168.0.0/24 | 192.168.0.2-192.168.0.254 |

Yosief please check the Virtual configuration that says:

Virtual Multinodo Dedicated Ceph storage: 2 Controllers, 2 computes, 3 storage over IPV4 link [0]

Many Thanks.

Some more debug:
I went to Admin / Compute / flavors / hit "Edit Flavor" button / Select Updated Metadata and update "aggregate_instance_extra_specs:storage" property to remote and created a Instance with this metadata changed.

Then I power off all the nodes and restarted them one by one starting from controller-0, controller-1, Compute-0 and so on... after this I went through the instance I created and I am able to open the console.

Frank Miller (sensfan22) wrote :

Some comments and information:

1. I don't understand what was changed in your Oct 12 comment re the flavors metadata. Based on your controller logs I see that the original VM1 was launched with a flavor that had "aggregate_instance_extra_specs:storage" set to remote. I suspect the power off/power on was the reason you were able to connect to the console on Oct 12.

2. We have a 2+2+2 config (2 controllers + 2 storage + 2 computes) set up on bare metal with the 2018.10 branch load and are able to launch a VM using a "remote" storage flavor and successfully connect to the console. We have to click on "Click here to show only console" to get to the actual VM console screen.

3. I took a look at the wiki you referenced and from a high-level that wiki looks correct. Looking at your controller-0 logs I see you unlocked compute-0 before you unlocked storage-0 but that should be ok.

4. On compute-0 in kern.log I see these bad checksum UDP logs for the IP address that was assigned to VM1. These logs come put a couple of minutes after the VM1 is launched on compute-0. You may need to investigate these further. But it looks like some kind of temporary network issue in your virtual environment that was cleared up after the power off/power on cycle you did on Oct 12.

Ghada Khalil (gkhalil) wrote :

Based on the above, we believe that this is some kind of configuration issue. Suggest having someone more familiar with this virtual env investigate further.

Bruce Jones (brucej) wrote :

As per the community meeting on 10/17 this is not release blocking. We'd like to continue the debug of this to see if it's a setup/config issue, a documentation issue or a defect.

Changed in starlingx:
assignee: nobody → Erich Cordoba (ericho)
tags: added: stx.docs
Ghada Khalil (gkhalil) on 2018-10-17
Changed in starlingx:
importance: Undecided → Medium
Elio Martinez (elio1979) wrote :

This problem is present with BM Multinode Local Storage

Erich Cordoba (ericho) wrote :

I tried to reproduce in a virtual environment but I was unable to see the failure. I got 6 cirros and 3 ubuntu instances running.

Ghada Khalil (gkhalil) on 2018-10-24
Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil) wrote :

Based on the latest info from Erich, this issue is not reproducible. Therefore, it doesn't gate the stx.2018.10 release. Removing the label.

tags: removed: stx.2018.10
Changed in starlingx:
importance: Medium → Low
Ghada Khalil (gkhalil) wrote :

I will leave it to Erich to follow up with the reporter to close this bug report. It should be marked as Invalid.

Ghada Khalil (gkhalil) wrote :

Marking as Invalid based on Ada's input:

-----------------------------
From: Cabrales, Ada [mailto:<email address hidden>]
Sent: Wednesday, October 24, 2018 10:14 AM
To: Khalil, Ghada; Cordoba Malibran, Erich
Cc: Jones, Bruce E; Hernandez Gonzalez, Fernando
Subject: RE: Update on https://bugs.launchpad.net/starlingx/+bug/1797234

Hello,

   We haven’t been able to reproduce this problem with latest ISO. Let’s mark it as invalid. If we can reproduce the issue with the ISO that includes the vswitch fix, we will re-open it.

Thanks
A.

Changed in starlingx:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers