/etc/hosts not updated in containers

Bug #1857245 reported by BN
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Triaged
Medium
Unassigned

Bug Description

**Environment**:
* OS (e.g. from /etc/os-release): ubuntu
* Kernel (e.g. `uname -a`): Linux zhavoronok 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
* Docker version if applicable (e.g. `docker version`): latest
* Kolla-Ansible version (e.g. `git head or tag or stable branch` or pip package version if using release): latest
* Docker image Install type (source/binary): source
* Docker image distribution: stein
* Are you using official images from Docker Hub or self built? official
* If self built - Kolla version and environment used to build: no

kolla_base_distro: "ubuntu"
kolla_install_type: "source"
openstack_release: "stein"
kolla_internal_vip_address: "10.0.225.254"
kolla_external_vip_address: "10.0.225.254"
keepalived_virtual_router_id: "102"
#enable_murano: "yes"
network_interface: "enp2s0f0"
api_interface: "enp2s0f0"
neutron_external_interface: "enp2s0f1"
enable_neutron_provider_networks: "yes"
neutron_tenant_network_types: "vxlan,vlan,flat"
#enable_barbican: "yes"
enable_cinder: "yes"
enable_cinder_backup: "yes"
enable_fluentd: "yes"
enable_masakari: "yes"
enable_horizon: "yes"
#enable_zun: "yes"
#enable_kuryr: "yes"
#enable_etcd: "yes"
#docker_configure_for_zun: "yes"
enable_magnum: "no"
enable_heat: "no"
enable_ceph: "no"
glance_backend_ceph: "yes"
cinder_backend_ceph: "yes"
nova_backend_ceph: "yes"
glance_enable_rolling_upgrade: "no"
barbican_crypto_plugin: "simple_crypto"
barbican_library_path: "/usr/lib/libCryptoki2_64.so"
ironic_dnsmasq_dhcp_range:
tempest_image_id:
tempest_flavor_ref_id:
tempest_public_network_id:
tempest_floating_network_name:
horizon_port: 48000
enable_octavia: "yes"

-----------------------------------------------------------------------
I added new machine to stack, it can be seen in Hypervisors, ceph osd status also shows it and once I am trying to live migrate instance to new host, it just stops without any errors; if I do migrate, it stuck at Confirm or Revert Resize/Migrate.

libvirt log file: http://paste.openstack.org/show/787839/

What can be done in order to debug/resolve it? Thank you

Revision history for this message
BN (zatoichy) wrote :

When I tried to create instance on the host:

Message
    internal error: process exited while connecting to monitor: Could not access KVM kernel module: Permission denied 2019-12-22T18:13:53.174702Z qemu-system-x86_64: failed to initialize KVM: Permission denied
Code
    500
Details
    Traceback (most recent call last): File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 1984, in _do_build_and_run_instance filter_properties, request_spec) File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 2354, in _build_and_run_instance instance_uuid=instance.uuid, reason=six.text_type(e)) RescheduledException: Build of instance 90cdb778-be49-462a-9dab-8ab2a299f215 was re-scheduled: internal error: process exited while connecting to monitor: Could not access KVM kernel module: Permission denied 2019-12-22T18:13:53.174702Z qemu-system-x86_64: failed to initialize KVM: Permission denied
Created
    Dec. 22, 2019, 6:13 p.m.

Revision history for this message
Viorel-Cosmin Miron (uhl-hosting) wrote :

It seems are some KVM issues as per second log, can you check KVM is properly loaded in your Kernel?

Revision history for this message
BN (zatoichy) wrote :

kvm_intel 217088 0
kvm 610304 1 kvm_intel
irqbypass 16384 1 kvm

Revision history for this message
BN (zatoichy) wrote :

Okay, I ve reinstalled compute/storage on that new node which I added recently. So now at least I can create instances on that new host machine. However, live migration is still not working.

log from new machine, when I am trying to migrate from that new machine to some other one:

2019-12-22 22:50:56.677 6 ERROR nova.virt.libvirt.driver [-] [instance: 31ae61c7-46c9-4049-942a-f8f397251332] Live Migration failure: operation failed: guest CPU doesn't match specification: missing features: vmx: libvirtError: operation failed: guest CPU doesn't match specification: missing features: vmx
2019-12-22 22:50:56.915 6 ERROR nova.virt.libvirt.driver [-] [instance: 31ae61c7-46c9-4049-942a-f8f397251332] Migration operation has aborted

log from old machine, when i am trying to migrate to the new machine:

2019-12-22 22:47:54.502 6 INFO nova.compute.manager [-] [instance: dd6c5065-2655-4b29-bb24-2901627ddbc0] Took 3.13 seconds for pre_live_migration on destination host zhavoronok.
2019-12-22 22:47:56.241 6 ERROR nova.virt.libvirt.driver [-] [instance: dd6c5065-2655-4b29-bb24-2901627ddbc0] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://zhavoronok/system: Unable to resolve address 'zhavoronok' service '16509': Name or service not known: libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+tcp://zhavoronok/system: Unable to resolve address 'zhavoronok' service '16509': Name or service not known
2019-12-22 22:47:56.682 6 ERROR nova.virt.libvirt.driver [-] [instance: dd6c5065-2655-4b29-bb24-2901627ddbc0] Migration operation has aborted

Revision history for this message
Dincer Celik (dincercelik) wrote :

Live Migration failure: operation failed: guest CPU doesn't match specification: missing features: vmx: libvirtError: operation failed: guest CPU doesn't match specification: missing features: vmx

I think you are using mixed type of cpus on your compute nodes so you cannot do live migration between them with passthrough cpu mode. You may try overriding cpu types, see https://docs.openstack.org/nova/stein/configuration/config.html#libvirt.cpu_mode

Changed in kolla-ansible:
status: New → Invalid
Revision history for this message
Mark Goddard (mgoddard) wrote :

vmx is hardware virtualisation support, has it been enabled in the BIOS for all nodes?

Revision history for this message
BN (zatoichy) wrote :

Issue was solved: I think it still exists on Stein version. Once you add the new machine to cluster, /etc/hosts are not updated in containers therefore, machines were not able to connect to new machine via hostname. I ve restarted containers and it fixed the issue. I am not sure but I think it was fixed in Train.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Hmm, that was not fixed because it is docker that manages /etc/hosts. It seems to be copying this on container restart indeed. Interesting. I run my deploy with DNS so never experienced this but this surely looks like an issue to be aware of when users report their problems. We might want to think about a possible workaround. Maybe start mounting /etc/hosts ro in all containers so it follows host's - we don't do docker networking so it does not matter that docker cannot add new entries this way.

Changed in kolla-ansible:
status: Invalid → Triaged
summary: - New host live migration
+ /etc/hosts not updated in containers
Changed in kolla-ansible:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.