No valid host found when trying to run coreos template through Heat

Bug #1390246 reported by Devdatta Kulkarni
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Solum
New
Undecided
caowei

Bug Description

While building and testing this patch:

https://review.openstack.org/#/c/102646/

I am seeing following error in solum-deployer

"stack": {"parent": null, "disable_rollback": true, "description": "Basic app deploy.\n", "links": [{"href": "http://10.0.2.15:8004/v1/970d61363eec48d2864f4fc85f90fd5c/stacks/ex14-e15c3262-d519-4cef-961b-42f59b10ef66/ff1940f9-a813-4ccb-a392-cc997dd40730", "rel": "self"}], "stack_status_reason": "Resource CREATE failed: ResourceInError: Went to status ERROR due to \"Message: No valid host was found. , Code: 500\"", "stack_name": "ex14-e15c3262-d519-4cef-961b-42f59b10ef66", "outputs": [{"output_value": "", "description": "The public IP address of the newly configured Server.", "output_key": "public_ip"}, {"output_value": "http://:5000", "description": "The URL for the Server.", "output_key": "URL"}], "stack_owner": null, "creation_time": "2014-11-06T22:36:01Z", "capabilities": [], "notification_topics": [], "updated_time": null, "timeout_mins": null, "stack_status": "CREATE_FAILED", "parameters": {"OS::stack_id": "ff1940f9-a813-4ccb-a392-cc997dd40730", "OS::stack_name": "ex14-e15c3262-d519-4cef-961b-42f59b10ef66", "key_name": "", "image": "coreos", "du_image": "10.0.2.15:5042/nodeus", "flavor": "m1.small", "port": "5000", "app_name": "ex14"}, "id": "ff1940f9-a813-4ccb-a392-cc997dd40730", "template_description": "Basic app deploy.\n"}}
 log_http_response /opt/stack/python-heatclient/heatclient/common/http.py:133
2014-11-06 22:37:52.864 13986 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 10.0.2.15:5672

I am seeing this in the vagrant environment, which is from here (https://github.com/rackerlabs/vagrant-solum-dev) but modified as follows:

I am using

SOLUM_IMAGE_FORMAT=vm
NOVADOCKER_BRANCH=d1ad84793b7f2182de04df8a5323d6928af672ca
DEVSTACK_BRANCH=ba842f5374f28d1f17bc008349a2d01958dfe82d

In local.conf.vm, I have:

GLANCE_BRANCH=8c161b6a4b0a7617ee224b23ada0a368e97eaae7
NOVA_BRANCH=c5ac21f3dbb4ad59efcb631d91e4e64f77fba43f
HEAT_BRANCH=56811cfb6d00a5c0a80bbe964b178605e7cdd12c
HEAT_REPO=https://github.com/devdattakulkarni/heat.git

I am using a modified version of coreos.yaml (https://review.openstack.org/#/c/102646/10/etc/solum/templates/coreos.yaml). In my modified version, I am not using neutron. Correspondingly, I have commented out following in local.con.vm:

#enable_service q-svc
#enable_service q-agt
#enable_service q-dhcp
#enable_service q-l3
#enable_service q-meta
#enable_service neutron

I have also commented out following line in deployer/handlers/heat.py

#parameters.update(heat_utils.get_network_parameters(osc))

Apart from this, there were several bugs that I found in
lib/solum
vm-slug/build-app

(Here is a summary of those)
- lib/solum does not install docker and docker-registry when the SOLUM_IMAGE_FORMAT is set to 'vm'. The code is present though. So don't know why these are not getting installed.

- vm-slug/build-app
- docker ps needs to be changed to sudo docker ps
- LOG_FILE=$(GET_LOGFILE) line needs to be added

Finally, on the devstack host machine I had to manually install 'libguestfs-tools' (sudo apt-get install libguestfs-tools). This should be made part of lib/solum.
Moreover, for this patch to work, we need a glance image named 'coreos'. For that I followed instructions on

https://coreos.com/docs/running-coreos/platforms/openstack/

with the glance image-create command as follows:

glance image-create --name coreos --container-format bare --disk-format qcow2 --file coreos_production_openstack_image.img

Revision history for this message
Devdatta Kulkarni (devdatta-kulkarni) wrote :

nova boot fails with the following stack trace in n-cond

> sleeping for 60.00 seconds from (pid=26577) _inner /opt/stack/nova/nova/openstack/common/loopingcall.py:132
2014-11-07 01:21:34.482 ERROR nova.scheduler.utils [req-f48a1224-526f-4470-81c6-2e9b342123f5 demo demo] [instance: 80690e5f-d6bc-447a-a602-b1de5ab57902] Error from last host: devstack (node devstack): [u'Traceback (most recent call last):\n', u' File "/opt/stack/nova/nova/compute/manager.py", line 2014, in do_build_and_run_instance\n filter_properties)\n', u' File "/opt/stack/nova/nova/compute/manager.py", line 2149, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance 80690e5f-d6bc-447a-a602-b1de5ab57902 was re-scheduled: internal error: process exited while connecting to monitor: Cannot set up guest memory 'pc.ram': Cannot allocate memory\n\n"]

Revision history for this message
Paul Czarkowski (paulcz) wrote :

that sounds like there's not enough memory in the box so scheduler refuses to start a vm. you can remove the 'memory' or 'ram' section from the scheduler drivers in the nova.conf to force it to not check, or add more memory to your VM.

why are you skipping neutron? the heat templates are expecting it to be there to set up the networking correctly.

Revision history for this message
Devdatta Kulkarni (devdatta-kulkarni) wrote :

@paulcz: Thanks for the suggestion on nova. will try it.

About skipping neutron.. I am doing it for two reasons.

1) In the VM flow, I was running into a 'subnet' error (https://github.com/rackerlabs/vagrant-solum-dev/issues/31)

2) I want to test whether coreos template would work in a situation where neutron might not be available.

So I decided to remove neutron from the mix. I have created a simpler template in which I don't have public_port and floating_ip resources. I have also removed references to these resources and have also removed the corresponding parameters.

Revision history for this message
Paul Czarkowski (paulcz) wrote :

Anyone wanting to run Solum will have a fairly mature openstack story and it's unlikely they'd be using anything other than neutron. If you switch back to nova-network then it means at some point you'll need to redo the work to get neutron working again.

Certain large public cloud vendors are neutron based, so it would make sense to ensure that neutron support is a first class citizen, especially for the VM use case.

Revision history for this message
Devdatta Kulkarni (devdatta-kulkarni) wrote :

Sure. I am not advocating to remove neutron.

Non-neutron option would be required for those infrastructures which don't support neutron yet.

We can support both options (with and without neutron) through configuration.

Revision history for this message
Devdatta Kulkarni (devdatta-kulkarni) wrote :

I was able to spin up a coreos cluster consisting of two nodes on a Vagrant VM with 6GB memory. Here are the commands that I used.

These steps assume that you have a coreos image registered in glance, you have created cloud-config.yaml, and you have generated ssh keypair.

For steps 1 and 2 below, you will need admin privileges.
On devstack you can gain those by doing 'source devstack/openrc admin'.

1) Create nova key-pair
nova keypair-add --pub-key ./ssh/id_rsa.pub devstack-keypair

2) Create nova network
nova network-create --fixed-range-v4 172.16.0.0/24 coreos-network

3) Boot coreos cluster
nova boot --user-data ./cloud-config.yaml --image 279bb4b9-3c77-40f9-9a92-a083be466363 --flavor m1.small --num-instances 2 --config-drive=true --security-groups default --nic net-id=25dc2d10-d1e4-4dff-9cf7-f1e8b31ba9de --key-name devstack-keypair coreos-vms-2

(Find the ids using following commands:
glance image-list
nova network-list
nova keypair-list)

4) Find out IP addresses of the spun up coreos machines
nova list

5) SSH into a machine (the default pre-configured username is 'core')
ssh -i id_rsa core@172.16.0.2

Once logged into the machine, I am seeing "Failed Units: 2" message on the console. The trace of the login session is below:

vagrant@devstack:~/.ssh$ ssh -i id_rsa core@172.16.0.2
The authenticity of host '172.16.0.2 (172.16.0.2)' can't be established.
ED25519 key fingerprint is 2f:57:8c:16:59:5d:41:14:c9:f4:81:ad:c6:2c:fe:02.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.16.0.2' (ED25519) to the list of known hosts.
CoreOS (alpha)
Failed Units: 2
  user-configdrive.path
  oem-cloudinit.service
core@coreos-vms-2-2095cfd8-8cdf-4a8b-9c8b-73dd2de91d13 ~ $

---

Note that in above steps I am not using Solum's CoreOS patch yet. I am just doing nova-boot. But now that I know that CoreOS can be spun up on devstack, I will be going back to Solum's CoreOS patch.

---

Revision history for this message
Devdatta Kulkarni (devdatta-kulkarni) wrote :
Download full text (3.3 KiB)

Finally successful in deploying CoreOS template!

Here is the screen trace from the solum-deployer screen.

The resource was found at http://10.0.2.15:8004/v1/1cb3982d28cc480cbf72cfdfe90e5e6f/stacks/coreos3-nov11-df718e8a-14c3-4756-95d7-9e9cd2a8a4bd/f5a06c66-3d1d-4ecd-9ef9-d0e3a25c4e29; you should be redirected automatically.
 log_http_response /opt/stack/python-heatclient/heatclient/common/http.py:133
2014-11-11 17:42:58.222 1186 DEBUG heatclient.common.http [-] curl -i -X GET -H 'User-Agent: python-heatclient' -H 'Content-Type: application/json' -H 'X-Auth-Url: http://10.0.2.15:5000/v3' -H 'Accept: application/json' -H 'X-Auth-Token: {SHA1}80d9b287fbcfbaa59e771b156f5c7f1b9ee974ec' http://10.0.2.15:8004/v1/1cb3982d28cc480cbf72cfdfe90e5e6f/stacks/coreos3-nov11-df718e8a-14c3-4756-95d7-9e9cd2a8a4bd/f5a06c66-3d1d-4ecd-9ef9-d0e3a25c4e29 log_curl_request /opt/stack/python-heatclient/heatclient/common/http.py:120
2014-11-11 17:42:58.376 1186 DEBUG heatclient.common.http [-]
HTTP/1.1 200 OK
date: Tue, 11 Nov 2014 17:42:58 GMT
content-length: 1208
content-type: application/json; charset=UTF-8

{"stack": {"parent": null, "disable_rollback": true, "description": "Basic app deploy.\n", "links": [{"href": "http://10.0.2.15:8004/v1/1cb3982d28cc480cbf72cfdfe90e5e6f/stacks/coreos3-nov11-df718e8a-14c3-4756-95d7-9e9cd2a8a4bd/f5a06c66-3d1d-4ecd-9ef9-d0e3a25c4e29", "rel": "self"}], "stack_status_reason": "Stack CREATE completed successfully", "stack_name": "coreos3-nov11-df718e8a-14c3-4756-95d7-9e9cd2a8a4bd", "outputs": [{"output_value": "", "description": "The public IP address of the newly configured Server.", "output_key": "public_ip"}, {"output_value": "http://:5000", "description": "The URL for the Server.", "output_key": "URL"}], "stack_owner": null, "creation_time": "2014-11-11T17:42:47Z", "capabilities": [], "notification_topics": [], "updated_time": null, "timeout_mins": null, "stack_status": "CREATE_COMPLETE", "parameters": {"OS::stack_id": "f5a06c66-3d1d-4ecd-9ef9-d0e3a25c4e29", "OS::stack_name": "coreos3-nov11-df718e8a-14c3-4756-95d7-9e9cd2a8a4bd", "key_name": "", "image": "coreos", "du_image": "10.0.2.15:5042/nodeus", "flavor": "m1.small", "port": "5000", "app_name": "coreos3-nov11"}, "id": "f5a06c66-3d1d-4ecd-9ef9-d0e3a25c4e29", "template_description": "Basic app deploy.\n"}}
 log_http_response /opt/stack/python-heatclient/heatclient/common/http.py:133
2014-11-11 17:42:58.390 1186 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 10.0.2.15:5672

solum assembly list shows the assembly to be in the 'READY' state.

However, it is not clear whether the container is actually up and running.

docker ps shows empty list (so, no container seems to have started up).

docker images shows following:

REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
10.0.2.15:5042/nodeus latest 798ba8b62ee4 6 minutes ago 855.7 MB

Notice that the "10.0.2.15:5042/nodeus" is same as the value of the "du_image" attribute passed to the heat template.

Some other observations.

1) The created container's image does seem to be getting uploaded to Glance. glance image-list does no...

Read more...

Revision history for this message
Devdatta Kulkarni (devdatta-kulkarni) wrote :
Download full text (9.1 KiB)

vagrant@devstack:~$ heat stack-show f5a06c66-3d1d-4ecd-9ef9-d0e3a25c4e29
+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| capabilities | [] |
| creation_time | 2014-11-11T17:42:47Z |
| description | Basic app deploy. |
| disable_rollback | True |
| id | f5a06c66-3d1d-4ecd-9ef9-d0e3a25c4e29 |
| links | http://10.0.2.15:8004/v1/1cb3982d28cc480cbf72cfdfe90e5e6f/stacks/coreos3-nov11-df718e8a-14c3-4756-95d7-9e9cd2a8a4bd/f5a06c66-3d1d-4ecd-9ef9-d0e3a25c4e29 (self) |
| notification_topics | [] |
| outputs | [ |
| | { |
| | "output_value": "", |
| | "description": "The public IP address of the newly configured Server.", |
| | "output_key": "public_ip" |
| | }, |
| | { ...

Read more...

Revision history for this message
Devdatta Kulkarni (devdatta-kulkarni) wrote :
Download full text (5.7 KiB)

Some more success.

I was able to spin up a CoreOS VM and login into it.
Once logged into VM though, I am seeing following error trace. It is complaining that the Docker image is not found. However, on the host when "docker images" is run, it shows the following:

REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
10.0.2.15:5042/nodeus latest daca6bf0c799 11 minutes ago 855.7 MB

So the container image is available on the docker registry.

core@ex12 ~ $ 172.24.4.3 - - [12/Nov/2014:18:56:41] "GET /v1/_ping HTTP/1.1" 200 4 "-" "Go 1.1 package http"
                                                                                                            2014-11-12 18:56:41,300 INFO: 172.24.4.3 - - [12/Nov/2014:18:56:41] "GET /v1/_ping HTTP/1.1" 200 4 "-" "Go 1.1 package http"
                              2014-11-12 18:56:41,400 DEBUG: api_error: images not found
                                                                                        172.24.4.3 - - [12/Nov/2014:18:56:41] "GET /v1/repositories/nodeus/images HTTP/1.1" 404 29 "-" "docker/1.3.0 go/go1.3.2 git-commit/c78088f kernel/3.17.2+ os/linux arch/amd64"
                                                            2014-11-12 18:56:41,402 INFO: 172.24.4.3 - - [12/Nov/2014:18:56:41] "GET /v1/repositories/nodeus/images HTTP/1.1" 404 29 "-" "docker/1.3.0 go/go1.3.2 git-commit/c78088f kernel/3.17.2+ os/linux arch/amd64"
                                                              172.24.4.3 - - [12/Nov/2014:18:56:47] "GET /v1/_ping HTTP/1.1" 200 4 "-" "Go 1.1 package http"
                                                                                                                                                            2014-11-12 18:56:47,269 INFO: 172.24.4.3 - - [12/Nov/2014:18:56:47] "GET /v1/_ping HTTP/1.1" 200 4 "-" "Go 1.1 package http"
                                                                              2014-11-12 18:56:47,365 DEBUG: api_error: images not found
                                                                                                                                        172.24.4.3 - - [12/Nov/2014:18:56:47] "GET /v1/repositories/nodeus/images HTTP/1.1" 404 29 "-" "docker/1.3.0 go/go1.3.2 git-commit/c78088f kernel/3.17.2+ os/linux arch/amd64"
                                                                                                            2014-11-12 18:56:47,366 INFO: 172.24.4.3 - - [12/Nov/2014:18:56:47] "GET /v1/repositories/nodeus/images HTTP/1.1" 404 29 "-" "docker/1.3.0 go/go1.3.2 git-commit/c78088f kernel/3.17.2+ os/linux arch/amd64"
core@ex12 ~ $ 172.24.4.3 - - [12/Nov/2014:18:56:53] "GET /v1/_ping HTTP/1.1" 200 4 "-" "Go 1.1 package http"
                                                                                                            2014-11-12 18:56:53,258 INFO: 172.24.4.3 - - [12/Nov/2014:18:56:53] "GET /v1/_ping HTTP/1.1" 200 4 "-" "Go 1.1 package http"
                                                                                                                                                                           ...

Read more...

Changed in solum:
assignee: nobody → Devdatta (devdatta-kulkarni)
Revision history for this message
Devdatta Kulkarni (devdatta-kulkarni) wrote :

If the VM cannot be pinged, this seem to work:

iptables -A POSTROUTING -t nat --src 10.0.0.0/24 -j MASQUERADE

https://ask.openstack.org/en/question/9340/cant-ping-from-vm-to-controller-node-ubuntuhavananova-networking/

Revision history for this message
Adrian Otto (aotto) wrote :

It's not clear to me if we presently have this problem or not. Please confirm.

Changed in solum:
status: New → Incomplete
Revision history for this message
Devdatta Kulkarni (devdatta-kulkarni) wrote :

I haven't gotten around to testing this, but most likely this problem still exists.

Changed in solum:
status: Incomplete → New
tags: added: heat solum-infra
Changed in solum:
assignee: Devdatta Kulkarni (devdatta-kulkarni) → caowei (caowei-e)
Revision history for this message
caowei (caowei-e) wrote : 回复:[Bug 1390246] Re: No valid host found when trying to run coreos template through Heat
Download full text (4.3 KiB)

OK
------------------------------------------------------------------发件人:Devdatta Kulkarni <email address hidden>发送时间:2016年6月16日(星期四) 01:19收件人:caowei_e <email address hidden>主 题:[Bug 1390246] Re: No valid host found when trying to run coreos template through Heat
** Changed in: solum
     Assignee: Devdatta Kulkarni (devdatta-kulkarni) => caowei (caowei-e)

--
You received this bug notification because you are a bug assignee.
Matching subscriptions: solum
https://bugs.launchpad.net/bugs/1390246

Title:
  No valid host found when trying to run coreos template through Heat

Status in Solum:
  New

Bug description:
  While building and testing this patch:

  https://review.openstack.org/#/c/102646/

  I am seeing following error in solum-deployer

  "stack": {"parent": null, "disable_rollback": true, "description": "Basic app deploy.\n", "links": [{"href": "http://10.0.2.15:8004/v1/970d61363eec48d2864f4fc85f90fd5c/stacks/ex14-e15c3262-d519-4cef-961b-42f59b10ef66/ff1940f9-a813-4ccb-a392-cc997dd40730", "rel": "self"}], "stack_status_reason": "Resource CREATE failed: ResourceInError: Went to status ERROR due to \"Message: No valid host was found. , Code: 500\"", "stack_name": "ex14-e15c3262-d519-4cef-961b-42f59b10ef66", "outputs": [{"output_value": "", "description": "The public IP address of the newly configured Server.", "output_key": "public_ip"}, {"output_value": "http://:5000", "description": "The URL for the Server.", "output_key": "URL"}], "stack_owner": null, "creation_time": "2014-11-06T22:36:01Z", "capabilities": [], "notification_topics": [], "updated_time": null, "timeout_mins": null, "stack_status": "CREATE_FAILED", "parameters": {"OS::stack_id": "ff1940f9-a813-4ccb-a392-cc997dd40730", "OS::stack_name": "ex14-e15c3262-d519-4cef-961b-42f59b10ef66", "key_name": "", "image": "coreos", "du_image": "10.0.2.15:5042/nodeus", "flavor": "m1.small", "port": "5000", "app_name": "ex14"}, "id": "ff1940f9-a813-4ccb-a392-cc997dd40730", "template_description": "Basic app deploy.\n"}}
   log_http_response /opt/stack/python-heatclient/heatclient/common/http.py:133
  2014-11-06 22:37:52.864 13986 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 10.0.2.15:5672

  I am seeing this in the vagrant environment, which is from here
  (https://github.com/rackerlabs/vagrant-solum-dev) but modified as
  follows:

  I am using

  SOLUM_IMAGE_FORMAT=vm
  NOVADOCKER_BRANCH=d1ad84793b7f2182de04df8a5323d6928af672ca
  DEVSTACK_BRANCH=ba842f5374f28d1f17bc008349a2d01958dfe82d

  In local.conf.vm, I have:

  GLANCE_BRANCH=8c161b6a4b0a7617ee224b23ada0a368e97eaae7
  NOVA_BRANCH=c5ac21f3dbb4ad59efcb631d91e4e64f77fba43f
  HEAT_BRANCH=56811cfb6d00a5c0a80bbe964b178605e7cdd12c
  HEAT_REPO=https://github.com/devdattakulkarni/heat.git

  I am using a modified version of coreos.yaml
  (https://review.openstack.org/#/c/102646/10/etc/solum/templates/coreos.yaml).
  In my modified version, I am not using neutron. Correspondingly, I
  have commented out following in local.con.vm:

  #enable_service q-svc
  #enable_service q-agt
  #enable_service q-dhcp
  #enable_service q-l3
  #enable_service q-meta
  #enable_service neutron

  I have also commented ou...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.