Deployment hungs. "/etc/nailgun-agent/config.yaml: No such file or directory" from nodes

Bug #1477932 reported by Leontii Istomin
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Vladimir Sharshov
7.0.x
Invalid
High
Fuel Python (Deprecated)

Bug Description

Deployment just hungs. Even provisioning step can't be done:
[root@fuel ~]# fuel task
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
id | status | name | cluster | progress | uuid
---|---------|------------------------------------|---------|----------|-------------------------------------
18 | ready | dump | None | 100 | c1786ba8-567c-4251-9658-27b3add6ca99
22 | ready | check_dhcp | 6 | 100 | c7d7fe64-a4e7-4968-8833-d092fb3a5408
24 | ready | check_repo_availability_with_setup | 6 | 100 | 7dbb8232-9400-4644-b958-750271edb4b1
23 | ready | check_repo_availability | 6 | 100 | 6f604e43-f151-4fe8-800e-eb6fb94608a1
21 | ready | verify_networks | 6 | 100 | 1c342b53-2538-44b3-a893-45dc33af639d
25 | running | deploy | 6 | 0 | 6ba39a9c-4c8c-4657-9a69-6dbd90bfbbe2

Cluster configuration:
Baremetal,Ubuntu,IBP,HA,Neutron-vlan,Сeph-all,Nova-debug,Nova-quotas,Sahara,Ceilometer,7.0-73
Controllers:3 Computes+Ceph:21

api: '1.0'
astute_sha: b1f37a988e097175cbbd14338286017b46b584c3
auth_required: true
build_id: 2015-07-22_23-50-18
build_number: '73'
feature_groups:
- mirantis
fuel-agent_sha: bc25d3b728e823e6154bac0442f6b88747ac48e1
fuel-library_sha: c174f646b40056668fad1fd62ee4bbb4d7902779
fuel-ostf_sha: 1672bf8908f8f506df5b2e63bd13b56aabbaf266
fuelmain_sha: 68871248453b432ecca0cca5a43ef0aad6079c39
nailgun_sha: 52a413e672b07396245bad16335870fffd8b7546
openstack_version: 2015.1.0-7.0
production: docker
python-fuelclient_sha: 7a29d1ac9b0d39290a45b4150b9157302ae0bf57
release: '7.0'

Diagnostic Snapshot: http://mos-scale-share.mirantis.com/fuel-snapshot-2015-07-24_10-23-51.tar.xz

Revision history for this message
Leontii Istomin (listomin) wrote :

I could successfully deployed the following configurations with this build:
1. Baremetal,Ubuntu,IBP, Neutron-gre,Ceph-all,Nova-debug,nova-quotas,7.0-73
    Controllers:3 Computes+Ceph:3
2. Baremetal,Ubuntu,IBP,HA, Neutron-tun,Ceph-all,Nova-debug,Nova-quotas,Sahara,Ceilometer,7.0-73
    Controllers:3 Computes+Ceph:47

description: updated
Changed in fuel:
milestone: none → 7.0
importance: Undecided → High
assignee: nobody → Fuel Python Team (fuel-python)
Changed in fuel:
status: New → Confirmed
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Vladimir Sharshov (vsharshov)
Revision history for this message
Leontii Istomin (listomin) wrote :

Redeployed fuel node and tried again. Can't reproduce the bug

Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Nailgun-agent always show this message about /etc/nailgun-agent/config.yaml in bootstrap stage because we configure this params using another way.

I've check bootstrap message from local installation and see same thing:

I, [2015-07-24T12:05:33.833135 #2192] INFO -- : Could not get url from configuration file: No such file or directory - /etc/nailgun-agent/config.yaml, trying other ways..

It is not affecting deployment. We should try to search reason in other place

Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Astute log does not contain any information about problem deployment task. More interesting. First tasks which logged was cluster removing.

Nailgun log also do not contain any information about such tasks, but '6ba39a9c-4c8c-4657-9a69-6dbd90bfbbe2' present in database.

Looks like old DB dump was restored to new master node installation.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Leontii Istomin (listomin) wrote :

I didn't perform any actions like restoring database

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Looks like this issue is not reproduced, Leonty, could you please try to reproduce the issue and close it if it is not reproduced anymore? I will close the bug automatically in next 3 weeks if we will have no updates (because QA team can't reproduce the issue).

Thank you!

Revision history for this message
Leontii Istomin (listomin) wrote :

Hasn't been reproduced with 7.0-98 build

Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
Alex Schultz (alex-schultz) wrote :

Reported on VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "301"
  build_id: "301"
  nailgun_sha: "4162b0c15adb425b37608c787944d1983f543aa8"
  python-fuelclient_sha: "486bde57cda1badb68f915f66c61b544108606f3"
  fuel-agent_sha: "50e90af6e3d560e9085ff71d2950cfbcca91af67"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "6c5b73f93e24cc781c809db9159927655ced5012"
  fuel-library_sha: "5d50055aeca1dd0dc53b43825dc4c8f7780be9dd"
  fuel-ostf_sha: "2cd967dccd66cfc3a0abd6af9f31e5b4d150a11c"
  fuelmain_sha: "a65d453215edb0284a2e4761be7a156bb5627677"

Changed in fuel:
status: Invalid → New
Revision history for this message
Alex Schultz (alex-schultz) wrote :
tags: added: customer-found
Revision history for this message
Sundar Nadathur (ns1-sundar) wrote :

I am hitting this issue with Fuel 7.0. I worked around by manually placing a file /etc/nailgun-agent/config.yaml with the following contents:

---
url: https://9.0.1.2:8443/api

The logs in /var/log/messages show that the URL got correctly picked up. But now we are hitting another error. This log repeats in /var/log/messages:

      /usr/bin/nailgun-agent:143:in 'initialize': undefined method '[]' for nil:NilClass (NoMethodError)

Igor Shishkin (teran)
Changed in fuel:
milestone: 7.0 → 8.0
Changed in fuel:
status: New → Confirmed
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Guys, can you provide step to reproduce?

Please pay attention, that original problem is "Deployment just hungs. Even provisioning step can't be done".

Changed in fuel:
status: Confirmed → Incomplete
description: updated
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Removed from 7.0 MU 1 scope as it's Incomplete for 8.0 and the ETA for the fix and backport is not clear

Dmitry Pyzhov (dpyzhov)
tags: added: area-python
Revision history for this message
Dan (enyalius) wrote :

I have experienced the above problem as well. One of my colleagues has had no problems at all with Fuel and I have been having this error for the past week while I try everything I could to find a way around it. I decided to compare the versions we had used (fuel --fuel-version). Aside from different hardware this is the only difference between the two Fuel installs.

(Working Version)
api: '1.0'
astute_sha: 6c5b73f93e24cc781c809db9159927655ced5012
auth_required: true
build_id: '167'
build_number: '167'
feature_groups:
- experimental
fuel-agent_sha: 50e90af6e3d560e9085ff71d2950cfbcca91af67
fuel-library_sha: 5d50055aeca1dd0dc53b43825dc4c8f7780be9dd
fuel-nailgun-agent_sha: d7027952870a35db8dc52f185bb1158cdd3d1ebd
fuel-ostf_sha: 2cd967dccd66cfc3a0abd6af9f31e5b4d150a11c
fuelmain_sha: a65d453215edb0284a2e4761be7a156bb5627677
nailgun_sha: 4162b0c15adb425b37608c787944d1983f543aa8
openstack_version: 2015.1.0-7.0
production: docker
python-fuelclient_sha: 486bde57cda1badb68f915f66c61b544108606f3
release: '7.0'
release_versions:
  2015.1.0-7.0:
    VERSION:
      api: '1.0'
      astute_sha: 6c5b73f93e24cc781c809db9159927655ced5012
      build_id: '167'
      build_number: '167'
      feature_groups:
      - experimental
      fuel-agent_sha: 50e90af6e3d560e9085ff71d2950cfbcca91af67
      fuel-library_sha: 5d50055aeca1dd0dc53b43825dc4c8f7780be9dd
      fuel-nailgun-agent_sha: d7027952870a35db8dc52f185bb1158cdd3d1ebd
      fuel-ostf_sha: 2cd967dccd66cfc3a0abd6af9f31e5b4d150a11c
      fuelmain_sha: a65d453215edb0284a2e4761be7a156bb5627677
      nailgun_sha: 4162b0c15adb425b37608c787944d1983f543aa8
      openstack_version: 2015.1.0-7.0
      production: docker
      python-fuelclient_sha: 486bde57cda1badb68f915f66c61b544108606f3
      release: '7.0'

My version is virtually identical with the exception of; build_id: '165' & build_number: '165'

Hopefully this helps you to reproduce the problem and come up with a fix for it.

Many thanks.

Revision history for this message
Dan (enyalius) wrote :

As added information; I have now tried the build 195 from the link https://ci.fuel-infra.org/ (the fuel-infra.org page does not seem updated). I received the same problem with the 195 build. The only thing I can think of is that either I am missing a key step to setup nailgun that causes this error or the hardware I am using is not compatible.

The setup I am doing involves; Installing the iso on a virtualbox image on a laptop, once the install is done configuring the fuel setup using the /usr/local/sbin/bootstrap_admin_node.sh once that is done I have two networks eth0 is the statically assigned network that connects to my hardware, and eth1 is my internet connection. Once I have finished configuration I run fuel-createmirror to prep for the reduced footprint build as per https://docs.fuel-infra.org/openstack/fuel/fuel-7.0/operations.html#using-the-reduced-footprint-feature

Once that is completed I login to the webportal and create the environment using the following options; kilo ubuntu and kvm (everything else is default). Once the environment is created I add my nodes and this is when I see the error that is mentioned above; "Could not get url from configuration file: No such file or directory - /etc/nailgun-agent/config.yaml, trying other ways.."

Revision history for this message
Dan (enyalius) wrote :

I believe I found why this was causing a problem, the network setup on the web interface was the problem for me, the default settings used a 172 address range which is fine but the problem was that it was attempting to use that range over the existing network interface. I had to add a VLAN tag to fix the problem.

I noted the following warning in the Logs of Fuel Master > WebBackend > Warning: "[7f5d9ed52740] (manager) Checking networks failed: Some untagged networks are assigned to the same physical interface. You should assign them to different physical interfaces. Affected:
"admin (PXE)", "public" networks at node"

You may want to check to see if this is the same problem that you experienced. Hope this helps.

Revision history for this message
Artem Roma (aroma-x) wrote :

No updates for more than 3 weeks, move to incomplete

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.