system dns-modify shortly after controller-0 unlock fails silently

Bug #1812269 reported by Michel Thebeau [WIND]
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Ran An

Bug Description

Title
-----
system dns-modify shortly after controller-0 unlock fails silently

Brief Description
-----------------
After installing and unlocking controller-0 for the first time, adding a DNS nameserver soon after the reboot may pass, but the /etc/resolv.conf file is not updated. The command example is:
  system dns-modify nameservers=8.8.8.8

The 250.001 config out-of-date alarm may persist, however an unrelated request may clear the alarm. Though the alarm clears, /etc/resolv.conf remains not updated.

Severity
--------
Minor; this workaround corrects the condition:

# system dns-modify nameservers=
# # wait for 250.001 alarm to clear
# system dns-modify nameservers=8.8.8.8

Steps to Reproduce
------------------
One needs to login quickly and run the following until the system provides credential and until the system accepts the command:

source /etc/nova/openrc
system dns-modify nameservers=8.8.8.8

Expected Behavior
------------------
This command either works (updating /etc/resolv.conf), or is explicitely blocked until the system is ready:
  system dns-modify nameservers=8.8.8.8

Actual Behavior
----------------
The command is allowed, the "system dns-show" command lists the revision, but /etc/resolv.conf is not updated. The 250.001 config out-of-date alarm may either persist or clear due to a different request.

Reproducibility
---------------
Intermittent - it is racey.

System Configuration
--------------------
any probably; standard controller and simplex were observered

Branch/Pull Time/Commit
-----------------------

2019-01-16_20-18-01
starlingx/master

Timestamp/Logs
--------------

PS: Collect content is available upon request.

# system dns-modify nameservers=8.8.8.8
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | 694f6811-920a-4ada-95cd-e2dc62cc8060 |
| nameservers | 8.8.8.8 |
| isystem_uuid | 2fdf7a36-0258-49a0-871c-8d65c12d35a4 |
| created_at | 2019-01-17T17:35:35.862460+00:00 |
| updated_at | 2019-01-17T21:20:12.015650+00:00 |
+--------------+--------------------------------------+

# system show
...
| nameservers | 8.8.8.8 |

# cat /etc/resolv.conf
nameserver 192.168.204.2

# grplist="$( echo "8\.8\.8\.8
config out of date
Raise system config alarm
Clear.*alarm
config_update_hosts
14a7482e-7928-4fe1-976d-13d5a4c762fd
req-a19bf780-af0b-4167-b18f-9dc26af14f19" | sed ':a;N;$!ba;s/\n/\\\|/g' )"

# grep "$grplist" sysinv.log

2019-01-17 19:16:25.864 7253 WARNING sysinv.api.controllers.v1.dns [-] dns {'uuid': u'694f6811-920a-4ada-95cd-e2dc62cc8060', 'nameservers': u'8.8.8.8', 'created_at': datetime.datetime(2019, 1, 17, 17, 35, 35, 862460, tzinfo=<iso8601.Utc>), 'updated_at': None, 'action': None, 'isystem_uuid': u'2fdf7a36-0258-49a0-871c-8d65c12d35a4', 'id': 1, 'forisystemid': 1}
2019-01-17 19:16:26.710 3994 WARNING sysinv.conductor.manager [req-a19bf780-af0b-4167-b18f-9dc26af14f19 admin admin] controller-0: iconfig out of date: target 14a7482e-7928-4fe1-976d-13d5a4c762fd, applied 55302d77-c9e7-4f90-acc3-ab5f14139060
2019-01-17 19:16:26.711 3994 WARNING sysinv.conductor.manager [req-a19bf780-af0b-4167-b18f-9dc26af14f19 admin admin] SYS_I Raise system config alarm: host controller-0 config applied: 55302d77-c9e7-4f90-acc3-ab5f14139060 vs. target: 14a7482e-7928-4fe1-976d-13d5a4c762fd.
2019-01-17 19:16:26.892 3994 INFO sysinv.conductor.manager [req-a19bf780-af0b-4167-b18f-9dc26af14f19 admin admin] _config_update_hosts config_uuid=14a7482e-7928-4fe1-976d-13d5a4c762fd
2019-01-17 19:16:32.044 3994 INFO sysinv.openstack.common.rpc.common [req-a19bf780-af0b-4167-b18f-9dc26af14f19 admin admin] Connected to AMQP server on 192.168.204.2:5672
2019-01-17 19:17:22.886 3994 WARNING sysinv.conductor.manager [req-fb19c223-2b5e-4ffb-b175-228d1b808df8 None None] controller-0: iconfig out of date: target 14a7482e-7928-4fe1-976d-13d5a4c762fd, applied 55302d77-c9e7-4f90-acc3-ab5f14139060
2019-01-17 19:17:22.886 3994 WARNING sysinv.conductor.manager [req-fb19c223-2b5e-4ffb-b175-228d1b808df8 None None] SYS_I Raise system config alarm: host controller-0 config applied: 55302d77-c9e7-4f90-acc3-ab5f14139060 vs. target: 14a7482e-7928-4fe1-976d-13d5a4c762fd.
2019-01-17 20:05:42.912 3994 WARNING sysinv.conductor.manager [req-9f36329d-9f5d-43d1-b2fb-4d9a8a291b14 admin admin] controller-0: iconfig out of date: target 498b290f-1017-4a60-bc6c-ddea21f6350a, applied 55302d77-c9e7-4f90-acc3-ab5f14139060
2019-01-17 20:05:42.912 3994 WARNING sysinv.conductor.manager [req-9f36329d-9f5d-43d1-b2fb-4d9a8a291b14 admin admin] SYS_I Raise system config alarm: host controller-0 config applied: 55302d77-c9e7-4f90-acc3-ab5f14139060 vs. target: 498b290f-1017-4a60-bc6c-ddea21f6350a.
2019-01-17 20:05:42.980 3994 INFO sysinv.conductor.manager [req-9f36329d-9f5d-43d1-b2fb-4d9a8a291b14 admin admin] _config_update_hosts config_uuid=498b290f-1017-4a60-bc6c-ddea21f6350a
2019-01-17 20:05:46.069 3994 WARNING sysinv.conductor.manager [req-9f36329d-9f5d-43d1-b2fb-4d9a8a291b14 admin admin] controller-0: iconfig out of date: target 3643e8b3-610c-4338-bc5c-190c92d947ae, applied 55302d77-c9e7-4f90-acc3-ab5f14139060
2019-01-17 20:05:46.069 3994 WARNING sysinv.conductor.manager [req-9f36329d-9f5d-43d1-b2fb-4d9a8a291b14 admin admin] SYS_I Raise system config alarm: host controller-0 config applied: 55302d77-c9e7-4f90-acc3-ab5f14139060 vs. target: 3643e8b3-610c-4338-bc5c-190c92d947ae.
2019-01-17 20:05:46.126 3994 INFO sysinv.conductor.manager [req-9f36329d-9f5d-43d1-b2fb-4d9a8a291b14 admin admin] _config_update_hosts config_uuid=3643e8b3-610c-4338-bc5c-190c92d947ae
2019-01-17 20:05:46.201 3994 INFO sysinv.conductor.manager [req-9f36329d-9f5d-43d1-b2fb-4d9a8a291b14 admin admin] SYS_I Clear system config alarm: controller-0
2019-01-17 20:11:50.250 3994 INFO sysinv.conductor.manager [req-9f36329d-9f5d-43d1-b2fb-4d9a8a291b14 admin admin] SYS_I Clear system config alarm: controller-0
2019-01-17 20:12:23.842 3994 INFO sysinv.conductor.manager [-] SYS_I Clear system config alarm: controller-0

Ken Young (kenyis)
Changed in starlingx:
assignee: nobody → Cindy Xie (xxie1)
tags: added: six.config
tags: added: stx.config
removed: six.config
Changed in starlingx:
importance: Undecided → Medium
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating - timing issue; would be nice to address as a robustness item in the upcoming release.

Changed in starlingx:
status: New → Triaged
tags: added: stx.2019.03
Ken Young (kenyis)
tags: added: stx.2019.05
removed: stx.2019.03
Ran An (an.ran)
Changed in starlingx:
assignee: Cindy Xie (xxie1) → Ran An (an.ran)
Revision history for this message
Ran An (an.ran) wrote :

I reproduced this issue and find condition "adding a DNS nameserver soon after the reboot" seems to be not necessary.

My questions: could you tell more about the expected behavior?
in my env "nameserver 192.168.204.2" is always the first line in file /etc/resolv.conf no matter what I do. I can't clear it by command "system dns-modify nameservers=". is it by design?

Revision history for this message
Michel Thebeau [WIND] (mthebeau) wrote :

Hi Ran, 192.168.204.2 is the active controller on the management network. It is internal to the cluster and is the internal DNS server. It should remain first, by design yes. If you nslookup any node's name it is served from that internal DNS server. Check "mgmt" within /etc/dnsmasq.conf

And you should be able to do something like this:
$ nslookup compute-0
Server: 192.168.204.2
Address: 192.168.204.2#53

Name: compute-0
Address: 192.168.204.229

Could you elaborate on "seems to be not necessary" in reference to "adding a DNS nameserver soon after the reboot"? When we add a dns server using system dns-modify command, we intend to reference a DNS server external to the cluster. The use of dns-modify command is applicable to the OAM network.

I found these use-cases:
  "a DNS service should be in place to ensure that links to external references in the current and future versions of the Web administration interface work as expected."

  "Before you can use fully qualified domain names (FQDN)" for ntp configuration.

Revision history for this message
Ran An (an.ran) wrote :

Hi Michel, sorry to confuse you. My point is "soon after the reboot" is not a necessary condition, as I first reproduced this issue on an env which had been unlocked for a whole day and some other tests had been done before.

Revision history for this message
Ran An (an.ran) wrote :

by analysis the sysinv.log I found each time "the /etc/resolv.conf file is not updated", there would lack following log:

sysinv.agent.manager [req-**** admin admin] Agent config applied **** <[config_uuid of the unenforced command]>

and it would appear when next "dns-modify" command was inputted. As the command inputted second time worked, the alarm was clear.

INFO sysinv.agent.manager [req-**** admin admin] Agent config applied 07d51e74-7f32-4c81-9f38-909253257c1d
WARNING sysinv.conductor.manager [req-**** admin admin] SYS_I Raise system config alarm: host controller-0 config applied: 07d51e74-7f32-4c81-9f38-909253257c1d vs. target: 7449f769-008d-46af-ae57-2548f56872be.
INFO sysinv.agent.manager [req-**** admin admin] Agent config applied 7449f769-008d-46af-ae57-2548f56872be
INFO sysinv.conductor.manager [req-**** admin admin] SYS_I Clear system config alarm: controller-0

I'll investigate related codes deeply this weekend.
And any ideas may helpful are welcomed. Thanks

Revision history for this message
Michel Thebeau [WIND] (mthebeau) wrote :

Thanks Ran; the revision that the condition is reproducible later on is interesting. I'll share that internally at WR and reference this launchpad in case it leads to more clues.

Revision history for this message
Ran An (an.ran) wrote :
Revision history for this message
Ran An (an.ran) wrote :

Hi, by code investigating and some tests. I found the command "dns-modify" will response after sysinv-db was updated, and file "/etc/resolv" will be updated asynchronously by "agent-manager" runs in another process called "sysinv-agent".

there are two cases that would lead behaviors shown in this issue.

Case 1. "agent-manager" deals with request "iconfig_update_file" from "agent-rpcapi" before it updated its attr "_ihost_personality" from None to controller, which is updated by a periodic_task.
this case may happen in early life of process "sysinv-agent". And it will lead dns-modify failed silently.

Case 2. the "sysinv-agent" process is busy and do not deal with requst "iconfig_update_file" from "agent-rpcapi" in time.
in this case, if we wait for a while (5~10 min), we could find file /etc/resolv.conf was updated.

ps: in both cases, the alarm(250.001) behaviors are correct.

Changed in starlingx:
status: Triaged → Confirmed
Revision history for this message
Ran An (an.ran) wrote :

For the first case, we could
    a)each time "agent-manager" deals with request "iconfig_update_file", make "agent-manager" check its attr "_ihost_personality" first. If the value is None, update it immediately.
 or b)response dns-modify fail and let the user try it later.

For the second case, we could add some notes to command response to the user like: "please check /etc/resolv.conf a few minutes later if it is not changed currently."

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/637459

Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/637459
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=53b9e4661561c85aabe802d098e79c1c099e6bec
Submitter: Zuul
Branch: master

commit 53b9e4661561c85aabe802d098e79c1c099e6bec
Author: SidneyAn <email address hidden>
Date: Mon Feb 18 22:03:57 2019 +0800

    retry func iconfig_update_file when host personality is None

    when we run "system dns-modify" command, the command will response after
    sysinv-db was updated, and file "/etc/resolv.conf" will be updated
    asynchronously by another process "sysinv-agent". Once the attr
    "_ihost_personality" of agent is None(initial value), it will not update
    file "/etc/resolv.conf" and will not inform sysinv client also,
    which will lead command dns-modify failed silently.

    This patch will retry function iconfig_update_file by which sysinv-agent
    update file "/etc/resolv.conf" when attr "_ihost_personality" is None.

    Closes-bug: 1812269

    Change-Id: I3a0437750a53607c04932c1b9b818e83903bb28b
    Signed-off-by: SidneyAn <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (f/stein)

Fix proposed to branch: f/stein
Review: https://review.openstack.org/638470

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (f/stein)
Download full text (4.1 KiB)

Reviewed: https://review.openstack.org/638470
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=cb45f9b3bdf68cbd6d6d21ccbe31d279dd4d05d1
Submitter: Zuul
Branch: f/stein

commit 28766a8d43f579fb027f4152c3f6586418e1eb9d
Author: Irina Mihai <email address hidden>
Date: Wed Feb 20 21:11:56 2019 +0000

    Prevent download and creation of default Cirros glance image

    - downloading the Cirros image fails in glance-bootstrap if
      the hardcoded requested image is not found
    - to workaround this issue, we disable the download and creation
      of the Cirros image in glance-bootstrap through the overrides
      -> this has no other impact as the image can be created after
         the chart's installation using "openstack image create"

    Change-Id: I418eb236f5eceb0124eb73787fe12e2f0aa2d9e1
    Closes-Bug: 1814142
    Signed-off-by: Irina Mihai <email address hidden>

commit 53b9e4661561c85aabe802d098e79c1c099e6bec
Author: SidneyAn <email address hidden>
Date: Mon Feb 18 22:03:57 2019 +0800

    retry func iconfig_update_file when host personality is None

    when we run "system dns-modify" command, the command will response after
    sysinv-db was updated, and file "/etc/resolv.conf" will be updated
    asynchronously by another process "sysinv-agent". Once the attr
    "_ihost_personality" of agent is None(initial value), it will not update
    file "/etc/resolv.conf" and will not inform sysinv client also,
    which will lead command dns-modify failed silently.

    This patch will retry function iconfig_update_file by which sysinv-agent
    update file "/etc/resolv.conf" when attr "_ihost_personality" is None.

    Closes-bug: 1812269

    Change-Id: I3a0437750a53607c04932c1b9b818e83903bb28b
    Signed-off-by: SidneyAn <email address hidden>

commit 0ce137a99a5fe04490dc23d2574beb6b1adbf343
Author: Kristine Bujold <email address hidden>
Date: Mon Feb 18 12:56:25 2019 -0500

    Move gnocchi and ceilometer static configs

    Move all gnocchi and ceilometer static configurations from the
    overrides to the Armada manifest.

    This is being done so we have a consistent way of managing
    containerized openstack configurations. Static configurations will
    be located in the Armada manifest and dynamic configuration will be
    located in the overrides files.

    Story: 2003909
    Task: 29535

    Change-Id: Ieab861cb1751146b70f722e70b8f89d81c0ed9a5
    Signed-off-by: Kristine Bujold <email address hidden>

commit 99e86fc151d326f4c26f9005d8cf84028078261c
Author: Kristine Bujold <email address hidden>
Date: Fri Feb 15 15:35:40 2019 -0500

    Move heat static configs to Armada manifest

    Move all heat static configurations from the overrides to the
    Armada manifest.

    This is being done so we have a consistent way of managing
    containerized openstack configurations. Static configurations will
    be located in the Armada manifest and dynamic configuration will be
    located in the overrides files.

    Story: 2003909
    Task: 29455

    Change-Id: Ie35b1696b9fce0458db724fc8163d5d181e0768a
    Sig...

Read more...

tags: added: in-f-stein
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.