Comment 5 for bug 1879577

Revision history for this message
Harald Jensås (harald-jensas) wrote :

An observation, introspection starts and nodes seem to boot. Then after 20 minutes introspection is aborted, and retried, probably a timeout? Then we get the errors setting boot device PXE when trying to start introspection again.

*** Introspection start ***

2020-05-19 16:07:22.133 7 INFO ironic_inspector.introspect [-] [node: 5d7a147c-ecce-4262-990e-95b8106281b4 state starting] Introspection started successfully

*** we see successfull DHCP for all the nodes ***

May 19 16:07:34 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:3a:d2:42
May 19 16:07:34 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.100 fa:16:3e:3a:d2:42
May 19 16:07:34 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:3a:d2:42
May 19 16:07:34 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.100 fa:16:3e:3a:d2:42
May 19 16:07:34 dnsmasq-dhcp[8]: DHCPREQUEST(br-ctlplane) 192.168.24.100 fa:16:3e:3a:d2:42
May 19 16:07:34 dnsmasq-dhcp[8]: DHCPACK(br-ctlplane) 192.168.24.100 fa:16:3e:3a:d2:42
May 19 16:08:42 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:88:73:76
May 19 16:08:42 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.101 fa:16:3e:88:73:76
May 19 16:08:42 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:88:73:76
May 19 16:08:42 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.101 fa:16:3e:88:73:76
May 19 16:08:42 dnsmasq-dhcp[8]: DHCPREQUEST(br-ctlplane) 192.168.24.101 fa:16:3e:88:73:76
May 19 16:08:42 dnsmasq-dhcp[8]: DHCPACK(br-ctlplane) 192.168.24.101 fa:16:3e:88:73:76
May 19 16:09:38 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:25:97:6d
May 19 16:09:38 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.102 fa:16:3e:25:97:6d
May 19 16:09:38 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:25:97:6d
May 19 16:09:38 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.102 fa:16:3e:25:97:6d
May 19 16:09:38 dnsmasq-dhcp[8]: DHCPREQUEST(br-ctlplane) 192.168.24.102 fa:16:3e:25:97:6d
May 19 16:09:38 dnsmasq-dhcp[8]: DHCPACK(br-ctlplane) 192.168.24.102 fa:16:3e:25:97:6d
May 19 16:14:13 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:75:51:8a
May 19 16:14:13 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.103 fa:16:3e:75:51:8a
May 19 16:14:13 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:75:51:8a
May 19 16:14:13 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.103 fa:16:3e:75:51:8a
May 19 16:14:13 dnsmasq-dhcp[8]: DHCPREQUEST(br-ctlplane) 192.168.24.103 fa:16:3e:75:51:8a
May 19 16:14:13 dnsmasq-dhcp[8]: DHCPACK(br-ctlplane) 192.168.24.103 fa:16:3e:75:51:8a
May 19 16:20:38 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:3a:d2:42
May 19 16:20:38 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.104 fa:16:3e:3a:d2:42
May 19 16:20:38 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:3a:d2:42
May 19 16:20:38 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.104 fa:16:3e:3a:d2:42
May 19 16:20:38 dnsmasq-dhcp[8]: DHCPREQUEST(br-ctlplane) 192.168.24.104 fa:16:3e:3a:d2:42
May 19 16:20:38 dnsmasq-dhcp[8]: DHCPACK(br-ctlplane) 192.168.24.104 fa:16:3e:3a:d2:42

*** Introspection is aborted 20 minutes later ***

2020-05-19 16:27:21.552 7 DEBUG ironic_inspector.introspect [-] Aborting introspection for node 5d7a147c-ecce-4262-990e-95b8106281b4 abort /usr/lib/python3.6/site-packages/ironic_inspector/introspect.py:152

2020-05-19 16:27:21.750 7 DEBUG ironic_inspector.node_cache [-] [node: 5d7a147c-ecce-4262-990e-95b8106281b4 state error] Committing fields: {'finished_at': datetime.datetime(2020, 5, 19, 16, 27, 21, 734366), 'error': 'Canceled by operator'} _commit /usr/lib/python3.6/site-packages/ironic_inspector/node_cache.py:150

*** It tries to start introspection again, and we get error ***
2020-05-19 16:27:24.358 7 DEBUG ironic_inspector.node_cache [-] [node: 5d7a147c-ecce-4262-990e-95b8106281b4 state error] Executing fsm(error).process_event(start) fsm_event /usr/lib/python3.6/site-packages/ironic_inspector/node_cache.py:207

2020-05-19 16:27:26.955 7 ERROR ironic_inspector.utils [-] [node: 5d7a147c-ecce-4262-990e-95b8106281b4 state starting] Failed to set boot device to PXE: Failed to set boot device for node 5d7a147c-ecce-4262-990e-95b8106281b4: Client Error for url: http://192.168.24.3:6385/v1/nodes/5d7a147c-ecce-4262-990e-95b8106281b4

var/log/containers/ironic/app.log.txt.gz have a Traceback:

2020-05-19 16:27:26.951 31 DEBUG wsme.api [req-d669aeaa-cbb7-4746-a63b-c69f289aa9b3 59d51e34181b4e3d8d06b4a4d5ddbb03 2ac7a7523af84e0e93ff48f6192f7ec0 - default default] Client-side error: Node 5d7a147c-ecce-4262-990e-95b8106281b4 is locked by host undercloud.localdomain, please retry after the current operation is completed.
Traceback (most recent call last):

  File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 241, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python3.6/site-packages/ironic/conductor/manager.py", line 2721, in set_boot_device
    purpose='setting boot device') as task:

  File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 161, in acquire
    return TaskManager(context, *args, **kwargs)

  File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 243, in __init__
    self.release_resources()

  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()

  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value

  File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 224, in __init__
    self._lock()

  File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 280, in _lock
    reserve_node()

  File "/usr/lib/python3.6/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python3.6/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python3.6/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value

  File "/usr/lib/python3.6/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 273, in reserve_node
    self.node_id)

  File "/usr/lib/python3.6/site-packages/ironic/objects/node.py", line 346, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/usr/lib/python3.6/site-packages/oslo_db/api.py", line 154, in wrapper
    ectxt.value = e.inner_exc

  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()

  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value

  File "/usr/lib/python3.6/site-packages/oslo_db/api.py", line 142, in wrapper
    return f(*args, **kwargs)

  File "/usr/lib/python3.6/site-packages/ironic/db/sqlalchemy/api.py", line 461, in reserve_node
    host=node['reservation'])