Juno release - Openstack
Ironic Juno - 2014.2 tar ball
Env:
one controller node - all openstack controller services and ironic-api service
one network node
one compute node - nova-compute service and ironic-conductor service with tftp service
2 ironic conductor nodes - run only ironc-conductor services with tftp services
[ totally one nova-compute service and 3 ironic-conductor services]
1) Launched an instance on a node
2) got scheduled to one conductor
3) the conductor deployed deploy ramdisk/kernel and copied the user image perfectly
4) the conductor finished deploying saying "Deploying to node ... done.
5) the node got rebooted.
6) Now, another conductor reserved it and started sending it deploy ramdisk/kernel and fails to do tgt because it was not able to get the token (token not found server error for tgt). But, as the node got second rebooted, nova instance was set to Active state. But, ssh will not work. [it gets stuck at the where it tries to get authorized from keystone but does not send enough info and results in authentication required error ]
The same behavior can be observed sometimes(inconsistently) if a successfully launched node is either rebooted from its command line or soft rebooted from nova command. The node becomes unreachable.
Another interesting observation is the server-ip-address and tftp-server keeps changing in the neutron dnsmasq's opts file. Opts file details are provided at the end of this report. The same port f6f5db91-a621-443b-b307-915eb00df8fa gets tftp-serve and server-ip-address as 10.10.44.17 and 10.10.44.18 (these are where ironic-conductors are running with their tftp service)
Looks like there is a race among multiple conductor services.
ironic-conductor log - that successfully launched till second reboot:
2014-11-28 18:05:09.145 4178 DEBUG ironic.common.utils [-] Execution completed, command line is "parted -a optimal -s /dev/disk/by-path/ip-10.10.44.52:3260-iscsi-iqn-14131dce-a5dd-489e-84f6-f9b1b8b56800-lun-1 -- unit MiB mklabel msdos mkpart primary 1 61441" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:84
2014-11-28 18:05:09.153 4178 DEBUG ironic.common.utils [-] Command stdout is: "" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:85
2014-11-28 18:05:09.157 4178 DEBUG ironic.common.utils [-] Command stderr is: "" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:86
2014-11-28 18:05:09.622 4178 DEBUG ironic.common.utils [-] Execution completed, command line is "fuser /dev/disk/by-path/ip-10.10.44.52:3260-iscsi-iqn-14131dce-a5dd-489e-84f6-f9b1b8b56800-lun-1" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:84
2014-11-28 18:05:09.630 4178 DEBUG ironic.common.utils [-] Command stdout is: "" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:85
2014-11-28 18:05:09.633 4178 DEBUG ironic.common.utils [-] Command stderr is: "" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:86
2014-11-28 18:09:33.690 4178 DEBUG ironic.common.utils [-] Execution completed, command line is "dd if=/var/lib/ironic/images/14131dce-a5dd-489e-84f6-f9b1b8b56800/disk of=/dev/disk/by-path/ip-10.10.44.52:3260-iscsi-iqn-14131dce-a5dd-489e-84f6-f9b1b8b56800-lun-1-part1 bs=1M oflag=direct" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:84
2014-11-28 18:09:33.700 4178 DEBUG ironic.common.utils [-] Command stdout is: "" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:85
2014-11-28 18:09:33.701 4178 DEBUG ironic.common.utils [-] Command stderr is: "1375+1 records in
1375+1 records out
1441923072 bytes (1.4 GB) copied, 263.634 s, 5.5 MB/s
" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:86
2014-11-28 18:09:33.846 4178 DEBUG ironic.common.utils [-] Execution completed, command line is "blkid -s UUID -o value /dev/disk/by-path/ip-10.10.44.52:3260-iscsi-iqn-14131dce-a5dd-489e-84f6-f9b1b8b56800-lun-1-part1" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:84
2014-11-28 18:09:33.847 4178 DEBUG ironic.common.utils [-] Command stdout is: "1a2c7ada-adc0-4765-b205-ae525d501fad
" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:85
2014-11-28 18:09:33.847 4178 DEBUG ironic.common.utils [-] Command stderr is: "" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:86
2014-11-28 18:09:34.437 4178 DEBUG ironic.common.utils [-] Execution completed, command line is "iscsiadm -m node -p 10.10.44.52:3260 -T iqn-14131dce-a5dd-489e-84f6-f9b1b8b56800 --logout" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:84
2014-11-28 18:09:34.438 4178 DEBUG ironic.common.utils [-] Command stdout is: "Logging out of session [sid: 8, target: iqn-14131dce-a5dd-489e-84f6-f9b1b8b56800, portal: 10.10.44.52,3260]
Logout of [sid: 8, target: iqn-14131dce-a5dd-489e-84f6-f9b1b8b56800, portal: 10.10.44.52,3260] successful.
" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:85
2014-11-28 18:09:34.439 4178 DEBUG ironic.common.utils [-] Command stderr is: "" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:86
2014-11-28 18:09:34.502 4178 DEBUG ironic.common.utils [-] Execution completed, command line is "iscsiadm -m node -p 10.10.44.52:3260 -T iqn-14131dce-a5dd-489e-84f6-f9b1b8b56800 -o delete" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:84
2014-11-28 18:09:34.503 4178 DEBUG ironic.common.utils [-] Command stdout is: "" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:85
2014-11-28 18:09:34.503 4178 DEBUG ironic.common.utils [-] Command stderr is: "" execute /usr/local/lib/python2.7/dist-packages/ironic/common/utils.py:86
2014-11-28 18:09:34.505 4178 DEBUG ironic.drivers.modules.image_cache [-] Starting clean up for master image cache /var/lib/ironic/master_images clean_up /usr/local/lib/python2.7/dist-packages/ironic/drivers/modules/image_cache.py:169
2014-11-28 18:09:37.509 4178 INFO ironic.drivers.modules.pxe [-] Deployment to node 14131dce-a5dd-489e-84f6-f9b1b8b56800 done
ironic-conductor 2 log that took over the same node right after the above conductor rebooted it:
2014-11-28 18:10:41.342 4409 DEBUG ironic.conductor.manager [-] Conductor trustycompute2.example.com taking over node 14131dce-a5dd-489e-84f6-f9b1b8b56800 _do_takeover /usr/local/lib/python2.7/dist-packages/ironic/conductor/manager.py:878
2014-11-28 18:10:41.356 4409 DEBUG ironic.common.pxe_utils [-] Building PXE config for node 14131dce-a5dd-489e-84f6-f9b1b8b56800 create_pxe_config /usr/local/lib/python2.7/dist-packages/ironic/common/pxe_utils.py:182
2014-11-28 18:10:41.366 4409 DEBUG ironic.drivers.modules.pxe [-] Fetching kernel and ramdisk for node 14131dce-a5dd-489e-84f6-f9b1b8b56800 _cache_ramdisk_kernel /usr/local/lib/python2.7/dist-packages/ironic/drivers/modules/pxe.py:212
2014-11-28 18:10:41.367 4409 DEBUG ironic.common.glance_service.base_image_service [-] Getting image metadata from glance. Image: e4e64019-0827-4ee1-a66f-b04e7b84f357 _show /usr/local/lib/python2.7/dist-packages/ironic/common/glance_service/base_image_service.py:186
2014-11-28 18:10:41.370 4409 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.198.10
2014-11-28 18:10:41.372 4409 DEBUG urllib3.connectionpool [-] Setting read timeout to <object object at 0x7fbbda5db0c0> _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:375
2014-11-28 18:10:41.493 4409 DEBUG urllib3.connectionpool [-] "HEAD /v1/images/e4e64019-0827-4ee1-a66f-b04e7b84f357 HTTP/1.1" 200 0 _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:415
2014-11-28 18:10:41.497 4409 DEBUG urllib3.connectionpool [-] Setting read timeout to <object object at 0x7fbbda5db0c0> _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:375
2014-11-28 18:10:41.549 4409 DEBUG urllib3.connectionpool [-] "HEAD /v1/images/e4e64019-0827-4ee1-a66f-b04e7b84f357 HTTP/1.1" 200 0 _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:415
2014-11-28 18:10:41.552 4409 DEBUG ironic.common.glance_service.base_image_service [-] Getting image metadata from glance. Image: 81fcbc70-457d-46df-9f4f-5bf32483a36d _show /usr/local/lib/python2.7/dist-packages/ironic/common/glance_service/base_image_service.py:186
2014-11-28 18:10:41.554 4409 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.198.10
2014-11-28 18:10:41.557 4409 DEBUG urllib3.connectionpool [-] Setting read timeout to <object object at 0x7fbbda5db0c0> _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:375
2014-11-28 18:10:41.620 4409 DEBUG urllib3.connectionpool [-] "HEAD /v1/images/81fcbc70-457d-46df-9f4f-5bf32483a36d HTTP/1.1" 200 0 _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:415
2014-11-28 18:10:41.624 4409 DEBUG urllib3.connectionpool [-] Setting read timeout to <object object at 0x7fbbda5db0c0> _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:375
2014-11-28 18:10:41.650 4409 DEBUG urllib3.connectionpool [-] "HEAD /v1/images/81fcbc70-457d-46df-9f4f-5bf32483a36d HTTP/1.1" 200 0 _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:415
2014-11-28 18:10:41.654 4409 DEBUG ironic.common.glance_service.base_image_service [-] Getting image metadata from glance. Image: 6b180bf0-aac6-4fa5-bdd7-86ac96e077e8 _show /usr/local/lib/python2.7/dist-packages/ironic/common/glance_service/base_image_service.py:186
Network node:
neutron dnsmasq's options file:
cat /var/lib/neutron/dhcp/3e1b6943-8639-48f3-ae16-0828a7daa9cb/opts
tag:tag0,option:classless-static-route,169.254.169.254/32,10.10.44.31,0.0.0.0/0,10.10.44.15
tag:tag0,249,169.254.169.254/32,10.10.44.31,0.0.0.0/0,10.10.44.15
tag:tag0,option:router,10.10.44.15
tag:b19eea1a-6182-4c8f-ab7d-598b76f61106,option:bootfile-name,pxelinux.0
tag:b19eea1a-6182-4c8f-ab7d-598b76f61106,option:server-ip-address,10.10.44.17
tag:b19eea1a-6182-4c8f-ab7d-598b76f61106,option:tftp-server,10.10.44.17
tag:6b896b2f-2c44-46b0-a4d5-b16de2375f77,option:tftp-server,10.10.44.17
tag:6b896b2f-2c44-46b0-a4d5-b16de2375f77,option:bootfile-name,pxelinux.0
tag:6b896b2f-2c44-46b0-a4d5-b16de2375f77,option:server-ip-address,10.10.44.17
tag:f6f5db91-a621-443b-b307-915eb00df8fa,option:bootfile-name,pxelinux.0
tag:f6f5db91-a621-443b-b307-915eb00df8fa,option:server-ip-address,10.10.44.18
tag:f6f5db91-a621-443b-b307-915eb00df8fa,option:tftp-server,10.10.44.18
tag:tag0,option:dns-server,10.10.44.31
cat /var/lib/neutron/dhcp/3e1b6943-8639-48f3-ae16-0828a7daa9cb/opts
tag:tag0,option:classless-static-route,169.254.169.254/32,10.10.44.31,0.0.0.0/0,10.10.44.15
tag:tag0,249,169.254.169.254/32,10.10.44.31,0.0.0.0/0,10.10.44.15
tag:tag0,option:router,10.10.44.15
tag:b19eea1a-6182-4c8f-ab7d-598b76f61106,option:bootfile-name,pxelinux.0
tag:b19eea1a-6182-4c8f-ab7d-598b76f61106,option:server-ip-address,10.10.44.17
tag:b19eea1a-6182-4c8f-ab7d-598b76f61106,option:tftp-server,10.10.44.17
tag:6b896b2f-2c44-46b0-a4d5-b16de2375f77,option:tftp-server,10.10.44.17
tag:6b896b2f-2c44-46b0-a4d5-b16de2375f77,option:bootfile-name,pxelinux.0
tag:6b896b2f-2c44-46b0-a4d5-b16de2375f77,option:server-ip-address,10.10.44.17
tag:f6f5db91-a621-443b-b307-915eb00df8fa,option:bootfile-name,pxelinux.0
tag:f6f5db91-a621-443b-b307-915eb00df8fa,option:server-ip-address,10.10.44.17
tag:f6f5db91-a621-443b-b307-915eb00df8fa,option:tftp-server,10.10.44.17
tag:tag0,option:dns-server,10.10.44.31
Is this still an active bug?