Activity log for bug #1386861

Date Who What changed Old value New value Message
2014-10-28 19:53:41 Aleksandr Shaposhnikov bug added bug
2014-10-28 20:24:25 Łukasz Oleś fuel: milestone 6.0
2014-10-28 20:24:34 Łukasz Oleś fuel: importance Undecided High
2014-10-28 20:24:44 Łukasz Oleś fuel: assignee Fuel Library Team (fuel-library)
2014-10-28 20:28:58 Roman Alekseenkov fuel: importance High Critical
2014-10-28 20:38:47 Aleksandr Shaposhnikov description Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD). There is two issues that this bug reveals: 1. Code responsible for wiping out HDD before provisioning doesn't works. 2. Fuel/MOS masternode not able to provide nodes with IP address in time. Suggested solutions(it would be nice to have them all implemented in some way): 1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic. 2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority. Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD). There is two issues that this bug reveals: 1. Code responsible for wiping out HDD before provisioning doesn't works. 2. Fuel/MOS masternode not able to provide nodes with IP address in time. Suggested solutions(it would be nice to have them all implemented in some way): 1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic. 2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority. 3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP.
2014-10-28 20:54:20 Aleksandr Shaposhnikov description Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD). There is two issues that this bug reveals: 1. Code responsible for wiping out HDD before provisioning doesn't works. 2. Fuel/MOS masternode not able to provide nodes with IP address in time. Suggested solutions(it would be nice to have them all implemented in some way): 1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic. 2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority. 3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP. Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD). There is two issues that this bug reveals: 1. Code responsible for wiping out HDD before provisioning doesn't works. 2. Fuel/MOS masternode not able to provide nodes with IP address in time. Suggested solutions(it would be nice to have them all implemented in some way): 1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic. 2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority. 3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP. 4. Implement https://blueprints.launchpad.net/fuel/+spec/continue-deployment to be able to continue deployment even if some compute nodes failed. 5. (Depends on 4) Increase latency between cobbler commands to restart nodes to escape "all the nodes boots at the same time". Suggested delay - 3-5 seconds from each other. Increasing it a lot will lead to increased overall time so let's try to keep provisioning time reasonable. Best approach - boot up controllers with delay of 10 seconds and when they started to provision we could reboot computes one by one because they wouldn't be deployed/installed unless controllers are ready. Without fourth variant this wouldn't work at all.
2014-10-28 21:10:30 Aleksandr Shaposhnikov description Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD). There is two issues that this bug reveals: 1. Code responsible for wiping out HDD before provisioning doesn't works. 2. Fuel/MOS masternode not able to provide nodes with IP address in time. Suggested solutions(it would be nice to have them all implemented in some way): 1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic. 2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority. 3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP. 4. Implement https://blueprints.launchpad.net/fuel/+spec/continue-deployment to be able to continue deployment even if some compute nodes failed. 5. (Depends on 4) Increase latency between cobbler commands to restart nodes to escape "all the nodes boots at the same time". Suggested delay - 3-5 seconds from each other. Increasing it a lot will lead to increased overall time so let's try to keep provisioning time reasonable. Best approach - boot up controllers with delay of 10 seconds and when they started to provision we could reboot computes one by one because they wouldn't be deployed/installed unless controllers are ready. Without fourth variant this wouldn't work at all. Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemtps because we have two NICs tuned the same way) and after that booting from HDD(SSD). There is two issues that this bug reveals: 1. Code responsible for wiping out HDD before provisioning doesn't works. 2. Fuel/MOS masternode not able to provide nodes with IP address in time. Suggested solutions(it would be nice to have them all implemented in some way): 1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic. 2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority. 3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP. 4. Implement https://blueprints.launchpad.net/fuel/+spec/continue-deployment to be able to continue deployment even if some compute nodes failed. 5. (Depends on 4) Increase latency between cobbler commands to restart nodes to escape "all the nodes boots at the same time". Suggested delay - 3-5 seconds from each other. Increasing it a lot will lead to increased overall time so let's try to keep provisioning time reasonable. Best approach - boot up controllers with delay of 10 seconds and when they started to provision we could reboot computes one by one because they wouldn't be deployed/installed unless controllers are ready. Without fourth variant this wouldn't work at all. P.S. 47 nodes could go without any problems. 50 - problems happen. But for Ubuntu and provisioning. CentOS looks better. It looks like the bottleneck somewhere on masternode because Ubuntu and provisioning have pretty big initrd's but CentOS doesn't.
2014-10-29 12:28:09 Bogdan Dobrelya fuel: status New Triaged
2014-10-30 15:14:11 Łukasz Oleś fuel: status Triaged In Progress
2014-10-30 15:14:14 Łukasz Oleś fuel: assignee Fuel Library Team (fuel-library) Łukasz Oleś (loles)
2014-11-27 12:34:40 Tomasz 'Zen' Napierala fuel: milestone 6.0 6.1
2014-11-27 12:34:45 Tomasz 'Zen' Napierala fuel: importance Critical Medium
2015-01-20 14:10:51 Łukasz Oleś fuel: assignee Łukasz Oleś (loles)
2015-01-21 09:25:23 Vladimir Kuklin fuel: assignee Fuel Library Team (fuel-library)
2015-02-06 09:42:27 Stanislaw Bogatkin fuel: status In Progress Triaged
2015-03-31 17:09:35 Vladimir Kuklin fuel: assignee Fuel Library Team (fuel-library) Tomasz 'Zen' Napierala (tzn)
2015-04-01 10:20:25 Vladimir Kuklin nominated for series fuel/7.0.x
2015-04-01 10:20:25 Vladimir Kuklin bug task added fuel/7.0.x
2015-04-01 10:20:25 Vladimir Kuklin nominated for series fuel/6.1.x
2015-04-01 10:20:25 Vladimir Kuklin bug task added fuel/6.1.x
2015-04-01 10:20:35 Vladimir Kuklin fuel/6.1.x: status Triaged Won't Fix
2015-04-01 10:20:52 Vladimir Kuklin fuel/7.0.x: milestone 7.0
2015-04-07 12:12:23 Vladimir Kuklin fuel/7.0.x: assignee Fuel Library Team (fuel-library)
2015-04-07 12:12:25 Vladimir Kuklin fuel/7.0.x: importance Undecided Critical
2015-04-07 12:12:26 Vladimir Kuklin fuel/7.0.x: importance Critical Medium
2015-04-07 12:12:36 Vladimir Kuklin fuel/7.0.x: status New Confirmed
2015-04-14 12:07:21 Tomasz 'Zen' Napierala fuel: assignee Tomasz 'Zen' Napierala (tzn) Fuel Library Team (fuel-library)
2015-04-14 12:08:42 Tomasz 'Zen' Napierala fuel: status Triaged Won't Fix
2015-04-14 12:08:59 Tomasz 'Zen' Napierala fuel/6.1.x: assignee Tomasz 'Zen' Napierala (tzn) Fuel Library Team (fuel-library)
2015-04-28 10:03:05 OpenStack Infra fuel: status Won't Fix In Progress
2015-04-28 10:03:05 OpenStack Infra fuel: assignee Fuel Library Team (fuel-library) Alexander Evseev (aevseev-h)
2015-04-29 18:33:18 Fuel Devops McRobotson fuel: milestone 6.1 7.0
2015-07-30 11:07:20 Sergii Golovatiuk fuel/7.0.x: importance Medium Wishlist
2015-07-30 11:07:23 Sergii Golovatiuk fuel/6.1.x: importance Medium Wishlist
2015-07-30 11:08:53 Vladimir Kuklin fuel/7.0.x: importance Wishlist Medium
2015-07-30 11:08:55 Vladimir Kuklin fuel/6.1.x: importance Wishlist Medium
2015-08-03 11:36:38 Vladimir Kuklin fuel/7.0.x: status Confirmed Incomplete
2015-08-18 08:22:28 Dina Belova fuel/7.0.x: status Incomplete Invalid
2015-12-01 20:48:12 Sergii Golovatiuk nominated for series fuel/8.0.x
2015-12-01 20:48:12 Sergii Golovatiuk bug task added fuel/8.0.x
2015-12-01 20:49:46 Sergii Golovatiuk fuel/8.0.x: status Invalid Confirmed
2015-12-01 20:49:58 Sergii Golovatiuk fuel/8.0.x: status Confirmed Invalid