Fuel for OpenStack

Bug #1386861
Activity log

Activity log for bug #1386861

Date	Who	What changed	Old value	New value	Message
2014-10-28 19:53:41	Aleksandr Shaposhnikov	bug			added bug
2014-10-28 20:24:25	Łukasz Oleś	fuel: milestone		6.0
2014-10-28 20:24:34	Łukasz Oleś	fuel: importance	Undecided	High
2014-10-28 20:24:44	Łukasz Oleś	fuel: assignee		Fuel Library Team (fuel-library)
2014-10-28 20:28:58	Roman Alekseenkov	fuel: importance	High	Critical
2014-10-28 20:38:47	Aleksandr Shaposhnikov	description	Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD). There is two issues that this bug reveals: 1. Code responsible for wiping out HDD before provisioning doesn't works. 2. Fuel/MOS masternode not able to provide nodes with IP address in time. Suggested solutions(it would be nice to have them all implemented in some way): 1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic. 2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority.	Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD). There is two issues that this bug reveals: 1. Code responsible for wiping out HDD before provisioning doesn't works. 2. Fuel/MOS masternode not able to provide nodes with IP address in time. Suggested solutions(it would be nice to have them all implemented in some way): 1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic. 2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority. 3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP.
2014-10-28 20:54:20	Aleksandr Shaposhnikov	description	Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD). There is two issues that this bug reveals: 1. Code responsible for wiping out HDD before provisioning doesn't works. 2. Fuel/MOS masternode not able to provide nodes with IP address in time. Suggested solutions(it would be nice to have them all implemented in some way): 1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic. 2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority. 3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP.	Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD). There is two issues that this bug reveals: 1. Code responsible for wiping out HDD before provisioning doesn't works. 2. Fuel/MOS masternode not able to provide nodes with IP address in time. Suggested solutions(it would be nice to have them all implemented in some way): 1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic. 2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority. 3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP. 4. Implement https://blueprints.launchpad.net/fuel/+spec/continue-deployment to be able to continue deployment even if some compute nodes failed. 5. (Depends on 4) Increase latency between cobbler commands to restart nodes to escape "all the nodes boots at the same time". Suggested delay - 3-5 seconds from each other. Increasing it a lot will lead to increased overall time so let's try to keep provisioning time reasonable. Best approach - boot up controllers with delay of 10 seconds and when they started to provision we could reboot computes one by one because they wouldn't be deployed/installed unless controllers are ready. Without fourth variant this wouldn't work at all.
2014-10-28 21:10:30	Aleksandr Shaposhnikov	description	Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD). There is two issues that this bug reveals: 1. Code responsible for wiping out HDD before provisioning doesn't works. 2. Fuel/MOS masternode not able to provide nodes with IP address in time. Suggested solutions(it would be nice to have them all implemented in some way): 1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic. 2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority. 3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP. 4. Implement https://blueprints.launchpad.net/fuel/+spec/continue-deployment to be able to continue deployment even if some compute nodes failed. 5. (Depends on 4) Increase latency between cobbler commands to restart nodes to escape "all the nodes boots at the same time". Suggested delay - 3-5 seconds from each other. Increasing it a lot will lead to increased overall time so let's try to keep provisioning time reasonable. Best approach - boot up controllers with delay of 10 seconds and when they started to provision we could reboot computes one by one because they wouldn't be deployed/installed unless controllers are ready. Without fourth variant this wouldn't work at all.	Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemtps because we have two NICs tuned the same way) and after that booting from HDD(SSD). There is two issues that this bug reveals: 1. Code responsible for wiping out HDD before provisioning doesn't works. 2. Fuel/MOS masternode not able to provide nodes with IP address in time. Suggested solutions(it would be nice to have them all implemented in some way): 1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic. 2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority. 3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP. 4. Implement https://blueprints.launchpad.net/fuel/+spec/continue-deployment to be able to continue deployment even if some compute nodes failed. 5. (Depends on 4) Increase latency between cobbler commands to restart nodes to escape "all the nodes boots at the same time". Suggested delay - 3-5 seconds from each other. Increasing it a lot will lead to increased overall time so let's try to keep provisioning time reasonable. Best approach - boot up controllers with delay of 10 seconds and when they started to provision we could reboot computes one by one because they wouldn't be deployed/installed unless controllers are ready. Without fourth variant this wouldn't work at all. P.S. 47 nodes could go without any problems. 50 - problems happen. But for Ubuntu and provisioning. CentOS looks better. It looks like the bottleneck somewhere on masternode because Ubuntu and provisioning have pretty big initrd's but CentOS doesn't.
2014-10-29 12:28:09	Bogdan Dobrelya	fuel: status	New	Triaged
2014-10-30 15:14:11	Łukasz Oleś	fuel: status	Triaged	In Progress
2014-10-30 15:14:14	Łukasz Oleś	fuel: assignee	Fuel Library Team (fuel-library)	Łukasz Oleś (loles)
2014-11-27 12:34:40	Tomasz 'Zen' Napierala	fuel: milestone	6.0	6.1
2014-11-27 12:34:45	Tomasz 'Zen' Napierala	fuel: importance	Critical	Medium
2015-01-20 14:10:51	Łukasz Oleś	fuel: assignee	Łukasz Oleś (loles)
2015-01-21 09:25:23	Vladimir Kuklin	fuel: assignee		Fuel Library Team (fuel-library)
2015-02-06 09:42:27	Stanislaw Bogatkin	fuel: status	In Progress	Triaged
2015-03-31 17:09:35	Vladimir Kuklin	fuel: assignee	Fuel Library Team (fuel-library)	Tomasz 'Zen' Napierala (tzn)
2015-04-01 10:20:25	Vladimir Kuklin	nominated for series		fuel/7.0.x
2015-04-01 10:20:25	Vladimir Kuklin	bug task added		fuel/7.0.x
2015-04-01 10:20:25	Vladimir Kuklin	nominated for series		fuel/6.1.x
2015-04-01 10:20:25	Vladimir Kuklin	bug task added		fuel/6.1.x
2015-04-01 10:20:35	Vladimir Kuklin	fuel/6.1.x: status	Triaged	Won't Fix
2015-04-01 10:20:52	Vladimir Kuklin	fuel/7.0.x: milestone		7.0
2015-04-07 12:12:23	Vladimir Kuklin	fuel/7.0.x: assignee		Fuel Library Team (fuel-library)
2015-04-07 12:12:25	Vladimir Kuklin	fuel/7.0.x: importance	Undecided	Critical
2015-04-07 12:12:26	Vladimir Kuklin	fuel/7.0.x: importance	Critical	Medium
2015-04-07 12:12:36	Vladimir Kuklin	fuel/7.0.x: status	New	Confirmed
2015-04-14 12:07:21	Tomasz 'Zen' Napierala	fuel: assignee	Tomasz 'Zen' Napierala (tzn)	Fuel Library Team (fuel-library)
2015-04-14 12:08:42	Tomasz 'Zen' Napierala	fuel: status	Triaged	Won't Fix
2015-04-14 12:08:59	Tomasz 'Zen' Napierala	fuel/6.1.x: assignee	Tomasz 'Zen' Napierala (tzn)	Fuel Library Team (fuel-library)
2015-04-28 10:03:05	OpenStack Infra	fuel: status	Won't Fix	In Progress
2015-04-28 10:03:05	OpenStack Infra	fuel: assignee	Fuel Library Team (fuel-library)	Alexander Evseev (aevseev-h)
2015-04-29 18:33:18	Fuel Devops McRobotson	fuel: milestone	6.1	7.0
2015-07-30 11:07:20	Sergii Golovatiuk	fuel/7.0.x: importance	Medium	Wishlist
2015-07-30 11:07:23	Sergii Golovatiuk	fuel/6.1.x: importance	Medium	Wishlist
2015-07-30 11:08:53	Vladimir Kuklin	fuel/7.0.x: importance	Wishlist	Medium
2015-07-30 11:08:55	Vladimir Kuklin	fuel/6.1.x: importance	Wishlist	Medium
2015-08-03 11:36:38	Vladimir Kuklin	fuel/7.0.x: status	Confirmed	Incomplete
2015-08-18 08:22:28	Dina Belova	fuel/7.0.x: status	Incomplete	Invalid
2015-12-01 20:48:12	Sergii Golovatiuk	nominated for series		fuel/8.0.x
2015-12-01 20:48:12	Sergii Golovatiuk	bug task added		fuel/8.0.x
2015-12-01 20:49:46	Sergii Golovatiuk	fuel/8.0.x: status	Invalid	Confirmed
2015-12-01 20:49:58	Sergii Golovatiuk	fuel/8.0.x: status	Confirmed	Invalid