2014-10-28 19:53:41 |
Aleksandr Shaposhnikov |
bug |
|
|
added bug |
2014-10-28 20:24:25 |
Łukasz Oleś |
fuel: milestone |
|
6.0 |
|
2014-10-28 20:24:34 |
Łukasz Oleś |
fuel: importance |
Undecided |
High |
|
2014-10-28 20:24:44 |
Łukasz Oleś |
fuel: assignee |
|
Fuel Library Team (fuel-library) |
|
2014-10-28 20:28:58 |
Roman Alekseenkov |
fuel: importance |
High |
Critical |
|
2014-10-28 20:38:47 |
Aleksandr Shaposhnikov |
description |
Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD).
There is two issues that this bug reveals:
1. Code responsible for wiping out HDD before provisioning doesn't works.
2. Fuel/MOS masternode not able to provide nodes with IP address in time.
Suggested solutions(it would be nice to have them all implemented in some way):
1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic.
2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority. |
Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD).
There is two issues that this bug reveals:
1. Code responsible for wiping out HDD before provisioning doesn't works.
2. Fuel/MOS masternode not able to provide nodes with IP address in time.
Suggested solutions(it would be nice to have them all implemented in some way):
1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic.
2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority.
3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP. |
|
2014-10-28 20:54:20 |
Aleksandr Shaposhnikov |
description |
Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD).
There is two issues that this bug reveals:
1. Code responsible for wiping out HDD before provisioning doesn't works.
2. Fuel/MOS masternode not able to provide nodes with IP address in time.
Suggested solutions(it would be nice to have them all implemented in some way):
1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic.
2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority.
3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP. |
Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD).
There is two issues that this bug reveals:
1. Code responsible for wiping out HDD before provisioning doesn't works.
2. Fuel/MOS masternode not able to provide nodes with IP address in time.
Suggested solutions(it would be nice to have them all implemented in some way):
1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic.
2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority.
3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP.
4. Implement https://blueprints.launchpad.net/fuel/+spec/continue-deployment to be able to continue deployment even if some compute nodes failed.
5. (Depends on 4) Increase latency between cobbler commands to restart nodes to escape "all the nodes boots at the same time". Suggested delay - 3-5 seconds from each other. Increasing it a lot will lead to increased overall time so let's try to keep provisioning time reasonable. Best approach - boot up controllers with delay of 10 seconds and when they started to provision we could reboot computes one by one because they wouldn't be deployed/installed unless controllers are ready. Without fourth variant this wouldn't work at all. |
|
2014-10-28 21:10:30 |
Aleksandr Shaposhnikov |
description |
Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemps because we have two NICs tuned the same way) and after that booting from HDD(SSD).
There is two issues that this bug reveals:
1. Code responsible for wiping out HDD before provisioning doesn't works.
2. Fuel/MOS masternode not able to provide nodes with IP address in time.
Suggested solutions(it would be nice to have them all implemented in some way):
1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic.
2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority.
3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP.
4. Implement https://blueprints.launchpad.net/fuel/+spec/continue-deployment to be able to continue deployment even if some compute nodes failed.
5. (Depends on 4) Increase latency between cobbler commands to restart nodes to escape "all the nodes boots at the same time". Suggested delay - 3-5 seconds from each other. Increasing it a lot will lead to increased overall time so let's try to keep provisioning time reasonable. Best approach - boot up controllers with delay of 10 seconds and when they started to provision we could reboot computes one by one because they wouldn't be deployed/installed unless controllers are ready. Without fourth variant this wouldn't work at all. |
Provisioning almost failed (we did manual steps to reboot two nodes again) because 2 nodes wasn't able to obtain IP via PXE (suggestion) because they booted straight to the previous OS. As far as I know Fuel previously erased first sectors of HDDs when "deploy" command was received and node forcibly rebooted by command from masternode. Looks like this code didn't work now because node didn't get an IP address during PXE and booting up to the next card and still couldn't do that (we have two attemtps because we have two NICs tuned the same way) and after that booting from HDD(SSD).
There is two issues that this bug reveals:
1. Code responsible for wiping out HDD before provisioning doesn't works.
2. Fuel/MOS masternode not able to provide nodes with IP address in time.
Suggested solutions(it would be nice to have them all implemented in some way):
1. Wipe out the disc space. In our situation boot order is looped so it finally will boot from nic.
2. Optimize TFTP/PXE/DHCP traffic priority on masternode. Probably do traffic shaping/prioritizing. DHCP is HIGH priority, DNS, TFTP - medium priority, HTTP, SYSLOG - low priority.
3. Enable gPXE loader ability to fetch files using HTTP/TCP instead of TFTP/UDP.
4. Implement https://blueprints.launchpad.net/fuel/+spec/continue-deployment to be able to continue deployment even if some compute nodes failed.
5. (Depends on 4) Increase latency between cobbler commands to restart nodes to escape "all the nodes boots at the same time". Suggested delay - 3-5 seconds from each other. Increasing it a lot will lead to increased overall time so let's try to keep provisioning time reasonable. Best approach - boot up controllers with delay of 10 seconds and when they started to provision we could reboot computes one by one because they wouldn't be deployed/installed unless controllers are ready. Without fourth variant this wouldn't work at all.
P.S. 47 nodes could go without any problems. 50 - problems happen. But for Ubuntu and provisioning. CentOS looks better. It looks like the bottleneck somewhere on masternode because Ubuntu and provisioning have pretty big initrd's but CentOS doesn't. |
|
2014-10-29 12:28:09 |
Bogdan Dobrelya |
fuel: status |
New |
Triaged |
|
2014-10-30 15:14:11 |
Łukasz Oleś |
fuel: status |
Triaged |
In Progress |
|
2014-10-30 15:14:14 |
Łukasz Oleś |
fuel: assignee |
Fuel Library Team (fuel-library) |
Łukasz Oleś (loles) |
|
2014-11-27 12:34:40 |
Tomasz 'Zen' Napierala |
fuel: milestone |
6.0 |
6.1 |
|
2014-11-27 12:34:45 |
Tomasz 'Zen' Napierala |
fuel: importance |
Critical |
Medium |
|
2015-01-20 14:10:51 |
Łukasz Oleś |
fuel: assignee |
Łukasz Oleś (loles) |
|
|
2015-01-21 09:25:23 |
Vladimir Kuklin |
fuel: assignee |
|
Fuel Library Team (fuel-library) |
|
2015-02-06 09:42:27 |
Stanislaw Bogatkin |
fuel: status |
In Progress |
Triaged |
|
2015-03-31 17:09:35 |
Vladimir Kuklin |
fuel: assignee |
Fuel Library Team (fuel-library) |
Tomasz 'Zen' Napierala (tzn) |
|
2015-04-01 10:20:25 |
Vladimir Kuklin |
nominated for series |
|
fuel/7.0.x |
|
2015-04-01 10:20:25 |
Vladimir Kuklin |
bug task added |
|
fuel/7.0.x |
|
2015-04-01 10:20:25 |
Vladimir Kuklin |
nominated for series |
|
fuel/6.1.x |
|
2015-04-01 10:20:25 |
Vladimir Kuklin |
bug task added |
|
fuel/6.1.x |
|
2015-04-01 10:20:35 |
Vladimir Kuklin |
fuel/6.1.x: status |
Triaged |
Won't Fix |
|
2015-04-01 10:20:52 |
Vladimir Kuklin |
fuel/7.0.x: milestone |
|
7.0 |
|
2015-04-07 12:12:23 |
Vladimir Kuklin |
fuel/7.0.x: assignee |
|
Fuel Library Team (fuel-library) |
|
2015-04-07 12:12:25 |
Vladimir Kuklin |
fuel/7.0.x: importance |
Undecided |
Critical |
|
2015-04-07 12:12:26 |
Vladimir Kuklin |
fuel/7.0.x: importance |
Critical |
Medium |
|
2015-04-07 12:12:36 |
Vladimir Kuklin |
fuel/7.0.x: status |
New |
Confirmed |
|
2015-04-14 12:07:21 |
Tomasz 'Zen' Napierala |
fuel: assignee |
Tomasz 'Zen' Napierala (tzn) |
Fuel Library Team (fuel-library) |
|
2015-04-14 12:08:42 |
Tomasz 'Zen' Napierala |
fuel: status |
Triaged |
Won't Fix |
|
2015-04-14 12:08:59 |
Tomasz 'Zen' Napierala |
fuel/6.1.x: assignee |
Tomasz 'Zen' Napierala (tzn) |
Fuel Library Team (fuel-library) |
|
2015-04-28 10:03:05 |
OpenStack Infra |
fuel: status |
Won't Fix |
In Progress |
|
2015-04-28 10:03:05 |
OpenStack Infra |
fuel: assignee |
Fuel Library Team (fuel-library) |
Alexander Evseev (aevseev-h) |
|
2015-04-29 18:33:18 |
Fuel Devops McRobotson |
fuel: milestone |
6.1 |
7.0 |
|
2015-07-30 11:07:20 |
Sergii Golovatiuk |
fuel/7.0.x: importance |
Medium |
Wishlist |
|
2015-07-30 11:07:23 |
Sergii Golovatiuk |
fuel/6.1.x: importance |
Medium |
Wishlist |
|
2015-07-30 11:08:53 |
Vladimir Kuklin |
fuel/7.0.x: importance |
Wishlist |
Medium |
|
2015-07-30 11:08:55 |
Vladimir Kuklin |
fuel/6.1.x: importance |
Wishlist |
Medium |
|
2015-08-03 11:36:38 |
Vladimir Kuklin |
fuel/7.0.x: status |
Confirmed |
Incomplete |
|
2015-08-18 08:22:28 |
Dina Belova |
fuel/7.0.x: status |
Incomplete |
Invalid |
|
2015-12-01 20:48:12 |
Sergii Golovatiuk |
nominated for series |
|
fuel/8.0.x |
|
2015-12-01 20:48:12 |
Sergii Golovatiuk |
bug task added |
|
fuel/8.0.x |
|
2015-12-01 20:49:46 |
Sergii Golovatiuk |
fuel/8.0.x: status |
Invalid |
Confirmed |
|
2015-12-01 20:49:58 |
Sergii Golovatiuk |
fuel/8.0.x: status |
Confirmed |
Invalid |
|