SM: servers re-image is happening in a loop for ESX ISO, when reimaged a cluster

Bug #1461791 reported by Bharat Kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Won't Fix
Medium
prasad miriyala
Trunk
New
High
prasad miriyala

Bug Description

When a servers in a cluster are reimaged with ESX ISO image, servers in that cluster a reimaging in a loop, only one server in that cluster re-image happens sucessfully, all other servers in that cluster will be re-imaging in a loop.

For looping servers netboot is always enabled in cobbler profile and the reimage sucessfull server status mail is sent for every non sucessfull servers reimage happened.

Tested with R2.20 Build #40

Logs:
=====
root@nodec32:~/images# cobbler system report --name=nodec50
Name : nodec50
TFTP Boot Files : {}
Comment :
Enable gPXE? : 0
Fetchable Files : {}
Gateway :
Hostname : nodec50
Image :
IPv6 Autoconfiguration : False
IPv6 Default Device :
Kernel Options : {'system_name': 'nodec50', 'system_domain': 'englab.juniper.net', 'ip_address': '10.204.221.3', 'server': '10.204.217.17'}
Kernel Options (Post Install) : {}
Kickstart : <<inherit>>
Kickstart Metadata : {'system_name': 'nodec50', 'esx_nicname': 'vmnic0', 'device_cfg': 'http://10.204.217.17/contrail/config_file/nodec50.sh', 'system_domain': 'englab.juniper.net', 'passwd': '$1$ueJTahJl$erdQZWKkNuli3Mks9rpRD.', 'partition': '/dev/sd?', 'server_license': '', 'ip_address': '10.204.221.3'}
LDAP Enabled : False
LDAP Management Type : authconfig
Management Classes : <<inherit>>
Management Parameters : <<inherit>>
Monit Enabled : False
Name Servers : []
Name Servers Search Path : []
Netboot Enabled : True <<<<<<<<<<<<<<<<< Its always true
Owners : ['admin']
Power Management Address : 10.207.25.144
Power Management ID :
Power Management Password : ADMIN
Power Management Type : ipmilan
Power Management Username : ADMIN
Profile : esx
Proxy : <<inherit>>
Red Hat Management Key : <<inherit>>
Red Hat Management Server : <<inherit>>
Repos Enabled : False
Server Override : <<inherit>>
Status : production
Template Files : {}
Virt Auto Boot : <<inherit>>
Virt CPUs : <<inherit>>
Virt Disk Driver Type : <<inherit>>
Virt File Size(GB) : <<inherit>>
Virt Path : <<inherit>>
Virt PXE Boot : 0
Virt RAM (MB) : <<inherit>>
Virt Type : <<inherit>>
Interface ===== : eth0
Bonding Opts :
Bridge Opts :
CNAMES : []
DHCP Tag :
DNS Name : nodec50.englab.juniper.net
Per-Interface Gateway :
Master Interface :
Interface Type :
IP Address : 10.204.221.3
IPv6 Address :
IPv6 Default Gateway :
IPv6 MTU :
IPv6 Prefix :
IPv6 Secondaries : []
IPv6 Static Routes : []
MAC Address : 00:25:90:C4:83:90
Management Interface : False
MTU :
Subnet Mask :
Static : False
Static Routes : []
Virt Bridge :

root@nodec32:~/images# vim /var/lib/tftpboot/pxelinux.cfg/*90
root@nodec32:~/images# cat /var/lib/tftpboot/pxelinux.cfg/*90
default linux
prompt 0
timeout 1
label linux
kernel /images/esx/mboot.c32
ipappend 2
append -c /images/esx/cobbler-boot.cfg
root@nodec32:~/images#

information type: Private → Public
Revision history for this message
Bharat Kumar (pbharat) wrote :

If reimaged server by server after completion of reimage, re-image is happening sucessfully, if reimaged a cluster issue is seen.
Same issue is seen with R2.1 also.

Revision history for this message
prasad miriyala (pmiriyala) wrote :

This is a cobbler issue that, when we issue multiple reimages for esxi. Cobbler is picking the last issued reimage hostname and sending the post installation triggers. As part of post installation trigger, cobbler turns off the Netboot flag.
Because of this, Netboot flag turned off for the last target, and all the others netboot flag is on. Except the last target, all the other targets get into reimage loop.

Work around:
Issue esxi reimages one after the other in sequence

tags: added: quench
Revision history for this message
prasad miriyala (pmiriyala) wrote :

As we create an image, SM creates a distro and profile with cobbler corresponding to an image id. Profile will be associated with distro.
Reimage creates a system and associate with profile. The profile corresponds to the distro.
Distro->Profile->System 1
Distro->Profile->System n
Kernel data, Kickstart meta data and etc… are present in Distro, Profile and System. Data will be taken from lower level and if not goes higher levels.
System configuration should take precedence for kernel meta data or any other kernel options.
Typically system configuration contains specifics about that system, ex: system name, ip address and etc… It looks like ESXi works with only profile data, not with system data.
we are updating the profile data for each reimage to satisfy above hack. Because of that one reimage works at a time.

A workaround:
Create multiple images say esx5.5-s1, esx5.5-s2… esx5.5-sn, which is one time job.
To reimage s1 to sn..
reimage s1 with esx5.5-s1, s2 with esx5.5-s2 and so on…

tags: added: releasenote
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Bug update]

bug update...

no longer affects: juniperopenstack/r3.0
no longer affects: juniperopenstack/r3.1
tags: removed: releasenote
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.