Comment 1 for bug 2015870

Revision history for this message
TCSECP (tcsecp) wrote :

Hi Team, Below I have mentioned the Autos calling issue in Magnum.
Issue Summary:
The nodes are scaling down but , it's failed to scaling up and shooting below error.

Template:

openstack coe cluster template create k8s--calico-cinder-auto-health-largef_min4_max5_lb_21_1 --image fedora-coreos-32 --keypair k8s --external-network Magnum-Test --master-lb-enabled --dns-nameserver 8.8.8.8 --master-flavor g1t1.large --flavor g1t1.large --network-driver calico --coe kubernetes --label container_infra_prefix="tcsmagnum.tcsecp.com/tcsmagnum/" --label 'docker_volume_type=az1-stable2' --label 'boot_volume_size=40' --label boot_volume_type=az1-stable2 --docker-volume-size 20 --docker-storage-driver overlay2 --label kube_tag=v1.21.1 --label calico_ipv4pool=10.100.0.0/24 --label flannel_network_subnetlen=28 --label flannel_backend=host-gw --fixed-network 532ebede-e9d0-4ec4-8bf1-abab1e8d786f --fixed-subnet eebe853c-70bb-48f6-8edc-6bb8f92b181e --label metrics_server_enabled=true --label monitoring_enabled=true --label prometheus_adapter_enabled=true --label cinder_csi_enabled=true --label grafana_admin_passwd=linux --volume-driver cinder --label 'auto_healing_enabled=True' --label 'auto_healing_controller=magnum-auto-healer' --label 'auto_scaling_enabled=True' --label 'min_node_count=1' --label 'max_node_count=6' --label 'health_status=True' --label 'health_status_reason=True'

Auto scaler pod logs

I0423 20:06:55.746258 1 scale_down.go:638] Can't retrieve node maynew-mln6rohb3yuf-node-3 from snapshot, removing from unremovable map, err: node not found
I0423 20:07:25.965144 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 20:07:25.965332 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 7.888µs
[root@maynew-mln6rohb3yuf-master-0 core]#

I0423 19:39:25.958460 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 19:39:25.958687 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 9.151µs
I0423 19:41:25.958877 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 19:41:25.958919 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 9.119µs
I0423 19:43:25.959125 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 19:43:25.959655 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 9.968µs
I0423 19:45:25.959969 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 19:45:25.960640 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 12.974µs
I0423 19:47:25.961019 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 19:47:25.961073 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 10.614µs
I0423 19:49:25.961326 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 19:49:25.961711 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 29.921µs
I0423 19:50:39.599246 1 scale_down.go:638] Can't retrieve node maynew-mln6rohb3yuf-node-1 from snapshot, removing from unremovable map, err: node not found
I0423 19:51:25.961994 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 19:51:25.962051 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 9.99µs
I0423 19:53:25.962300 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 19:53:25.962613 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 8.136µs
I0423 19:55:25.962822 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 19:55:25.962863 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 9.68µs
I0423 19:57:25.963197 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 19:57:25.963441 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 10.752µs
I0423 19:59:25.963650 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 19:59:25.963852 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 21.935µs
I0423 20:01:25.964029 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 20:01:25.964067 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 9.843µs
I0423 20:03:25.964256 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 20:03:25.964430 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 7.094µs
I0423 20:05:25.964730 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 20:05:25.964921 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 17.052µs
I0423 20:05:40.544105 1 static_autoscaler.go:559] Decreasing size of default-worker, expected=4 current=3 delta=-1
I0423 20:05:40.544181 1 magnum_nodegroup.go:255] Decreasing target size by -1, 4->3
I0423 20:05:43.051101 1 static_autoscaler.go:342] Some node group target size was fixed, skipping the iteration
I0423 20:06:55.746258 1 scale_down.go:638] Can't retrieve node maynew-mln6rohb3yuf-node-3 from snapshot, removing from unremovable map, err: node not found
I0423 20:07:25.965144 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 20:07:25.965332 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 7.888µs
I0423 20:09:25.965569 1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0423 20:09:25.965605 1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 7.894µs

Couldn't find template for node group default-worker
E0423 19:13:14.998665 1 static_autoscaler.go:415] Failed to scale up: Could not compute total resources: No node info for: default-worker

Regards,
Sriramu Desingh.