After this fix I'm now running into a scenario where grafana goes into an indefinite blocked state.
I've deployed prometheus-21 and grafana-39 with my Kubernetes deployment. After things settle I change the web-listen-port to 80 and see:
grafana/0* blocked idle 4 10.5.1.77 3000/tcp Exception reaching prometheus API whilst updating dashboards
I took a look at the logs for grafana and I see that it looks like it's getting the updated port:
2021-02-25 00:59:06 INFO juju-log Invoking reactive handler: reactive/grafana.py:578:wipe_nrpe_checks 2021-02-25 00:59:06 INFO juju-log Invoking reactive handler: reactive/grafana.py:596:configure_sources 2021-02-25 00:59:06 INFO juju-log Found datasource: {'service_name': 'prometheus', 'type': 'prometheus', 'url': 'http://10.5.2.227:80', 'description': 'Juju generated source'} 2021-02-25 00:59:06 INFO juju-log Datasource already exist, updating: prometheus - Juju generated source 2021-02-25 00:59:06 INFO juju-log Checking Dashboard Template: CephCluster.json.j2 2021-02-25 00:59:06 DEBUG juju-log Skipping Dashboard Template: CephCluster.json.j2 missing 31 metrics.Missing: ceph_client_io_read_ops, ceph_osds, ceph_osds_down, ceph_osd_perf_apply_latency_seconds, ceph_cluster_used_bytes, ceph_cluster_capacity_bytes, ceph_osd_perf_commit_latency_seconds, ceph_misplaced_objects, ceph_monitor_quorum_count, ceph_stale_pgs, ceph_undersized_pgs, ceph_degraded_pgs, ceph_osd_up, ceph_stuck_stale_pgs, ceph_client_io_write_bytes, ceph_degraded_objects, ceph_pool_available_bytes, ceph_unclean_pgs, ceph_client_io_write_ops, ceph_health_status, ceph_recovery_io_bytes, ceph_osds_in, ceph_recovery_io_keys, ceph_recovery_io_objects, ceph_cluster_available_bytes, ceph_cluster_objects, ceph_stuck_unclean_pgs, ceph_stuck_degraded_pgs, ceph_osd_pgs, ceph_client_io_read_bytes, ceph_stuck_undersized_pgs 2021-02-25 00:59:06 INFO juju-log Checking Dashboard Template: Swift.json.j2 2021-02-25 00:59:06 DEBUG juju-log Skipping Dashboard Template: Swift.json.j2 missing 13 metrics.Missing: exec_swiftparts_object_handoff, exec_swiftparts_account_handoff, exec_swiftparts_object_primary, object_server_async_pendings, swift_disk_usage_bytes, exec_swiftparts_container_misplaced, swift_replication_stats, swift_replication_duration_seconds, exec_swiftparts_account_misplaced, exec_swiftparts_account_primary, exec_swiftparts_container_primary, exec_swiftparts_container_handoff, exec_swiftparts_object_misplaced 2021-02-25 00:59:06 INFO juju-log Checking Dashboard Template: OpenStackCloud.json.j2 2021-02-25 00:59:06 DEBUG juju-log Skipping Dashboard Template: OpenStackCloud.json.j2 missing 16 metrics.Missing: nova_resources_ram_mbs, hypervisor_disk_gbs_total, nova_resources_disk_gbs, hypervisor_vcpus_used, hypervisor_disk_gbs_used, hypervisor_memory_mbs_used, neutron_net_size, nova_resources_vcpus, nova_instances, hypervisor_memory_mbs_total, hypervisor_running_vms, openstack_allocation_ratio, openstack_exporter_cache_age_seconds, hypervisor_vcpus_total, openstack_exporter_cache_refresh_duration_seconds, hypervisor_schedulable_instances 2021-02-25 00:59:06 INFO juju-log Checking Dashboard Template: CephOSD.json.j2 2021-02-25 00:59:06 DEBUG juju-log Skipping Dashboard Template: CephOSD.json.j2 missing 10 metrics.Missing: ceph_osd_used_bytes, ceph_osd_in, ceph_osds, ceph_osd_perf_apply_latency_seconds, ceph_osd_perf_commit_latency_seconds, ceph_osd_avail_bytes, ceph_osd_variance, ceph_osd_up, ceph_osd_pgs, ceph_osd_utilization 2021-02-25 00:59:06 INFO juju-log Checking Dashboard Template: RabbitMQ.json.j2 2021-02-25 00:59:06 DEBUG juju-log Skipping Dashboard Template: RabbitMQ.json.j2 missing 18 metrics.Missing: rabbitmq_node_fd_total, rabbitmq_overview_messages_acked, rabbitmq_overview_exchanges, rabbitmq_overview_channels, rabbitmq_node_sockets_used, rabbitmq_overview_messages_ready, rabbitmq_overview_messages_published, rabbitmq_node_fd_used, rabbitmq_node_sockets_total, rabbitmq_overview_consumers, rabbitmq_overview_messages_unacked, rabbitmq_overview_connections, rabbitmq_node_mem_limit, rabbitmq_node_mem_used, rabbitmq_node_proc_total, rabbitmq_node_proc_used, rabbitmq_overview_queues, rabbitmq_overview_messages_delivered 2021-02-25 00:59:06 INFO juju-log Checking Dashboard Template: CephPools.json.j2 2021-02-25 00:59:06 DEBUG juju-log Skipping Dashboard Template: CephPools.json.j2 missing 9 metrics.Missing: ceph_pool_raw_used_bytes, ceph_pool_read_total, ceph_pool_read_bytes_total, ceph_pool_used_bytes, ceph_pool_objects_total, ceph_pool_available_bytes, ceph_pool_dirty_objects_total, ceph_pool_write_total, ceph_pool_write_bytes_total 2021-02-25 00:59:06 INFO juju-log Invoking reactive handler: reactive/grafana.py:1211:import_dashboards 2021-02-25 00:59:06 INFO juju-log import_dashboards: telegraf, digest None, is_new: False 2021-02-25 00:59:06 INFO juju-log import_dashboards: kubernetes, digest None, is_new: False 2021-02-25 00:59:06 INFO juju-log import_dashboards: prometheus, digest None, is_new: False 2021-02-25 00:59:06 INFO juju-log Invoking reactive handler: hooks/relations/http/provides.py:15:broken:website 2021-02-25 00:59:06 DEBUG update-status UPDATE DATA_SOURCE SET basic_auth_user = ?, basic_auth_password = ?, basic_auth = 0 ('', '')
But when looking at the db, something is preventing it from being properly updated:
sqlite> SELECT * FROM data_source; 1|1|0|prometheus|prometheus - Juju generated source|proxy|http://10.5.2.227:9090||||0|||0|{}|2021-02-25 00:06:19|2021-02-25 00:06:19|0|{}|0|2459839366 sqlite> .quit
After this fix I'm now running into a scenario where grafana goes into an indefinite blocked state.
I've deployed prometheus-21 and grafana-39 with my Kubernetes deployment. After things settle I change the web-listen-port to 80 and see:
grafana/0* blocked idle 4 10.5.1.77 3000/tcp Exception reaching prometheus API whilst updating dashboards
I took a look at the logs for grafana and I see that it looks like it's getting the updated port:
2021-02-25 00:59:06 INFO juju-log Invoking reactive handler: reactive/ grafana. py:578: wipe_nrpe_ checks grafana. py:596: configure_ sources 10.5.2. 227:80', 'description': 'Juju generated source'} io_read_ ops, ceph_osds, ceph_osds_down, ceph_osd_ perf_apply_ latency_ seconds, ceph_cluster_ used_bytes, ceph_cluster_ capacity_ bytes, ceph_osd_ perf_commit_ latency_ seconds, ceph_misplaced_ objects, ceph_monitor_ quorum_ count, ceph_stale_pgs, ceph_undersized _pgs, ceph_degraded_pgs, ceph_osd_up, ceph_stuck_ stale_pgs, ceph_client_ io_write_ bytes, ceph_degraded_ objects, ceph_pool_ available_ bytes, ceph_unclean_pgs, ceph_client_ io_write_ ops, ceph_health_status, ceph_recovery_ io_bytes, ceph_osds_in, ceph_recovery_ io_keys, ceph_recovery_ io_objects, ceph_cluster_ available_ bytes, ceph_cluster_ objects, ceph_stuck_ unclean_ pgs, ceph_stuck_ degraded_ pgs, ceph_osd_pgs, ceph_client_ io_read_ bytes, ceph_stuck_ undersized_ pgs _object_ handoff, exec_swiftparts _account_ handoff, exec_swiftparts _object_ primary, object_ server_ async_pendings, swift_disk_ usage_bytes, exec_swiftparts _container_ misplaced, swift_replicati on_stats, swift_replicati on_duration_ seconds, exec_swiftparts _account_ misplaced, exec_swiftparts _account_ primary, exec_swiftparts _container_ primary, exec_swiftparts _container_ handoff, exec_swiftparts _object_ misplaced json.j2 json.j2 missing 16 metrics.Missing: nova_resources_ ram_mbs, hypervisor_ disk_gbs_ total, nova_resources_ disk_gbs, hypervisor_ vcpus_used, hypervisor_ disk_gbs_ used, hypervisor_ memory_ mbs_used, neutron_net_size, nova_resources_ vcpus, nova_instances, hypervisor_ memory_ mbs_total, hypervisor_ running_ vms, openstack_ allocation_ ratio, openstack_ exporter_ cache_age_ seconds, hypervisor_ vcpus_total, openstack_ exporter_ cache_refresh_ duration_ seconds, hypervisor_ schedulable_ instances used_bytes, ceph_osd_in, ceph_osds, ceph_osd_ perf_apply_ latency_ seconds, ceph_osd_ perf_commit_ latency_ seconds, ceph_osd_ avail_bytes, ceph_osd_variance, ceph_osd_up, ceph_osd_pgs, ceph_osd_ utilization node_fd_ total, rabbitmq_ overview_ messages_ acked, rabbitmq_ overview_ exchanges, rabbitmq_ overview_ channels, rabbitmq_ node_sockets_ used, rabbitmq_ overview_ messages_ ready, rabbitmq_ overview_ messages_ published, rabbitmq_ node_fd_ used, rabbitmq_ node_sockets_ total, rabbitmq_ overview_ consumers, rabbitmq_ overview_ messages_ unacked, rabbitmq_ overview_ connections, rabbitmq_ node_mem_ limit, rabbitmq_ node_mem_ used, rabbitmq_ node_proc_ total, rabbitmq_ node_proc_ used, rabbitmq_ overview_ queues, rabbitmq_ overview_ messages_ delivered raw_used_ bytes, ceph_pool_ read_total, ceph_pool_ read_bytes_ total, ceph_pool_ used_bytes, ceph_pool_ objects_ total, ceph_pool_ available_ bytes, ceph_pool_ dirty_objects_ total, ceph_pool_ write_total, ceph_pool_ write_bytes_ total grafana. py:1211: import_ dashboards /http/provides. py:15:broken: website
2021-02-25 00:59:06 INFO juju-log Invoking reactive handler: reactive/
2021-02-25 00:59:06 INFO juju-log Found datasource: {'service_name': 'prometheus', 'type': 'prometheus', 'url': 'http://
2021-02-25 00:59:06 INFO juju-log Datasource already exist, updating: prometheus - Juju generated source
2021-02-25 00:59:06 INFO juju-log Checking Dashboard Template: CephCluster.json.j2
2021-02-25 00:59:06 DEBUG juju-log Skipping Dashboard Template: CephCluster.json.j2 missing 31 metrics.Missing: ceph_client_
2021-02-25 00:59:06 INFO juju-log Checking Dashboard Template: Swift.json.j2
2021-02-25 00:59:06 DEBUG juju-log Skipping Dashboard Template: Swift.json.j2 missing 13 metrics.Missing: exec_swiftparts
2021-02-25 00:59:06 INFO juju-log Checking Dashboard Template: OpenStackCloud.
2021-02-25 00:59:06 DEBUG juju-log Skipping Dashboard Template: OpenStackCloud.
2021-02-25 00:59:06 INFO juju-log Checking Dashboard Template: CephOSD.json.j2
2021-02-25 00:59:06 DEBUG juju-log Skipping Dashboard Template: CephOSD.json.j2 missing 10 metrics.Missing: ceph_osd_
2021-02-25 00:59:06 INFO juju-log Checking Dashboard Template: RabbitMQ.json.j2
2021-02-25 00:59:06 DEBUG juju-log Skipping Dashboard Template: RabbitMQ.json.j2 missing 18 metrics.Missing: rabbitmq_
2021-02-25 00:59:06 INFO juju-log Checking Dashboard Template: CephPools.json.j2
2021-02-25 00:59:06 DEBUG juju-log Skipping Dashboard Template: CephPools.json.j2 missing 9 metrics.Missing: ceph_pool_
2021-02-25 00:59:06 INFO juju-log Invoking reactive handler: reactive/
2021-02-25 00:59:06 INFO juju-log import_dashboards: telegraf, digest None, is_new: False
2021-02-25 00:59:06 INFO juju-log import_dashboards: kubernetes, digest None, is_new: False
2021-02-25 00:59:06 INFO juju-log import_dashboards: prometheus, digest None, is_new: False
2021-02-25 00:59:06 INFO juju-log Invoking reactive handler: hooks/relations
2021-02-25 00:59:06 DEBUG update-status UPDATE DATA_SOURCE SET basic_auth_user = ?, basic_auth_password = ?, basic_auth = 0 ('', '')
But when looking at the db, something is preventing it from being properly updated:
sqlite> SELECT * FROM data_source; s|prometheus - Juju generated source|proxy|http:// 10.5.2. 227:9090||||0|| |0|{}|2021- 02-25 00:06:19|2021-02-25 00:06:19| 0|{}|0| 2459839366
1|1|0|prometheu
sqlite> .quit