Activity log for bug #1896630

Date Who What changed Old value New value Message
2020-09-22 14:54:36 Drew Freiberger bug added bug
2020-09-22 15:10:29 Vladimir Grevtsev bug added subscriber Vladimir Grevtsev
2020-09-22 15:24:58 Drew Freiberger bug added subscriber Canonical IS BootStack
2020-09-22 15:25:04 Drew Freiberger bug added subscriber Canonical Field Critical
2020-09-22 15:30:15 Frode Nordahl bug task added charm-layer-ovn
2020-09-22 15:31:05 Frode Nordahl bug task deleted charm-neutron-openvswitch
2020-09-22 17:57:39 Drew Freiberger bug added subscriber Canonical Field High
2020-09-22 17:57:41 Drew Freiberger removed subscriber Canonical Field Critical
2020-10-30 12:07:41 Edward Hope-Morley bug added subscriber Edward Hope-Morley
2020-11-18 12:46:35 Nobuto Murata bug added subscriber Nobuto Murata
2020-12-10 16:13:51 Frode Nordahl tags ps5
2021-01-13 21:36:40 Billy Olsen charm-layer-ovn: status New Triaged
2021-01-13 21:36:44 Billy Olsen charm-layer-ovn: importance Undecided High
2021-01-14 11:21:10 Frode Nordahl bug task added juju
2021-02-20 14:23:50 John A Meinel juju: importance Undecided Medium
2021-02-20 14:23:50 John A Meinel juju: status New Triaged
2021-05-01 15:36:26 Billy Olsen charm-layer-ovn: status Triaged Invalid
2021-07-06 08:49:15 Frode Nordahl summary ovn-chassis subordinate to octavia registered with shortname shows down Need for managing /etc/hosts for containers
2021-07-06 08:50:08 Frode Nordahl description On a juju 2.7.8, latest charms (20.08), I have a dead ovn-controller agent on one of the octavia units. $ openstack network agent list|grep lxd | juju-a9d6f4-21-lxd-9.maas | OVN Controller agent | juju-a9d6f4-21-lxd-9 | | XXX | UP | ovn-controller | | juju-a9d6f4-25-lxd-10.maas | OVN Controller agent | juju-a9d6f4-25-lxd-10.maas | | :-) | UP | ovn-controller | | juju-a9d6f4-23-lxd-10.maas | OVN Controller agent | juju-a9d6f4-23-lxd-10.maas | | :-) | UP | ovn-controller | Two of the three ovn-controller agents on octavia units are registered with host=$fqdn, and the down controller is registered with a shortname. `hostname -f` shows the full fqdn on the down unit /etc/openvswitch/system-id.conf lists the short hostname only `ovs-vsctl list open_vswitch` lists both the hostname and the system-id as shortname seeing a lot of errors in /var/log/ovn/ovn-controller.log along the lines of: 2020-09-22T14:22:39.500Z|04678|binding|INFO|Changing chassis for lport 529233fc-f9c4-40b1-8c6a-f2e906a2498d from juju-a9d6f4-21-lxd-9.maas to juju-a9d6f4-21-lxd-9. 2020-09-22T06:25:01.829Z|857112|main|INFO|OVNSB commit failed, force recompute next time. restart of ovn-controller shows the following in the log: 2020-09-22T14:22:30.498Z|00001|vlog|INFO|opened log file /var/log/ovn/ovn-controller.log 2020-09-22T14:22:30.500Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2020-09-22T14:22:30.500Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2020-09-22T14:22:30.502Z|00004|main|INFO|OVS IDL reconnected, force recompute. 2020-09-22T14:22:30.504Z|00005|reconnect|INFO|ssl:10.35.61.157:6642: connecting... 2020-09-22T14:22:30.504Z|00006|main|INFO|OVNSB IDL reconnected, force recompute. 2020-09-22T14:22:30.508Z|00007|reconnect|INFO|ssl:10.35.61.157:6642: connected 2020-09-22T14:22:30.514Z|00008|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2020-09-22T14:22:30.514Z|00009|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2020-09-22T14:22:30.514Z|00010|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2020-09-22T14:22:30.515Z|00011|ovsdb_idl|WARN|transaction error: {"details":"RBAC rules for client \"juju-a9d6f4-21-lxd-9\" role \"ovn-controller\" prohibit modification of table \"Chassis\".","error":"permission error"} 2020-09-22T14:22:30.515Z|00012|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-22T14:22:30.515Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2020-09-22T14:22:30.515Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2020-09-22T14:22:30.516Z|00013|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"10.35.82.18\") for index on columns \"type\" and \"ip\". First row, with UUID 86556077-6325-4cb6-9bbd-c5979ae15d2c, was inserted by this transaction. Second row, with UUID 3345a08e-534b-4ccf-a7b6-2d6d00706422, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"} 2020-09-22T14:22:30.516Z|00014|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-22T14:22:30.516Z|00015|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"10.35.82.18\") for index on columns \"type\" and \"ip\". First row, with UUID 3345a08e-534b-4ccf-a7b6-2d6d00706422, existed in the database before this transaction and was not modified by the transaction. Second row, with UUID 916635aa-e98c-4f23-8ac8-1e3f381151c6, was inserted by this transaction.","error":"constraint violation"} 2020-09-22T14:22:30.516Z|00016|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-22T14:22:30.516Z|00017|binding|INFO|Changing chassis for lport 529233fc-f9c4-40b1-8c6a-f2e906a2498d from juju-a9d6f4-21-lxd-9.maas to juju-a9d6f4-21-lxd-9. 2020-09-22T14:22:30.516Z|00018|binding|INFO|529233fc-f9c4-40b1-8c6a-f2e906a2498d: Claiming fa:16:3e:e4:70:66 fc00:2d33:a2bc:84d4:f816:3eff:fee4:7066 2020-09-22T14:22:30.517Z|00019|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"10.35.82.18\") for index on columns \"type\" and \"ip\". First row, with UUID 3345a08e-534b-4ccf-a7b6-2d6d00706422, existed in the database before this transaction and was not modified by the transaction. Second row, with UUID 6219b9c9-fc57-4caa-8f75-46ead7584901, was inserted by this transaction.","error":"constraint violation"} 2020-09-22T14:22:30.517Z|00020|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-22T14:22:30.518Z|00021|binding|INFO|Changing chassis for lport 529233fc-f9c4-40b1-8c6a-f2e906a2498d from juju-a9d6f4-21-lxd-9.maas to juju-a9d6f4-21-lxd-9. 2020-09-22T14:22:30.518Z|00022|binding|INFO|529233fc-f9c4-40b1-8c6a-f2e906a2498d: Claiming fa:16:3e:e4:70:66 fc00:2d33:a2bc:84d4:f816:3eff:fee4:7066 2020-09-22T14:22:30.521Z|00023|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"10.35.82.18\") for index on columns \"type\" and \"ip\". First row, with UUID 3345a08e-534b-4ccf-a7b6-2d6d00706422, existed in the database before this transaction and was not modified by the transaction. Second row, with UUID 5f2ca07b-859f-4013-9e49-5fd00a1909e9, was inserted by this transaction.","error":"constraint violation"} 2020-09-22T14:22:30.521Z|00024|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-22T14:22:30.521Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected Relation info being provided from octavia-ovn-chassis to octavia on that unit shows chassis-name as the short hostname, but on other octavia units, the chassis-name provided from ovn-chassis to octavia is the fqdn. $ sudo juju-run octavia/0 -r 139 --remote-unit octavia-ovn-chassis/1 'relation-get' chassis-name: '"juju-a9d6f4-21-lxd-9"' egress-subnets: 10.35.61.179/32 ingress-address: 10.35.61.179 ovn-configured: "true" private-address: 10.35.61.179 $ sudo juju-run octavia/1 -r 139 --remote-unit octavia-ovn-chassis/2 'relation-get' chassis-name: '"juju-a9d6f4-23-lxd-10.maas"' egress-subnets: 10.35.61.191/32 ingress-address: 10.35.61.191 ovn-configured: "true" private-address: 10.35.61.191 It appears from a brief read-through of the ovn-chassis charm that the hostname is queried from the ovsdb and then system-id is set from that hostname. Is it possible that there's a race between the system being able to query it's fqdn from DNS during deployment and the hostname ovs sees when it initializes the database on install? Some potentially relevant code snippets: # The local ``ovn-controller`` process will retrieve information about # how to connect to OVN from the local Open vSwitch database. self.run('ovs-vsctl', 'set', 'open', '.', 'external-ids:ovn-encap-type=geneve', '--', 'set', 'open', '.', 'external-ids:ovn-encap-ip={}' .format(self.get_data_ip()), '--', 'set', 'open', '.', 'external-ids:system-id={}' .format(self.get_ovs_hostname())) *snip* def get_ovs_hostname(): for row in ch_ovsdb.SimpleOVSDB('ovs-vsctl').open_vswitch: return row['external_ids']['hostname'] When deploying on metal with MAAS, MAAS will add the FQDN to the localhost record in /etc/hosts so that issuing the `hostname -f` command will always succeed regardless of availability of the network. When deploying on the other provider combinations it is Juju that does the host initialization and Juju does not add the FQDN to the localhost record in /etc/hosts. [Original description] On a juju 2.7.8, latest charms (20.08), I have a dead ovn-controller agent on one of the octavia units. $ openstack network agent list|grep lxd | juju-a9d6f4-21-lxd-9.maas | OVN Controller agent | juju-a9d6f4-21-lxd-9 | | XXX | UP | ovn-controller | | juju-a9d6f4-25-lxd-10.maas | OVN Controller agent | juju-a9d6f4-25-lxd-10.maas | | :-) | UP | ovn-controller | | juju-a9d6f4-23-lxd-10.maas | OVN Controller agent | juju-a9d6f4-23-lxd-10.maas | | :-) | UP | ovn-controller | Two of the three ovn-controller agents on octavia units are registered with host=$fqdn, and the down controller is registered with a shortname. `hostname -f` shows the full fqdn on the down unit /etc/openvswitch/system-id.conf lists the short hostname only `ovs-vsctl list open_vswitch` lists both the hostname and the system-id as shortname seeing a lot of errors in /var/log/ovn/ovn-controller.log along the lines of: 2020-09-22T14:22:39.500Z|04678|binding|INFO|Changing chassis for lport 529233fc-f9c4-40b1-8c6a-f2e906a2498d from juju-a9d6f4-21-lxd-9.maas to juju-a9d6f4-21-lxd-9. 2020-09-22T06:25:01.829Z|857112|main|INFO|OVNSB commit failed, force recompute next time. restart of ovn-controller shows the following in the log: 2020-09-22T14:22:30.498Z|00001|vlog|INFO|opened log file /var/log/ovn/ovn-controller.log 2020-09-22T14:22:30.500Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2020-09-22T14:22:30.500Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2020-09-22T14:22:30.502Z|00004|main|INFO|OVS IDL reconnected, force recompute. 2020-09-22T14:22:30.504Z|00005|reconnect|INFO|ssl:10.35.61.157:6642: connecting... 2020-09-22T14:22:30.504Z|00006|main|INFO|OVNSB IDL reconnected, force recompute. 2020-09-22T14:22:30.508Z|00007|reconnect|INFO|ssl:10.35.61.157:6642: connected 2020-09-22T14:22:30.514Z|00008|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2020-09-22T14:22:30.514Z|00009|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2020-09-22T14:22:30.514Z|00010|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2020-09-22T14:22:30.515Z|00011|ovsdb_idl|WARN|transaction error: {"details":"RBAC rules for client \"juju-a9d6f4-21-lxd-9\" role \"ovn-controller\" prohibit modification of table \"Chassis\".","error":"permission error"} 2020-09-22T14:22:30.515Z|00012|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-22T14:22:30.515Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2020-09-22T14:22:30.515Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2020-09-22T14:22:30.516Z|00013|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"10.35.82.18\") for index on columns \"type\" and \"ip\". First row, with UUID 86556077-6325-4cb6-9bbd-c5979ae15d2c, was inserted by this transaction. Second row, with UUID 3345a08e-534b-4ccf-a7b6-2d6d00706422, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"} 2020-09-22T14:22:30.516Z|00014|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-22T14:22:30.516Z|00015|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"10.35.82.18\") for index on columns \"type\" and \"ip\". First row, with UUID 3345a08e-534b-4ccf-a7b6-2d6d00706422, existed in the database before this transaction and was not modified by the transaction. Second row, with UUID 916635aa-e98c-4f23-8ac8-1e3f381151c6, was inserted by this transaction.","error":"constraint violation"} 2020-09-22T14:22:30.516Z|00016|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-22T14:22:30.516Z|00017|binding|INFO|Changing chassis for lport 529233fc-f9c4-40b1-8c6a-f2e906a2498d from juju-a9d6f4-21-lxd-9.maas to juju-a9d6f4-21-lxd-9. 2020-09-22T14:22:30.516Z|00018|binding|INFO|529233fc-f9c4-40b1-8c6a-f2e906a2498d: Claiming fa:16:3e:e4:70:66 fc00:2d33:a2bc:84d4:f816:3eff:fee4:7066 2020-09-22T14:22:30.517Z|00019|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"10.35.82.18\") for index on columns \"type\" and \"ip\". First row, with UUID 3345a08e-534b-4ccf-a7b6-2d6d00706422, existed in the database before this transaction and was not modified by the transaction. Second row, with UUID 6219b9c9-fc57-4caa-8f75-46ead7584901, was inserted by this transaction.","error":"constraint violation"} 2020-09-22T14:22:30.517Z|00020|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-22T14:22:30.518Z|00021|binding|INFO|Changing chassis for lport 529233fc-f9c4-40b1-8c6a-f2e906a2498d from juju-a9d6f4-21-lxd-9.maas to juju-a9d6f4-21-lxd-9. 2020-09-22T14:22:30.518Z|00022|binding|INFO|529233fc-f9c4-40b1-8c6a-f2e906a2498d: Claiming fa:16:3e:e4:70:66 fc00:2d33:a2bc:84d4:f816:3eff:fee4:7066 2020-09-22T14:22:30.521Z|00023|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Encap\" table to have identical values (geneve and \"10.35.82.18\") for index on columns \"type\" and \"ip\". First row, with UUID 3345a08e-534b-4ccf-a7b6-2d6d00706422, existed in the database before this transaction and was not modified by the transaction. Second row, with UUID 5f2ca07b-859f-4013-9e49-5fd00a1909e9, was inserted by this transaction.","error":"constraint violation"} 2020-09-22T14:22:30.521Z|00024|main|INFO|OVNSB commit failed, force recompute next time. 2020-09-22T14:22:30.521Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected Relation info being provided from octavia-ovn-chassis to octavia on that unit shows chassis-name as the short hostname, but on other octavia units, the chassis-name provided from ovn-chassis to octavia is the fqdn. $ sudo juju-run octavia/0 -r 139 --remote-unit octavia-ovn-chassis/1 'relation-get' chassis-name: '"juju-a9d6f4-21-lxd-9"' egress-subnets: 10.35.61.179/32 ingress-address: 10.35.61.179 ovn-configured: "true" private-address: 10.35.61.179 $ sudo juju-run octavia/1 -r 139 --remote-unit octavia-ovn-chassis/2 'relation-get' chassis-name: '"juju-a9d6f4-23-lxd-10.maas"' egress-subnets: 10.35.61.191/32 ingress-address: 10.35.61.191 ovn-configured: "true" private-address: 10.35.61.191 It appears from a brief read-through of the ovn-chassis charm that the hostname is queried from the ovsdb and then system-id is set from that hostname. Is it possible that there's a race between the system being able to query it's fqdn from DNS during deployment and the hostname ovs sees when it initializes the database on install? Some potentially relevant code snippets:         # The local ``ovn-controller`` process will retrieve information about         # how to connect to OVN from the local Open vSwitch database.         self.run('ovs-vsctl',                  'set', 'open', '.',                  'external-ids:ovn-encap-type=geneve', '--',                  'set', 'open', '.',                  'external-ids:ovn-encap-ip={}'                  .format(self.get_data_ip()), '--',                  'set', 'open', '.',                  'external-ids:system-id={}'                  .format(self.get_ovs_hostname())) *snip*     def get_ovs_hostname():         for row in ch_ovsdb.SimpleOVSDB('ovs-vsctl').open_vswitch:             return row['external_ids']['hostname']
2022-07-28 15:36:10 Felipe Reyes bug task added charm-nova-compute
2022-11-03 15:51:20 Canonical Juju QA Bot juju: importance Medium Low
2022-11-03 15:51:22 Canonical Juju QA Bot tags ps5 expirebugs-bot ps5
2023-02-15 13:23:07 Felipe Reyes charm-nova-compute: status New Confirmed
2023-02-15 14:42:45 Felipe Reyes charm-nova-compute: assignee Felipe Reyes (freyes)
2023-02-15 15:43:31 OpenStack Infra charm-nova-compute: status Confirmed In Progress
2023-02-21 13:17:07 OpenStack Infra charm-nova-compute: status In Progress Fix Committed
2023-02-24 15:05:56 Felipe Reyes bug task added charm-guide
2023-02-24 15:06:04 Felipe Reyes charm-guide: assignee Felipe Reyes (freyes)
2023-03-17 14:26:20 OpenStack Infra charm-guide: status New In Progress
2023-03-18 02:33:35 OpenStack Infra charm-guide: status In Progress Fix Released