No more IP addresses available on network error during migrate-ovn-db action, and other data migration problems...

Bug #1905554 reported by Jake Hill
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Charms Deployment Guide
Fix Released
High
Frode Nordahl
OpenStack Neutron API OVN Plugin Charm
New
Undecided
Unassigned

Bug Description

Following instructions at https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-ovn.html#migration-from-neutron-ml2-ovs-to-ml2-ovn

The migrate-ovn-db action fails with an error "No more IP addresses available on network e50ff198-0fb5-4fd4-9b6c-d2f4b0a3f575." I think this is part-way through the migration so not clear how to proceed.

$ juju run-action --wait neutron-api-plugin-ovn/0 migrate-ovn-db i-really-mean-it=true
unit-neutron-api-plugin-ovn-0:
  UnitId: neutron-api-plugin-ovn/0
  id: "39990"
  message: Execution failed, please investigate output.
  results:
    Stderr: |+
      migrate-ovn-db: OUTPUT FROM SYNC ON STDERR:
      /usr/lib/python3/dist-packages/pymysql/cursors.py:170: Warning: (3719, "'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous.")
        result = self._query(query)

    Stdout: "migrate-ovn-db: OUTPUT FROM SYNC ON STDOUT:\n2020-11-25 11:06:17.760
      413761 INFO neutron.cmd.ovn.neutron_ovn_db_sync_util [-] Started Neutron OVN
      db sync\e[00m\n

[snip]

      11:06:56.892 413761 WARNING neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovn_db_sync
      [req-91be95af-c8c7-41cc-a372-5cd7a0241373 - - - - -] Network found in Neutron
      but not in OVN DB, network_id=e50ff198-0fb5-4fd4-9b6c-d2f4b0a3f575\e[00m\n2020-11-25
      11:06:57.026 413761 WARNING neutron.db.ovn_revision_numbers_db [req-91be95af-c8c7-41cc-a372-5cd7a0241373
      - - - - -] No revision row found for e50ff198-0fb5-4fd4-9b6c-d2f4b0a3f575 (type:
      networks) when bumping the revision number. Creating one.\e[00m\n2020-11-25
      11:06:57.227 413761 INFO neutron.db.ovn_revision_numbers_db [req-91be95af-c8c7-41cc-a372-5cd7a0241373
      - - - - -] Successfully bumped revision number for resource e50ff198-0fb5-4fd4-9b6c-d2f4b0a3f575
      (type: networks) to 10\e[00m\n2020-11-25 11:06:57.565 413761 CRITICAL neutron_ovn_db_sync_util
      [req-91be95af-c8c7-41cc-a372-5cd7a0241373 - - - - -] Unhandled error: neutron_lib.exceptions.IpAddressGenerationFailure:
      No more IP addresses available on network e50ff198-0fb5-4fd4-9b6c-d2f4b0a3f575.\n2020-11-25
      11:06:57.565 413761 ERROR neutron_ovn_db_sync_util Traceback (most recent call
      last):\n2020-11-25 11:06:57.565 413761 ERROR neutron_ovn_db_sync_util File
      \"/usr/lib/python3/dist-packages/neutron/db/ipam_pluggable_backend.py\", line
      138, in _ipam_allocate_ips\n2020-11-25 11:06:57.565 413761 ERROR neutron_ovn_db_sync_util
      \ ip_address, subnet_id = ipam_allocator.allocate(ip_request)\n2020-11-25
      11:06:57.565 413761 ERROR neutron_ovn_db_sync_util File \"/usr/lib/python3/dist-packages/neutron/ipam/subnet_alloc.py\",
      line 240, in allocate\n2020-11-25 11:06:57.565 413761 ERROR neutron_ovn_db_sync_util
      \ raise ipam_exc.IpAddressGenerationFailureAllSubnets()\n2020-11-25 11:06:57.565
      413761 ERROR neutron_ovn_db_sync_util neutron.ipam.exceptions.IpAddressGenerationFailureAllSubnets:
      No more IP addresses available.\n2020-11-25 11:06:57.565 413761 ERROR neutron_ovn_db_sync_util
      \n2020-11-25 11:06:57.565 413761 ERROR neutron_ovn_db_sync_util During handling
      of the above exception, another exception occurred:\n2020-11-25 11:06:57.565
      413761 ERROR neutron_ovn_db_sync_util \n

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Thank you for the bug.

Could you attach the complete output from the action, the charm log and the neutron-server log?

I wonder if the sync util is attempting to allocate a port for the OVN metadata service and that the subnet is indeed depleted of free IP addresses.

Would you happen to have any surplus ports that could be removed to free up an IP address in the network/subnet in question? Or would it be possible to extend one of the full subnets?

You should be able to interact with the Neutron API reverting the change in step 8 and 9. If you have followed the guide that should be safe to do at this point in time.

If you are successful in freeing an address or extending the subnet you could repeat step 8, 9 and 10 again.

Revision history for this message
Jake Hill (routergod) wrote :

Thank you for looking at this. Attached as requested (I pruned the neutron-server.log a bit but not the other one, sorry). Thanks also for the tip, I will investigate the offending network.

Revision history for this message
Jake Hill (routergod) wrote :
Revision history for this message
Jake Hill (routergod) wrote :
Revision history for this message
Jake Hill (routergod) wrote :

Following your advice I have determined that the network in question e50ff198-0fb5-4fd4-9b6c-d2f4b0a3f575 is an external network on which there are no float IPs. Or at least, no spare ones. This is actually by design in my case.

I can add IPs to aide migration. I'm reading in here that external gateway router SNAT IPs change as part of this migration?

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Thank you for adding the requested artifacts, the sync operation is indeed stopping on OVN attempting to create a new port in that network for which it is unable to get an IP address.

You are correct in asserting that this also means that OVN will use a different external IP address for its router in this network, which in turn means a change of external ip for instances that do not have FIPs.

Checking enough free IP addresses in subnets should definitively be mentioned in the documentation and should perhaps also be a check in Neutron or the charms

I'll start by doing a documentation update, thank you for bringing this issue to our attention and I hope you are able to continue your migration.

Changed in charm-deployment-guide:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Frode Nordahl (fnordahl)
Revision history for this message
Jake Hill (routergod) wrote :

Thanks for the confirmation. Yes I managed to crunch through most of the migration but there were some further data and config issues which did not seem to be covered in the documentation.

One is related to floating ip port forwarding configurations, where a float has a router assignment but no port or fixed-ip information. These result in errors like;

2020-11-25 16:18:16 DEBUG migrate-ovn-db 2020-11-25 16:18:15.733 562121 CRITICAL neutron_ovn_db_sync_util [req-a5010a3e-9193-4328-83e2-0a2e7c2225b4 - - - - -] Unhandled error: neutron_lib.exceptions.PortNotFound: Port None could not be found.

I just deleted these floats and the tenant will have to re-establish these port forwards.

Then I had a small number of errors for networks which apparently had no metadata ports;

 Sync for Northbound db started with mode : repair\e[00m\n2020-11-25
      16:45:51.357 579092 ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovn_client
      [req-251e5041-9285-4fbc-b8e1-a6527fba8365 - - - - -] Metadata port couldn't
      be found for network 034952cd-ad28-41e9-9c2e-69aacd25bb1e\e[00m\n2020-11-25
      16:45:52.029 579092 ERROR neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovn_client
      [req-251e5041-9285-4fbc-b8e1-a6527fba8365 - - - - -] Metadata port couldn't
      be found for network 2a8367d3-9f34-43c6-9fc4-84c72afd0af4\e[00m\n

No clue about this, but post-migration I have two hypervisors apparently lacking the neutron-ovn-metadata-agent package. See https://discourse.charmhub.io/t/critical-error-during-ovn-migration/3881

The final problem was when I came to run the neutron-openvswitch cleanup action. Here I encountered the following error;

  message: 'Action "cleanup" failed: "Action requires configuration option `firewall-driver`
    to be set to "openvswitch" for succesfull operation."'

It seemed a bit late to be making this config change but I did so anyway. I wondered if this was related to the port=None issue?

summary: No more IP addresses available on network error during migrate-ovn-db
- action
+ action, and other data migration problems...
Revision history for this message
Frode Nordahl (fnordahl) wrote :

We have indeed managed to slip adding the step about changing firewall driver in the documentation, that is unfortunate.

The primary reason for changing firewall driver is to have Neutron remove all the iptables rules from the hypervisors as the Neutron cleanup scripts do not support removing them. Depending on when in the migration you changed the driver Neutron may or may not have completed the removal.

So for your deployment you may want to sanitize and remove any leftover iptables rules from hypervisors to avoid them being in the way for future changes to security groups.

With OVN security groups are programmed into the Open vSwitch flow tables directly and there is no need for iptables rules on hypervisors anymore.

Revision history for this message
Frode Nordahl (fnordahl) wrote :
Changed in charm-deployment-guide:
status: Triaged → In Progress
Frode Nordahl (fnordahl)
Changed in charm-deployment-guide:
status: In Progress → Fix Committed
Frode Nordahl (fnordahl)
Changed in charm-deployment-guide:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.