Neutron dhcp agent not able to provide dhcp ip to VM

Bug #1561540 reported by Dave Urschatz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OPNFV
Invalid
Undecided
Unassigned
neutron
Invalid
High
Unassigned

Bug Description

VMs are not getting an ip from dhcp. Occasionally a VM may get an ip.

Pre-conditions: OpenStack (Liberty, HA, OpenDaylight) built via OPNFV JOID (Canonical) deployment automation that leverages MAAS/Juju.

Whoami: JOID user. This is my first neutron reported bug.

ubuntu@juma:~$ neutron --version
3.1.0

Perceived severity: is this a blocker for my project.

Error seen in dhcp-agent.log:
2016-03-24 02:04:52.245 23148 ERROR oslo.messaging._drivers.impl_rabbit [req-e5037584-e5c6-4dc5-961b-6300b372f60b - - - - -] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 2 seconds.
/Error
2016-03-24 02:15:34.895 6657 ERROR neutron.agent.dhcp.agent message = self.waiters.get(msg_id, timeout=timeout)
2016-03-24 02:15:34.895 6657 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 223, in get
2016-03-24 02:15:34.895 6657 ERROR neutron.agent.dhcp.agent 'to message ID %s' % msg_id)
2016-03-24 02:15:34.895 6657 ERROR neutron.agent.dhcp.agent MessagingTimeout: Timed out waiting for a reply to message ID 80fcb2ba9c444b1592ef9e7955cea0ba
2016-03-24 02:15:34.895 6657 ERROR neutron.agent.dhcp.agent
2016-03-24 02:15:34.896 6657 WARNING oslo.service.loopingcall [req-1009548a-c589-43b4-8a25-cc6d265d9dea - - - - -] Function 'neutron.agent.dhcp.agent.DhcpAgentWithStateReport._report_state' run outlasted interval by 30.01 sec
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent [req-45baaca9-a59d-4b2e-9b28-b0761996e6a9 - - - - -] Failed reporting state!
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/dhcp/agent.py", line 571, in _report_state
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent ctx, self.agent_state, True)
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/rpc.py", line 86, in report_state
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent return method(context, 'report_state', **kwargs)
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent retry=self.retry)
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent timeout=timeout, retry=retry)
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 431, in send
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent retry=retry)
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 422, in _send
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent raise result
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent RemoteError: Remote error: DBError (pymysql.err.InternalError) (1054, u"Unknown column 'agents.load' in 'field list'") [SQL: u'SELECT agents.id AS agents_id, agents.agent_type AS agents_agent_type, agents.`binary` AS agents_binary, agents.topic AS agents_topic, agents.host AS agents_host, agents.admin_state_up AS agents_admin_state_up, agents.created_at AS agents_created_at, agents.started_at AS agents_started_at, agents.heartbeat_timestamp AS agents_heartbeat_timestamp, agents.description AS agents_description, agents.configurations AS agents_configurations, agents.`load` AS agents_load \nFROM agents \nWHERE agents.agent_type = %s AND agents.host = %s'] [parameters: (u'DHCP agent', u'node4-control')]
2016-03-24 02:15:34.969 6657 ERROR neutron.agent.dhcp.agent [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply\n executor_callback))\n', u' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch\n executor_callback)\n', u' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 129, in _do_dispatch\n result = func(ctxt, **new_args)\n', u' File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 317, in report_state\n return self.plugin.create_or_update_agent(context, agent_state)\n', u' File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 265, in create_or_update_agent\n return self._create_or_update_agent(context, agent)\n', u' File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 238, in _create_or_update_agent\n context, agent_state[\'agent_type\'], agent_state[\'host\'])\n', u' File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 198, in _get_agent_by_type_and_host\n Agent.host == host).one()\n', u' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2473, in one\n ret = list(self)\n', u' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2516, in __iter__\n return self._execute_and_instances(context)\n', u' File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2531, in _execute_and_instances\n result = conn.execute(querycontext.statement, self._params)\n', u' File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 914, in execute\n return meth(self, multiparams, params)\n', u' File "/usr/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection\n return connection._execute_clauseelement(self, multiparams, params)\n', u' File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement\n compiled_sql, distilled_params\n', u' File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context\n context)\n', u' File "/usr/lib/python2.7/dist-

packages/sqlalchemy/engine/base.py", line 1337, in _handle_dbapi_exception\n util.raise_from_cause(newraise, exc_info)\n', u' File "/usr/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause\n reraise(type(exception), exception, tb=exc_tb)\n', u' File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context\n context)\n', u' File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 450, in do_execute\n cursor.execute(statement, parameters)\n', u' File "/usr/lib/python2.7/dist-packages/pymysql/cursors.py", line 132, in execute\n result = self._query(query)\n', u' File "/usr/lib/python2.7/dist-packages/pymysql/cursors.py", line 271, in _query\n conn.query(q)\n', u' File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 726, in query\n self._affected_rows = self._read_query_result(unbuffered=unbuffered)\n', u' File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 861, in _read_query_result\n result.read()\n', u' File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 1064, in read\n first_packet = self.connection._read_packet()\n', u' File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 826, in _read_packet\n packet.check_error()\n', u' File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 370, in check_error\n raise_mysql_exception(self._data)\n', u' File "/usr/lib/python2.7/dist-packages/pymysql/err.py", line 116, in raise_mysql_exception\n _check_mysql_exception(errinfo)\n', u' File "/usr/lib/python2.7/dist-packages/pymysql/err.py", line 112, in _check_mysql_exception\n raise InternalError(errno, errorvalue)\n', u'DBError: (pymysql.err.InternalError) (1054, u"Unknown column \'agents.load\' in \'field list\'") [SQL: u\'SELECT agents.id AS agents_id, agents.agent_type AS agents_agent_type, agents.`binary` AS agents_binary, agents.topic AS agents_topic, agents.host AS agents_host, agents.admin_state_up AS agents_admin_state_up, agents.created_at AS agents_created_at, agents.started_at AS agents_started_at, agents.heartbeat_timestamp AS agents_heartbeat_timestamp, agents.description AS agents_description, agents.configurations AS agents_configurations, agents.`load` AS agents_load \\nFROM agents \\nWHERE agents.agent_type = %s AND agents.host = %s\'] [parameters: (u\'DHCP agent\', u\'node4-control\')]\n'].

Full logs are at:
ubuntu@node4-control:/var/log/neutron$ pastebinit < dhcp-agent.log
http://paste.ubuntu.com/15487135/
ubuntu@node4-control:/var/log/neutron$ pastebinit < l3-agent.log
http://paste.ubuntu.com/15487137/
ubuntu@node4-control:/var/log/neutron$ pastebinit < neutron-lbaas-agent.log
http://paste.ubuntu.com/15487138/
ubuntu@node4-control:/var/log/neutron$ pastebinit < neutron-metering-agent.log
http://paste.ubuntu.com/15487143/
ubuntu@node4-control:/var/log/neutron$ pastebinit < neutron-ns-metadata-proxy-7feece98-fdd4-4b44-b4b8-0a6b5bdfd08d.log
http://paste.ubuntu.com/15487146/

Revision history for this message
Assaf Muller (amuller) wrote :

"Unknown column 'agents.load'"

Ah-ha. Did you run 'neutron-db-manage upgrade heads'?

Revision history for this message
Dave Urschatz (dave-urschatz) wrote :

ubuntu@node4-control:~$ sudo neutron-db-manage upgrade heads
Traceback (most recent call last):
  File "/usr/bin/neutron-db-manage", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python2.7/dist-packages/neutron/db/migration/cli.py", line 565, in main
    CONF.command.func(config, CONF.command.name)
  File "/usr/lib/python2.7/dist-packages/neutron/db/migration/cli.py", line 194, in do_upgrade
    run_sanity_checks(config, revision)
  File "/usr/lib/python2.7/dist-packages/neutron/db/migration/cli.py", line 552, in run_sanity_checks
    script_dir.run_env()
  File "/usr/lib/python2.7/dist-packages/alembic/script/base.py", line 397, in run_env
    util.load_python_file(self.dir, 'env.py')
  File "/usr/lib/python2.7/dist-packages/alembic/util/pyfiles.py", line 81, in load_python_file
    module = load_module_py(module_id, path)
  File "/usr/lib/python2.7/dist-packages/alembic/util/compat.py", line 79, in load_module_py
    mod = imp.load_source(module_id, path, fp)
  File "/usr/lib/python2.7/dist-packages/neutron/db/migration/alembic_migrations/env.py", line 135, in <module>
    run_migrations_online()
  File "/usr/lib/python2.7/dist-packages/neutron/db/migration/alembic_migrations/env.py", line 114, in run_migrations_online
    engine = session.create_engine(neutron_config.database.connection)
  File "/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py", line 112, in create_engine
    url = sqlalchemy.engine.url.make_url(sql_connection)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/url.py", line 186, in make_url
    return _parse_rfc1738_args(name_or_url)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/url.py", line 235, in _parse_rfc1738_args
    "Could not parse rfc1738 URL from string '%s'" % name)
sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string ''
ubuntu@node4-control:~$

Revision history for this message
Assaf Muller (amuller) wrote :

It looks like the database connection in neutron.conf is incorrect.

Revision history for this message
Dave Urschatz (dave-urschatz) wrote :

Will look at it shortly.

Revision history for this message
Dave Urschatz (dave-urschatz) wrote :

Thanks Assaf.

I had run the "neutron-db-manage upgrade heads" command from a system hosting the neutron-gateway service. There was no database connection defined in its neutron.conf.

However, in this JOID HA OpenStack deployment there are 3 neutron-api services running on separate lxc containers which do have database configurations in neutron.conf. For this test deployment they look like:
...
[database]
connection = mysql://neutron:LRkn3YLzqPJNXWt8X7gpVPNRBZcS4xxT@10.120.0.97/neutron
max_pool_size = 96
...

When I ran the command on all 3 we get:

ubuntu@juju-machine-1-lxc-5:~$ sudo neutron-db-manage upgrade heads
sudo: unable to resolve host juju-machine-1-lxc-5
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
  Running upgrade for neutron ...
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
  OK
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
  Running upgrade for neutron-fwaas ...
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
  OK
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
  Running upgrade for neutron-lbaas ...
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
  OK
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
  Running upgrade for neutron-vpnaas ...
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
  OK
ubuntu@juju-machine-1-lxc-5:~$

Revision history for this message
Henry Gessau (gessau) wrote :

Your database is up to date with the server. What is your "neutron-server --version"?

Revision history for this message
Dave Urschatz (dave-urschatz) wrote :

It is: ubuntu@juju-machine-3-lxc-8:~$ neutron-server --version
neutron-server 7.0.3

Revision history for this message
Henry Gessau (gessau) wrote :

Is the column in the database?
mysql -e "show columns from agents from neutron;"

Revision history for this message
Dave Urschatz (dave-urschatz) wrote :

mysql> show columns from agents from neutron;
+---------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------+------+-----+---------+-------+
| id | varchar(36) | NO | PRI | NULL | |
| agent_type | varchar(255) | NO | MUL | NULL | |
| binary | varchar(255) | NO | | NULL | |
| topic | varchar(255) | NO | | NULL | |
| host | varchar(255) | NO | | NULL | |
| admin_state_up | tinyint(1) | NO | | 1 | |
| created_at | datetime | NO | | NULL | |
| started_at | datetime | NO | | NULL | |
| heartbeat_timestamp | datetime | NO | | NULL | |
| description | varchar(255) | YES | | NULL | |
| configurations | varchar(4095) | NO | | NULL | |
| load | int(11) | NO | | 0 | |
+---------------------+---------------+------+-----+---------+-------+
12 rows in set (0.00 sec)

mysql>

Revision history for this message
Henry Gessau (gessau) wrote :

Well the agents.load column is there.
Are you still seeing the problem?
What process is pid 6657 from your traceback?

Revision history for this message
Dave Urschatz (dave-urschatz) wrote :

ubuntu@node4-control:~$ ps -aux |grep 6657
ubuntu 5717 0.0 0.0 10460 936 pts/73 S+ 00:17 0:00 grep --color=auto 6657
neutron 6657 2.0 0.1 131456 49632 ? Ss Mar24 85:06 /usr/bin/python /usr/bin/neutron-dhcp-agent --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/dhcp_agent.ini --log-file=/var/log/neutron/dhcp-agent.log

Revision history for this message
Henry Gessau (gessau) wrote :

Thanks. Agents should not be doing DB transactions. This is a serious bug.

Changed in neutron:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Henry Gessau (gessau) wrote :

Since an agent version may be out of sync with the server version during an upgrade, no agent is allowed to access the neutron DB directly.

In this case the culprit is:
neutron/agent/dhcp.agent/DhcpAgentWithStateReport._report_state()

Revision history for this message
Henry Gessau (gessau) wrote :

neutron/agent/dhcp/agent.py: DhcpAgentWithStateReport._report_state()

tags: added: l3-ipam-dhcp
Revision history for this message
Henry Gessau (gessau) wrote :

Sorry, after looking at the log more carefully, I see that the DB transaction was done by the RPC server (which is the neutron service). You say you have three instances of the neutron service running. It looks like one of them was running the upgraded code before the DB was upgraded. Can you confirm the exact steps you did when upgrading the service instances?

Changed in neutron:
status: Confirmed → Incomplete
Henry Gessau (gessau)
tags: removed: l3-ipam-dhcp
Revision history for this message
Dave Urschatz (dave-urschatz) wrote :

Thanks Henry. The deployment of this OpenStack is completely controlled by JOID which leverages MAAS/Juju and OpenStack Charms. I will engage the JOID team to see how we can best address getting an answer for the exact steps of upgrading the service instances.

Also the following Juju status shows 3 instances of neutron-api and 1 instance of neutron-gateway are running.

ID WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE
neutron-api/0 active idle 1.25.3 1/lxc/5 9696/tcp 10.120.0.99 Unit is ready
  hacluster-neutron/0 active idle 1.25.3 10.120.0.99 Unit is ready and clustered
  neutron-api-odl/0 unknown idle 1.25.3 10.120.0.99
neutron-api/1 active idle 1.25.3 2/lxc/8 9696/tcp 10.120.0.108 Unit is ready
  hacluster-neutron/2 active idle 1.25.3 10.120.0.108 Unit is ready and clustered
  neutron-api-odl/2 unknown idle 1.25.3 10.120.0.108
neutron-api/2 active idle 1.25.3 3/lxc/8 9696/tcp 10.120.0.110 Unit is ready
  hacluster-neutron/1 active idle 1.25.3 10.120.0.110 Unit is ready and clustered
  neutron-api-odl/1 unknown idle 1.25.3 10.120.0.110
neutron-gateway/0 active idle 1.25.3 1 node4-control.maas Unit is ready
  openvswitch-odl/1 active idle 1.25.3 node4-control.maas Open vSwitch configured and ready

Revision history for this message
Dave Urschatz (dave-urschatz) wrote :

Hi Henery. You had asked earlier if I'm still seeing the problem. After completely removing and re-creating the tenant router, network and vms, the problem still exists. VM console log shows:
---
ci-info: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Route info failed!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2016-03-28 12:49:27,790 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by <class 'socket.error'>: [Errno 101] Network is unreachable)]
---

Still working on getting you the upgrade steps...

Revision history for this message
Dave Urschatz (dave-urschatz) wrote :
Download full text (3.3 KiB)

Here is the best explanation I have of "the exact steps taken when upgrading the service instances". It comes from https://launchpad.net/~openstack-charmers team (Liam Young).

--- Begin Liam's e-mail to me snip:
Upgrading an OpenStack deployment is always a non-trivial process. The OpenStack charms automate a lot of the process, however always plan and test your upgrade prior to upgrading production OpenStack environments. However, to do the upgrade the charms run:

juju set <charm-name> openstack-origin=<new openstack origin ie cloud:trusty-mitaka>

In the case of neutron the neutron-api charm handles the neutron db migration (assuming this deployment is using a more recent version of Openstack than Juno). The charm queries juju to find out who is the leader and the leader then runs the db migration by running neutron-db-manage stamp then neutron-db-manage upgrade head. See do_openstack_upgrade in https://github.com/openstack/charm-neutron-api/blob/master/hooks/neutron_api_utils.py. You should be able to see which unit ran the migration and see the output of the migration by looking at /var/log/juju/unit-neutron-api*log on each of the neutron-api units.

I wouldn't recommend doing the upgrade by manually running neutron-db-manage. The charms have been designed to do the upgrade for you.
--- END e-mail snip

Some log entries taken from the

 /var/log/juju/unit-neutron-api-0.log:
2016-03-24 02:12:19 INFO juju-log shared-db:47: Migrating the neutron database.
2016-03-24 02:12:19 INFO shared-db-relation-changed INFO [alembic.runtime.migration] Context impl MySQLImpl.
2016-03-24 02:12:19 INFO shared-db-relation-changed INFO [alembic.runtime.migration] Will assume non-transactional DDL.
2016-03-24 02:12:21 INFO shared-db-relation-changed INFO [alembic.runtime.migration] Context impl MySQLImpl.
2016-03-24 02:12:21 INFO shared-db-relation-changed INFO [alembic.runtime.migration] Will assume non-transactional DDL.
2016-03-24 02:12:21 INFO shared-db-relation-changed INFO [alembic.runtime.migration] Running upgrade -> juno, juno_initial
2016-03-24 02:12:48 DEBUG juju.worker.leadership tracker.go:138 neutron-api/0 renewing lease for neutron-api leadership

/var/log/juju/unit-neutron-api-1.log
2016-03-24 02:10:11 INFO juju-log shared-db:47: Not running neutron database migration, not leader

/var/log/juju/unit-neutron-api-2.log
2016-03-24 02:09:21 INFO juju-log shared-db:47: Not running neutron database migration, not leader

Full logs from the instances:

instance: neutron-api/0
root@juju-machine-1-lxc-5:/var/log/juju# pastebinit < /var/log/juju/unit-neutron-api-0.log
http://paste.ubuntu.com/15551961/
root@juju-machine-1-lxc-5:/var/log/juju# pastebinit < /var/log/juju/unit-neutron-api-odl-0.log
http://paste.ubuntu.com/15551978/

instance: neutron-api/1
root@juju-machine-2-lxc-8:/home/ubuntu# pastebinit < /var/log/juju/unit-neutron-api-1.log
http://paste.ubuntu.com/15551993/
root@juju-machine-2-lxc-8:/home/ubuntu# pastebinit < /var/log/juju/unit-neutron-api-odl-2.log
http://paste.ubuntu.com/15551997/

instance: neutron-api/2
root@juju-machine-3-lxc-8:/home/ubuntu# pastebinit < /var/log/juju/unit-neutron-api-2.log
http://paste.ubuntu.com/155...

Read more...

Revision history for this message
Dave Urschatz (dave-urschatz) wrote :

Heads up: Last chance to get any logs from the offending deployment.

I noticed in one of the log lines above that the DB is upgrading to juno when we are actually deploying Liberty.

2016-03-24 02:12:21 INFO shared-db-relation-changed INFO [alembic.runtime.migration] Running upgrade -> juno, juno_initial

Since this “upgrade to juno not liberty” issue is also seen on both Orange and CENGN deployments but only the DHCP issue is seen on CENGN. I will be be changing some systems and re-deploying in a few hours.

Please let me know if you need any further logs checked. If not we can still check on the next deployment.

Revision history for this message
Henry Gessau (gessau) wrote :

There are many moving parts in this bug report and I can only focus on the one detail that I know something about.

"Unknown column 'agents.load' "

This means that (at least) one of the neutron servers was running a new version of the code before the database was upgraded.

The upgrade procedure must be:

1) Install new version
2) Stop all instances of neutron-server
3) neutron-db-manage upgrade heads
4) Start neutron-server(s)

Note: if you are upgrading to Kilo -> Liberty you can reduce server downtime as follows:

0) Kilo version is running
1) Install Liberty version
2) neutron-db-manage upgrade --expand
3) Restart neutron-server(s)
4) Wait for a maintenance window
5) Stop all instances of neutron-server
6) neutron-db-manage upgrade --contract
7) Start neutron-server(s)

Revision history for this message
Dave Urschatz (dave-urschatz) wrote :

Thanks Henry. Agreed on many parts so I'm trying to co-ordinate as best I can.

I just sent your steps to Liam Yonge of https://launchpad.net/~openstack-charmers team as they own the automation that should be doing those steps.

Revision history for this message
Liam Young (gnuoy) wrote :

As I understand it this is a fresh install of Liberty and not an upgrade to Liberty. The messages about upgrading from Juno seem to be normal when initialising the database with Liberty Neutron.

During a normal Juju install of Openstack with multiple neutron-servers, Juju elects one of the units to manage the db schema. Prior to the schema being initialised neutron-server will keep restarting (it starts, sees missing db information and dies and upstart restarts it). As soon as the leader has initialised the database all neutron-server processes will come up and stay up. This means that on a normal deploy /var/log/neutron/neutron-server.log contains schema exceptions until the database init has happened.

If you hit this again please can you provide /var/log/neutron/neutron-server.log and /var/log/juju* from each neutron-server (neutron-api charm) unit.

Thanks
Liam

Revision history for this message
Dave Urschatz (dave-urschatz) wrote :

Thanks Liam for this much appreciated explanation. Sounds like db upgrade issue seen here is not likely the culprit of this bug.

This leaves me with bug status from a neutron user perspective (my chair) is:
- I'm still having intermittent events where VMs are not getting addresses from DHCP on private tenant networks.
- Will keep digging on my side. (Any debugging suggestions are welcome!)

Revision history for this message
Dave Urschatz (dave-urschatz) wrote :

This issue is invalid since the root cause is not 'neutron'. It was determined to be a combination of:
- Underlay network physical interface issue
- Incorrect mapping of Bare Metal physical network interfaces to Kernel names. Names were inconsistent across nodes due to variations in HW.

Changed in neutron:
status: Incomplete → Invalid
Changed in opnfv:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.