community 10.0 fails to deploy neutron L3 HA

Bug #1664974 reported by Luca Cervigni
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Incomplete
High
Fuel Sustaining

Bug Description

Description:
Deploying a cluster selecting the neutron L3 HA option fails with
"All nodes are finished. Failed tasks: Task[openstack-network-networks/103] Stopping the deployment process!"

Steps to reproduce:
3 controllers + mongo
3 ceph
1 compute

Click deploy

Expected result: Environment deployed with neutron L3 HA
Actual result: Deploy fails with [openstack-network-networks/103] Stopping the deployment process
Reproducibility: always
Workaround: none that I know, but unselecting the L3 HA make everything work (even if using instead the L2+DVR)

Impact: Environment not deployed
Fuel community 10.0 - October 2016 iso from fuel-infra website.
The snapshot is more than 6GB. If you can tell me where to investigate I can provide logs.

Tags: area-neutron
Revision history for this message
Luca Cervigni (cervigni) wrote :

There are actually problems of dependencies it seems from the puppet logs, this is very weird because apparently I get the same error when trying to use L2 + DVR. One week ago I had a deployment with that enabled and it deployed successfully. Could you check this out?

2017-02-16 15:52:42 WARNING (/Stage[main]/Openstack_tasks::Openstack_network::Networks/Neutron_subnet[admin_internal_net__subnet]) Skipping because of failed dependencies

2017-02-16 15:52:29 ERR Not managing Neutron_network[admin_internal_net] due to earlier Neutron API failures.
2017-02-16 15:52:29 ERR (/Stage[main]/Openstack_tasks::Openstack_network::Networks/Neutron_network[admin_floating_net]/ensure) change from absent to present failed: Not managing Neutron_network[admin_floating_net] due to earlier Neutron API failures.
2017-02-16 15:52:29 ERR /usr/bin/puppet:8:in `<main>'

MORE LOGS: http://pastebin.com/gp3DMXFq

Snapshot coming

Revision history for this message
Luca Cervigni (cervigni) wrote :

Snapshot
https://1drv.ms/u/s!AvlKFvlmzy2kgbdbuByHzjW9reTQuQ

Environment: alpha
Workflow
Deployment
Started
20:56:08 16/02/2017
Status
Error
Finished
21:33:58 16/02/2017
Message
All nodes are finished. Failed tasks: Task[openstack-network-networks/28] Stopping the deployment process!

Revision history for this message
Gregory Orange (gregoryo2017) wrote :

Further detail from this same environment, here is the error message from the Fuel Dashboard:

Error
All nodes are finished. Failed tasks: Task[openstack-network-networks/42] Stopping the deployment process!

This appears to match the node id, because "fuel2 node list -e $env" yields:
| 42 | Untitled (xx:xx) | error | ubuntu

Looking at node-42:/var/log/puppet.log :

2017-02-20 06:39:00 +0000 /Stage[main]/Openstack_tasks::Openstack_network::Plugins::Ml2/Service[neutron-server] (notice): Triggered 'refresh' from 1 events
2017-02-20 06:42:20 +0000 /Stage[main]/Openstack_tasks::Openstack_network::Plugins::Ml2/Exec[waiting-for-neutron-api]/returns (notice): <html><body><h1>503 Service Unavailable</h1>
2017-02-20 06:42:20 +0000 /Stage[main]/Openstack_tasks::Openstack_network::Plugins::Ml2/Exec[waiting-for-neutron-api]/returns (notice): No server is available to handle this request.
2017-02-20 06:42:20 +0000 /Stage[main]/Openstack_tasks::Openstack_network::Plugins::Ml2/Exec[waiting-for-neutron-api]/returns (notice): </body></html>
2017-02-20 06:42:20 +0000 /Stage[main]/Openstack_tasks::Openstack_network::Plugins::Ml2/Exec[waiting-for-neutron-api] (err): Failed to call refresh: neutron net-list --http-timeout=4 2>&1 > /dev/null returned 1 instead of one of [0]
2017-02-20 06:42:20 +0000 /Stage[main]/Openstack_tasks::Openstack_network::Plugins::Ml2/Exec[waiting-for-neutron-api] (err): neutron net-list --http-timeout=4 2>&1 > /dev/null returned 1 instead of one of [0]

Looking at node-42:/etc/puppet/modules/openstack_tasks/manifests/openstack_network/plugins/ml2.pp I see the 'waiting-for-neutron-api' exec subscribes to the neutron-server service, and retries 30 times at 4 seconds each - total of two minutes. That fits with the timing above.

Running the command manually on node-42 succeeds:

root@node-42:~# . openrc
root@node-42:~# neutron net-list --http-timeout=4 && echo yes || echo no

yes

So, it looks to me like the neutron-server service isn't running within two minutes of being started by the Puppet run. Now, I'll try to work out how to tweak Fuel to increase that time, to test.

Changed in fuel:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
milestone: none → 10.1
Revision history for this message
Gregory Orange (gregoryo2017) wrote :

I edited /etc/puppet/newton-10.0/modules/openstack_tasks/manifests/openstack_network/plugins/ml2.pp on the Fuel master and increased the 'waiting-for-neutron-api' exec parameter 'try_sleep' to 20, redeployed (through the Fuel web UI) and it worked. That seems to show that the Neutron API is not ready within two minutes (default is 4 seconds wait, 30 tries). Should it simply be increased, or is it a problem that the API is taking too long to become ready?

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Very strange, it took 10 minutes to start neutron-server

https://paste.mirantis.net/show/9615/

Changed in fuel:
milestone: 10.1 → 10.x-updates
importance: Undecided → High
status: New → Confirmed
tags: added: area-library
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → MOS Neutron (mos-neutron)
tags: added: area-neutron
removed: area-library
Revision history for this message
Ann Taraday (akamyshnikova) wrote :

I investigate neutron logs and found https://paste.mirantis.net/show/10539/ - seems that MySQL was having problems and that time deamon.log and mysql.log:
https://paste.mirantis.net/show/10538/
https://paste.mirantis.net/show/10537/

So, I consider that slowness of neutron-server start happened also due to problems with MySQL.

Changed in fuel:
assignee: MOS Neutron (mos-neutron) → nobody
Changed in fuel:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
Revision history for this message
Michael Polenchuk (mpolenchuk) wrote :

Doesn't reproduced against latest F10 iso.

Changed in fuel:
status: Confirmed → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.