Many Tracebacks for Ubuntu: Failed rescheduling router <...>: no eligible l3 agent found
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Invalid
|
High
|
MOS Neutron |
Bug Description
"build_id": "2015-03-
"ostf_sha": "b9a090c71682fb
"build_number": "210",
"release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-03-
"auth_required": true,
"api": "1.0",
"nailgun_sha": "1d2bd383caecc5
"production": "docker",
"python-
"astute_sha": "4a117a1ca6bdcc
"feature_groups": ["mirantis"],
"release": "6.1",
"fuelmain_sha": "f3d6353c08d8eb
"fuellib_sha": "7764225db5bc65
1. Create new environment (Ubuntu)
2. Choose Neutron, Vlan
3. Add 1 controller+mongo, 1 compute, 2 mongo
4. Start deployment. It was successful
5. Start OSTF tests. It was successful except test "Check network connectivity from instance via floating IP" which failed on step: Check connectivity to the floating IP using ping command.
6. There is many errors in neutron-server log on node-5:
2015-03-20 13:20:19 ERR
neutron.
2015-03-20 13:20:19.591 30741 TRACE neutron.
2015-03-20 13:20:19.591 30741 TRACE neutron.
2015-03-20 13:20:19.591 30741 TRACE neutron.
2015-03-20 13:20:19.591 30741 TRACE neutron.
2015-03-20 13:20:19.591 30741 TRACE neutron.
2015-03-20 13:20:19.591 30741 TRACE neutron.
2015-03-20 13:20:19.591 30741 TRACE neutron.
In the same test case for CentOS I have no errors in this log (node-11)
Logs are here: https:/
Changed in fuel: | |
status: | New → Confirmed |
Changed in fuel: | |
assignee: | Fuel Library Team (fuel-library) → MOS Neutron (mos-neutron) |
I've found a lot of timeout errors in lrmd.log (from node-5, primary controller)
1689:2015- 03-20T11: 38:13.806657+ 00:00 warning: warning: child_timeout_ callback: p_neutron- l3-agent_ monitor_ 20000 process (PID 28131) timed out 03-20T11: 38:13.807663+ 00:00 warning: warning: operation_finished: p_neutron- l3-agent_ monitor_ 20000:28131 - timed out after 10000ms 03-20T11: 39:34.548347+ 00:00 warning: warning: operation_finished: p_ceilometer- alarm-evaluator _monitor_ 20000:28731 - timed out after 30000ms 03-20T11: 39:35.420902+ 00:00 warning: warning: child_timeout_ callback: p_heat- engine_ monitor_ 20000 process (PID 28751) timed out
1690:2015-
1730:2015-
1731:2015-
It means that environment worked extremely slow and monitor functions of many Pacemaker resources was unable to work correctly. Neutron agents were restarted several times.
There was only one controller, so if Neutron didn't receive state reports from L3 agent, it was unable to reschedule a router and restore connectivity for instances.
Also from pacemakerd.log:
2015-03- 20T11:40: 07.896950+ 00:00 err: error: child_waitpid: Managed process 23705 (lrmd) dumped core 20T11:40: 07.896950+ 00:00 notice: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=23705, core=1) 20T11:40: 07.896950+ 00:00 notice: notice: pcmk_process_exit: Respawning failed child process: lrmd 20T11:40: 07.929236+ 00:00 err: error: pcmk_child_exit: Child process crmd (23708) exited: Generic Pacemaker error (201) 20T11:40: 07.932405+ 00:00 notice: notice: pcmk_process_exit: Respawning failed child process: crmd
2015-03-
2015-03-
2015-03-
2015-03-
Please also specify characteristics of cluster nodes (VMs)