Fixed interval looping call 'nova.servicegroup.drivers.db.DbDriver._report_state' failed when fill root on primary controller

Bug #1515032 reported by Artem Hrechanychenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Undecided
Fuel Library (Deprecated)

Bug Description

{"release_versions": {"2015.1.0-8.0": {"VERSION": {"fuel-upgrade_sha": "1e894e26d4e1423a9b0d66abd6a79505f4175ff6", "fuel-createmirror_sha": "d3949f3094248b9686a4c20caea5ae13c9738a8d", "shotgun_sha": "25dd78a3118267e3616df0727ce746e7dead2d67", "api": "1.0", "fuelmain_sha": "3b64b95f6b513b0d46b57525a858eea7a9809da3", "astute_sha": "959b06c5ef8143125efd1727d350c050a922eb12", "fuel-ostf_sha": "bc0844c493490ddbe9ca1bb5d9f88c068772c7e3", "openstack_version": "2015.1.0-8.0", "build_id": "141", "fuelmenu_sha": "a133a8ae1621f7700a8123295681b8d3c98f74a4", "build_number": "141", "fuel-library_sha": "0db9fb73a6ebc25098be59151457b24018e967b6", "fuel-nailgun-agent_sha": "00b4b11553c250f22c0079fb74c8b782dcb7b740", "fuel-nailgun_sha": "050633358cacb4023e84fda0367e68858051f702", "fuel-agent_sha": "7027b9c31f1986c3904877a2133b1fcb77094beb", "production": "docker", "network-checker_sha": "a57e1d69acb5e765eb22cab0251c589cd76f51da", "feature_groups": ["mirantis"], "release": "8.0", "python-fuelclient_sha": "ee31f93f047a54b3bd2ab5d281c214bbd53f9156"}}}, "fuel-upgrade_sha": "1e894e26d4e1423a9b0d66abd6a79505f4175ff6", "fuel-createmirror_sha": "d3949f3094248b9686a4c20caea5ae13c9738a8d", "shotgun_sha": "25dd78a3118267e3616df0727ce746e7dead2d67", "api": "1.0", "fuelmain_sha": "3b64b95f6b513b0d46b57525a858eea7a9809da3", "astute_sha": "959b06c5ef8143125efd1727d350c050a922eb12", "fuel-ostf_sha": "bc0844c493490ddbe9ca1bb5d9f88c068772c7e3", "openstack_version": "2015.1.0-8.0", "build_id": "141", "fuelmenu_sha": "a133a8ae1621f7700a8123295681b8d3c98f74a4", "build_number": "141", "auth_required": true, "fuel-library_sha": "0db9fb73a6ebc25098be59151457b24018e967b6", "fuel-nailgun-agent_sha": "00b4b11553c250f22c0079fb74c8b782dcb7b740", "fuel-nailgun_sha": "050633358cacb4023e84fda0367e68858051f702", "fuel-agent_sha": "7027b9c31f1986c3904877a2133b1fcb77094beb", "production": "docker", "network-checker_sha": "a57e1d69acb5e765eb22cab0251c589cd76f51da", "feature_groups": ["mirantis"], "release": "8.0", "python-fuelclient_sha": "ee31f93f047a54b3bd2ab5d281c214bbd53f9156"}

Steps to reproduce:

1)Deploy 3 controller, 2 compute 1 cinder +Neutron Vlan
2) Fill root on primary controller to below threshold limit for the corosync disk monitor of 512MB
    Free space in root on primary controller - 12288
    Need to fill space on root - 11775
    fallocate -l 11775M /root/bigfile

3) Check for stopped resources and migrating resources on primary controller
   root@node-1:~# pcs status xml |grep "resources_running"
        <node name="node-1.test.domain.local" id="1" online="true" standby="false" standby_onfail="false" maintenance="false" pending="false" unclean="false" shutdown="false" expected_up="true" is_dc="true" resources_running="0" type="member" />

sysinfo_node-1.test.domain.local (ocf::pacemaker:SysInfo): Stopped
 Clone Set: clone_p_vrouter [p_vrouter]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
 vip__management (ocf::fuel:ns_IPaddr2): Started node-2.test.domain.local
 vip__vrouter_pub (ocf::fuel:ns_IPaddr2): Started node-2.test.domain.local
 vip__vrouter (ocf::fuel:ns_IPaddr2): Started node-2.test.domain.local
 vip__public (ocf::fuel:ns_IPaddr2): Started node-2.test.domain.local
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
 Master/Slave Set: master_p_conntrackd [p_conntrackd]
     Masters: [ node-2.test.domain.local ]
     Slaves: [ node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Masters: [ node-2.test.domain.local ]
     Slaves: [ node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
 Clone Set: clone_p_dns [p_dns]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
 Clone Set: clone_p_heat-engine [p_heat-engine]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
 Clone Set: clone_p_neutron-plugin-openvswitch-agent [p_neutron-plugin-openvswitch-agent]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
 Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
 Clone Set: clone_p_neutron-dhcp-agent [p_neutron-dhcp-agent]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
 Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
 sysinfo_node-2.test.domain.local (ocf::pacemaker:SysInfo): Started node-2.test.domain.local
 sysinfo_node-3.test.domain.local (ocf::pacemaker:SysInfo): Started node-3.test.domain.local
 Clone Set: clone_ping_vip__public [ping_vip__public]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]
 Clone Set: clone_p_ntp [p_ntp]
     Started: [ node-2.test.domain.local node-3.test.domain.local ]
     Stopped: [ node-1.test.domain.local ]

4) run ostf sanity, smoke, ha

Expected results:
Only OSTF HA test should fail

Actual results:

 "Check that required services are running
Some nova services have not been started.. Please refer to OpenStack logs for more details."

from OSTF logs:
...

fuel_health.tests.sanity.test_sanity_infrastructure: DEBUG: No handlers could be found for logger "oslo_config.cfg"
 Binary Host Zone Status State Updated_At

nova-scheduler node-1.test.domain.local internal enabled XXX 2015-11-10 21:22:45
 nova-cert node-2.test.domain.local internal enabled XXX 2015-11-10 21:20:50
nova-consoleauth node-2.test.domain.local internal enabled XXX 2015-11-10 21:21:53

File "/usr/lib/python2.6/site-packages/fuel_health/tests/sanity/test_sanity_infrastructure.py", line 75, in test_001_services_state u'XXX' not in output, 'Step 2 failed: Some nova services '

from node 1 nova-scheduler logs:

2015-11-10 21:24:33 ERR server on 10.109.7.3:5673 is unreachable: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'. Trying again in 5 seconds.
2015-11-10 21:23:46 ERR Fixed interval looping call 'nova.servicegroup.drivers.db.DbDriver._report_state' failed

Node2-nova-cert logs:

Fixed interval looping call 'nova.servicegroup.drivers.db.DbDriver._report_state' failed

node2-nova-consoleauth:

2015-11-10 21:24:32 ERR server on 10.109.7.3:5673 is unreachable: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'. Trying again in 5 seconds.
2015-11-10 21:22:24 ERR Fixed interval looping call 'nova.servicegroup.drivers.db.DbDriver._report_state' failed

Tags: area-library
Revision history for this message
Artem Hrechanychenko (agrechanichenko) wrote :
Revision history for this message
Artem Hrechanychenko (agrechanichenko) wrote :

link to snapshot

tags: added: area-library
Artem Roma (aroma-x)
Changed in fuel:
status: New → Confirmed
Changed in fuel:
status: Confirmed → In Progress
assignee: Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

It takes up to two minutes to restore a connection to the database. In the worst case scenario a test can fail before the connection is restored.

---
2015-11-12 15:53:16.247 17115 TRACE nova.servicegroup.drivers.db dbapi_connection = rec.get_connection()
2015-11-12 15:53:16.247 17115 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 525, in get_connection
2015-11-12 15:53:16.247 17115 TRACE nova.servicegroup.drivers.db self.connection = self.__connect()
2015-11-12 15:53:16.247 17115 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 537, in __connect
2015-11-12 15:53:16.247 17115 TRACE nova.servicegroup.drivers.db connection = self.__pool._creator()
2015-11-12 15:53:16.247 17115 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/compat/handle_error.py", line 200, in connect
2015-11-12 15:53:16.247 17115 TRACE nova.servicegroup.drivers.db raise original_exception
2015-11-12 15:53:16.247 17115 TRACE nova.servicegroup.drivers.db DBConnectionError: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None
2015-11-12 15:53:16.247 17115 TRACE nova.servicegroup.drivers.db
2015-11-12 15:53:26.872 17115 INFO nova.scheduler.host_manager [req-79360310-a2a9-4ada-bf84-e058145484f2 - - - - -] Successfully synced instances from host 'node-4.test.domain.local'.
2015-11-12 15:54:13.195 17115 ERROR nova.servicegroup.drivers.db [req-a3fd5579-559e-4eb7-b56b-364b246f5910 - - - - -] Recovered model server connection!
2015-11-12 15:55:13.181 17115 ERROR sqlalchemy.pool.QueuePool [req-a3fd5579-559e-4eb7-b56b-364b246f5910 - - - - -] Exception closing connection <_mysql.connection closed at 2ab4190>
2015-11-12 15:55:13.181 17115 TRACE sqlalchemy.pool.QueuePool Traceback (most recent call last):
2015-11-12 15:55:13.181 17115 TRACE sqlalchemy.pool.QueuePool File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 250, in _close_connection
2015-11-12 15:55:13.181 17115 TRACE sqlalchemy.pool.QueuePool self._dialect.do_close(connection)
2015-11-12 15:55:13.181 17115 TRACE sqlalchemy.pool.QueuePool File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 412, in do_close
2015-11-12 15:55:13.181 17115 TRACE sqlalchemy.pool.QueuePool dbapi_connection.close()
2015-11-12 15:55:13.181 17115 TRACE sqlalchemy.pool.QueuePool ProgrammingError: closing a closed connection
2015-11-12 15:55:13.181 17115 TRACE sqlalchemy.pool.QueuePool
2015-11-12 15:55:43.456 17115 INFO nova.scheduler.host_manager [req-9d41807c-6267-43e0-94a8-4988463b6561 - - - - -] Successfully synced instances from host 'node-4.test.domain.local'.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

It seems we should wait ~5 minutes after filling up the disk on the controller before testing, so that all services have settled.

Changed in fuel:
assignee: Kyrylo Galanov (kgalanov) → MOS Nova (mos-nova)
Changed in fuel:
assignee: MOS Nova (mos-nova) → Kyrylo Galanov (kgalanov)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

As discussed with Kyrylo on Slack, this is expected situation: it takes time for Galera to re-assemble the cluster and during that time it's effectively down and will respond 2013 for incoming connections - OpenStack services have no other choice but keep trying. Eventually, they will reconnect to Galera and resume normal operation mode.

Changed in fuel:
importance: High → Undecided
status: In Progress → Incomplete
Changed in fuel:
assignee: Kyrylo Galanov (kgalanov) → Fuel Library Team (fuel-library)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Closing as this issue remained incomplete for more than a month

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.