ceilometer communication error, related osft tests fail

Bug #1350429 reported by Aleksey Kasatkin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Ivan Berezovskiy

Bug Description

"build_id": "2014-07-30_10-31-34", "ostf_sha": "9c0454b2197756051fc9cee3cfd856cf2a4f0875", "build_number": "374", "auth_required": true, "api": "1.0", "nailgun_sha": "8cf375f7687d7d0797e7f085a909df8087fc82a6", "production": "docker", "fuelmain_sha": "60669ce22d201b2cbcfd147286230e4eaed2850b", "astute_sha": "b16efcec6b4af1fb8669055c053fbabe188afa67", "feature_groups": ["mirantis"], "release": "5.1", "fuellib_sha": "8980a11e23242e0b0f331a8b899e18ac89f441be"

1. Create new environment (Ubuntu, simple mode)
2. Choose Ceilometer
3. Add controller+mongo, compute
4. Start deployment. It's successful.
5. Start OSTF tests. All ceilometer-related tests failed. Other tests passed.

Try use ceilometer from cli:

$ ceilometer meter-list
Error communicating with http://172.16.0.2:8777 [Errno 111] Connection refused

A lot of messages and traces in ceilometer logs about mongo and amqp connection:

2014-07-30 14:54:51.627 27450 WARNING ceilometer.storage.pymongo_base [-] Unable to connect to the database server: could not connect to 192.168.0.1:27017: [Errno 111] ECONNREFUSED. Trying again in 10 seconds.

2014-07-30 16:03:56.566 27488 ERROR ceilometer.openstack.common.rpc.common [-] AMQP server on 192.168.0.1:5672 is unreachable: Socket closed. Trying again in 17 seconds.

netstat shows that nobody is listening on 8777. URLs in keystone DB seem to be correct.

Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → High
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → MOS Ceilometer (mos-ceilometer)
milestone: none → 5.1
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :
Revision history for this message
Ivan Berezovskiy (iberezovskiy) wrote :

The problems in your logs are with mongo:
2014-07-30T15:52:59.690065+01:00 warning: Wed Jul 30 14:52:59.693 [conn14] warning: No such role, "dbOwner", in database ceilometer. No privileges will be acquired from this role

then:
2014-07-30T15:53:00.579003+01:00 err: Wed Jul 30 14:53:00.579 [conn14] ERROR: Uncaught std::exception: boost::filesystem::create_directory: Input/output error: "/var/lib/mongo/_tmp/esort.1406731979.6", terminating

2014-07-30T15:53:00.579530+01:00 info: dbexit:

Mongo didn't start, then ceilometer db-sync failed. So, ceilometer-api couldn't work.

I can't reproduce this problem. I deployed environment that you've described on Nova network. Should I try on neutron?

What size of RAM for controller did you use? May be it doesn't enough for role mongo+controller?

Dina Belova (dbelova)
Changed in fuel:
assignee: MOS Ceilometer (mos-ceilometer) → Ivan Berezovskiy (iberezovskiy)
Revision history for this message
Ivan Berezovskiy (iberezovskiy) wrote :

Also I can't reproduce it on environment with neutron GRE or VLAN

Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

2GB RAM on each node

Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

I cannot reproduce it when controller is separated from mongo.

Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

But it is constantly reproduced in original configuration.

Changed in fuel:
status: New → Confirmed
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

Reproduced on #376.

Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

It was a problem with disk of a particular node. Everything is OK on another controller-mongo node. Ivan Berezovskiy couldn't reproduce that issue on another env.

Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Kirill Omelchenko (komelchenko) wrote :

Reproduced on:
5.1 build 6
http://paste.openstack.org/show/112151/

Steps:
1. Create new cluster: Simple, NeutronGRE, LVM Storage
2. Add 1x cntrl, 1x cmp, 1x storage, 2x mongodb
3. Configure storage for mongodb nodes as is showed here http://i.imgur.com/Gr0dUXB.png
4. Deploy cluster. Finishes successfully
5. Start OSTF

Expected: Passes.

Actual:

'List ceilometer availability' fails.
On the step 'Request the list of meters'

manual run of the command on the controller results into error: http://paste.openstack.org/show/112148/

Changed in fuel:
status: Invalid → Confirmed
Revision history for this message
Ivan Berezovskiy (iberezovskiy) wrote :

There are some errors in puppet logs:

2014-09-16T09:25:01.322284+01:00 debug: (Exec[ceilometer-dbsync](provider=posix)) Executing 'ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf'
2014-09-16T09:25:01.326497+01:00 debug: Executing 'ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf'
2014-09-16T09:30:01.330159+01:00 err: (/Stage[main]/Ceilometer::Db/Exec[ceilometer-dbsync]) Failed to call refresh: Command exceeded timeout
2014-09-16T09:30:01.350562+01:00 err: (/Stage[main]/Ceilometer::Db/Exec[ceilometer-dbsync]) Command exceeded timeout

/etc/ceilometer/ceilometer.conf:
connection=mongodb://ceilometer:CLMFGEzI@192.168.0.5,192.168.0.6/ceilometer

Ping mongo nodes from controller:
ping 192.168.0.5
PING 192.168.0.5 (192.168.0.5) 56(84) bytes of data.
From 192.168.0.2 icmp_seq=1 Destination Host Unreachable
From 192.168.0.2 icmp_seq=2 Destination Host Unreachable
From 192.168.0.2 icmp_seq=3 Destination Host Unreachable

ping 192.168.0.6
PING 192.168.0.6 (192.168.0.6) 56(84) bytes of data.
From 192.168.0.2 icmp_seq=1 Destination Host Unreachable
From 192.168.0.2 icmp_seq=2 Destination Host Unreachable
From 192.168.0.2 icmp_seq=3 Destination Host Unreachable

It's a problem with connectivity between controller and mongo nodes, it's not ceilometer issue I think.

Revision history for this message
Kirill Omelchenko (komelchenko) wrote :

It was my bad, had different nic setup for mongodb nodes compared to others.

Changed in fuel:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.