ceilometer-api fails to start on xenial-newton (maas + lxd)

Bug #1632909 reported by Ryan Beisner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
Medium
Unassigned
OpenStack AODH Charm
Invalid
Low
Unassigned
OpenStack Ceilometer Charm
Invalid
Low
Unassigned
juju-core
Won't Fix
Undecided
Unassigned
ceilometer (Juju Charms Collection)
Invalid
Low
Unassigned

Bug Description

ceilometer-api fails to start on xenial-newton (bare metal tests):

ceilometer/0 blocked idle 1.25.6.1 4/lxc/0 8777/tcp 10.245.169.194 Services not running that should be: ceilometer-api

# juju status:
http://paste.ubuntu.com/23315659/

# system service status:
ubuntu@juju-machine-4-lxc-0:/etc/init$ sudo service ceilometer-api status
sudo: unable to resolve host juju-machine-4-lxc-0
● ceilometer-api.service - Ceilometer API
   Loaded: loaded (/lib/systemd/system/ceilometer-api.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/ceilometer-api.service.d
           └─override.conf
   Active: inactive (dead) (Result: exit-code) since Thu 2016-10-13 01:26:18 UTC; 37min ago
  Process: 21012 ExecStart=/usr/bin/ceilometer-api --port ${PORT} -- --config-file=/etc/ceilometer/ceilometer.conf --log-file=/var/log/ceilometer/ceilometer-api.log (code=exited, status=1/FAILURE)
  Process: 21011 ExecStartPre=/bin/chown ceilometer:adm /var/log/ceilometer (code=exited, status=0/SUCCESS)
  Process: 21010 ExecStartPre=/bin/chown ceilometer:ceilometer /var/lock/ceilometer /var/lib/ceilometer (code=exited, status=0/SUCCESS)
  Process: 21009 ExecStartPre=/bin/mkdir -p /var/lock/ceilometer /var/log/ceilometer /var/lib/ceilometer (code=exited, status=0/SUCCESS)
 Main PID: 21012 (code=exited, status=1/FAILURE)

Oct 13 01:26:18 juju-machine-4-lxc-0 systemd[1]: ceilometer-api.service: Unit entered failed state.
Oct 13 01:26:18 juju-machine-4-lxc-0 systemd[1]: ceilometer-api.service: Failed with result 'exit-code'.
Oct 13 01:26:18 juju-machine-4-lxc-0 systemd[1]: ceilometer-api.service: Service hold-off time over, scheduling restart.
Oct 13 01:26:18 juju-machine-4-lxc-0 systemd[1]: Stopped Ceilometer API.
Oct 13 01:26:18 juju-machine-4-lxc-0 systemd[1]: ceilometer-api.service: Start request repeated too quickly.
Oct 13 01:26:18 juju-machine-4-lxc-0 systemd[1]: Failed to start Ceilometer API.
Warning: ceilometer-api.service changed on disk. Run 'systemctl daemon-reload' to reload units.

Ryan Beisner (1chb1n)
summary: - ceilometer-api fails to start on xenial-newton
+ ceilometer-api fails to start on xenial-newton (lxc on metal)
Revision history for this message
James Page (james-page) wrote : Re: ceilometer-api fails to start on xenial-newton (lxc on metal)
Download full text (4.8 KiB)

From ceilometer log output (in syslog):

Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.163 21448 ERROR ceilometer.agent.manager [-] Unable to discover resources: 'NoneType' object has no attribute 'get_access'
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.163 21448 ERROR ceilometer.agent.manager Traceback (most recent call last):
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.163 21448 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/ceilometer/agent/manager.py", line 491, in discover
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.163 21448 ERROR ceilometer.agent.manager self.keystone).get_endpoints(
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.163 21448 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/ceilometer/keystone_client.py", line 41, in get_service_catalog
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.163 21448 ERROR ceilometer.agent.manager return client.session.auth.get_access(client.session).service_catalog
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.163 21448 ERROR ceilometer.agent.manager AttributeError: 'NoneType' object has no attribute 'get_access'
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.163 21448 ERROR ceilometer.agent.manager
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.164 21448 ERROR ceilometer.agent.manager [-] Unable to discover resources: 'NoneType' object has no attribute 'get_access'
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.164 21448 ERROR ceilometer.agent.manager Traceback (most recent call last):
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.164 21448 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/ceilometer/agent/manager.py", line 491, in discover
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.164 21448 ERROR ceilometer.agent.manager self.keystone).get_endpoints(
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.164 21448 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/ceilometer/keystone_client.py", line 41, in get_service_catalog
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.164 21448 ERROR ceilometer.agent.manager return client.session.auth.get_access(client.session).service_catalog
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.164 21448 ERROR ceilometer.agent.manager AttributeError: 'NoneType' object has no attribute 'get_access'
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31.164 21448 ERROR ceilometer.agent.manager
Oct 6 12:56:31 juju-machine-4-lxc-0 ceilometer-agent-central[21435]: 2016-10-06 12:56:31...

Read more...

Revision history for this message
James Page (james-page) wrote :

This looks wrong:

[service_credentials]
os_auth_url = http://10.245.169.120:5000/v2.0
os_tenant_name = services
os_username = ceilometer
os_password = kmMx7rp27XRFnrhFp2MPVBZ5bGZhgSrd6wrSSWYbyzX97VtmR4WSw5bjrcb8RLjj

Revision history for this message
James Page (james-page) wrote :

With the latest master branch charm:

[service_credentials]
auth_url = http://10.5.21.55:5000
project_name = services
username = ceilometer
password = zXMcH7kPzL8YGjFxsp8SPKckxhTCbkWcVCCJ5d4fq7KZWdtHJp6jCCcGh5bnC8Ys
project_domain_name = default
user_domain_name = default
auth_type = password

is what I see testing using newton

Revision history for this message
Ryan Beisner (1chb1n) wrote :

# mojo deploy output shows charm store retrieval
00:12:58.903 2016-10-13 10:29:38 [INFO] Downloading ~openstack-charmers-next/ceilometer-agent from charm store to ceilometer-agent
00:12:58.906 2016-10-13 10:29:38 [INFO] Downloading ~openstack-charmers-next/ceilometer from charm store to ceilometer
00:12:58.908 2016-10-13 10:29:38 [INFO] Downloading ~openstack-charmers-next/aodh from charm store to aodh
00:13:00.229 2016-10-13 10:29:39 [INFO] ceph-osd is cs:~openstack-charmers-next/ceph-osd-252
00:13:00.299 2016-10-13 10:29:39 [INFO] ntp is cs:trusty/ntp-15
00:13:00.322 2016-10-13 10:29:39 [INFO] ceilometer-agent is cs:~openstack-charmers-next/ceilometer-agent-230
00:13:00.678 2016-10-13 10:29:40 [INFO] mongodb is cs:trusty/mongodb-37
00:13:01.054 2016-10-13 10:29:40 [INFO] neutron-openvswitch is cs:~openstack-charmers-next/neutron-openvswitch-247
00:13:01.111 2016-10-13 10:29:40 [INFO] ceilometer is cs:~openstack-charmers-next/ceilometer-239
00:13:01.113 2016-10-13 10:29:40 [INFO] cinder-ceph is cs:~openstack-charmers-next/cinder-ceph-226

# repo-info from mojo charm collect dir shows the retrieved charm commit f289429
http://paste.ubuntu.com/23317381/

==> ./ceilometer/repo-info <==
commit-sha-1: f289429ed66138f1786bb89d02cf6a4aec476241
commit-short: f289429
branch: HEAD
remote: https://github.com/openstack/charm-ceilometer
info-generated: Wed Oct 12 07:16:14 UTC 2016
note: This file should exist only in a built or published charm artifact (not in the charm source code tree).

# repo-info on disk from the deployed unit shows charm commit f289429

jenkins@juju-osci1-machine-4:~$ juju ssh ceilometer/0 "cat /var/lib/juju/agents/unit-ceilometer-0/charm/repo-info"
Warning: Permanently added '10.245.168.11' (ECDSA) to the list of known hosts.
Warning: Permanently added '10.245.170.170' (ECDSA) to the list of known hosts.
commit-sha-1: f289429ed66138f1786bb89d02cf6a4aec476241
commit-short: f289429
branch: HEAD
remote: https://github.com/openstack/charm-ceilometer
info-generated: Wed Oct 12 07:16:14 UTC 2016
note: This file should exist only in a built or published charm artifact (not in the charm source code tree).
Connection to 10.245.170.170 closed.

# git master shows current commet level f289429

'Stop/Start ceilometer-api if Apache has changed'
https://github.com/openstack/charm-ceilometer/commit/f289429ed66138f1786bb89d02cf6a4aec476241

Ryan Beisner (1chb1n)
tags: added: maas-provider
Revision history for this message
Ryan Beisner (1chb1n) wrote :

This is ultimately caused by LXD containers using the MAAS provider not being able to resolve one-anothers' hostnames. Juju 1.25.6 + MAAS 1.9.4

summary: - ceilometer-api fails to start on xenial-newton (lxc on metal)
+ ceilometer-api fails to start on xenial-newton (maas + lxd)
Revision history for this message
Ryan Beisner (1chb1n) wrote :

<jamespage> Oct 13 10:54:41 juju-machine-4-lxc-0 ceilometer-api[18098]: my_ip = socket.gethostbyname(socket.gethostname())
<jamespage> Oct 13 10:54:41 juju-machine-4-lxc-0 ceilometer-api[18098]: socket.gaierror: [Errno -2] Name or service not known

Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.1.0
Changed in juju-core:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 1.25.7
Changed in juju-core:
milestone: 1.25.7 → 1.25.8
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.25.8 → 1.25.9
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.25.9 → 1.25.10
James Page (james-page)
Changed in charm-aodh:
status: New → Triaged
importance: Undecided → High
Changed in ceilometer (Juju Charms Collection):
status: New → Triaged
importance: Undecided → High
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.25.10 → none
Changed in juju-core:
milestone: none → 1.25.11
Revision history for this message
Anastasia (anastasia-macmood) wrote :

1.25 on xenial does not support the lxd and newer lxc there. LXD work is in 2.0 only.

Marking this as "Won't Fix".

Changed in juju-core:
status: Triaged → Won't Fix
importance: Critical → Undecided
milestone: 1.25.11 → none
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Ryan,
Do you still see this failure with Juju 2.1?

Changed in juju:
status: Triaged → Incomplete
milestone: 2.1.0 → none
Revision history for this message
Ryan Beisner (1chb1n) wrote :

This is still impacting Xenial-Newton Ceilometer on metal mojo spec testing.

Changed in juju:
status: Incomplete → Triaged
milestone: none → 2.2.0
Revision history for this message
Ryan Beisner (1chb1n) wrote :

xenial-newton ceilometer-api sees this fairly often. I do not observe it on any other combo in the OpenStack bare metal mojo spec tests.

James Page (james-page)
Changed in ceilometer (Juju Charms Collection):
importance: High → Low
Changed in charm-aodh:
importance: High → Low
James Page (james-page)
Changed in charm-ceilometer:
importance: Undecided → Low
status: New → Triaged
Changed in ceilometer (Juju Charms Collection):
status: Triaged → Invalid
David Ames (thedac)
Changed in charm-aodh:
status: Triaged → Invalid
Changed in charm-ceilometer:
status: Triaged → Invalid
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.2-beta1 → 2.2-beta2
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.2-beta2 → 2.2-beta3
Changed in juju:
milestone: 2.2-beta3 → 2.2-beta4
Changed in juju:
milestone: 2.2-beta4 → 2.2-rc1
Revision history for this message
Tim Penhey (thumper) wrote :

@Ryan we don't have anyone working on this just now, and I'm going to stop pretending that someone will just pick this up and work on it.

If this is still a blocking issue, we should get it onto the stakeholders kanban board to prioritise with other blocking issues.

Changed in juju:
milestone: 2.2-rc1 → none
importance: High → Medium
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 5 years, so we're marking it Expired. If you believe this is incorrect, please update the status.

Changed in juju:
status: Triaged → Expired
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.