KeyError: service was deployed but then not found

Bug #1519796 reported by Andreas Hasenack
12
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju-deployer
Incomplete
Low
Unassigned

Bug Description

juju-deployer 0.6.0-1 from ppa:juju/stable

Got this in a jenkins run that deployed landscape:
09:27:53 2015-11-25 09:27:53 Starting deployment of scapestack-maas
09:28:02 2015-11-25 09:28:02 Deploying services...
09:28:02 2015-11-25 09:28:02 Deploying service haproxy using cs:trusty/haproxy-10
09:28:21 2015-11-25 09:28:21 Deploying service landscape-server using local:trusty/landscape-server
09:28:33 2015-11-25 09:28:33 Deploying service postgresql using cs:trusty/postgresql-27
09:28:46 2015-11-25 09:28:46 Deploying service rabbitmq-server using cs:trusty/rabbitmq-server-26
09:30:07 2015-11-25 09:30:07 Adding 1 more units to haproxy
09:31:17 2015-11-25 09:31:17 Adding 1 more units to landscape-server
09:32:26 2015-11-25 09:32:26 Adding 1 more units to postgresql
09:33:34 2015-11-25 09:33:34 Adding 1 more units to rabbitmq-server
09:33:39 2015-11-25 09:33:39 Adding relations...
09:33:39 Traceback (most recent call last):
09:33:39 File "/usr/bin/juju-deployer", line 9, in <module>
09:33:39 load_entry_point('juju-deployer==0.6.0', 'console_scripts', 'juju-deployer')()
09:33:39 File "/usr/lib/python2.7/dist-packages/deployer/cli.py", line 135, in main
09:33:39 run()
09:33:39 File "/usr/lib/python2.7/dist-packages/deployer/cli.py", line 234, in run
09:33:39 importer.Importer(env, deployment, options).run()
09:33:39 File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 325, in run
09:33:39 rels_created = self.add_relations()
09:33:39 File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 232, in add_relations
09:33:39 if self._rel_exists(status, end_a, end_b):
09:33:39 File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 252, in _rel_exists
09:33:39 rels_svc_a = status['services'][name_a].get('relations', {})
09:33:39 KeyError: 'landscape-server'

The service was deployed, but then it wasn't found.

sparkiegeek hinted that https://code.launchpad.net/~frankban/juju-deployer/unit-placement-fix/+merge/267187 might be related, around line 22 of that diff.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

The deployer file uses placement to put all services in containers on the bootstrap node.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

This is the bundle that was used

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Francesco Banconi (frankban) wrote :

After a quick look this does not seem related to status not refreshed or service not yet included in the mega-watcher data.
A time.sleep(5.1) is still there between services deployments and the add_relations call. As the log demonstrates the landscape-server has been added at 09:28:21 and adding relations starts at 09:33:39. Note that add_relations calls `self.env.status()` to refresh the status data taken from the mega-watcher, so if I am not missing something it's unlikely a bug due to lack of sleep.
It could be a race on which the service has been removed by some other process while the deployer was executing.
Waiting for instructions about how to dupe, I'll mark this as incomplete.

Changed in juju-deployer:
status: New → Incomplete
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

It happened again on a CI run:
(...)
04:37:55 2016-01-08 04:37:55 Deploying services...
04:37:55 2016-01-08 04:37:55 Deploying service haproxy using cs:trusty/haproxy-10
04:38:22 2016-01-08 04:38:22 Deploying service landscape-server using local:trusty/landscape-server
04:38:53 2016-01-08 04:38:53 Deploying service postgresql using cs:trusty/postgresql-27
04:39:18 2016-01-08 04:39:17 Deploying service rabbitmq-server using cs:trusty/rabbitmq-server-26
04:40:54 2016-01-08 04:40:54 Adding 1 more units to haproxy
04:42:01 2016-01-08 04:42:01 Adding 1 more units to landscape-server
04:43:16 2016-01-08 04:43:16 Adding 1 more units to postgresql
04:44:25 2016-01-08 04:44:25 Adding 1 more units to rabbitmq-server
04:44:30 2016-01-08 04:44:30 Adding relations...
04:44:30 Traceback (most recent call last):
04:44:30 File "/usr/bin/juju-deployer", line 9, in <module>
04:44:30 load_entry_point('juju-deployer==0.6.1', 'console_scripts', 'juju-deployer')()
04:44:30 File "/usr/lib/python2.7/dist-packages/deployer/cli.py", line 135, in main
04:44:30 run()
04:44:30 File "/usr/lib/python2.7/dist-packages/deployer/cli.py", line 234, in run
04:44:30 importer.Importer(env, deployment, options).run()
04:44:30 File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 325, in run
04:44:30 rels_created = self.add_relations()
04:44:30 File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 232, in add_relations
04:44:30 if self._rel_exists(status, end_a, end_b):
04:44:30 File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 252, in _rel_exists
04:44:30 rels_svc_a = status['services'][name_a].get('relations', {})
04:44:30 KeyError: 'landscape-server'

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

This last run was with:
juju-core 1.25.0-0ubuntu1~14.04.1~juju1
juju-deployer 0.6.1-1~ubuntu14.04.1~ppa1

Revision history for this message
Stuart Bishop (stub) wrote :

I suspect this is Bug #1489237 in another form.

Revision history for this message
Francis Ginther (fginther) wrote :

Another case of this:
https://ci.lscape.net/job/landscape-system-tests/1458/console

This run was with
juju-core 1.25.3-trusty-amd64

[log snippet]
16:10:51 2016-03-19 16:10:51 Deploying services...
16:10:51 2016-03-19 16:10:51 Deploying service haproxy using cs:trusty/haproxy-10
16:11:08 2016-03-19 16:11:08 Deploying service landscape-server using local:trusty/landscape-server
16:11:20 2016-03-19 16:11:20 Deploying service postgresql using cs:trusty/postgresql-27
16:11:31 2016-03-19 16:11:31 Deploying service rabbitmq-server using cs:trusty/rabbitmq-server-26
16:12:54 2016-03-19 16:12:54 Adding 1 more units to haproxy
16:14:02 2016-03-19 16:14:02 Adding 1 more units to landscape-server
16:15:11 2016-03-19 16:15:11 Adding 1 more units to postgresql
16:16:20 2016-03-19 16:16:20 Adding 1 more units to rabbitmq-server
16:16:25 2016-03-19 16:16:25 Adding relations...
16:16:26 Traceback (most recent call last):
16:16:26 File "/usr/bin/juju-deployer", line 9, in <module>
16:16:26 load_entry_point('juju-deployer==0.6.4', 'console_scripts', 'juju-deployer')()
16:16:26 File "/usr/lib/python2.7/dist-packages/deployer/cli.py", line 135, in main
16:16:26 run()
16:16:26 File "/usr/lib/python2.7/dist-packages/deployer/cli.py", line 234, in run
16:16:26 importer.Importer(env, deployment, options).run()
16:16:26 File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 325, in run
16:16:26 rels_created = self.add_relations()
16:16:26 File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 232, in add_relations
16:16:26 if self._rel_exists(status, end_a, end_b):
16:16:26 File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 252, in _rel_exists
16:16:26 rels_svc_a = status['services'][name_a].get('relations', {})
16:16:26 KeyError: 'landscape-server'

Revision history for this message
Ursula Junque (ursinha) wrote :

Another case of this failure: https://ci.lscape.net/job/landscape-system-tests/1704/console. Juju version is 1.25.3-trusty-amd64.

06:21:12 2016-04-04 06:21:12 Starting deployment of maas
06:21:23 2016-04-04 06:21:23 Deploying services...
06:21:24 2016-04-04 06:21:23 Deploying service haproxy using cs:trusty/haproxy-10
06:21:42 2016-04-04 06:21:42 Deploying service landscape-server using local:trusty/landscape-server
06:21:54 2016-04-04 06:21:54 Deploying service postgresql using cs:trusty/postgresql-27
06:22:07 2016-04-04 06:22:07 Deploying service rabbitmq-server using cs:trusty/rabbitmq-server-26
06:23:29 2016-04-04 06:23:29 Adding 1 more units to haproxy
06:24:39 2016-04-04 06:24:39 Adding 1 more units to landscape-server
06:25:43 2016-04-04 06:25:43 Adding 1 more units to postgresql
06:26:52 2016-04-04 06:26:52 Adding 1 more units to rabbitmq-server
06:26:57 2016-04-04 06:26:57 Adding relations...
06:26:57 Traceback (most recent call last):
06:26:57 File "/usr/bin/juju-deployer", line 9, in <module>
06:26:57 load_entry_point('juju-deployer==0.6.4', 'console_scripts', 'juju-deployer')()
06:26:57 File "/usr/lib/python2.7/dist-packages/deployer/cli.py", line 135, in main
06:26:57 run()
06:26:57 File "/usr/lib/python2.7/dist-packages/deployer/cli.py", line 234, in run
06:26:57 importer.Importer(env, deployment, options).run()
06:26:57 File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 325, in run
06:26:57 rels_created = self.add_relations()
06:26:57 File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 232, in add_relations
06:26:57 if self._rel_exists(status, end_a, end_b):
06:26:57 File "/usr/lib/python2.7/dist-packages/deployer/action/importer.py", line 252, in _rel_exists
06:26:57 rels_svc_a = status['services'][name_a].get('relations', {})
06:26:57 KeyError: 'landscape-server'

Revision history for this message
Ursula Junque (ursinha) wrote :

And another one: https://ci.lscape.net/job/landscape-system-tests/1702/console.

Would you have any tips for us to help diagnosing the issue?

Tom Haddon (mthaddon)
Changed in juju-deployer:
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.