Upgrades, puppet in keystone container fails after upgrade

Bug #1353574 reported by Evgeniy L
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Matthew Mosesohn

Bug Description

1. install 5.0
2. run upgrade

Expected result:
There is no error during puppet run in keystone container

Actual result:
Sometimes there is error in puppet logs, as result keystone half-configured and master node doesn't work.

Error: /Stage[main]/Main/Keystone_tenant[admin]: Could not evaluate: Execution of '/usr/bin/keystone --os-token 98f1ebf9-d87a-41d9-a40b-90275817ddff --os-endpoint http://127.0.0.1:35357/v2.0/ tenant-list' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500)

Upgrade error:
2014-08-06 14:41:38 INFO 15801 (health_checker) Checkers which are still fail ['integration_postgres_nailgun_nginx']

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "5.1"
  api: "1.0"
  build_number: "407"
  build_id: "2014-08-06_16-16-04"
  astute_sha: "b52910642d6de941444901b0f20e95ebbcb2b2e9"
  fuellib_sha: "513ec5cdcdef74c7419d5bae967b9edc7da8dbd7"
  ostf_sha: "be71965998364bf8e6415bd38b75c84b63aab867"
  nailgun_sha: "75c834f6b99b495c5a71b9e4609a3ab8b6140dbb"
  fuelmain_sha: "88afef2ef0aec4493a17583443ca93ba44e0f20a"

Tags: upgrade
Revision history for this message
Evgeniy L (rustyrobot) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Error in logs, obviously, should be related to unexpected DB schema for user table:
2014-08-06 23:29:01.350 507 TRACE keystone.common.wsgi ProgrammingError: (ProgrammingError) column user.domain_id does not exist
2014-08-06 23:29:01.350 507 TRACE keystone.common.wsgi LINE 1: ...T "user".id AS user_id, "user".name AS user_name, "user".dom...
2014-08-06 23:29:01.350 507 TRACE keystone.common.wsgi ^
2014-08-06 23:29:01.350 507 TRACE keystone.common.wsgi 'SELECT "user".id AS user_id, "user".name AS user_name, "user".domain_id AS user_domain_id, "user".password AS user_password, "user".enabled AS user_enabled, "user".extra AS user_extra, "user".default_project_id AS user_default_project_id \nFROM "user" \nWHERE "user".name = %(name_1)s AND "user".domain_id = %(domain_id_1)s' {'domain_id_1': 'default', 'name_1': u'admin'}

Revision history for this message
Evgeniy L (rustyrobot) wrote :

Yep, it happened because of

Error: /Stage[main]/Keystone/Exec[keystone-manage db_sync]: Failed to call refresh: keystone-manage db_sync returned 1 instead of one of [0]
Error: /Stage[main]/Keystone/Exec[keystone-manage db_sync]: keystone-manage db_sync returned 1 instead of one of [0]

As result we get half-configured keystone container.
Also after puppet rerun it worked fine.
What we can do here is
1. add retries for syncdb run
2. add retries for puppet run
3. exit from start.sh if puppet run failed and supervisor will restart it again

Changed in fuel:
status: New → Triaged
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Tomasz 'Zen' Napierala (tzn)
status: Triaged → In Progress
Revision history for this message
Alexander Kislitsky (akislitsky) wrote :

Also sometimes we have error:

Info: /Stage[main]/Keystone/Service[keystone]: Unscheduling refresh on Service[keystone]
Error: /Stage[main]/Main/Keystone_tenant[admin]: Could not evaluate: Execution of '/usr/bin/keystone --os-token 9f1d23a3-930d-4e2c-9d0a-740929e84dcd --os-endpoint http://127.0.0.1:35357/v2.0/ tenant-list' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500)

Notice: /Stage[main]/Main/Keystone_role[admin]/ensure: created
Notice: /Stage[main]/Main/Keystone_user[admin]: Dependency Keystone_tenant[admin] has failures: true
Warning: /Stage[main]/Main/Keystone_user[admin]: Skipping because of failed dependencies
Notice: /Stage[main]/Main/Keystone_user_role[admin@admin]: Dependency Keystone_tenant[admin] has failures: true
Warning: /Stage[main]/Main/Keystone_user_role[admin@admin]: Skipping because of failed dependencies
Notice: Finished catalog run in 17.27 seconds
Stopping keystone: [ OK ]

Full log at: http://paste.openstack.org/show/92103/

Łukasz Oleś (loles)
Changed in fuel:
assignee: Tomasz 'Zen' Napierala (tzn) → Łukasz Oleś (loles)
Revision history for this message
Łukasz Oleś (loles) wrote :

I can not reproduce it. I tried for 2 days and it didn't happen.

I may run puppet apply twice https://github.com/stackforge/fuel-main/blob/master/docker/keystone/start.sh#L8 but I don't see any reason for this

Changed in fuel:
status: In Progress → Incomplete
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: Incomplete → New
assignee: Łukasz Oleś (loles) → Evgeniy L (rustyrobot)
Revision history for this message
Evgeniy L (rustyrobot) wrote :

I'll try to reproduce it, it happens not often, approximately 1 time from 25 upgrade runs

Changed in fuel:
status: New → Confirmed
Revision history for this message
Evgeniy L (rustyrobot) wrote :

Moved to 6.0 since it happens very rare, should be moved back to 5.1 in case if will reproduce it and if it will be easy to fix.

Changed in fuel:
importance: High → Medium
milestone: 5.1 → 6.0
assignee: Evgeniy L (rustyrobot) → Fuel Python Team (fuel-python)
Revision history for this message
Evgeniy L (rustyrobot) wrote :

Got this error again

Error: /Stage[main]/Main/Keystone_tenant[admin]: Could not evaluate: Execution of '/usr/bin/keystone --os-token 41ac89ac-d860-4eb3-a740-857e3f8d2289 --os-endpoint http://127.0.0.1:35357/v2.0/ tenant-list' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500)

I went to the container and it worked fine

[root@455aa1f52675 /]# /usr/bin/keystone --os-token 41ac89ac-d860-4eb3-a740-857e3f8d2289 --os-endpoint http://127.0.0.1:35357/v2.0/ tenant-list
+----------------------------------+-------+---------+
| id | name | enabled |
+----------------------------------+-------+---------+
| c8110bb1302a42ecbcf85839e0640ca2 | admin | True |
+----------------------------------+-------+---------+

Changed in fuel:
importance: Medium → High
Revision history for this message
Evgeniy L (rustyrobot) wrote :

Increased the priority because I reproduced it on third upgrade run.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/116605

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Matthew Mosesohn (raytrac3r)
status: Confirmed → In Progress
Evgeniy L (rustyrobot)
Changed in fuel:
milestone: 6.0 → 5.1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Matthew Mosesohn (<email address hidden>) on branch: master
Review: https://review.openstack.org/116605
Reason: going to try bash -xe in keystone instead

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-main (master)

Fix proposed to branch: master
Review: https://review.openstack.org/116613

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-main (master)

Reviewed: https://review.openstack.org/116613
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=74ad3dd68020aac3042f62c59c137498474ecbee
Submitter: Jenkins
Branch: master

commit 74ad3dd68020aac3042f62c59c137498474ecbee
Author: Matthew Mosesohn <email address hidden>
Date: Mon Aug 25 17:27:28 2014 +0400

    Exit keystone container if puppet fails

    Change-Id: Iaa2dbf90b2c2e560a7cece61c252abc7f039673b
    Closes-Bug: #1353574

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

verified on fuel-5.1-upgrade-11-2014-09-17_21-40-34.tar.lrz

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.