Bug #1319076 “supervisorctl restarts docker containers” : Bugs : Fuel for OpenStack

Sergii Golovatiuk (sgolovatiuk) on 2014-05-13

Changed in fuel:
assignee:	nobody → Sergii Golovatiuk (sgolovatiuk)
assignee:	Sergii Golovatiuk (sgolovatiuk) → Alexander Didenko (adidenko)
importance:	Undecided → High

Tatyanka (tatyana-leontovich) on 2014-05-13

Changed in fuel:
milestone:	none → 5.0

Sergii Golovatiuk (sgolovatiuk) on 2014-05-13

description:

updated

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2014-05-13:

#1

This issue occurs only when we run systests with --group=setup. Like this:

sh "utils/jenkins/system_tests.sh" -t test -w $(pwd) -j "fuelweb_test" -i "$ISO_PATH" -o --group=setup

After starting fuel virtual server we're getting broken cobbler container which leads to "Unable to find boot device" on openstack virtual nodes.

We can see the following in /var/log/docker-cobbler.log

dnsmasq-dhcp: read /etc/ethers - 0 addresses
Traceback (most recent call last):
  File "/usr/bin/cobblerd", line 76, in main
    api = cobbler_api.BootAPI(is_cobblerd=True)
  File "/usr/lib/python2.6/site-packages/cobbler/api.py", line 130, in __init__
    self.deserialize()
  File "/usr/lib/python2.6/site-packages/cobbler/api.py", line 898, in deserialize
    return self._config.deserialize()
  File "/usr/lib/python2.6/site-packages/cobbler/config.py", line 266, in deserialize
    raise CX("serializer: error loading collection %s. Check /etc/cobbler/modules.conf" % item.collection_type())
CX: 'serializer: error loading collection profile. Check /etc/cobbler/modules.conf'

But we can't reproduce it if we deploy Fuel in a normal (manual) way.

This may be related to the snapshot/revert mechanism we use in our system tests. For example: nailgun container is ready and "curl http://127.0.0.1:8000/api/version" works fine, systemtest scripts see "Fuel node deployment complete!" and they snapshot fuel VM. But cobbler container may be still running and we catch it in the middle of something. So after revert is gets messed up completely. The following commands fix this:

supervisorctl stop docker-cobbler
docker ps -a | grep cobbler | awk '{print $1}' | xargs docker rm -f
supervisorctl start docker-cobbler

So, maybe we should use "dockerctl check" in /usr/local/sbin/bootstrap_admin_node.sh instead of "curl http://127.0.0.1:8000/api/version" in order to determine if Fuel node deployment is complete?

Please also note that currently "dockerctl check" gives exit code 0 even if some containers are in "ERROR" state. So we should either make it "exit 1", or grep output from "dockerctl check"

This issue occurs only when we run systests with --group=setup. Like this:

sh "utils/jenkins/system_tests.sh" -t test -w $(pwd) -j "fuelweb_test" -i "$ISO_PATH" -o --group=setup

After starting fuel virtual server we're getting broken cobbler container which leads to "Unable to find boot device" on openstack virtual nodes.

We can see the following in /var/log/docker-cobbler.log

dnsmasq-dhcp: read /etc/ethers - 0 addresses
Traceback (most recent call last):
  File "/usr/bin/cobblerd", line 76, in main
    api = cobbler_api.BootAPI(is_cobblerd=True)
  File "/usr/lib/python2.6/site-packages/cobbler/api.py", line 130, in __init__
    self.deserialize()
  File "/usr/lib/python2.6/site-packages/cobbler/api.py", line 898, in deserialize
    return self._config.deserialize()
  File "/usr/lib/python2.6/site-packages/cobbler/config.py", line 266, in deserialize
    raise CX("serializer: error loading collection %s. Check /etc/cobbler/modules.conf" % item.collection_type())
CX: 'serializer: error loading collection profile. Check /etc/cobbler/modules.conf'

But we can't reproduce it if we deploy Fuel in a normal (manual) way.

This may be related to the snapshot/revert mechanism we use in our system tests. For example: nailgun container is ready and "curl http://127.0.0.1:8000/api/version" works fine, systemtest scripts see "Fuel node deployment complete!" and they snapshot fuel VM. But cobbler container may be still running and we catch it in the middle of something. So after revert is gets messed up completely. The following commands fix this:

supervisorctl stop docker-cobbler
docker ps -a | grep cobbler | awk '{print $1}' | xargs docker rm -f
supervisorctl start docker-cobbler

So, maybe we should use "dockerctl check" in /usr/local/sbin/bootstrap_admin_node.sh instead of "curl http://127.0.0.1:8000/api/version" in order to determine if Fuel node deployment is complete?

Please also note that currently "dockerctl check" gives exit code 0 even if some containers are in "ERROR" state. So we should either make it "exit 1", or grep output from "dockerctl check"

Changed in fuel:
assignee:	Alexander Didenko (adidenko) → Matthew Mosesohn (raytrac3r)
importance:	High → Medium
status:	New → Triaged

Aleksandr Didenko (adidenko) on 2014-05-14

Changed in fuel:
importance:	Medium → High
assignee:	Matthew Mosesohn (raytrac3r) → Alexander Didenko (adidenko)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-14: Fix proposed to fuel-main (master)

#2

Fix proposed to branch: master
Review: https://review.openstack.org/93547

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-14: Fix merged to fuel-main (master)

#3

Reviewed: https://review.openstack.org/93547
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=f9b9fc1d30f08e0cee5b2eae839360b13f3f81c2
Submitter: Jenkins
Branch: master

commit f9b9fc1d30f08e0cee5b2eae839360b13f3f81c2
Author: Aleksandr Didenko <email address hidden>
Date: Wed May 14 13:22:43 2014 +0300

Refactor fuel node readiness check

    Use "dockerctl check" instead of checking Fuel web UI for fuel
    node rediness check. Using "dockerctl check" we'll make sure ALL
    docker containers are ready.

Change-Id: I2649b45e4764217eabf2bc1f47a8602770b9dd14
Closes-bug: #1319076

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2014-05-23:

#4

Issue is still reproduced on {"build_id": "2014-05-23_03-53-39", "mirantis": "yes", "build_number": "19", "ostf_sha": "5c479f04c35127576d35526650ec83b104f9a33d", "nailgun_sha": "bd09f89ef56176f64ad5decd4128933c96cb20f4", "production": "docker", "api": "1.0", "fuelmain_sha": "db2d153e62cb2b3034d33359d7e3db9d4742c811", "astute_sha": "9a0d86918724c1153b5f70bdae008dea8572fd3e", "release": "5.0", "fuellib_sha": "2ed4fbe1e04b85e83f1010ca23be7f5da34bd492"}

just run sh "utils/jenkins/system_tests.sh" -t test -w $(pwd) -j "fuelweb_test" -i "$ISO_PATH" -o --group=setup

Changed in fuel:
status:	Fix Committed → Confirmed

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2014-05-23:

#5

logs.tar.gz Edit (3.2 MiB, application/x-tar)

Issue is still reproduced on {"build_id": "2014-05-23_03-53-39", "mirantis": "yes", "build_number": "19", "ostf_sha": "5c479f04c35127576d35526650ec83b104f9a33d", "nailgun_sha": "bd09f89ef56176f64ad5decd4128933c96cb20f4", "production": "docker", "api": "1.0", "fuelmain_sha": "db2d153e62cb2b3034d33359d7e3db9d4742c811", "astute_sha": "9a0d86918724c1153b5f70bdae008dea8572fd3e", "release": "5.0", "fuellib_sha": "2ed4fbe1e04b85e83f1010ca23be7f5da34bd492"}

just run sh "utils/jenkins/system_tests.sh" -t test -w $(pwd) -j "fuelweb_test" -i "$ISO_PATH" -o --group=setup

Vladimir Kuklin (vkuklin) on 2014-05-23

Changed in fuel:
importance:	High → Critical

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2014-05-23:

#6

supervisorctl restarting containers is expected behavior. If you must stop a container for any reason, you must do it this way:
supervisorctl stop cobbler (only stops restarting it - does NOT actually stop it)
dockerctl stop cobbler

If you have a specific automated test that works in a peculiar way, I can help massage it into place.

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2014-05-23:

#7

I've tried to reproduce this issue Andrey had by following his instructions: run --group=setup and then start Fuel node. But it works fine, cobbler container is able to start with no issues.

Here is the problem part from Andrey's logs:

Info: cobbler_digest_user: user cobbler already exists
Notice: /Stage[main]/Cobbler::Server/Service[httpd]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Cobbler::Server/Service[httpd]: Unscheduling refresh on Service[httpd]
Notice: /Stage[main]/Cobbler::Server/Service[cobblerd]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Cobbler::Server/Service[cobblerd]: Scheduling refresh of Exec[cobbler_sync]
Info: /Stage[main]/Cobbler::Server/Service[cobblerd]: Unscheduling refresh on Service[cobblerd]
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: cobblerd does not appear to be running/accessible
Error: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]: Failed to call refresh: cobbler sync returned 155 instead of one of [0]
Error: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]: cobbler sync returned 155 instead of one of [0]
Notice: /Stage[main]/Cobbler::Server/Service[xinetd]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Cobbler::Server/Service[xinetd]: Unscheduling refresh on Service[xinetd]
Info: cobbler_distro: checking if distro exists: bootstrap

It looks like cobblerd service failed to start:
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: cobblerd does not appear to be running/accessible

Although puppet reports successful start:
Notice: /Stage[main]/Cobbler::Server/Service[cobblerd]/ensure: ensure changed 'stopped' to 'running'

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2014-05-23:

#8

Andrey,

What exactly broke? Your logs deployed 7 nodes. I don't see where the specific failure is.

Changed in fuel:
status:	Confirmed → Incomplete

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-23: Fix proposed to fuel-library (master)

#9

Fix proposed to branch: master
Review: https://review.openstack.org/95186

Changed in fuel:
assignee:	Alexander Didenko (adidenko) → Matthew Mosesohn (raytrac3r)
status:	Incomplete → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-23: Fix proposed to fuel-library (stable/5.0)

#10

Fix proposed to branch: stable/5.0
Review: https://review.openstack.org/95201

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-23: Fix merged to fuel-library (master)

#11

Reviewed: https://review.openstack.org/95186
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=a2081ba6e173975b10489dca09f8c8d85ffd7b10
Submitter: Jenkins
Branch: master

commit a2081ba6e173975b10489dca09f8c8d85ffd7b10
Author: Matthew Mosesohn <email address hidden>
Date: Fri May 23 19:09:50 2014 +0400

Install cobbler settings before service start

Change-Id: I14ce54625d26e6bbeceaa0c78f70f01561ac5718
Closes-Bug: #1319076

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-23: Fix merged to fuel-library (stable/5.0)

#12

Reviewed: https://review.openstack.org/95201
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=eb90bae5159aa5fad25bf7ee1019ed5dc2f59ccd
Submitter: Jenkins
Branch: stable/5.0

commit eb90bae5159aa5fad25bf7ee1019ed5dc2f59ccd
Author: Matthew Mosesohn <email address hidden>
Date: Fri May 23 19:09:50 2014 +0400

Install cobbler settings before service start

Change-Id: I14ce54625d26e6bbeceaa0c78f70f01561ac5718
Closes-Bug: #1319076

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2014-05-26:

#13

After some research of cobbler error message and its code, it looks like sometimes we may have empty JSON file(s) when "cobbler" service starts. One of possible empty JSON files are:

/var/lib/cobbler/config/distros.d/centos-x86_64.json
/var/lib/cobbler/config/distros.d/ubuntu_1204_x86_64.json
/var/lib/cobbler/config/profiles.d/bootstrap.json
/var/lib/cobbler/config/profiles.d/centos-x86_64.json
/var/lib/cobbler/config/profiles.d/ubuntu_1204_x86_64.json

We create those files after "service{cobbler}" in out manifests, so I'm not sure how exactly this is happening.

Revision history for this message

Dennis Dmitriev (ddmitriev) wrote on 2014-07-17:

#14

It is not reproduced on 5.1 now:
{"build_id": "2014-07-16_23-15-49", "ostf_sha": "9863db951a6e159f4fa6e6861c8331e1af069cf8", "build_number": "326", "auth_required": false, "api": "1.0", "nailgun_sha": "2dd42e8cbd82efea996ec85736148ff4a55a4631", "production": "docker", "fuelmain_sha": "37b50682c58444d33bf5fbe17814591af2109e7e", "astute_sha": "d90cad0130da014eded5c21fa5f31054ce999dac", "feature_groups": ["mirantis"], "release": "5.1", "fuellib_sha": "9f77d5a0e030b10f8c78018edb1f65bbd6d875ab"}

[root@nailgun ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS ...
...
5b8e86a31e8d fuel/cobbler_5.1:latest /bin/sh -c /usr/loca 41 minutes ago Up 41 minutes
...

Revision history for this message

Dennis Dmitriev (ddmitriev) wrote on 2014-07-17:

#15

5.0.1 is working fine too.
{"build_id": "2014-07-15_09-57-01", "mirantis": "yes", "build_number": "125", "ostf_sha": "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f", "nailgun_sha": "3b347e9c6a08bb30d85a122398eacd605fb62305", "production": "docker", "api": "1.0", "fuelmain_sha": "019cb348b50e71bf1c522bea649b3cdb0bcca28a", "astute_sha": "5df009e8eab611750309a4c5b5c9b0f7b9d85806", "release": "5.0.1", "fuellib_sha": "2d1e1369c13bc9771e9473086cb064d257a21fc2"}

[root@nailgun ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS
cf650f3cd4c4 fuel/cobbler_5.0.1:latest /bin/sh -c /usr/loca 15 minutes ago Up 15 minutes

Revision history for this message

Dennis Dmitriev (ddmitriev) wrote on 2014-07-17:

#16

fixed

Dmitry Pyzhov (dpyzhov) on 2014-08-12

no longer affects:	fuel/5.1.x
Changed in fuel:
milestone:	5.0 → 5.1

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	Critical	Matthew Mosesohn	Fuel for OpenStack 5.1
	5.0.x	Fix Released	Critical	Matthew Mosesohn	Fuel for OpenStack 5.0

Fuel for OpenStack

supervisorctl restarts docker containers

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches