supervisorctl restarts docker containers

Bug #1319076 reported by Sergii Golovatiuk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Matthew Mosesohn
5.0.x
Fix Released
Critical
Matthew Mosesohn

Bug Description

When environment is deployed with

sh "utils/jenkins/system_tests.sh" -t test -w $(pwd) -j "fuelweb_test" -i "$ISO_PATH" -o --group=setup

--group=setup causes side effect on master node. It completely breaks cobbler container and supervisorctl tries to restart it constantly

How to reproduce
Set group as "setup" and deploy environment

docker ps -a
There will be stopped or 1-2 minute running cobbler container

Also if you VNC any slave node you will see that it can't get IP and "bo boot media found"

Changed in fuel:
assignee: nobody → Sergii Golovatiuk (sgolovatiuk)
assignee: Sergii Golovatiuk (sgolovatiuk) → Alexander Didenko (adidenko)
importance: Undecided → High
Changed in fuel:
milestone: none → 5.0
description: updated
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

This issue occurs only when we run systests with --group=setup. Like this:

 sh "utils/jenkins/system_tests.sh" -t test -w $(pwd) -j "fuelweb_test" -i "$ISO_PATH" -o --group=setup

After starting fuel virtual server we're getting broken cobbler container which leads to "Unable to find boot device" on openstack virtual nodes.

We can see the following in /var/log/docker-cobbler.log

dnsmasq-dhcp: read /etc/ethers - 0 addresses
Traceback (most recent call last):
  File "/usr/bin/cobblerd", line 76, in main
    api = cobbler_api.BootAPI(is_cobblerd=True)
  File "/usr/lib/python2.6/site-packages/cobbler/api.py", line 130, in __init__
    self.deserialize()
  File "/usr/lib/python2.6/site-packages/cobbler/api.py", line 898, in deserialize
    return self._config.deserialize()
  File "/usr/lib/python2.6/site-packages/cobbler/config.py", line 266, in deserialize
    raise CX("serializer: error loading collection %s. Check /etc/cobbler/modules.conf" % item.collection_type())
CX: 'serializer: error loading collection profile. Check /etc/cobbler/modules.conf'

But we can't reproduce it if we deploy Fuel in a normal (manual) way.

This may be related to the snapshot/revert mechanism we use in our system tests. For example: nailgun container is ready and "curl http://127.0.0.1:8000/api/version" works fine, systemtest scripts see "Fuel node deployment complete!" and they snapshot fuel VM. But cobbler container may be still running and we catch it in the middle of something. So after revert is gets messed up completely. The following commands fix this:

supervisorctl stop docker-cobbler
docker ps -a | grep cobbler | awk '{print $1}' | xargs docker rm -f
supervisorctl start docker-cobbler

So, maybe we should use "dockerctl check" in /usr/local/sbin/bootstrap_admin_node.sh instead of "curl http://127.0.0.1:8000/api/version" in order to determine if Fuel node deployment is complete?

Please also note that currently "dockerctl check" gives exit code 0 even if some containers are in "ERROR" state. So we should either make it "exit 1", or grep output from "dockerctl check"

Changed in fuel:
assignee: Alexander Didenko (adidenko) → Matthew Mosesohn (raytrac3r)
importance: High → Medium
status: New → Triaged
Changed in fuel:
importance: Medium → High
assignee: Matthew Mosesohn (raytrac3r) → Alexander Didenko (adidenko)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-main (master)

Fix proposed to branch: master
Review: https://review.openstack.org/93547

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-main (master)

Reviewed: https://review.openstack.org/93547
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=f9b9fc1d30f08e0cee5b2eae839360b13f3f81c2
Submitter: Jenkins
Branch: master

commit f9b9fc1d30f08e0cee5b2eae839360b13f3f81c2
Author: Aleksandr Didenko <email address hidden>
Date: Wed May 14 13:22:43 2014 +0300

    Refactor fuel node readiness check

    Use "dockerctl check" instead of checking Fuel web UI for fuel
    node rediness check. Using "dockerctl check" we'll make sure ALL
    docker containers are ready.

    Change-Id: I2649b45e4764217eabf2bc1f47a8602770b9dd14
    Closes-bug: #1319076

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Issue is still reproduced on {"build_id": "2014-05-23_03-53-39", "mirantis": "yes", "build_number": "19", "ostf_sha": "5c479f04c35127576d35526650ec83b104f9a33d", "nailgun_sha": "bd09f89ef56176f64ad5decd4128933c96cb20f4", "production": "docker", "api": "1.0", "fuelmain_sha": "db2d153e62cb2b3034d33359d7e3db9d4742c811", "astute_sha": "9a0d86918724c1153b5f70bdae008dea8572fd3e", "release": "5.0", "fuellib_sha": "2ed4fbe1e04b85e83f1010ca23be7f5da34bd492"}

just run sh "utils/jenkins/system_tests.sh" -t test -w $(pwd) -j "fuelweb_test" -i "$ISO_PATH" -o --group=setup

Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Issue is still reproduced on {"build_id": "2014-05-23_03-53-39", "mirantis": "yes", "build_number": "19", "ostf_sha": "5c479f04c35127576d35526650ec83b104f9a33d", "nailgun_sha": "bd09f89ef56176f64ad5decd4128933c96cb20f4", "production": "docker", "api": "1.0", "fuelmain_sha": "db2d153e62cb2b3034d33359d7e3db9d4742c811", "astute_sha": "9a0d86918724c1153b5f70bdae008dea8572fd3e", "release": "5.0", "fuellib_sha": "2ed4fbe1e04b85e83f1010ca23be7f5da34bd492"}

just run sh "utils/jenkins/system_tests.sh" -t test -w $(pwd) -j "fuelweb_test" -i "$ISO_PATH" -o --group=setup

Changed in fuel:
importance: High → Critical
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

supervisorctl restarting containers is expected behavior. If you must stop a container for any reason, you must do it this way:
supervisorctl stop cobbler (only stops restarting it - does NOT actually stop it)
dockerctl stop cobbler

If you have a specific automated test that works in a peculiar way, I can help massage it into place.

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

I've tried to reproduce this issue Andrey had by following his instructions: run --group=setup and then start Fuel node. But it works fine, cobbler container is able to start with no issues.

Here is the problem part from Andrey's logs:

Info: cobbler_digest_user: user cobbler already exists
Notice: /Stage[main]/Cobbler::Server/Service[httpd]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Cobbler::Server/Service[httpd]: Unscheduling refresh on Service[httpd]
Notice: /Stage[main]/Cobbler::Server/Service[cobblerd]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Cobbler::Server/Service[cobblerd]: Scheduling refresh of Exec[cobbler_sync]
Info: /Stage[main]/Cobbler::Server/Service[cobblerd]: Unscheduling refresh on Service[cobblerd]
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: cobblerd does not appear to be running/accessible
Error: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]: Failed to call refresh: cobbler sync returned 155 instead of one of [0]
Error: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]: cobbler sync returned 155 instead of one of [0]
Notice: /Stage[main]/Cobbler::Server/Service[xinetd]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Cobbler::Server/Service[xinetd]: Unscheduling refresh on Service[xinetd]
Info: cobbler_distro: checking if distro exists: bootstrap

It looks like cobblerd service failed to start:
Notice: /Stage[main]/Cobbler::Server/Exec[cobbler_sync]/returns: cobblerd does not appear to be running/accessible

Although puppet reports successful start:
Notice: /Stage[main]/Cobbler::Server/Service[cobblerd]/ensure: ensure changed 'stopped' to 'running'

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Andrey,

What exactly broke? Your logs deployed 7 nodes. I don't see where the specific failure is.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/95186

Changed in fuel:
assignee: Alexander Didenko (adidenko) → Matthew Mosesohn (raytrac3r)
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.0)

Fix proposed to branch: stable/5.0
Review: https://review.openstack.org/95201

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/95186
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=a2081ba6e173975b10489dca09f8c8d85ffd7b10
Submitter: Jenkins
Branch: master

commit a2081ba6e173975b10489dca09f8c8d85ffd7b10
Author: Matthew Mosesohn <email address hidden>
Date: Fri May 23 19:09:50 2014 +0400

    Install cobbler settings before service start

    Change-Id: I14ce54625d26e6bbeceaa0c78f70f01561ac5718
    Closes-Bug: #1319076

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/5.0)

Reviewed: https://review.openstack.org/95201
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=eb90bae5159aa5fad25bf7ee1019ed5dc2f59ccd
Submitter: Jenkins
Branch: stable/5.0

commit eb90bae5159aa5fad25bf7ee1019ed5dc2f59ccd
Author: Matthew Mosesohn <email address hidden>
Date: Fri May 23 19:09:50 2014 +0400

    Install cobbler settings before service start

    Change-Id: I14ce54625d26e6bbeceaa0c78f70f01561ac5718
    Closes-Bug: #1319076

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

After some research of cobbler error message and its code, it looks like sometimes we may have empty JSON file(s) when "cobbler" service starts. One of possible empty JSON files are:

/var/lib/cobbler/config/distros.d/centos-x86_64.json
/var/lib/cobbler/config/distros.d/ubuntu_1204_x86_64.json
/var/lib/cobbler/config/profiles.d/bootstrap.json
/var/lib/cobbler/config/profiles.d/centos-x86_64.json
/var/lib/cobbler/config/profiles.d/ubuntu_1204_x86_64.json

We create those files after "service{cobbler}" in out manifests, so I'm not sure how exactly this is happening.

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

It is not reproduced on 5.1 now:
{"build_id": "2014-07-16_23-15-49", "ostf_sha": "9863db951a6e159f4fa6e6861c8331e1af069cf8", "build_number": "326", "auth_required": false, "api": "1.0", "nailgun_sha": "2dd42e8cbd82efea996ec85736148ff4a55a4631", "production": "docker", "fuelmain_sha": "37b50682c58444d33bf5fbe17814591af2109e7e", "astute_sha": "d90cad0130da014eded5c21fa5f31054ce999dac", "feature_groups": ["mirantis"], "release": "5.1", "fuellib_sha": "9f77d5a0e030b10f8c78018edb1f65bbd6d875ab"}

[root@nailgun ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS ...
...
5b8e86a31e8d fuel/cobbler_5.1:latest /bin/sh -c /usr/loca 41 minutes ago Up 41 minutes
...

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

5.0.1 is working fine too.
{"build_id": "2014-07-15_09-57-01", "mirantis": "yes", "build_number": "125", "ostf_sha": "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f", "nailgun_sha": "3b347e9c6a08bb30d85a122398eacd605fb62305", "production": "docker", "api": "1.0", "fuelmain_sha": "019cb348b50e71bf1c522bea649b3cdb0bcca28a", "astute_sha": "5df009e8eab611750309a4c5b5c9b0f7b9d85806", "release": "5.0.1", "fuellib_sha": "2d1e1369c13bc9771e9473086cb064d257a21fc2"}

[root@nailgun ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS
cf650f3cd4c4 fuel/cobbler_5.0.1:latest /bin/sh -c /usr/loca 15 minutes ago Up 15 minutes

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

fixed

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/5.1.x
Changed in fuel:
milestone: 5.0 → 5.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.