Fuel fails to start many services on Fuel node reboot

Bug #1518825 reported by Sam Stoelinga
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Matthew Mosesohn
7.0.x
Won't Fix
Medium
Matthew Mosesohn
8.0.x
Invalid
Medium
Matthew Mosesohn
Mitaka
Invalid
Medium
Matthew Mosesohn

Bug Description

After I rebooted the fuel master node I wasn't able to access the web interface. The root cause seems to be failure to startup postgres and other container log files including nginx and nailgun

Version found: MOS 7.0 official release

*Postgres container*
The log file in /var/lib/fuel/container_data/7.0/postgres/9.3/pgstartup.log shows that it's a permission issue:

< 2015-11-23 03:05:20.265 UTC >FATAL: could not open log file "/var/log/pgsql": Permission denied

Postgres fix: https://review.openstack.org/248520

*nginx*
Had to recreate /var/log/nginx in the container.

*Nailgun*
After fixing the Postgres issue there are still issues with the log and permissions of the nailgun container. See log:
Info: Applying configuration version '1448250752'
Notice: /Stage[main]/Main/Service[crond]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/Main/Service[crond]: Unscheduling refresh on Service[crond]
Notice: /Stage[main]/Nailgun::Venv/Exec[nailgun_syncdb]/returns: executed successfully
Notice: /Stage[main]/Nailgun::Venv/Exec[nailgun_upload_fixtures]/returns: executed successfully
Notice: Finished catalog run in 3.72 seconds
Stopping supervisord: ERROR: unix:///var/run/supervisor.sock refused connection (already shut down?)
Waiting roughly 60 seconds for /var/run/supervisord.pid to be removed after child processes exit

Supervisord still working on shutting down. We've waited roughly 60 seconds, we'll let it do its thing from here
Error: The directory named as part of the path /var/log/supervisor/supervisord.log does not exist.
For help, use /usr/bin/supervisord -h

*Cobbler*:
Cobbler container fails to start httpd becuase of log directory not existing:
service httpd start
Starting httpd: (2)No such file or directory: httpd: could not open error log file /etc/httpd/logs/error_log.
Unable to open logs
                                                           [FAILED]

Steps to reproduce:
1. Deploy Fuel 7.0 from ISO downloaded of Mirantis website
2. Reboot the fuel master through SSH and executing: reboot

Current result:
Fuel master is not usable

Workaround:
#postgres
touch /var/log/docker-logs/pgsql
chown 26:26 /var/log/docker-logs/pgsql

#nginx
mkdir /var/log/docker-logs/nginx

#nailgun
mkdir /var/log/docker-logs/supervisor
mkdir /var/log/docker-logs/nailgun

#cobbler
mkdir /var/log/docker-logs/httpd

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/248520

Changed in fuel:
assignee: nobody → Sam Stoelinga (sammiestoel)
status: New → In Progress
description: updated
description: updated
description: updated
summary: - Postgres fails to start on Fuel node reboot
+ Fuel fails to start many services on Fuel node reboot
description: updated
description: updated
Maciej Relewicz (rlu)
Changed in fuel:
importance: Undecided → Medium
milestone: none → 7.0-updates
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 7.0-updates → 9.0
tags: added: area-library
Changed in fuel:
assignee: Sam Stoelinga (sammiestoel) → Matthew Mosesohn (raytrac3r)
tags: added: keep-in-9.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Fuel DevOps Robot (<email address hidden>) on branch: master
Review: https://review.openstack.org/248520
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

This exact permission error doesn't appear for me in Fuel 7.0. There is another error where keystone container starts too soon before postgres is ready. This isn't a 'high' bug, however, because all the services will come up within 8 minutes without needing any manual intervntion.

Revision history for this message
Ali Jabbar (jabbar-ali) wrote :

I am facing same issue, Is there any work around for this problem ?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.