nova-network failed to start after installing on CentOS compute node

Bug #1439996 reported by Stanislav Makar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Critical
Fuel QA Team
6.0.x
Invalid
Critical
Fuel DevOps

Bug Description

http://jenkins-product.srt.mirantis.net:8080/view/6.1/job/6.1.centos.smoke_nova/188/

 Scenario:
            1. Create cluster in HA mode with 1 controller
            2. Add 1 node with controller role
            3. Add 1 node with compute role
            4. Deploy the cluster
            5. Validate cluster was set up correctly, there are no dead
            services, there are no errors in logs
            6. Verify networks
            7. Verify network configuration on controller
            8. Run OSTF

on compute node
service openstack-nova-network status
openstack-nova-network dead but pid file exists

Started it manually - working

Stanislav Makar (smakar)
Changed in fuel:
assignee: nobody → Stanislav Makar (smakar)
summary: - AssertionError: Expected service count is 6, but get 5 count, actual
- list [<Service: nova-consoleauth>, <Service: nova-scheduler>, <Service:
- nova-conductor>, <Service: nova-cert>, <Service: nova-compute>]
+ Expected service count is 6, but get 5 count, actual list [<Service:
+ nova-consoleauth>, <Service: nova-scheduler>, <Service: nova-conductor>,
+ <Service: nova-cert>, <Service: nova-compute>]
description: updated
Changed in fuel:
importance: High → Critical
Revision history for this message
Stanislav Makar (smakar) wrote : Re: Expected service count is 6, but get 5 count, actual list [<Service: nova-consoleauth>, <Service: nova-scheduler>, <Service: nova-conductor>, <Service: nova-cert>, <Service: nova-compute>]
Changed in fuel:
status: Confirmed → Incomplete
status: Incomplete → Invalid
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Reproduced on CI: http://jenkins-product.srt.mirantis.net:8080/view/6.1/job/6.1.centos.smoke_nova/193/

On compute:
[root@node-2 ~]# /etc/init.d/openstack-nova-network status
openstack-nova-network dead but pid file exists
[root@node-2 ~]#

After manual restart it works:
[root@node-2 ~]# /etc/init.d/openstack-nova-network restart
Stopping openstack-nova-network: [FAILED]
Starting openstack-nova-network: [ OK ]
[root@node-2 ~]# /etc/init.d/openstack-nova-network status
openstack-nova-network (pid 5461) is running...
[root@node-2 ~]#

Changed in fuel:
status: Invalid → Confirmed
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Traceback from /var/log/nova/network.log on the compute node:

http://paste.openstack.org/show/198213/

summary: - Expected service count is 6, but get 5 count, actual list [<Service:
- nova-consoleauth>, <Service: nova-scheduler>, <Service: nova-conductor>,
- <Service: nova-cert>, <Service: nova-compute>]
+ nova-network failed to start after installing on CentOS compute node
Revision history for this message
Stanislav Makar (smakar) wrote :

I have not found such logs in diag snapshot as on http://paste.openstack.org/show/198213/
there is not even such file name /var/log/nova/network.log

Now I see that 6 last jobs do not have such errors http://jenkins-product.srt.mirantis.net:8080/view/6.1/job/6.1.centos.smoke_nova/

The root cause of this could be that we start service and then networking-refresh, will change the order

2015-04-04T04:54:45.434284+00:00 debug: Executing '/sbin/service openstack-nova-network status'
2015-04-04T04:54:45.539842+00:00 debug: Executing '/sbin/chkconfig openstack-nova-network'
2015-04-04T04:54:45.641858+00:00 debug: Executing '/sbin/service openstack-nova-network start'
2015-04-04T04:54:45.759137+00:00 debug: Executing '/sbin/chkconfig openstack-nova-network'
2015-04-04T04:54:45.967288+00:00 debug: Executing '/sbin/chkconfig openstack-nova-network on'
2015-04-04T04:54:46.247326+00:00 notice: (/Stage[main]/Nova::Network/Nova::Generic_service[network]/Service[nova-network]/ensure) ensure chang
ed 'stopped' to 'running'
2015-04-04T04:54:46.247651+00:00 debug: (/Stage[main]/Nova::Network/Nova::Generic_service[network]/Service[nova-network]) The container Nova::
Generic_service[network] will propagate my refresh event
2015-04-04T04:54:46.247651+00:00 info: (/Stage[main]/Nova::Network/Nova::Generic_service[network]/Service[nova-network]) Unscheduling refresh
on Service[nova-network]
2015-04-04T04:54:46.247651+00:00 info: (/Stage[main]/Nova::Network/Nova::Generic_service[network]/Service[nova-network]) Evaluated in 0.82 sec
onds
2015-04-04T04:54:46.247651+00:00 info: (Nova::Generic_service[network]) Starting to evaluate the resource
2015-04-04T04:54:46.250527+00:00 debug: (Nova::Generic_service[network]) The container Class[Nova::Network] will propagate my refresh event
2015-04-04T04:54:46.254997+00:00 info: (Nova::Generic_service[network]) Evaluated in 0.01 seconds
2015-04-04T04:54:46.255803+00:00 info: (/Stage[main]/Main/Exec[networking-refresh]) Starting to evaluate the resource
2015-04-04T04:54:46.257191+00:00 info: (/Stage[main]/Main/Exec[networking-refresh]) Evaluated in 0.00 seconds
2015-04-04T04:54:46.259050+00:00 info: (Class[Nova::Network]) Starting to evaluate the resource

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/171123

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Stanislav Makar (smakar) wrote :

Now we need QA help :)

here is Custom ISO is being built http://jenkins-product.srt.mirantis.net:8080/view/custom_iso/job/custom_6.1_iso/938/

It would be fine we test without this patch 10 time
and 10 times again this custom ISO
compare the result and find out whether this patch fixes the bug
Thanks

Changed in fuel:
assignee: Stanislav Makar (smakar) → Fuel QA Team (fuel-qa)
Changed in fuel:
status: In Progress → Triaged
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

I've completed 10 runs of this custom ISO, and 10 times of master ISO (fuel-6.1-298):

http://jenkins-product.srt.mirantis.net:8080/view/All/job/custom_manual_test/

builds #21 .. #31 - custom ISO
builds #32 .. #42 - master ISO #298

All passed succesfully.

But http://jenkins-product.srt.mirantis.net:8080/view/6.1/job/6.1.centos.smoke_nova/220/ failed yesterday with the same error.

Tanya Leontovich noticed that this issue is reproduced mainly on the host mc0n4-msk.msk.mirantis.net , so it can be an issue with the host.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/171123
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=f985d3aca79f0d4dea60119e940c92557b90132b
Submitter: Jenkins
Branch: master

commit f985d3aca79f0d4dea60119e940c92557b90132b
Author: Stanislav Makar <email address hidden>
Date: Tue Apr 7 10:15:55 2015 +0000

    Fix floating nova-network service start problem

    * Add ordering during nova network deployment.
    * Make stub for networking-refresh resource due such stuff is doing by
      l23network module.

    Change-Id: Ie42fcb5a952d97d2a2e459eaa771bcf1750315a7
    Closes-bug: #1439996

Changed in fuel:
status: Triaged → Fix Committed
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Fuel DevOps (fuel-devops)
Revision history for this message
Aleksandra Fedorova (bookwar) wrote :

Denis, afaik we provided you with access to Zabbix data. Is there anything else we need to do about this bug?

Changed in fuel:
assignee: Fuel DevOps (fuel-devops) → Dennis Dmitriev (ddmitriev)
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

According to zabbix, servers weren't on the load at the time when CI tests failed.
The bug hasn't reproduced on mc0n4 since Apr 09, srv14 is not using in CI at the moment.

Changed in fuel:
status: Fix Committed → Incomplete
assignee: Dennis Dmitriev (ddmitriev) → Fuel QA Team (fuel-qa)
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

According to issue is not reproduced more then month manually, on bvt and on swarm - move to invalid status

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.