nova-scheduler received request from nova-conductor only after ~40 minutes

Bug #1572968 reported by Leontii Istomin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Invalid
High
Leontii Istomin

Bug Description

Detailed bug description:
during boot_attach_and_delete_server_with_secgroups (http://paste.openstack.org/show/494970/) rally scenario we got the following error: http://paste.openstack.org/show/494926/
uuid of the instance: http://paste.openstack.org/show/494973/
We have found that nova-conductor wait responce from nova-scheduller during 60 seconds: http://paste.openstack.org/show/494957/
But nova-sheduller received the request only after 40 minutes and processed the request during 5 seconds: http://paste.openstack.org/show/494968/
Steps to reproduce:
1. deploy Fuel 9.0-217
2. apply fixes for succsessfull deployment due the following bugs:
https://bugs.launchpad.net/fuel/+bug/1543233 (regen repo and build butstrap image)
https://bugs.launchpad.net/fuel/+bug/1566974
https://bugs.launchpad.net/fuel/+bug/1570509
https://bugs.launchpad.net/fuel/+bug/1569859
restart nailgun and receiverd
3. boot 201 nodes in bootstrap
4. deploy env with 3 controllers, 20 computes+Ceph, 168 computes
5. replace rabbitmq to zeromq (http://perestroika-repo-tst.infra.mirantis.net/review/CR-19836/mos-repos/ubuntu/9.0/pool/main/p/python-oslo.messaging/)
6. perform rally tests

Expected results:
All vms will be booted successfully during the test
Actual result:
Some instances was failed to boot (43 from 1000)
Reproducibility:
didn't try to repro
Workaround:
not found yet
Impact:
Some instances can';t be started
Description of the environment:
- Operation system: ubuntu
- Versions of components: mos9.0
- Reference architecture: 3 controllers, 20 computes+Ceph, 168 computes
- Network model: vxlan+DVR
- Related projects installed: replaced rabbitmq by zeromq
Additional information:
Diagnostic snapshot: http://mos-scale-share.mirantis.com/fuel-snapshot-2016-04-21_07-55-59.tar.gz

description: updated
Dina Belova (dbelova)
tags: added: area-nova
Changed in mos:
assignee: nobody → MOS Nova (mos-nova)
milestone: none → 9.0
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Leontii Istomin (listomin) wrote :

Need to reproduce the issue. We can use 20-nodes environment. New oslo-messaging package with zero driver with timestamping is ready https://review.fuel-infra.org/#/c/19937/ (http://perestroika-repo-tst.infra.mirantis.net/review/CR-19937/mos-repos/ubuntu/9.0/) Need to go though a message chain and find where message stucks.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Leontiy, we simply don't have hardware to try to reproduce this. I suggest you guys keep us in the loop and ping in Slack, when there is an environment with repro ready, or even better, when you'll give this another try, so that we can monitor booting of VMs live.

Another point is that this probably should not be High, as we are using the oslo.messaging driver which stability is an open question...

Changed in mos:
assignee: MOS Nova (mos-nova) → Leontiy Istomin (listomin)
status: Confirmed → Incomplete
Revision history for this message
Dina Belova (dbelova) wrote :

We're still going to reproduce the bug at the end of this week or beginning of the next. Let's keep it in incomplete state for a while.

Revision history for this message
Dina Belova (dbelova) wrote :

More than a month in the incomplete state with no successful reproduction. If seen again, will be reopened.

Changed in mos:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.