test_subnet_details failed due to MismatchError

Bug #1412325 reported by Ken'ichi Ohmichi
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Unassigned
tempest
Fix Released
Undecided
Yair Fried

Bug Description

A new test test_subnet_details failed on the gate for Nova like the following:

http://logs.openstack.org/92/144092/3/check/check-tempest-dsvm-neutron-full/6bcefe8/logs/testr_results.html.gz

Traceback (most recent call last):
  File "tempest/test.py", line 112, in wrapper
    return f(self, *func_args, **func_kwargs)
  File "tempest/scenario/test_network_basic_ops.py", line 486, in test_subnet_details
    self._check_dns_server(ssh_client, [alt_dns_server])
  File "tempest/scenario/test_network_basic_ops.py", line 437, in _check_dns_server
    trgt_serv=dns_servers))
  File "/usr/local/lib/python2.7/dist-packages/testtools/testcase.py", line 348, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/usr/local/lib/python2.7/dist-packages/testtools/testcase.py", line 433, in assertThat
    raise mismatch_error
MismatchError: set(['9.8.7.6']) != set(['1.2.3.4']): Looking for servers: ['9.8.7.6']. Retrieved DNS nameservers: ['1.2.3.4'] From host: 172.24.4.98.

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/148197

Changed in tempest:
assignee: nobody → Ken'ichi Ohmichi (oomichi)
status: New → In Progress
Revision history for this message
Yair Fried (yfried) wrote :

IMO this is actually a Neutron bug and Tempest reveals a race.
We should add a waiter for the update to verify it's really a race.

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

A waiter might fix the problem.
The issue does not occur always - which means that cirros surely updated dns servers through DHCP (it would have been weird if it didn't but you can never know).

However, once a subnet is updated on the neutron server there is a delay while the DHCP agent receives the update and reloads the corresponding dnsmasq process. The waiter should take into account this delay.

Subnet update event on neutron-server:
http://logs.openstack.org/92/144092/3/check/check-tempest-dsvm-neutron-full/6bcefe8/logs/screen-q-svc.txt.gz#_2015-01-19_06_53_38_409

Kill -HUP for dnsmasq on DHCP agent
http://logs.openstack.org/92/144092/3/check/check-tempest-dsvm-neutron-full/6bcefe8/logs/screen-q-dhcp.txt.gz#_2015-01-19_06_53_39_822

The delay is about 1.5 seconds. On top of this, one should add something for the DHCP protocol to complete on lease renewal.

However, it seems the lease renewal (6:53:38.410) occurs before the DHCP server is updated (6:53:39.822)
http://logs.openstack.org/92/144092/3/check/check-tempest-dsvm-neutron-full/6bcefe8/console.html#_2015-01-19_07_00_31_210

I am not sure there is anything broken in neutron about this (if not that we're lacking feedback on subnet's DHCP status).
If I were to fix it I would add a waiter before renewing the lease (5 seconds), and another before checking the ssh client (2 seconds). If you want you might add retry logic, but that should not be necessary. Even with high load I've rarely seen DHCP updates be process in over 5 seconds.

Changed in tempest:
assignee: Ken'ichi Ohmichi (oomichi) → Yair Fried (yfried)
Yair Fried (yfried)
Changed in tempest:
assignee: Yair Fried (yfried) → nobody
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Saw this just now[1]

2015-01-19 13:45:31.130 | ==============================
2015-01-19 13:45:31.130 | Failed 1 tests - output below:
2015-01-19 13:45:31.130 | ==============================
2015-01-19 13:45:31.130 |
2015-01-19 13:45:31.131 | tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_subnet_details[compute,gate,network,smoke]
2015-01-19 13:45:31.131 | -----------------------------------------------------------------------------------------------------------
2015-01-19 13:45:31.131 |
[. . .]
2015-01-19 13:45:31.440 | Traceback (most recent call last):
2015-01-19 13:45:31.440 | File "tempest/test.py", line 112, in wrapper
2015-01-19 13:45:31.440 | return f(self, *func_args, **func_kwargs)
2015-01-19 13:45:31.440 | File "tempest/scenario/test_network_basic_ops.py", line 486, in test_subnet_details
2015-01-19 13:45:31.440 | self._check_dns_server(ssh_client, [alt_dns_server])
2015-01-19 13:45:31.440 | File "tempest/scenario/test_network_basic_ops.py", line 437, in _check_dns_server
2015-01-19 13:45:31.440 | trgt_serv=dns_servers))
2015-01-19 13:45:31.440 | File "/usr/local/lib/python2.7/dist-packages/testtools/testcase.py", line 348, in assertEqual
2015-01-19 13:45:31.440 | self.assertThat(observed, matcher, message)
2015-01-19 13:45:31.441 | File "/usr/local/lib/python2.7/dist-packages/testtools/testcase.py", line 433, in assertThat
2015-01-19 13:45:31.441 | raise mismatch_error
2015-01-19 13:45:31.441 | MismatchError: set(['9.8.7.6']) != set(['1.2.3.4']): Looking for servers: ['9.8.7.6']. Retrieved DNS nameservers: ['1.2.3.4'] From host: 172.24.4.75.
2015-01-19 13:45:31.441 |
2015-01-19 13:45:31.441 |
[. . .]

[1] http://logs.openstack.org/44/148244/1/check/check-tempest-dsvm-neutron-full/07a3134/console.html

Revision history for this message
John Schwarz (jschwarz) wrote :
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Elastic Recheck query: https://review.openstack.org/#/c/148272/ -- Add query for DNS MisMatchError - bug 1412325

Changed in tempest:
assignee: nobody → Yair Fried (yfried)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tempest (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/148391

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/148723

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tempest (master)

Reviewed: https://review.openstack.org/148723
Committed: https://git.openstack.org/cgit/openstack/tempest/commit/?id=81cf3a42386c211b08d992d8de171dfb0ca9f772
Submitter: Jenkins
Branch: master

commit 81cf3a42386c211b08d992d8de171dfb0ca9f772
Author: Adam Gandelman <email address hidden>
Date: Tue Jan 20 13:16:39 2015 -0800

    Skip test_subnet_details

    Rather than a full revert, skip test_subnet_details until timeouts/delays
    have been added and the test is stabilized.

    Change-Id: Iac6603ae3efc8588756c02ae2c1ab03a9df615e9
    Related-bug: #1412325

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tempest (master)

Change abandoned by Ken'ichi Ohmichi (<email address hidden>) on branch: master
Review: https://review.openstack.org/148197
Reason: The skip patch is merged, and this patch is not necessary right now.

Feel free to use this code with the other commit if necessary.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Joe Gordon (<email address hidden>) on branch: master
Review: https://review.openstack.org/148391
Reason: https://review.openstack.org/#/c/148723/ merged instead

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/148815

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master)

Reviewed: https://review.openstack.org/148815
Committed: https://git.openstack.org/cgit/openstack/tempest/commit/?id=bb0ea39fd19d2675dae738620f5dcc44498633e4
Submitter: Jenkins
Branch: master

commit bb0ea39fd19d2675dae738620f5dcc44498633e4
Author: Yair Fried <email address hidden>
Date: Mon Jan 19 07:26:08 2015 +0000

    Unskip and Add timeout method to dns check scenario

    In test_subnet_details, Tempest checks the effect of alternative dns
    server which is updated just before the check.
    However, sometimes the check failed because the update could not make
    it before the check. This reason is not concrete yet, but this problem
    happens three times in 60 minutes now.
    So we need to solve this problem soon for smooth development, and this
    patch adds timeout method for alternative dns part.

    Change-Id: I0cb98ff401b526c24388ea440e5629a8ebc925eb
    Closes-Bug: #1412325

Changed in tempest:
status: In Progress → Fix Released
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This bug is > 365 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

Changed in neutron:
status: New → Incomplete
Changed in neutron:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.