Deployment failed with error: <class 'cobbler.cexceptions.CX'>:'MAC address duplicated: 0c:c4:7a:14:25:36'

Bug #1491725 reported by Timur Nurlygayanov on 2015-09-03
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Vladimir Kozhukalov
7.0.x
High
Vladimir Kozhukalov

Bug Description

Note: this bug is hard to reproduce, but I faced it two times on the scale lab.

Steps To Reproduce:
1. Create cluster with 21 hardware server with 4 Ethernet interfaces and deploy it. Deployment will passed.
2. Delete it and create new cluster with the same servers. Start network verification and verify that all ok. Deploy the cluster.

Observed Result:
Deployment failed with the error:
<class 'cobbler.cexceptions.CX'>:'MAC address duplicated: 0c:c4:7a:14:25:36'

Please, see attached diagnostic snapshot for more detailed information:
https://drive.google.com/file/d/0Byup6hoNUUUeeXctZHFFTEhITW8/view?usp=sharing

Here is some errors in Nailgun & Astute logs:

_____________
[root@fuel ~]# tail -n 200 -f /var/log/docker-nailgun.log
2015-09-03 08:05:08,835 DEBG 'statsenderd' stdout output:
2015-09-03 08:05:08.834 ERROR [7f7ff36de700] (statsenderd) Collector ping failed: HTTPError

2015-09-03 08:09:49,046 DEBG 'statsenderd' stdout output:
2015-09-03 08:09:49.045 ERROR [7f7ff36de700] (statsenderd) Collector ping failed: HTTPError
____________
[root@fuel ~]# tail -n 200 -f /var/log/docker-astute.log
fuel-core-7.0-astute
fuel-core-7.0-astute
fuel-core-7.0-astute
fuel-core-7.0-astute
2015-08-28 11:49:29,262 DEBG 'astute' stdout output:
[amqp] Detected missing server heartbeats
____________

Reproduced on MOS 7.0 ISO #219:
{"build_id": "2015-08-23_15-01-12", "build_number": "219", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "2015-08-23_15-01-12", "build_number": "219", "api": "1.0", "fuel-library_sha": "3a3ea6d9849bc1ba35c1bd882f0a0678b20d2e51", "nailgun_sha": "7790ce872512ecdf21689e6a5f970dd7119febdb", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "4c2ab9d6c623d345086c6e2874d1df81fd96a942", "production": "docker", "python-fuelclient_sha": "fc7b63aa6900fe3b2c183108ba6a13e868bc0472", "astute_sha": "53c86cba593ddbac776ce5a3360240274c20738c", "fuel-ostf_sha": "16839cbf471b7142b04c0d2c2d94786bc486fefe", "release": "7.0", "fuelmain_sha": "a494e6628319abfef57e1754f6453cf8f1a4bc65"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "3a3ea6d9849bc1ba35c1bd882f0a0678b20d2e51", "nailgun_sha": "7790ce872512ecdf21689e6a5f970dd7119febdb", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "4c2ab9d6c623d345086c6e2874d1df81fd96a942", "production": "docker", "python-fuelclient_sha": "fc7b63aa6900fe3b2c183108ba6a13e868bc0472", "astute_sha": "53c86cba593ddbac776ce5a3360240274c20738c", "fuel-ostf_sha": "16839cbf471b7142b04c0d2c2d94786bc486fefe", "release": "7.0", "fuelmain_sha": "a494e6628319abfef57e1754f6453cf8f1a4bc65"}

Changed in fuel:
importance: Undecided → Critical
assignee: nobody → Fuel Python Team (fuel-python)
milestone: none → 7.0
status: New → Confirmed
description: updated

From the fuel-dev chat (probably it will help):

Timur Nurlygayanov [8:55 PM]
hi there
Timur Nurlygayanov [8:55 PM]
I have strange error: <class 'cobbler.cexceptions.CX'>:'MAC address duplicated: 0c:c4:7a:14:25:36'
Timur Nurlygayanov [8:56 PM]
and I can't determine the duplicated mac addresses from master node
Timur Nurlygayanov [9:11 PM]
I can provide the environment
Ryan Moe [10:35 PM]
I ran into that same problem yesterday afternoon on a 7.0 ISO
Ryan Moe [10:35 PM]
it turned out I had a duplicated cobbler profile
Ryan Moe [10:36 PM]
one of my nodes (the one with the mac address from the error message) had 2 cobbler profiles with different node IDs

description: updated

I had the same problem that occured after upgrade - https://bugs.launchpad.net/fuel/+bug/1491516

@Andrey, thank you for confirmation, it looks like not "the same issue" but probably the root of the issues is the same, let's not mark these issues as duplicated because it is different steps to reproduce.

The priority of this issue is Critical because it affects deployment and it is Blocked for deployment of OpenStack clusters. Priority can be changed to High only if we will understand that it is very hard to fix in MOS 7.0 - this issue reproduced not in 100% of cases, so, it is Blocker, but blocker for not 100% of users.

And looks like the case with upgrade which was mentioned by Andrey will be reproduced in 100%, so, it means that we have some hidden issues in the product which will affect many users.

Important:
If user will face the issue it is not possible to avoid the issue, looks like it will require to redeploy master node to fix the issue.

from cobbler log:
Wed Sep 2 15:17:31 2015 - INFO | Exception occured: <class 'cobbler.cexceptions.CX'>
Wed Sep 2 15:17:31 2015 - INFO | Exception value: 'MAC address duplicated: 0c:c4:7a:14:25:36'
Wed Sep 2 15:17:31 2015 - INFO | Exception Info:
  File "/usr/lib/python2.6/site-packages/cobbler/remote.py", line 2058, in _dispatch
    return method_handle(*params)
   File "/usr/lib/python2.6/site-packages/cobbler/remote.py", line 838, in modify_system
    return self.modify_item("system",object_id,attribute,arg,token)
   File "/usr/lib/python2.6/site-packages/cobbler/remote.py", line 831, in modify_item
    return method(arg)
   File "/usr/lib/python2.6/site-packages/cobbler/item_system.py", line 695, in modify_interface
    if field == "macaddress" : self.set_mac_address(value, interface)
   File "/usr/lib/python2.6/site-packages/cobbler/item_system.py", line 384, in set_mac_address
    raise CX("MAC address duplicated: %s" % address)

Andrew Maksimov (maximov) wrote :

so Timur, can you provide insight how often this issue can be reproduced. From bug description I understand that this is not very often. right?
Also after deployment is failed, can user destroy cluster and redeploy it, will it work?

Yes, this happens not in 100% of cases (probably in 10-20% of deployments), but case "delete environment and create new one" doesn't work - new environments will fail too.

Ivan Kliuk (ivankliuk) on 2015-09-03
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Vladimir Kozhukalov (kozhukalov)

Folks, we catch same issue :
I don't know how-to reproduce it, but its related to cobbler system and fuel nodes:
 id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---|----------|------------------|---------|---------------|-------------------|-------|-------------------|--------|---------
18 | discover | Untitled (42:94) | 5 | 192.168.5.110 | ec:f4:bb:cd:42:94 | | cinder, compute | True | 5
19 | discover | Untitled (43:00) | 5 | 192.168.5.112 | ec:f4:bb:cd:43:00 | | cinder, compute | True | 5
16 | discover | Untitled (45:54) | 5 | 192.168.5.113 | ec:f4:bb:cd:45:54 | | controller, mongo | True | 5
17 | discover | Untitled (41:20) | 5 | 192.168.5.111 | ec:f4:bb:cd:41:20 | | controller, mongo | True | 5
12 | discover | Untitled (45:4c) | 5 | 192.168.5.114 | ec:f4:bb:cd:45:4c | | controller, mongo | True | 5
[root@fueldc209hw bootstrap]# cobbler system list
 default
 node-11
 node-12
 node-13
 node-14
 node-15
 node-16
 node-17
 node-18
 node-19
##
But nodes count was always static - 5
I guess, its related to some action, which "delete nodes" , but after node re-discovered - it catch +1 to their id.
Work-around :
cobbler system remove --name ${old_node-id}
cobbler sync

Ivan Ponomarev (ivanzipfer) wrote :

I've got the same issue but when I checked out nailgun system list I didn't find there garbage there was only default system

Fix proposed to branch: master
Review: https://review.openstack.org/220191

Changed in fuel:
status: Confirmed → In Progress
Dmitry Pyzhov (dpyzhov) on 2015-09-04
Changed in fuel:
importance: Critical → High

Dmitry, please describe why priority was changed.

Dmitry Pyzhov (dpyzhov) wrote :

Part of the nodes cannot be deployed and there is a workaround. Not every possible configuration affected. I agree that issue is painful. But it has High priority according our criteria: https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Confirm_and_triage_bugs

Reviewed: https://review.openstack.org/220191
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=8283dc2932c24caab852ae9de15f94605cc350c6
Submitter: Jenkins
Branch: master

commit 8283dc2932c24caab852ae9de15f94605cc350c6
Author: Vladimir Kozhukalov <email address hidden>
Date: Thu Sep 3 17:58:23 2015 +0300

    Remove nodes from Cobbler by MACs before provisioning

    The issue is that when we massively remove nodes from
    Cobbler some of them can stay there. It is because
    Cobbler is not intended to be scalable. It stores
    all systems in plain text files and manipulates these
    data quite slowly. So, we need to make sure that
    there are no nodes in the Cobbler with the same MAC
    addresses, otherwise Cobbler throws MAC address
    duplication error.

    Change-Id: I822a086c364fc026cd07b2fad4e1810c274d6357
    Closes-Bug: #1491725

Changed in fuel:
status: In Progress → Fix Committed

Verified on build 287
Fix released

Changed in fuel:
status: Fix Committed → Fix Released
Ksenia Svechnikova (kdemina) wrote :

Issue was reproduced on ISO#286

Steps:

1. Deploy cluster, deployment failed
2. Reset cluster
3. Start deploy

Get error at <class 'cobbler.cexceptions.CX'>:'MAC address duplicated: ec:f4:bb:cd:42:96'

@Ksenia

could you please check the issue on the latest MOS 7.0 images, like #287+, and change status to confirmed it will be reproduced?

Thank you!

Ksenia Svechnikova (kdemina) wrote :

Verify the behavior on ISO#288. The issue wasn't reproduced

tags: added: release-notes-done
Dennis Dmitriev (ddmitriev) wrote :

Reproduced the same issue on ISO#288 , after reset a cluster.

Changed in fuel:
status: Fix Released → Confirmed

Does #288 contain this fix https://review.openstack.org/#/c/222860/?

Alexander Gordeev (a-gordeev) wrote :

Vladimir, nope, it can't contain the fix for https://bugs.launchpad.net/fuel/+bug/1494446

Moreover, the report for 1494446 starts with 'ISO 288 (RC2)'.

Andrew Maksimov (maximov) wrote :

guys, please retest it against RC3, because we fixed related bug in RC3 (https://bugs.launchpad.net/fuel/+bug/1494446)

Changed in fuel:
milestone: 7.0 → 8.0
status: Confirmed → Fix Committed
tags: added: on verification

I did about 10 delete and reset clusters in a row and issue didn't show up

build_id": "298",
"build_number": "298",
"release_versions":
{

    "2015.1.0-7.0":

{

    "VERSION":

{

    "build_id": "298",
    "build_number": "298",
    "api": "1.0",
    "fuel-library_sha": "0623b4daad438ceeb5dc41b10cdd3011795fff7e",
    "nailgun_sha": "d590b26dbb09785b8a8b3651b0ef69746fcf9991",
    "feature_groups":

            [
                "mirantis"
            ],
            "fuel-nailgun-agent_sha": "d7027952870a35db8dc52f185bb1158cdd3d1ebd",
            "openstack_version": "2015.1.0-7.0",
            "fuel-agent_sha": "082a47bf014002e515001be05f99040437281a2d",
            "production": "docker",
            "python-fuelclient_sha": "486bde57cda1badb68f915f66c61b544108606f3",
            "astute_sha": "6c5b73f93e24cc781c809db9159927655ced5012",
            "fuel-ostf_sha": "1f08e6e71021179b9881a824d9c999957fcc7045",
            "release": "7.0",
            "fuelmain_sha": "6b83d6a6a75bf7bca3177fcf63b2eebbf1ad0a85"
        }
    }

}

tags: removed: on verification

Reviewed: https://review.openstack.org/222658
Committed: https://git.openstack.org/cgit/stackforge/fuel-docs/commit/?id=18e039a858e9e7d8846f55dc0bf5ff193d1d8ac2
Submitter: Jenkins
Branch: master

commit 18e039a858e9e7d8846f55dc0bf5ff193d1d8ac2
Author: Alexander Adamov <email address hidden>
Date: Fri Sep 11 18:02:12 2015 +0300

    [RN 7.0]Fuel install&deploy issues

    Adds resolved and known issues:
    LP1491725, LP1437410,
    LP1477903

    Change-Id: I87fcb333d632de6faa7071713f88e8519bccf8d7
    Related-Bug: #1491725
    Related-Bug: #1437410
    Related-Bug: #1477903

tags: added: on-verification
Sergey Novikov (snovikov) wrote :

Verified on

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  openstack_version: "2015.1.0-8.0"
  api: "1.0"
  build_number: "138"
  build_id: "138"
  fuel-nailgun_sha: "3a745ee87e659b3ba239bbede21e491292646acb"
  python-fuelclient_sha: "769df968e19d95a4ab4f12b1d2c76d385cf3168c"
  fuel-agent_sha: "84335446172cc6a699252c184076a519ac791ca1"
  fuel-nailgun-agent_sha: "d66f188a1832a9c23b04884a14ef00fc5605ec6d"
  astute_sha: "e99368bd77496870592781f4ba4fb0caacb9f3a7"
  fuel-library_sha: "80c2dcf3e298e576dd50111825041466b0e38d3f"
  fuel-ostf_sha: "983d0e6fe64397d6ff3bd72311c26c44b02de3e8"
  fuel-createmirror_sha: "df6a93f7e2819d3dfa600052b0f901d9594eb0db"
  fuelmain_sha: "4c58b6503fc780be117777182165fd7b037b1a96"

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification
Dmitry Pyzhov (dpyzhov) on 2015-10-21
tags: added: area-python
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers