Improve ipaddr2 logs to debug RTNETLINK errors

Bug #2002346 reported by Fabiano Correa Mercer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Fabiano Correa Mercer

Bug Description

Brief Description
-----------------
In rare situations, the add_interface may fail with RTNETLINK error.

Add logs to help the investigation if this error occurs again.
Log to show the device link status and all IP address configured for
the specific device.
Add log for IP address deletion from device, to be sure that
ipaddr2 start/stop sequence were executed.

Severity
--------
Minor.

Steps to Reproduce
------------------
Deploy 250 subclouds in parallel

Expected Behavior
------------------
subcloud439 online/unlock/operational after the deployment

Actual Behavior
----------------
subcloud439 offline/non-operational after the deployment.

Reproducibility
---------------
1 out of 2000

System Configuration
--------------------
Distributed Cloud - IPv6

Branch/Pull Time/Commit
-----------------------
N/A

Last Pass
---------
2022-11-29_22-00-05

Timestamp/Logs
--------------
2022-12-12T18:45:33.533 ip-10-229-177-192 sysinv-api[53042]: info Dec 12 18:45:33 ERROR: Unable to run system show, trying direct request on sysinv-api URL (sysinv-api) 2022-12-12T18:45:33.677 ip-10-229-177-192 sysinv-api[53042]: info Dec 12 18:45:33 ERROR: Unable to communicate with the System Inventory Service (sysinv-api)

daemon-cfg.log:

2022-12-12T20:35:48.166 controller-0 OCF_IPaddr2(management-ip)[440833]: info INFO: IP status = no, IP_CIP=
...
2022-12-12T20:35:48.553 controller-0 OCF_IPaddr2(management-ip)[441642]: info INFO: Adding inet6 address 2620:10a:a001:ac12::36e2/123 to device ens6 (with preferred_lft forever)
2022-12-12T20:35:48.559 controller-0 OCF_IPaddr2(management-ip)[441642]: err ERROR: RTNETLINK answers: File exists
2022-12-12T20:35:48.563 controller-0 OCF_IPaddr2(management-ip)[441642]: err ERROR: Failed to add 2620:10a:a001:ac12::36e2

Test Activity
-------------
Scalability Testing

Workaround
----------
Reboot the subcloud

Changed in starlingx:
assignee: nobody → Fabiano Correa Mercer (fcorream)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/869610

Changed in starlingx:
status: New → In Progress
summary: - AWS subcloud not operational after the initial deployment
+ Improve ipaddr2 logs to debug RTNETLINK errors
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/869610
Committed: https://opendev.org/starlingx/integ/commit/2bea64e066cdf8abfe3d8ef00f8041d91b9a80b4
Submitter: "Zuul (22348)"
Branch: master

commit 2bea64e066cdf8abfe3d8ef00f8041d91b9a80b4
Author: Fabiano Mercer <email address hidden>
Date: Mon Jan 9 18:13:26 2023 -0300

    Add logs to debug RTNETLINK errors

    Problem: in a rare situation the add_interface may
    fail with RTNETLINK error.
    Add logs to help the investigation to check the
    device link status and IP address configured.

    Test plan ( Debian only )
    PASS Fresh install of AIO-SX
    PASS Fresh install of AIO-DX

    Closes-Bug: #2002346

    Signed-off-by: Fabiano Mercer <email address hidden>
    Change-Id: Ice92d54cf87c0b58ff0d1917b2c4b61a277fb961

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
tags: added: stx.8.0 stx.networking
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.