ubuntu_fan_smoke_test failed with enable disable fan test on GKE 4.15 with g1-small and n1-highmem-16 / B-gcp-5.3

Bug #1840904 reported by Po-Hsu Lin
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Triaged
Medium
Unassigned
linux-signed-gke-4.15 (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

This test has passed on other nodes, just these 2 are failing
Test failed with:
    /usr/sbin/fanatic: bogus: unknown underlay network format

Reproduce rate: 3/3 on these two instances.

However, I tried to test this manually with g1-small, yes the first attempt failed with the exact failure, but when I give it a retry right away, test passed this time.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-1041-gke 4.15.0-1041.43
ProcVersionSignature: Ubuntu 4.15.0-1041.43-gke 4.15.18
Uname: Linux 4.15.0-1041-gke x86_64
ApportVersion: 2.20.9-0ubuntu7.7
Architecture: amd64
Date: Wed Aug 21 09:52:32 2019
SourcePackage: linux-signed-gke-4.15
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
tags: added: gke sru-20190812
Po-Hsu Lin (cypressyew)
tags: added: ubuntu-fan-smoke-test
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Issue found on GCP 5.3 Eoan (5.3.0-1010) as well.
Failing on all the instances

tags: added: 5.3 eoan gcp sru-20191202
Revision history for this message
Sean Feole (sfeole) wrote :

linux-gcp 5.3.0-1012.13

02/08 05:28:49 DEBUG| utils:0153| [stdout]
02/08 05:28:49 DEBUG| utils:0153| [stdout] Testing Fan Networking (pre-0.13.0 API)
02/08 05:28:54 DEBUG| utils:0153| [stdout] docker pull ubuntu: PASSED
02/08 05:28:54 DEBUG| utils:0153| [stdout] enable disable fan test: FAILED (fanatic enable-fan returned 1)
02/08 05:28:54 ERROR| utils:0153| [stderr] /usr/sbin/fanatic: bogus: unknown underlay network format
02/08 05:28:54 ERROR| test:0414| Exception escaping from test:
Traceback (most recent call last):
  File "/home/jenkins/autotest/client/shared/test.py", line 411, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/home/jenkins/autotest/client/shared/test.py", line 823, in _call_test_function
    return func(*args, **dargs)
  File "/home/jenkins/autotest/client/shared/test.py", line 291, in execute
    postprocess_profiled_run, args, dargs)
  File "/home/jenkins/autotest/client/shared/test.py", line 212, in _call_run_once
    self.run_once(*args, **dargs)
  File "/home/jenkins/autotest/client/tests/ubuntu_fan_smoke_test/ubuntu_fan_smoke_test.py", line 49, in run_once
    self.results = utils.system_output(cmd, retain_output=True)
  File "/home/jenkins/autotest/client/shared/utils.py", line 1267, in system_output
    verbose=verbose, args=args).stdout
  File "/home/jenkins/autotest/client/shared/utils.py", line 918, in run
    "Command returned non-zero exit status")
CmdError: Command <./ubuntu_fan_smoke_test.sh bogus> failed, rc=1, Command returned non-zero exit status
* Command:
    ./ubuntu_fan_smoke_test.sh bogus
Exit status: 1
Duration: 10.0694971085

stdout:

Testing Fan Networking (pre-0.13.0 API)
docker pull ubuntu: PASSED
enable disable fan test: FAILED (fanatic enable-fan returned 1)
stderr:
/usr/sbin/fanatic: bogus: unknown underlay network format

tags: added: sru-20200127
Revision history for this message
Sean Feole (sfeole) wrote :

Also seen on gcp : 5.3.0-1013.14 : amd64

Across all instances

02/18 20:02:18 DEBUG| utils:0153| [stdout] enable disable fan test: FAILED (fanatic enable-fan returned 1)
02/18 20:02:18 ERROR| utils:0153| [stderr] /usr/sbin/fanatic: bogus: unknown underlay network format
02/18 20:02:18 ERROR| test:0414| Exception escaping from test:
Traceback (most recent call last):
  File "/home/jenkins/autotest/client/shared/test.py", line 411, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/home/jenkins/autotest/client/shared/test.py", line 823, in _call_test_function
    return func(*args, **dargs)
  File "/home/jenkins/autotest/client/shared/test.py", line 291, in execute
    postprocess_profiled_run, args, dargs)
  File "/home/jenkins/autotest/client/shared/test.py", line 212, in _call_run_once
    self.run_once(*args, **dargs)
  File "/home/jenkins/autotest/client/tests/ubuntu_fan_smoke_test/ubuntu_fan_smoke_test.py", line 49, in run_once
    self.results = utils.system_output(cmd, retain_output=True)
  File "/home/jenkins/autotest/client/shared/utils.py", line 1267, in system_output
    verbose=verbose, args=args).stdout
  File "/home/jenkins/autotest/client/shared/utils.py", line 918, in run
    "Command returned non-zero exit status")
CmdError: Command <./ubuntu_fan_smoke_test.sh bogus> failed, rc=1, Command returned non-zero exit status
* Command:
    ./ubuntu_fan_smoke_test.sh bogus
Exit status: 1
Duration: 10.0405759811

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue can be found on B-gcp 5.3 as well.

tags: added: sru-20200217
summary: - ubuntu_fan_smoke_test failed on GKE 4.15 with g1-small and n1-highmem-16
+ ubuntu_fan_smoke_test failed with enable disable fan test on GKE 4.15
+ with g1-small and n1-highmem-16 / B-gcp-5.3
Sean Feole (sfeole)
tags: added: sru-20200316
Changed in ubuntu-kernel-tests:
status: New → Triaged
Changed in linux-signed-gke-4.15 (Ubuntu):
status: New → Triaged
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

In this round it's failing with all instances with 4.15.0-1057.60 GKE
f1-micro
g1-small
n1-highcpu-16
n1-highcpu-32
n1-highcpu-4
n1-highmem-16
n1-highmem-8
n1-standard-2
n1-standard-64
n1-standard-8

Sean Feole (sfeole)
tags: added: sru-20200406
Revision history for this message
Sean Feole (sfeole) wrote :

looking at the test it appears that there are several factors hard coded into it.

    def determine_underlay(self):
        underlay = 'bogus'
        cmd = 'ip address'
        output = utils.system_output(cmd, retain_output=False)
        for line in output.split('\n'):
            m = re.search('inet (\d+\.\d+)\.\d+\.\d+\/\d+ brd \d+\.\d+\.\d+\.\d+ scope', line)
            if m:
                underlay = '%s.0.0/16' % m.group(1)
                break
        return underlay

This may be the problem overall and possibly not the right way to approach this, The underlay network should probably be the default routable network. Not every network is a /16 , will take a look to see if I can refactor this a bit.

Revision history for this message
Sean Feole (sfeole) wrote :
Download full text (5.4 KiB)

After further review of the test, there appear to be two scripts which to the bulk of the work,
smoke_test_old.sh and smoke_test-0.13.0.sh

In all cases, smoke_test_old.sh is executed across all of our tests , due to this if statement.

if dpkg --compare-versions $FAN_VERSION lt 0.13; then
        echo "Testing Fan Networking (pre-0.13.0 API)"
        $RUN_DIR/smoke_test_old.sh "$@"
        RC=$?
else
        echo "Testing Fan Networking (0.13.0+ API)"
        $RUN_DIR/smoke_test-0.13.0.sh "$@"
        RC=$?
fi

The latest ubuntu-fan is 0.12.13 which will always be less than 0.13. So this in itself is broken, Looking at the two scripts the newer one appears to be more fluid. However, it needs to be rewritten. To accommodate some changes as fanctl has changed in its output a bit.

The older script appears to test just that basics,

-Fanup/FanDown
-Fan + Docker

That's it, we don't include any of the testing for

- Fan + LXD
- Fan + Ping Remote Host

Remote host could be anything, but obviously when firewalled across labs. that makes this test quite a headache to run. I'm assuming this would be riddled with if statements to identify if we are in the cloud or a metal lab.

To fix this "FOR NOW" so stuff just works, then the function in comment #7 needs to be changed to something like this.

    def determine_underlay(self):
        cmd = 'ip route'
        output = utils.system_output(cmd, retain_output=False)
        m = output.split(" ")[2]
        underlay = m[:-1]+'0'+'/16'

This way, we use the interface/subnet of whatever our default route is. As opposed to the previous method , which used regex to match strings, and in some cases it would fail.

This Works in the clouds as tested on google.

18:43:33 INFO | Writing results to /home/jenkins/autotest/client/results/default
18:43:33 DEBUG| Initializing the state engine
18:43:33 DEBUG| Persistent state client.steps now set to []
18:43:33 DEBUG| Persistent option harness now set to None
18:43:33 DEBUG| Persistent option harness_args now set to None
18:43:33 DEBUG| Selected harness: standalone
18:43:33 INFO | START ---- ---- timestamp=1587062613 localtime=Apr 16 18:43:33
18:43:33 DEBUG| Persistent state client._record_indent now set to 1
18:43:33 INFO | START ubuntu_fan_smoke_test.fan-smoke-test ubuntu_fan_smoke_test.fan-smoke-test timestamp=1587062613 localtime=Apr 16 18:43:33
18:43:33 DEBUG| Persistent state client._record_indent now set to 2
18:43:33 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_fan_smoke_test.fan-smoke-test', 'ubuntu_fan_smoke_test.fan-smoke-test')
18:43:33 DEBUG| Running 'ip route'
18:43:33 DEBUG| Running './ubuntu_fan_smoke_test.sh 10.240.0.0/16'
18:43:33 ERROR| [stderr] + awk /Installed:/{print $2}
18:43:33 ERROR| [stderr] + apt-cache policy ubuntu-fan
18:43:33 ERROR| [stderr] + FAN_VERSION=0.12.13
18:43:33 ERROR| [stderr] + dirname ./ubuntu_fan_smoke_test.sh
18:43:33 ERROR| [stderr] + RUN_DIR=.
18:43:33 ERROR| [stderr] + http_proxy=
18:43:33 ERROR| [stderr] + https_proxy=
18:43:33 ERROR| [stderr] + nc -w 2 squid.internal 3128
18:43:33 ERROR| [stderr] + echo
18:43:33 ERROR| [stderr] + nc -w 2 91.189.89.216 3128
18:43:33 ERROR| [stderr] ...

Read more...

Changed in linux-signed-gke-4.15 (Ubuntu):
status: Triaged → Invalid
Changed in ubuntu-kernel-tests:
assignee: nobody → Sean Feole (sfeole)
importance: Undecided → Medium
status: Triaged → In Progress
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Affecting B-5.4 GCP

tags: added: kqa-blocker sru-20200518
Revision history for this message
Sean Feole (sfeole) wrote : Re: [Bug 1840904] Re: ubuntu_fan_smoke_test failed with enable disable fan test on GKE 4.15 with g1-small and n1-highmem-16 / B-gcp-5.3

This test needs to be re-written, i'll try to prioritize it this week.

On Mon, Jun 8, 2020 at 11:56 AM Po-Hsu Lin <email address hidden>
wrote:

> Affecting B-5.4 GCP
>
> ** Tags added: kqa-blocker sru-20200518
>
> --
> You received this bug notification because you are a member of Canonical
> Kernel Team, which is subscribed to ubuntu-kernel-tests.
> https://bugs.launchpad.net/bugs/1840904
>
> Title:
> ubuntu_fan_smoke_test failed with enable disable fan test on GKE 4.15
> with g1-small and n1-highmem-16 / B-gcp-5.3
>
> Status in ubuntu-kernel-tests:
> In Progress
> Status in linux-signed-gke-4.15 package in Ubuntu:
> Invalid
>
> Bug description:
> This test has passed on other nodes, just these 2 are failing
> Test failed with:
> /usr/sbin/fanatic: bogus: unknown underlay network format
>
> Reproduce rate: 3/3 on these two instances.
>
> However, I tried to test this manually with g1-small, yes the first
> attempt failed with the exact failure, but when I give it a retry
> right away, test passed this time.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 18.04
> Package: linux-image-4.15.0-1041-gke 4.15.0-1041.43
> ProcVersionSignature: Ubuntu 4.15.0-1041.43-gke 4.15.18
> Uname: Linux 4.15.0-1041-gke x86_64
> ApportVersion: 2.20.9-0ubuntu7.7
> Architecture: amd64
> Date: Wed Aug 21 09:52:32 2019
> SourcePackage: linux-signed-gke-4.15
> UpgradeStatus: No upgrade log present (probably fresh install)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1840904/+subscriptions
>

Sean Feole (sfeole)
Changed in ubuntu-kernel-tests:
assignee: Sean Feole (sfeole) → nobody
status: In Progress → Triaged
Po-Hsu Lin (cypressyew)
tags: added: sru-20200831 sru-20200921
tags: added: 4.15
tags: added: groovy
tags: added: sru-20201109
Revision history for this message
Kelsey Steele (kelsey-steele) wrote :

spotted on groovy/gcp 5.8.0-1015.15

tags: added: 5.8 sru-20201130
tags: added: sru-20210104
Ian May (ian-may)
tags: added: sru-20210125
Revision history for this message
Ian May (ian-may) wrote :

found on bionic/gcp 5.4.0-1038.41~18.04.1

tags: added: sru-20210222
tags: added: 5.4
Revision history for this message
Ian May (ian-may) wrote :

found on b/gcp-5.4 (5.4.0-1041.44~18.04.1)

tags: added: sru-20210315
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.