Timeout being hit when using remote and testruns involve lengthy tests that make machine go silent
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Next Generation Checkbox (CLI) |
Invalid
|
High
|
Unassigned |
Bug Description
UPDATE: this happens any time there's a lengthy test run that makes a machine go silent for too long. I have some pretty large machines that could go silent for a very long time due to network, memory or CPU testing. I am now seeing this affect some machines during cpu stress testing as well as during network tests.
Unfortunately, this is still blocking me from being able to do full regression testing.
Original description:
I think I'm hitting a timeout with checkbox-remote in the polling it does when a target disappears. The scenario is that remote is taking to the target via NIC-1. During testing of NIC-2, NIC-3 and NIC-4, the system may disappear from the network for several hours (the network testing is 1 hour per port minimum), so it's possible that NIC-1 will disappear for at least 3 hours in this scenario.
So what appears to be happening is that the test is initiated by checkbox-remote, the network test fires, and because NIC-1 is gone for longer than $TIMEOUT, checkbox just stops polling and says that the connection is now lost, killing the session.
We need to be able to either alter this timeout, or disable it entirely, something like:
checkbox-cli master --polling-
so that the master will wait a full 5 hours before declaring the session dead. Depending on the network config of the SUT, this is an entirely common possibility, and in some cases, it's possible that the network test could cause the system to disappear for 12 hours or longer (given a large enough number of multi-port NICs installed).
summary: |
Timeout being hit when using remote and target machine disappears for - too long + network testing |
Changed in checkbox-ng: | |
milestone: | none → 1.11.0 |
Changed in checkbox-ng: | |
status: | New → Fix Released |
Changed in checkbox-ng: | |
milestone: | 1.11.0 → 1.12.0 |
status: | Fix Released → New |
summary: |
- Timeout being hit when using remote and target machine disappears for - network testing + Timeout being hit when using remote and testruns involve lengthy tests + that make machine go silent |
description: | updated |
tags: | added: blocks-hwcert-server |
Changed in checkbox-ng: | |
importance: | Medium → High |
Changed in checkbox-ng: | |
milestone: | 1.12.0 → 1.13.0 |
Changed in checkbox-ng: | |
milestone: | 1.13.0 → none |
Changed in checkbox-ng: | |
status: | New → Incomplete |
Soooo... any chance this can be fixed soon?