Fast Models [7.1.42 (May 25 2012)] Does not open userNetPorts while socket is in TIME_WAIT state

Bug #1034809 reported by Alexander Sack
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
LAVA Dispatcher
Won't Fix
High
Unassigned
Linaro Fast Models
New
Undecided
Unassigned

Bug Description

I use -C motherboard.hostbridge.userNetPorts='5555=5555' ... if all is normal, i can start the model and half a second later i will see the port 5555 opened as LISTEN in lsof...

however, sometimes more often than not, that doesnt' work and the fastmodel never brings up the 5555 port.

we rely on this feature in LAVA and we see lots of flakiness on many sides, with this potentially triggering a large part of it.

dmart wondered on IRC:
11:13 < dmart> Can the port be customised? Is this a TIME_WAIT problem?
11:14 < asac> dmart: so i use
11:14 < asac> -C motherboard.hostbridge.userNetPorts='5555=5555
11:14 < asac> and so on
11:14 < asac> dmart: now if i start the fast model, if things go well i see a LISTEN 5555 on the host mahcine right away
11:15 < asac> dmart: but in 8 out of 10 runs the port is not opened atall
11:15 < asac> dmart: in cases where it works it seems to be not coupled with the actual target system opening a port
11:15 < asac> it just opens the port right away (which makse sense)
11:15 < asac> so yeah... short: hostbridge.userNetPorts not created in 8 out of 10 cases

note that the telnet ports for serial are always opened properly.

peter maydell pointed out:
11:28 < pm215> it would be good to be able to make the model use the socket option that allows rebinding

also we observed some weird effects where network traffic through this 5555 port gets stalled from time to time and telnet into one of the serial soockets will make the traffic resume - but no further details on this observation yet. Might just indicate a wider issue on the userport traffic side

Related branches

Alexander Sack (asac)
description: updated
description: updated
Alexander Sack (asac)
summary: - Fast Models [7.1.42 (May 25 2012)] UserPort Networking is sometimes
- flaky
+ Fast Models [7.1.42 (May 25 2012)] UserPort Networking feels flaky
Revision history for this message
Alexander Sack (asac) wrote : Re: Fast Models [7.1.42 (May 25 2012)] UserPort Networking feels flaky

OK, from my experiments I can say that TIME_WAIT might indeed have an influence on this. I couldn't reproduce a not-created socket for 5555 if i wait for TIME_WAIT socket to go away.

Fastmodel probably should use SO_REUSEADDR to fix this.

We probably can work around in LAVA by waiting for TIME_WAIT socket to die.

Changed in lava-dispatcher:
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
Alexander Sack (asac) wrote :

guess high for LAVA fastmodel project. we should land a workaround first. and then improve the way we shut down fast models. ATM we just kill them blindly, while we should properly destroy other resources such as the adb connection (through adb disconnect) before terminating the model.

Changed in lava-dispatcher:
importance: Critical → High
summary: - Fast Models [7.1.42 (May 25 2012)] UserPort Networking feels flaky
+ Fast Models [7.1.42 (May 25 2012)] Does not open userNetPorts when
+ socket is in TIME_WAIT state
summary: - Fast Models [7.1.42 (May 25 2012)] Does not open userNetPorts when
+ Fast Models [7.1.42 (May 25 2012)] Does not open userNetPorts while
socket is in TIME_WAIT state
Alan Bennett (akbennett)
Changed in lava-dispatcher:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers