Bug #638955 “emulated netcards don't work with recent sunos kern...” : Bugs : QEMU

Revision history for this message

daniel pecka (dpecka) wrote on 2010-09-17:

#1

reproduced with latest vanilla qemu-kvm ..

i've just build it without any optimalizations like this: `./configure --prefix=$HOME/chroot/opt/qemu-kvm-0.13rc1; make`

(qemu) info version
info version
0.12.91 (qemu-kvm-0.13.0-rc1)

it acts just same .. i'm trying at first to hunt down what has happend in sunos kernel .. well, i hope that we'll be able to fix it as soon as possible because it's just very miserable that we're unable to use the best (in my opinion) virtualization platform ..

regards, daniel

Revision history for this message

daniel pecka (dpecka) wrote on 2010-09-17:

#2

here it is Edit (1.6 KiB, application/zip)

added a output from `kstat -p e1000*` ..

call for more info if needed ..
regards by daniel

ps. summary: everything seems fine (link statistics and so) but receiving just doesn't work .. transmitting works

Revision history for this message

Edgar E. Iglesias (edgar-iglesias) wrote on 2010-09-18: Re: [Qemu-devel] [PATCH] e1000: Pad short frames to minimum size (60 bytes)

#3

On Sat, Sep 18, 2010 at 09:43:45PM +0100, Stefan Hajnoczi wrote:
> The OpenIndiana (Solaris) e1000g driver drops frames that are too long
> or too short. It expects to receive frames of at least the Ethernet
> minimum size. ARP requests in particular are small and will be dropped
> if they are not padded appropriately, preventing a Solaris VM from
> becoming visible on the network.
>
> Signed-off-by: Stefan Hajnoczi <email address hidden>
> ---
> hw/e1000.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/hw/e1000.c b/hw/e1000.c
> index 7d7d140..bc983f9 100644
> --- a/hw/e1000.c
> +++ b/hw/e1000.c
> @@ -55,6 +55,7 @@ static int debugflags = DBGBIT(TXERR) | DBGBIT(GENERAL);
>
> #define IOPORT_SIZE 0x40
> #define PNPMMIO_SIZE 0x20000
> +#define MIN_BUF_SIZE 60
>
> /*
> * HW models:
> @@ -635,10 +636,19 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, size_t size)
> uint32_t rdh_start;
> uint16_t vlan_special = 0;
> uint8_t vlan_status = 0, vlan_offset = 0;
> + uint8_t min_buf[MIN_BUF_SIZE];
>
> if (!(s->mac_reg[RCTL] & E1000_RCTL_EN))
> return -1;
>
> + /* Pad to minimum Ethernet frame length */
> + if (size < sizeof(min_buf)) {
> + memcpy(min_buf, buf, size);
> + memset(&min_buf[size], 0, sizeof(min_buf) - size);
> + buf = min_buf;
> + size = sizeof(min_buf);
> + }
> +

Hi,

This doesn't look right. AFAIK, MAC's dont pad on receive.

IMO this kind of padding should somehow be done by the bridge that forwards
packets into the qemu vlan (e.g slirp or the generic tap bridge).

Cheers

Revision history for this message

Edgar E. Iglesias (edgar-iglesias) wrote on 2010-09-19:

#4

On Sun, Sep 19, 2010 at 01:18:01PM +0200, Michael S. Tsirkin wrote:
> On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> > On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> > <email address hidden> wrote:
> > > This doesn't look right. AFAIK, MAC's dont pad on receive.
> >
> > I agree. NICs that do padding will do it on transmit, not receive.
> > Anything coming in on the wire should already have the minimum length.
> >
> > In QEMU that isn't true today and that's why rtl8139, pcnet, and
> > ne2000 already do this same padding. This patch is the smallest
> > change to cover e1000.
> >
> > > IMO this kind of padding should somehow be done by the bridge that forwards
> > > packets into the qemu vlan (e.g slirp or the generic tap bridge).
> >
> > That should work and we can then drop the padding code from existing
> > NICs. I'll take a look.
> >
> > Stefan
>
> Not all nic devices have to be emulate ethernet, so not all devices want
> the padding, e.g. virtio does not.

Right, ethernet behaviour should obviously not be applied unconditionally for
all net devices.

> It's also easy to imagine an
> ethernet device that strips the padding: would be silly to add it
> just to have it stripped.

I dont beleive that is possible. The FCS comes last, so an ethernet MAC
would have to do really silly things to differentiate between padding and
real payload.

> If we really want to do this generically, we could implement a function dealing
> with the padding, and call it from relevant devices.

Another way is to have network devices register their link types so that the
generic bridge can apply whatever link specific fixups that may be needed.

I would prefer to have the padding of bridged frames decoupled from the
device models, but I cant say I feel very strongly about this.

Cheers

Revision history for this message

daniel pecka (dpecka) wrote on 2010-09-20:

#5

well, feel free to request whichever information you could need or consider as a helpful ..

just for your information after ping via e1000 adapter i can see `arp -n` entry in target system and icmp packets are delivered ok. i'd like to presume that there is some little issue because e1000 driver is really just one taken from sunos kernel the best (althought that we've issue with receiving) .. all others work like trash (no statistic, no available modes, ..)

but as i said, i have *nothing indicating a problem in logs, i already put here a kernel statistic for this driver in attachment ..

regards, daniel

Revision history for this message

Edgar E. Iglesias (edgar-iglesias) wrote on 2010-09-20:

#6

On Mon, Sep 20, 2010 at 10:42:31AM +0200, Kevin Wolf wrote:
> Am 18.09.2010 23:12, schrieb Stefan Hajnoczi:
> > On Sat, Sep 18, 2010 at 9:57 PM, Hervé Poussineau <email address hidden> wrote:
> >> Another patch creating ARP replies at least 64 bytes long has been
> >> committed:
> >> http://git.savannah.gnu.org/cgit/qemu.git/commit/?id=dbf3c4b4baceb91eb64d09f787cbe92d65188813
> >>
> >> Does it fix your issue?
> >
> > No I don't think so. This is an e1000 issue, it will happen if you
> > use tap networking too. The commit you linked to only affects slirp
> > and pads its ARP code.
> >
> > I think there are two places where the minimum frame length can be enforced:
> > 1. The NIC emulation code. This is currently how rtl8139, pcnet, and
> > ne2000 do it. My patch adds the same for e1000.
> > 2. The net layer. If we're emulating Ethernet then it would be
> > possible to pad to minimum frame length in common networking code
> > (net.c).
>
> 3. The sender. I think it should be the sender's decision which packet
> he sends and there's no reason to manipulate it on its way to the guest.
> If the sender sends too short packets, this is where the bug is.

Yes, but when using tap, the ethernet sender is QEMU itself. Tap doesn't
have the same requirements as ethernet so the original sender has no
reason to pad.

Internally in QEMU, there is code that picks up tap packets and
forwards them to the emulated ethernet links, this is were padding
should be done IMO. Not in the device models receive path.

The bridge that forwards frames from tap into emulated links must
also handle different kinds of link types, as all emulated network
devices are not necessarily ethernet.

Cheers

Revision history for this message

Edgar E. Iglesias (edgar-iglesias) wrote on 2010-09-20:

#7

On Mon, Sep 20, 2010 at 10:50:40AM +0200, Kevin Wolf wrote:
> Am 19.09.2010 08:36, schrieb Stefan Hajnoczi:
> > On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> > <email address hidden> wrote:
> >> This doesn't look right. AFAIK, MAC's dont pad on receive.
> >
> > I agree. NICs that do padding will do it on transmit, not receive.
> > Anything coming in on the wire should already have the minimum length.
> >
> > In QEMU that isn't true today and that's why rtl8139, pcnet, and
> > ne2000 already do this same padding. This patch is the smallest
> > change to cover e1000.
>
> What's the reason that it isn't true in QEMU today? Shouldn't we fix
> these problems rather than making device emulations incorrect to
> compensate for it?

Yes we should, I agree.

Cheers

Revision history for this message

Stefan Hajnoczi (stefanha) wrote on 2010-09-20:

#8

Daniel,
Does the following qemu.git patch solve the problem?
http://patchwork.ozlabs.org/patch/65137/raw/

Sorry about the partially mirrored mailing list thread. I expected Launchpad to show the entire discussion but it seems to whitelist only registered users' emails.

Stefan

Revision history for this message

Anthony Liguori (anthony-codemonkey) wrote on 2010-09-20:

#9

0001-tap-make-sure-packets-are-at-least-40-bytes-long.patch Edit (832 bytes, text/x-patch; name="0001-tap-make-sure-packets-are-at-least-40-bytes-long.patch")

On 09/20/2010 05:42 AM, Michael S. Tsirkin wrote:
> On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
>
>> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
>> <email address hidden> wrote:
>>
>>> This doesn't look right. AFAIK, MAC's dont pad on receive.
>>>
>> I agree. NICs that do padding will do it on transmit, not receive.
>> Anything coming in on the wire should already have the minimum length.
>>
> QEMU never gets access to the wire.
> Our APIs do not really pass complete ethernet packets:
> we forward packets without checksum and padding.
>
> I think it makes complete sense to keep this and
> handle padding in devices because we
> have devices that pass the frame to guest without padding and checksum.
> It should be easy to replace padding code in devices that
> need it with some kind of macro.
>

Would this not also address the problem? It sounds like the root cause
is the tap code, not the devices..

Regards,

Anthony Liguori

>
>> In QEMU that isn't true today and that's why rtl8139, pcnet, and
>> ne2000 already do this same padding. This patch is the smallest
>> change to cover e1000.
>>
>>
>>> IMO this kind of padding should somehow be done by the bridge that forwards
>>> packets into the qemu vlan (e.g slirp or the generic tap bridge).
>>>
>> That should work and we can then drop the padding code from existing
>> NICs. I'll take a look.
>>
>> Stefan
>>
>

Revision history for this message

Edgar E. Iglesias (edgar-iglesias) wrote on 2010-09-20:

#10

On Mon, Sep 20, 2010 at 03:31:32PM -0500, Anthony Liguori wrote:
> On 09/20/2010 05:42 AM, Michael S. Tsirkin wrote:
> > On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> >
> >> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> >> <email address hidden> wrote:
> >>
> >>> This doesn't look right. AFAIK, MAC's dont pad on receive.
> >>>
> >> I agree. NICs that do padding will do it on transmit, not receive.
> >> Anything coming in on the wire should already have the minimum length.
> >>
> > QEMU never gets access to the wire.
> > Our APIs do not really pass complete ethernet packets:
> > we forward packets without checksum and padding.
> >
> > I think it makes complete sense to keep this and
> > handle padding in devices because we
> > have devices that pass the frame to guest without padding and checksum.
> > It should be easy to replace padding code in devices that
> > need it with some kind of macro.
> >
>
> Would this not also address the problem? It sounds like the root cause
> is the tap code, not the devices..
>
> Regards,
>
> Anthony Liguori
>
> >
> >> In QEMU that isn't true today and that's why rtl8139, pcnet, and
> >> ne2000 already do this same padding. This patch is the smallest
> >> change to cover e1000.
> >>
> >>
> >>> IMO this kind of padding should somehow be done by the bridge that forwards
> >>> packets into the qemu vlan (e.g slirp or the generic tap bridge).
> >>>
> >> That should work and we can then drop the padding code from existing
> >> NICs. I'll take a look.
> >>
> >> Stefan
> >>
> >
>

> From f77c3143f3fbefdfa2f0cc873c2665b5aa78e8c9 Mon Sep 17 00:00:00 2001
> From: Anthony Liguori <email address hidden>
> Date: Mon, 20 Sep 2010 15:29:31 -0500
> Subject: [PATCH] tap: make sure packets are at least 40 bytes long
>
> This is required by ethernet drivers but not enforced in the Linux tap code so
> we need to fix it up ourselves.

This enforces ethernet semantics on the internal links (which is probably
not good), but it's IMO much better than changing the devices. It also
moves the workaround closer to the root of the problem. IMO, it's a step
in the right direction.

Acked-by: Edgar E. Iglesias <email address hidden>

> Signed-off-by: Anthony Liguori <email address hidden>
>
> diff --git a/net/tap.c b/net/tap.c
> index 4afb314..822241a 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -179,7 +179,13 @@ static int tap_can_send(void *opaque)
> #ifndef __sun__
> ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen)
> {
> - return read(tapfd, buf, maxlen);
> + ssize_t len;
> +
> + len = read(tapfd, buf, maxlen);
> + if (len > 0) {
> + len = MAX(MIN(maxlen, 40), len);
> + }
> + return len;
> }
> #endif
>
> --
> 1.7.0.4
>

On Mon, Sep 20, 2010 at 03:31:32PM -0500, Anthony Liguori wrote:
> On 09/20/2010 05:42 AM, Michael S. Tsirkin wrote:
> > On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> >    
> >> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> >> <edgar.iglesias@gmail.com>  wrote:
> >>      
> >>> This doesn't look right. AFAIK, MAC's dont pad on receive.
> >>>        
> >> I agree.  NICs that do padding will do it on transmit, not receive.
> >> Anything coming in on the wire should already have the minimum length.
> >>      
> > QEMU never gets access to the wire.
> > Our APIs do not really pass complete ethernet packets:
> > we forward packets without checksum and padding.
> >
> > I think it makes complete sense to keep this and
> > handle padding in devices because we
> > have devices that pass the frame to guest without padding and checksum.
> > It should be easy to replace padding code in devices that
> > need it with some kind of macro.
> >    
> 
> Would this not also address the problem?  It sounds like the root cause 
> is the tap code, not the devices..
> 
> Regards,
> 
> Anthony Liguori
> 
> >    
> >> In QEMU that isn't true today and that's why rtl8139, pcnet, and
> >> ne2000 already do this same padding.  This patch is the smallest
> >> change to cover e1000.
> >>
> >>      
> >>> IMO this kind of padding should somehow be done by the bridge that forwards
> >>> packets into the qemu vlan (e.g slirp or the generic tap bridge).
> >>>        
> >> That should work and we can then drop the padding code from existing
> >> NICs.  I'll take a look.
> >>
> >> Stefan
> >>      
> >    
>

> From f77c3143f3fbefdfa2f0cc873c2665b5aa78e8c9 Mon Sep 17 00:00:00 2001
> From: Anthony Liguori <aliguori@us.ibm.com>
> Date: Mon, 20 Sep 2010 15:29:31 -0500
> Subject: [PATCH] tap: make sure packets are at least 40 bytes long
> 
> This is required by ethernet drivers but not enforced in the Linux tap code so
> we need to fix it up ourselves.

This enforces ethernet semantics on the internal links (which is probably
not good), but it's IMO much better than changing the devices. It also
moves the workaround closer to the root of the problem. IMO, it's a step
in the right direction.

Acked-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>

> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> 
> diff --git a/net/tap.c b/net/tap.c
> index 4afb314..822241a 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -179,7 +179,13 @@ static int tap_can_send(void *opaque)
>  #ifndef __sun__
>  ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen)
>  {
> -    return read(tapfd, buf, maxlen);
> +    ssize_t len;
> +
> +    len = read(tapfd, buf, maxlen);
> +    if (len > 0) {
> +        len = MAX(MIN(maxlen, 40), len);
> +    }
> +    return len;
>  }
>  #endif
>  
> -- 
> 1.7.0.4
>

Revision history for this message

Edgar E. Iglesias (edgar-iglesias) wrote on 2010-09-20:

#11

On Mon, Sep 20, 2010 at 03:31:32PM -0500, Anthony Liguori wrote:
> On 09/20/2010 05:42 AM, Michael S. Tsirkin wrote:
> > On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> >
> >> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> >> <email address hidden> wrote:
> >>
> >>> This doesn't look right. AFAIK, MAC's dont pad on receive.
> >>>
> >> I agree. NICs that do padding will do it on transmit, not receive.
> >> Anything coming in on the wire should already have the minimum length.
> >>
> > QEMU never gets access to the wire.
> > Our APIs do not really pass complete ethernet packets:
> > we forward packets without checksum and padding.
> >
> > I think it makes complete sense to keep this and
> > handle padding in devices because we
> > have devices that pass the frame to guest without padding and checksum.
> > It should be easy to replace padding code in devices that
> > need it with some kind of macro.
> >
>
> Would this not also address the problem? It sounds like the root cause
> is the tap code, not the devices..
>
> Regards,
>
> Anthony Liguori
>
> >
> >> In QEMU that isn't true today and that's why rtl8139, pcnet, and
> >> ne2000 already do this same padding. This patch is the smallest
> >> change to cover e1000.
> >>
> >>
> >>> IMO this kind of padding should somehow be done by the bridge that forwards
> >>> packets into the qemu vlan (e.g slirp or the generic tap bridge).
> >>>
> >> That should work and we can then drop the padding code from existing
> >> NICs. I'll take a look.
> >>
> >> Stefan
> >>
> >
>

> From f77c3143f3fbefdfa2f0cc873c2665b5aa78e8c9 Mon Sep 17 00:00:00 2001
> From: Anthony Liguori <email address hidden>
> Date: Mon, 20 Sep 2010 15:29:31 -0500
> Subject: [PATCH] tap: make sure packets are at least 40 bytes long
>
> This is required by ethernet drivers but not enforced in the Linux tap code so
> we need to fix it up ourselves.
>
> Signed-off-by: Anthony Liguori <email address hidden>
>
> diff --git a/net/tap.c b/net/tap.c
> index 4afb314..822241a 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -179,7 +179,13 @@ static int tap_can_send(void *opaque)
> #ifndef __sun__
> ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen)
> {
> - return read(tapfd, buf, maxlen);
> + ssize_t len;
> +
> + len = read(tapfd, buf, maxlen);
> + if (len > 0) {
> + len = MAX(MIN(maxlen, 40), len);

A small detail :)
40 -> 64 (including a dummy FCS).

> + }
> + return len;
> }
> #endif
>
> --
> 1.7.0.4
>

On Mon, Sep 20, 2010 at 03:31:32PM -0500, Anthony Liguori wrote:
> On 09/20/2010 05:42 AM, Michael S. Tsirkin wrote:
> > On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> >    
> >> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> >> <edgar.iglesias@gmail.com>  wrote:
> >>      
> >>> This doesn't look right. AFAIK, MAC's dont pad on receive.
> >>>        
> >> I agree.  NICs that do padding will do it on transmit, not receive.
> >> Anything coming in on the wire should already have the minimum length.
> >>      
> > QEMU never gets access to the wire.
> > Our APIs do not really pass complete ethernet packets:
> > we forward packets without checksum and padding.
> >
> > I think it makes complete sense to keep this and
> > handle padding in devices because we
> > have devices that pass the frame to guest without padding and checksum.
> > It should be easy to replace padding code in devices that
> > need it with some kind of macro.
> >    
> 
> Would this not also address the problem?  It sounds like the root cause 
> is the tap code, not the devices..
> 
> Regards,
> 
> Anthony Liguori
> 
> >    
> >> In QEMU that isn't true today and that's why rtl8139, pcnet, and
> >> ne2000 already do this same padding.  This patch is the smallest
> >> change to cover e1000.
> >>
> >>      
> >>> IMO this kind of padding should somehow be done by the bridge that forwards
> >>> packets into the qemu vlan (e.g slirp or the generic tap bridge).
> >>>        
> >> That should work and we can then drop the padding code from existing
> >> NICs.  I'll take a look.
> >>
> >> Stefan
> >>      
> >    
>

> From f77c3143f3fbefdfa2f0cc873c2665b5aa78e8c9 Mon Sep 17 00:00:00 2001
> From: Anthony Liguori <aliguori@us.ibm.com>
> Date: Mon, 20 Sep 2010 15:29:31 -0500
> Subject: [PATCH] tap: make sure packets are at least 40 bytes long
> 
> This is required by ethernet drivers but not enforced in the Linux tap code so
> we need to fix it up ourselves.
> 
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> 
> diff --git a/net/tap.c b/net/tap.c
> index 4afb314..822241a 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -179,7 +179,13 @@ static int tap_can_send(void *opaque)
>  #ifndef __sun__
>  ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen)
>  {
> -    return read(tapfd, buf, maxlen);
> +    ssize_t len;
> +
> +    len = read(tapfd, buf, maxlen);
> +    if (len > 0) {
> +        len = MAX(MIN(maxlen, 40), len);

A small detail :)
40 -> 64 (including a dummy FCS).

> +    }
> +    return len;
>  }
>  #endif
>  
> -- 
> 1.7.0.4
>

Revision history for this message

Anthony Liguori (anthony-codemonkey) wrote on 2010-09-20:

#12

On 09/20/2010 03:44 PM, Michael S. Tsirkin wrote:
>>> From f77c3143f3fbefdfa2f0cc873c2665b5aa78e8c9 Mon Sep 17 00:00:00 2001
>>> From: Anthony Liguori<email address hidden>
>>> Date: Mon, 20 Sep 2010 15:29:31 -0500
>>> Subject: [PATCH] tap: make sure packets are at least 40 bytes long
>>>
>>> This is required by ethernet drivers but not enforced in the Linux tap code so
>>> we need to fix it up ourselves.
>>>
>>
>> This enforces ethernet semantics on the internal links (which is probably
>> not good),
>>
> Plus plus ungood.
> When we do add e.g. ipoib support, we'll have to go and hunt these bugs down again.
> Also will make it impossible to implement any devices that pass in guest buffers
> without FCS and padding.
>

That's actually a good point which strongly is in favor of making the
devices do the padding themselves.

Regards,

Anthony Liguori

Revision history for this message

Edgar E. Iglesias (edgar-iglesias) wrote on 2010-09-20:

#13

Download full text (4.3 KiB)

On Mon, Sep 20, 2010 at 10:44:34PM +0200, Michael S. Tsirkin wrote:
> On Mon, Sep 20, 2010 at 10:40:35PM +0200, Edgar E. Iglesias wrote:
> > On Mon, Sep 20, 2010 at 03:31:32PM -0500, Anthony Liguori wrote:
> > > On 09/20/2010 05:42 AM, Michael S. Tsirkin wrote:
> > > > On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> > > >
> > > >> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> > > >> <email address hidden> wrote:
> > > >>
> > > >>> This doesn't look right. AFAIK, MAC's dont pad on receive.
> > > >>>
> > > >> I agree. NICs that do padding will do it on transmit, not receive.
> > > >> Anything coming in on the wire should already have the minimum length.
> > > >>
> > > > QEMU never gets access to the wire.
> > > > Our APIs do not really pass complete ethernet packets:
> > > > we forward packets without checksum and padding.
> > > >
> > > > I think it makes complete sense to keep this and
> > > > handle padding in devices because we
> > > > have devices that pass the frame to guest without padding and checksum.
> > > > It should be easy to replace padding code in devices that
> > > > need it with some kind of macro.
> > > >
> > >
> > > Would this not also address the problem? It sounds like the root cause
> > > is the tap code, not the devices..
> > >
> > > Regards,
> > >
> > > Anthony Liguori
> > >
> > > >
> > > >> In QEMU that isn't true today and that's why rtl8139, pcnet, and
> > > >> ne2000 already do this same padding. This patch is the smallest
> > > >> change to cover e1000.
> > > >>
> > > >>
> > > >>> IMO this kind of padding should somehow be done by the bridge that forwards
> > > >>> packets into the qemu vlan (e.g slirp or the generic tap bridge).
> > > >>>
> > > >> That should work and we can then drop the padding code from existing
> > > >> NICs. I'll take a look.
> > > >>
> > > >> Stefan
> > > >>
> > > >
> > >
> >
> > > From f77c3143f3fbefdfa2f0cc873c2665b5aa78e8c9 Mon Sep 17 00:00:00 2001
> > > From: Anthony Liguori <email address hidden>
> > > Date: Mon, 20 Sep 2010 15:29:31 -0500
> > > Subject: [PATCH] tap: make sure packets are at least 40 bytes long
> > >
> > > This is required by ethernet drivers but not enforced in the Linux tap code so
> > > we need to fix it up ourselves.
> >
> >
> > This enforces ethernet semantics on the internal links (which is probably
> > not good),
>
> Plus plus ungood.
> When we do add e.g. ipoib support, we'll have to go and hunt these bugs down again.
> Also will make it impossible to implement any devices that pass in guest buffers
> without FCS and padding.

If we dont remove the padding from the device models rx paths, we
will continue with code that relies on it and it is IMO wrong.
Ethernet MAC's don't padd nor append checksum on receive.

I agree with you that it's not great that the internal link
protocol has to be strictly ethernet but it seems to me like
if that is reality today, with or without Anthonys patch.
slirp and tap both require ethernet semantics (except possibly
padding and FCS). The addressing and packet headers are ethernet.

In the long run, I'd rather see a more flexible in...

On Tue, Sep 21, 2010 at 11:17:07AM +0200, Michael S. Tsirkin wrote:
> On Mon, Sep 20, 2010 at 10:51:36PM +0200, Edgar E. Iglesias wrote:
> > On Mon, Sep 20, 2010 at 03:31:32PM -0500, Anthony Liguori wrote:
> > > On 09/20/2010 05:42 AM, Michael S. Tsirkin wrote:
> > > > On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> > > >    
> > > >> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> > > >> <edgar.iglesias@gmail.com>  wrote:
> > > >>      
> > > >>> This doesn't look right. AFAIK, MAC's dont pad on receive.
> > > >>>        
> > > >> I agree.  NICs that do padding will do it on transmit, not receive.
> > > >> Anything coming in on the wire should already have the minimum length.
> > > >>      
> > > > QEMU never gets access to the wire.
> > > > Our APIs do not really pass complete ethernet packets:
> > > > we forward packets without checksum and padding.
> > > >
> > > > I think it makes complete sense to keep this and
> > > > handle padding in devices because we
> > > > have devices that pass the frame to guest without padding and checksum.
> > > > It should be easy to replace padding code in devices that
> > > > need it with some kind of macro.
> > > >    
> > > 
> > > Would this not also address the problem?  It sounds like the root cause 
> > > is the tap code, not the devices..
> > > 
> > > Regards,
> > > 
> > > Anthony Liguori
> > > 
> > > >    
> > > >> In QEMU that isn't true today and that's why rtl8139, pcnet, and
> > > >> ne2000 already do this same padding.  This patch is the smallest
> > > >> change to cover e1000.
> > > >>
> > > >>      
> > > >>> IMO this kind of padding should somehow be done by the bridge that forwards
> > > >>> packets into the qemu vlan (e.g slirp or the generic tap bridge).
> > > >>>        
> > > >> That should work and we can then drop the padding code from existing
> > > >> NICs.  I'll take a look.
> > > >>
> > > >> Stefan
> > > >>      
> > > >    
> > > 
> > 
> > > From f77c3143f3fbefdfa2f0cc873c2665b5aa78e8c9 Mon Sep 17 00:00:00 2001
> > > From: Anthony Liguori <aliguori@us.ibm.com>
> > > Date: Mon, 20 Sep 2010 15:29:31 -0500
> > > Subject: [PATCH] tap: make sure packets are at least 40 bytes long
> > > 
> > > This is required by ethernet drivers but not enforced in the Linux tap code so
> > > we need to fix it up ourselves.
> > > 
> > > Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> > > 
> > > diff --git a/net/tap.c b/net/tap.c
> > > index 4afb314..822241a 100644
> > > --- a/net/tap.c
> > > +++ b/net/tap.c
> > > @@ -179,7 +179,13 @@ static int tap_can_send(void *opaque)
> > >  #ifndef __sun__
> > >  ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen)
> > >  {
> > > -    return read(tapfd, buf, maxlen);
> > > +    ssize_t len;
> > > +
> > > +    len = read(tapfd, buf, maxlen);
> > > +    if (len > 0) {
> > > +        len = MAX(MIN(maxlen, 40), len);
> > 
> > 
> > A small detail :)
> > 40 -> 64 (including a dummy FCS).
> 
> I don't think so: e1000 at least has code to tack the FCS on,
> so we'll end up with a 68 bytes.

And at the moment e1000 also has padding, both padding
and FCS appending should go away from ethernet models before
this goes in.

Anyway, if you guys maintaining the networking parts are in
agreement that padding and FCS appending should be done in
the device models (at least for the moment), I'll accept
that and back-off. In that case, I think your suggestion
of hiding things behind some kind of generic macro or
function would be good. At least it will clarify things.

Cheers

Revision history for this message

daniel pecka (dpecka) wrote on 2010-10-02:

#17

well, i did some more investigations and here come a results ..

this patch http://patchwork.ozlabs.org/patch/65137/raw/ solves problem partially .. NICs are working with that but after a deeper look, connection is lost when the netstack is flooded with higher traffic ..

i can connect with ssh|telnet from qemu-kvm host to sunos machines, but when i type dmesg for example (or anything else which does for a moment a higher traffic), the connection freezes ..

when i bind both tap ifaces under one bridge, access each machine via theirs /dev/console, conection to neighboring guest seems like works as expected, so this issue only affects connection between kvm host and guests ..

sorry for my very plain description of problem, but it's again easy to reproduce ..

so once more in short:

two machines with following settings:
-net nic,model=e1000,macaddr="00:50:56:ba:5e:74",vlan=1 \
-net tap,ifname=oi0,script=no,vlan=1 & ## openindiana

-net nic,model=e1000,macaddr="00:50:56:ba:6e:74",vlan=1 \
-net tap,ifname=solaris0,script=no,vlan=1 & ## solaris

1) ping over directly assigned address on oi0|solaris0 works, connection is lost when invoked higher trafic aka - ssh|telnet in there and then typed dmesg command or whatever else which floods /dev/stdin and invokes due to the that higher traffic

2) when created bridge (brctl addbr br0; brctl addif br0 oi0 solaris0) and assigned address it behaves same way with exception, that when used /dev/console on each of them for connection to second machine, netstack seems like working there okay ..

regards, daniel

Revision history for this message

Stefan Hajnoczi (stefanha) wrote on 2010-10-12: Re: [Bug 638955] Re: emulated netcards don't work with recent sunos kernel

#18

On Sat, Oct 2, 2010 at 8:23 PM, daniel pecka <email address hidden> wrote:
> well, i did some more investigations and here come a results ..
>
> this patch http://patchwork.ozlabs.org/patch/65137/raw/ solves problem
> partially .. NICs are working with that but after a deeper look,
> connection is lost when the netstack is flooded with higher traffic ..

I haven't looked more into this but noticed an e1000 patch from
Anthony Perard which may improve the Solaris experience:
http://patchwork.ozlabs.org/patch/67594/

Stefan

Revision history for this message

daniel pecka (dpecka) wrote on 2011-01-03:

#19

is this issue dead ?? can i do something for help to fix it?

regards, daniel

Revision history for this message

Stefan Hajnoczi (stefanha) wrote on 2011-01-04:

#20

On Mon, Jan 3, 2011 at 1:40 PM, daniel pecka <email address hidden> wrote:
> is this issue dead ?? can i do something for help to fix it?

I believe no one has investigated this issue since my last comment.
Someone with time and interest in Solaris needs to step up to debug
this problem.

DTrace inside the guest and QEMU tracing (see docs/tracing.txt) are
good tools for figuring out what is going on in the Solaris device
driver and QEMU's hardware emulation, respectively.

If you know a previous QEMU version where a network device works under
Solaris you could use git-bisect(1) to find the commit that broke
Solaris. From what you've said though, it seems the issue is with new
Solaris kernels rather than changes in QEMU.

Stefan

Revision history for this message

daniel pecka (dpecka) wrote on 2011-01-04:

#21

okay Stefan ..

thanks, i poked several people and trying to learn up how netstack works .. i have no experience with programming drivers .. i hope that we'll fix it soon cuz it's very bad that we're unable to use kvm|qemu

regards, daniel

Revision history for this message

Stefan Weil (ubuntu-weilnetz) wrote on 2011-01-04:

#22

Hi Daniel,

I just tried a newer version of the indiana iso image
(http://dlc-origin.openindiana.org/isos/148/oi-dev-148-x86.iso) with
latest qemu (not qemu-kvm) on a debian amd64 linux host, and I had no problems
with networking (ssh from qemu's emulated indiana host to physical linux host).

Tested with e1000 and i82559c, both work.

Does the error only occur with the older iso image?
Or is it caused by qemu-kvm?

Regards,
Stefan

Revision history for this message

Daniel Kvasnicka (daniel-kvasnicka-jr) wrote on 2011-01-29:

#23

I can confirm this. Just spent hours studying my network configuration in OpenIndiana b148 running in Qemu KVM and figuring out what's wrong... Everything's OK, network is up but I won't even ping the gateway.
Please fix this soon!

Revision history for this message

geppz (no-carrier) wrote on 2011-02-28:

#24

Hi all,
I can confirm this bug,
on latest openindiana-148 and qemu-kvm 0.13.0 you cannot even ping the virtualization host.
With qemu-kvm-0.14.0 (just released!) you CAN ping the host: this is already an improvement.
HOWEVER
biggest bug is still there: if you log in to the openindiana machine via ssh and do "dmesg" or "netstat" or some other command which ouptuts a lot of text, the tcp socket will hang (well say it hangs once every 3 attempts) forever.

Going with tcpdump -e from within the guest, I have identified that the problem is when a big enough packet is outputed.
I tried a few times with dmesg, and as soon as the tcp packet reaches the following length:

18:38:28.340097 52:54:69:b5:89:11 (oui Unknown) > 00:19:b9:81:2c:52 (oui Unknown), ethertype IPv4 (0x0800), length 1514: 192.168.7.38.ssh > 192.168.7.52.59008: Flags [.], ack 2824, win 64436, options [nop,nop,TS val 27488132 ecr 6063255], length 1448

it cannot get through. Then the IP stack tries and retries to send the same identical packet, but there will never be any reply from the other side. Finally the socket is torn down.

I have bridged networking for the VM. My bridge is a normal linux bridge br0 with MTU 1500.
Has MTU anything to do with all this?
Is it a linux-bridge bug or a qemu-kvm bug?

Please fix this, solaris is important for its ZFS.
Thank you

Revision history for this message

Stefan Hajnoczi (stefanha) wrote on 2011-03-01:

#25

On Mon, Feb 28, 2011 at 7:06 PM, geppz <email address hidden> wrote:
> Going with tcpdump -e from within the guest, I have identified that the problem is when a big enough packet is outputed.
> I tried a few times with dmesg, and as soon as the tcp packet reaches the following length:
>
> 18:38:28.340097 52:54:69:b5:89:11 (oui Unknown) > 00:19:b9:81:2c:52 (oui
> Unknown), ethertype IPv4 (0x0800), length 1514: 192.168.7.38.ssh >
> 192.168.7.52.59008: Flags [.], ack 2824, win 64436, options [nop,nop,TS
> val 27488132 ecr 6063255], length 1448
>
> it cannot get through. Then the IP stack tries and retries to send the
> same identical packet, but there will never be any reply from the other
> side. Finally the socket is torn down.
>
> I have bridged networking for the VM. My bridge is a normal linux bridge br0 with MTU 1500.
> Has MTU anything to do with all this?
> Is it a linux-bridge bug or a qemu-kvm bug?

Excellent, thanks for posting these details. The bug is probably in
the NIC hardware emulation and I think we can track this one down
fairly easily.

Can you please post your qemu-kvm command-line including the NIC model
that you are using?

Stefan

Revision history for this message

geppz (no-carrier) wrote on 2011-03-01:

#26

Emulated NIC is e1000.

I found out that if one reduces the MTU on the client like "ifconfig eth0 mtu 300" it seems ssh hangs much more rarely (but still hangs, at 300).
Reducing it on the virtualization host bridge is not enough though (unless you are initiating ssh from the virtualization host itself)
To trigger the hang, do:
while true ; do dmesg ; done
The higher the allowed MTU, the quicker the hang, e.g. MTU 500 hangs within one minute. 1500 hangs instantly.

Command line is the following. Excuse the length... it's a libvirt

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/local/qemu-kvm-0.14.0/bin/qemu-system-x86_64 -S -M pc-0.14 -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -name openindiana1 -uuid ed0b8483-d186-1f39-39ef-97194a1f02bf -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/openindiana1.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -no-acpi -boot c -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/dev/mapper/datavg1-openindiana1,if=none,id=drive-ide0-0-0,boot=on,format=raw,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=54,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:69:b5:89:11,bus=pci.0,addr=0x3 -usb -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

I'm available to try patches for a while if somebody can spot the problem... the host is still not in production.

Thanks for your work

Revision history for this message

Stefan Hajnoczi (stefanha) wrote on 2011-03-05:

#27

Download full text (5.0 KiB)

I was able to reproduce this problem with qemu.git running OpenIndiana 148 with tap and bridge on the host. I did not see an issue with the userspace network stack - seems to manifest itself as a checksum error in transmitted packets.

Here is the host tcpdump during a TCP stall with mtu 1500:

19:47:54.601950 IP 192.168.122.33.22 > 192.168.122.1.40611: Flags [P.], seq 6949:7509, ack 3545, win 64436, options [nop,nop,TS val 24455 ecr 111832709], length 560
19:47:54.601966 IP 192.168.122.1.40611 > 192.168.122.33.22: Flags [.], ack 7509, win 163, options [nop,nop,TS val 111832710 ecr 24455], length 0
19:47:54.602312 IP 192.168.122.33.22 > 192.168.122.1.40611: Flags [P.], seq 7509:8069, ack 3545, win 64436, options [nop,nop,TS val 24455 ecr 111832709], length 560
19:47:54.602325 IP 192.168.122.1.40611 > 192.168.122.33.22: Flags [.], ack 8069, win 171, options [nop,nop,TS val 111832710 ecr 24455], length 0

Everything went fine up to here but now the stall shows up...

19:47:54.602594 IP 192.168.122.33.22 > 192.168.122.1.40611: Flags [P.], seq 8069:8629, ack 3545, win 64436, options [nop,nop,TS val 24455 ecr 111832709], length 560
19:47:54.602831 IP 192.168.122.33.22 > 192.168.122.1.40611: Flags [P.], seq 8629:9189, ack 3545, win 64436, options [nop,nop,TS val 24455 ecr 111832709], length 560
19:47:54.602847 IP 192.168.122.1.40611 > 192.168.122.33.22: Flags [.], ack 8069, win 171, options [nop,nop,TS val 111832710 ecr 24455,nop,nop,sack 1 {8629:9189}], length 0

Notice that only seq up to 8069 was acked by the host and this is a duplicate ack. I think it's prodding the guest to transmit from 8069 again.

19:47:54.603447 IP 192.168.122.33.22 > 192.168.122.1.40611: Flags [P.], seq 9189:9749, ack 3545, win 64436, options [nop,nop,TS val 24456 ecr 111832710], length 560
19:47:54.603459 IP 192.168.122.1.40611 > 192.168.122.33.22: Flags [.], ack 8069, win 171, options [nop,nop,TS val 111832710 ecr 24455,nop,nop,sack 1 {8629:9749}], length 0
19:47:54.603734 IP 192.168.122.33.22 > 192.168.122.1.40611: Flags [P.], seq 9749:10309, ack 3545, win 64436, options [nop,nop,TS val 24456 ecr 111832710], length 560
19:47:54.603751 IP 192.168.122.1.40611 > 192.168.122.33.22: Flags [.], ack 8069, win 171, options [nop,nop,TS val 111832710 ecr 24455,nop,nop,sack 1 {8629:10309}], length 0
19:47:54.603882 IP 192.168.122.33.22 > 192.168.122.1.40611: Flags [P.], seq 8069:8629, ack 3545, win 64436, options [nop,nop,TS val 24456 ecr 111832710], length 560
19:47:55.021608 IP 192.168.122.33.22 > 192.168.122.1.40611: Flags [.], seq 8069:9517, ack 3545, win 64436, options [nop,nop,TS val 24498 ecr 111832710], length 1448
19:47:55.578667 STP 802.1d, Config, Flags [none], bridge-id 8000.da:7b:46:27:8c:aa.8001, length 35
19:47:55.851350 IP 192.168.122.33.22 > 192.168.122.1.40611: Flags [.], seq 8069:9517, ack 3545, win 64436, options [nop,nop,TS val 24581 ecr 111832710], length 1448
19:47:57.577496 STP 802.1d, Config, Flags [none], bridge-id 8000.da:7b:46:27:8c:aa.8001, length 35
19:47:57.625504 IP 192.168.122.33.22 > 192.168.122.1.40611: Flags [.], seq 8069:9517, ack 3545, win 64436, options [nop,nop,TS val 24745 ecr 111832710], length 1448

Resends and more duplicate acks up ...

QEMU

emulated netcards don't work with recent sunos kernel

Bug Description

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Changed in qemu:
status:	New → Fix Released