netcat-openbsd exits too soon

Bug #544935 reported by Moritz Grimm
126
This bug affects 23 people
Affects Status Importance Assigned to Milestone
netcat-openbsd (Ubuntu)
Fix Released
Wishlist
Canonical Server
Natty
Fix Released
Wishlist
Canonical Server

Bug Description

netcat-openbsd's -q setting defaults to 0. This means that as soon as stdin is closed, it will terminate.

For instance, this manifests itself by giving no output used like this:

$ printf 'GET / HTTP/1.0\r\n\r\n' | nc www.google.com 80

because printf terminates, causing nc to terminate before Google responds.

However, running nc without the pipe and typing the GET request works.

netcat-traditional defaults to not quitting on stdin closing (ie a default value of -1).

The original netcat from OpenBSD does not have a -q option if I (Soren) remember correctly and behaves similarly to netcat-traditional.

netcat-openbsd in Debian is not affected either, since this is caused by a patch carried by Ubuntu (which I honestly forget whether I wrote or took from Fedora (it's been more than two years)).

This default value, FWIW, was chosen because libvirt (at least at the time) would do something like "ssh somehost nc -U /var/lib/libvirt/sock" (not passing a -q option) when connecting to remote hosts over ssh. Terminating the ssh session left the stale nc around (because it did not care that stdin had been close) around, making it impossible for others to connect to the same libvirt instance. Patching libvirt to pass -q was not sufficient, because others could be using a stock libvirt and inadvertently cause a Denial of Service of libvirt (because their nc process would stick around indefinitely).

Tags: glucid
Revision history for this message
Stefan Haller (haliner) wrote :

I can confirm this bug.

This bug affects Lucid and also Karmic (and most likely Jaunty and Intrepid, because the package version is the same).

Changed in netcat-openbsd (Ubuntu):
status: New → Confirmed
Revision history for this message
Stefan Haller (haliner) wrote :

I’ve found out something about this bug:

If you use parameter “-q 0”, netcat won’t print the result. This is a bug.
But if you use something else like “-q 1” or something like that, the output will be correct.

A Ubuntu specific patch changes the default value of parameter -q to the “bad” value 0. That’s why the problem doesn’t occur on a Debian system. But if you use “-q 0” (Ubuntu’s default) the bug affects also other distros.

Revision history for this message
Lars Volker (lv) wrote :

Oh come on. They managed to break "nc"? Well done!

Revision history for this message
Lars Volker (lv) wrote :

Oh, and to contribute to the solution: As a workaround, you can add

alias nc='nc -q 1' to your ~./.bash_aliases to revert to the old behaviour (but only for your user).

Revision history for this message
Stefan Krah (stefan-usenet) wrote :

Confirmed on Intrepid. nc.traditional is not affected.

It would be _very_ nice to give this bug a high priority. I spent several hours
searching for bugs in Twisted and asyncore.py because I simply did not
expect a broken netcat.

Revision history for this message
Soren Hansen (soren) wrote :

   -q secs quit after EOF on stdin and delay of secs (-1 to not quit)

Default value is 0, i.e. as soon as stdin EOF's, it terminates. When you're using printf, it EOF's before the remote server manages to send back a response. netcat-traditional happens to default to -1. Each of -1, 0, and X>0 as default values all have their individual idiosyncracies. You just happen to have hit one of the idiosyncracies that come with a default of 0.

If you depend on a specific behaviour, you should be specifically asking for said behaviour rather than relying on default values. Nevertheless, Hardy, Intrepid, Jaunty, Karmic, and now Lucid have all been released with this default behaviour. We are not going to issue an SRU that changes this behaviour, since people may have scripts that rely on it, and not breaking existing setups is always going to be more important than catering for new uses.

Whether this is even a bug is debatable. The docs could certainly point out that the default is 0, but as long as they're not lying and saying it's some other value, I consider it a wishlist request.

Oh, and Lars.. Comments such as #3 is not going to get you anything other than a rude gesture. It'd be helpful if you'd keep such comments to yourself.

Soren Hansen (soren)
summary: - netcat-openbsd stdout broken on Ubuntu
+ netcat-openbsd exits too soon
Revision history for this message
Soren Hansen (soren) wrote :

I've update the description. Feel free to adjust if you feel I'm misrepresenting the issue.

description: updated
Changed in netcat-openbsd (Ubuntu):
importance: Undecided → Wishlist
Revision history for this message
Stefan Krah (stefan-usenet) wrote :

Soren,

thank you for considering the issue. However, I still think this is a bug:

  1) The program is called nc.openbsd and nc on OpenBSD does not
      have this behavior.

  2) I'm not aware of any other nc that has this behavior. One would also
      not ship an echo where -n is the default.

  3) Ubuntu's manual page gives the following example. Note that no -q is specified:

          echo -ne "GET / HTTP/1.0\r\n\r\n" | nc www.google.com 80

Point 3), which comes directly from the OpenBSD documentation, makes it
clear that a default of -q 0 was never intended.

I think the priority should be raised to "high" (or equivalent).

Let me expand on my previous posting. I tested a Twisted app on localhost.
Because the network is fast enough, this issue appeared only in about 2% of
all cases. Since I'm not aware of any nc with this default behavior, I assumed
random failures in the server.

I don't think Ubuntu should change the standard behavior of widespread Unix tools.

Revision history for this message
Stefan Krah (stefan-usenet) wrote :

Soren,

netcat in the most recent Fedora does not have this behavior.

"If you depend on a specific behaviour, you should be specifically asking for
 said behaviour rather than relying on default values."

ISTM that this advice is meant for libvirt. - Seriously, this does not apply when
every nc other than Ubuntu nc has the same default.

Revision history for this message
Moritz Grimm (mgrimm-astaro) wrote :

Hello Soren,

thank you for your explanation.

I am in complete agreement with Stefan; it is a bad idea to change the default behavior of a defacto-standard utility as a workaround for a bug in a different (and apparently Linux-only) product, regardless of how long it's been like that on Ubuntu.

On a side-note, the -q option is not part of netcat on OpenBSD. It's a Debian extension. People relying on netcat-openbsd to behave like netcat on OpenBSD do not know about -q.

It should be reclassified as a bug.

Regards,

Moritz

Revision history for this message
Cliff Frey (clifffrey) wrote :

I also consider this to be a real bug. It breaks many many tests that I have. Debian had exactly this bug, and they fixed it.

The good news is that the bug has been fixed in debian..

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502188

It was fixed with a new version of quit-timer.patch
http://patch-tracker.debian.org/patch/series/view/netcat-openbsd/1.89-4/quit-timer.patch

possibly related:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/201340

Revision history for this message
James Cuzella (trinitronx) wrote :

Why does Ubuntu constantly have to mess with things that had already been working and tested before, and then subsequently release them into the repositories so it can break things for everyone?

This isn't just a problem with their version of netcat. I'm also talking about pulseaudio, upstart, grub2, etc...

Please fix!

Revision history for this message
Marc MacIntyre (marcmac) wrote :

+1 for this as a bug; another issue that I haven't seen mentioned is this breaks scripts that are written for multiple linux distros, many of which don't even support the -q flag. using -i 1 is a portable workaround, but adds a delay that is unwanted.

Revision history for this message
C Meyer (c-meyer) wrote :

I was wondering for a few hours why I could not see the data after piping a request to the server until I found this. I mean there is an easy workaround, but the tutorials found on the internet are not working in ubuntu due to this bug. For a newbie this is quite confusing. Would be better to have the default behaviour.

Revision history for this message
KennoVO (kenno-xs4all) wrote :

I just wasted a full, extremely frustrating working day trying to find why a complicated script involving in-house software and a tangle of named pipes and background commands stopped working. Now it's after midnight, I'm still at work, and I found out it's this. I'm right now very tempted to give up on ubuntu.

Even worse, the -q workaround doesn't work for me; I had -q in my scripts all the time just for good measure. My server-side application is a sponge. Like the unix "sort" command, it first soaks up all the input, and only after it receives and EOF through the input channel, it processes the input and starts producing output. Traditional netcat behavior with -q x would be (from the client-side point of view):
- upon receipt of an EOF on stdin, close the output stream, putting the TCP connection in half-close state (this effectively transmits the EOF to the server, triggering my server-side application to start processing)
- wait x seconds (while data from the server-side application flows in)
- quit (in my case, it actually quits before the x seconds are over because the server closes its side of connection when it finishes processing)

Ubuntu's new borken netcat does the following instead:
- receive an EOF on stdin but don't let the remote host know (I speculate this is a "feature" of BSD netcat but it might also be another ubuntu bug)
- wait x seconds
- quit (only now an EOF is detected at the server side, but it's too late to send data back)

As you can see, my scripts are now broken beyond repair. My only solution is installing netcat-traditional.

One of the great features of the traditional netcat was that it could function as a two-way unix pipe for TCP (hint: net-cat). Swallowing the EOF breaks this, as illustrated by the above example. I don't know what people were thinking when they changed this, but they surely haven't been reading the traditional netcat man page:
       You may be asking "why not just use telnet to connect to arbitrary ports?" Valid question, and
       here are some reasons. Telnet has the "standard input EOF" problem, so one must introduce
       calculated delays in driving scripts to allow network output to finish. This is the main rea-
       son netcat stays running until the *network* side closes.
Further discussions about this issue:
http://oskt.secure-endpoints.com/man/knc.1.pdf
http://www.eggheadcafe.com/software/aspnet/36139087/example-code-for-using-gawks--operators.aspx
nc6 seems to have the best solution: by default, it works like the "BSD-style" netcat, but it has the --half-close flag which makes it work like traditional netcat (and saves the day for me).
http://linux.die.net/man/1/nc6

Revision history for this message
Stefan Krah (stefan-usenet) wrote :

Several people in this thread have indicated that they have wasted hours if not days to track down seemingly random failures.

The Ubuntu manpage guarantees this to work:

echo -n "GET / HTTP/1.0\r\n\r\n" | nc host.example.com 80

The original OpenBSD nc does not even have the -q option:

http://www.openbsd.org/cgi-bin/man.cgi?query=nc&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html

In short, there is no good way to debug the issue quickly, especially if the failures are sporadic (like on localhost). Given that this bug was introduced by Ubuntu, the importance should be raised to "high".

Joel Ebel (jbebel)
tags: added: glucid
Revision history for this message
Robbie Williamson (robbiew) wrote :

I'd like us to consider fixing this for Natty. I understand changing it will break users who have adapted to it, but this wouldn't be the first time functionality of an application has changed between releases. I do not, however support the idea of changing functionality in a stable release, ie Lucid or Maverick, as that is a huge violation of our SRU policy.

Changed in netcat-openbsd (Ubuntu Natty):
assignee: nobody → Canonical Server Team (canonical-server)
Revision history for this message
Chuck Short (zulcss) wrote :

This should be fixed in natty now.

chuck

Changed in netcat-openbsd (Ubuntu Natty):
status: Confirmed → Fix Released
Revision history for this message
Marco Di Bartolomeo (dukez) wrote :

I have spent several hours trying to debug a C concurrent server that wasn't delivering data to netcat clients. Finally I have found this web page, from which I have understood that my application wasn't bugged at all, it was netcat... Someone you usually trust :(
Thank you very much for having fixed this in Natty.

Revision history for this message
The Gavitron (me-gavitron) wrote :

We encountered this bug as we were upgrading our production environment to Lucid. I'm a little disappointed that this fix won't be applied to lucid, as our production environment is now going to have to be modified to work around this bug until the next LTS release, by which point we will have possibly forgotten all about the change in the first place.

While I realise this isn't a severe regression, I think the SRU policy at https://wiki.ubuntu.com/StableReleaseUpdates could be applied to this scenario as a bug which "(1) [has] an obviously safe patch and (2) affect[s] an application rather than critical infrastructure packages"

I would put this on par with removing the -j option from tar, then postponing a fix until the next LTS.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.