Allow binary dumping to stdout

Bug #192174 reported by Matteo Croce
12
Affects Status Importance Assigned to Milestone
tcpflow (Ubuntu)
Confirmed
Wishlist
Unassigned
Nominated for Dapper by Nwallins
Nominated for Feisty by Nwallins
Declined for Gutsy by Luca Falavigna
Nominated for Hardy by Nwallins
Nominated for Intrepid by Nwallins

Bug Description

Binary package hint: tcpflow

This patch adds the ability to dump binary data to output, so it can be processed with another tool.
eg.
tcpflow -B tcp port 80 | mytool

Revision history for this message
Matteo Croce (teknoraver) wrote :
Revision history for this message
Matteo Croce (teknoraver) wrote :

Don't print LF when binary dumping

Murat Gunes (mgunes)
Changed in tcpflow:
importance: Undecided → Wishlist
Revision history for this message
Nwallins (rick-hull) wrote :

Matteo,

I have been manually patching tcpflow for this feature on my own.

My way was much cruder -- I have been using the -C flag -- removed the automatic invocation of -s and totally disabled the newlines by commenting out the putchar("\n") in tcpip.c

I strongly support the addition of this flag and behavior to main package.

Purely as an addendum, I have attached the README I created to remind me how to patch tcpflow on a new platform. Again, your diff looks much cleaner.

Regards,
Rick

description: updated
Revision history for this message
Nwallins (rick-hull) wrote :

I nominated this for release. I am fairly new to Launchpad, so my apologies if this was inappropriate.

Revision history for this message
binary.koala (binary-koala) wrote :

there seems to be a slight logical mistake in Matteo's patch.
Unless it is a 'feature' to have '\n' after each packet the very last lines of the patch should read:

<code>
- putchar('\n');
+ if(strip_nonprint)
+ putchar('\n');
   fflush(stdout);
</code>

i.e. if(strip_nonprint) statement needs to be inverted.

Revision history for this message
Nwallins (rick-hull) wrote : Re: [Bug 192174] Re: Allow binary dumping to stdout

makes sense. I don't think I ever actually tested Matteo's patch.

I still support this feature and behavior.

- Rick

On Thu, Feb 25, 2010 at 7:55 AM, binary.koala <email address hidden>wrote:

> there seems to be a slight logical mistake in Matteo's patch.
> Unless it is a 'feature' to have '\n' after each packet the very last lines
> of the patch should read:
>
> <code>
> - putchar('\n');
> + if(strip_nonprint)
> + putchar('\n');
> fflush(stdout);
> </code>
>
> i.e. if(strip_nonprint) statement needs to be inverted.
>
> --
> Allow binary dumping to stdout
> https://bugs.launchpad.net/bugs/192174
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
binary.koala (binary-koala) wrote :

another thing,
with Matteo's patch applied -B doesn't seem to output sequential (read consistent) streams.
new stream is being printed as soon as it's put together regardless if another stream is being printed at that moment, which results in streams being mixed together... which defeats the whole purpose of tcpflow :()
this, of course, doesn't happen if you save streams into files and then 'cat' them together.

to reproduce (requires tcpflow patched with http://launchpadlibrarian.net/11992351/20_stdout-dump.diff):

cd /tmp
mkdir dump; cd dump
# run two _parallel_ tcpflow processes in background
sudo su
tcpflow -i ethX 'port 80' &
tcpflow -i ethX -B > ../stdout.dump &

# run two parallel downloads
wget http://upload.wikimedia.org/wikipedia/commons/8/8a/Ptolemy_Cosmographia_Sarmatia%2BRha-river.jpg -O /dev/null & wget http://upload.wikimedia.org/wikipedia/commons/0/09/Skeleton_of_boiled_woman.jpg -O /dev/null

#when completed - stop tcpflow
killall tcpflow
cd ..

now edit stdout.dump and remove two HTTP GET headers and compare it with appropriate dump file from the first tcpflow instance, e.g:

hexdiff dump/remote.server.00080-local.ip.12345 stdout.dump

somewhere in the middle of 'stdout.dump' you will notice new HTTP header injected in the middle of binary JPEG data.

any ideas?

Revision history for this message
Nwallins (rick-hull) wrote :

Did you take a look at what I did to get tcpflow to behave as desired?

http://launchpadlibrarian.net/18406455/tcpflow_patch_readme.txt

I am not much of a C hacker. Maybe it's better to start from scratch to get
the desired behavior, rather than starting w/ Matteo's patch?

- Rick

On Thu, Feb 25, 2010 at 11:31 AM, binary.koala <email address hidden>wrote:

> another thing,
> with Matteo's patch applied -B doesn't seem to output sequential (read
> consistent) streams.
> new stream is being printed as soon as it's put together regardless if
> another stream is being printed at that moment, which results in streams
> being mixed together... which defeats the whole purpose of tcpflow :()
> this, of course, doesn't happen if you save streams into files and then
> 'cat' them together.
>
> to reproduce (requires tcpflow patched with
> http://launchpadlibrarian.net/11992351/20_stdout-dump.diff):
>
> cd /tmp
> mkdir dump; cd dump
> # run two _parallel_ tcpflow processes in background
> sudo su
> tcpflow -i ethX 'port 80' &
> tcpflow -i ethX -B > ../stdout.dump &
>
> # run two parallel downloads
> wget
> http://upload.wikimedia.org/wikipedia/commons/8/8a/Ptolemy_Cosmographia_Sarmatia%2BRha-river.jpg-O /dev/null & wget
> http://upload.wikimedia.org/wikipedia/commons/0/09/Skeleton_of_boiled_woman.jpg-O /dev/null
>
> #when completed - stop tcpflow
> killall tcpflow
> cd ..
>
> now edit stdout.dump and remove two HTTP GET headers and compare it
> with appropriate dump file from the first tcpflow instance, e.g:
>
> hexdiff dump/remote.server.00080-local.ip.12345 stdout.dump
>
> somewhere in the middle of 'stdout.dump' you will notice new HTTP header
> injected in the middle of binary JPEG data.
>
> any ideas?
>
> --
> Allow binary dumping to stdout
> https://bugs.launchpad.net/bugs/192174
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
binary.koala (binary-koala) wrote :

yes, i did look at your patch too, but i don't see how it would solve 'stream mixing' bug.

i'm neither a C hacker, but desperately trying to marry tcpflow with foremost by using a pipe :)
i did try tcpxtract, but it looks dated and suffers some problems of producing broken binaries.

what do you mean by starting from scratch, new sets of patches or a completely new app?

Danja

Revision history for this message
Nwallins (rick-hull) wrote :

Danja,

Ok, I never noticed the stream mixing bug. Or, hm, maybe I did -- I think i
would get anomalous results every now and then. I didn't get a chance to
really evaluate your analysis on that.

And yes, I definitely like to pipe tcpflow to a binary decoder for doing
traffic analysis. I haven't heard of tcpxtract.

So I think we are definitely on the exact same page as to how we want
tcpflow to behave.

As far as starting from scratch, I mean starting from the latest tcpflow
upstream source, before applying Matteo's or anyone else's patch. If that
software contains your stream mixing bug, then obviously that will need to
be worked out in any case. I take it you don't see Matteo's patch as
introducing the stream mixing bug.

In any case, I don't think I can help much with the actual development. But
I can provide moral and technical support, as to demonstrating this is a
valid use case, and doing testing, etc.

- Rick

On Thu, Feb 25, 2010 at 12:24 PM, binary.koala <email address hidden>wrote:

> yes, i did look at your patch too, but i don't see how it would solve
> 'stream mixing' bug.
>
> i'm neither a C hacker, but desperately trying to marry tcpflow with
> foremost by using a pipe :)
> i did try tcpxtract, but it looks dated and suffers some problems of
> producing broken binaries.
>
> what do you mean by starting from scratch, new sets of patches or a
> completely new app?
>
> Danja
>
> --
> Allow binary dumping to stdout
> https://bugs.launchpad.net/bugs/192174
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
binary.koala (binary-koala) wrote :

indeed looks like we need similar functionality, and probably not only us.
i looked into the source of tcpip.c and it is obvious that print_packet() function doesn't take any care of stream ordering:

void print_packet(flow_t flow, const u_char *data, u_int32_t length)
{
  printf("%s: ", flow_filename(flow));
  fwrite(data, length, 1, stdout);
  putchar('\n');
  fflush(stdout);
}

we would need some sort of locking/buffering mechanism here that would wait/buffer streams and print them out sequentially.

as i cannot rewrite in myself, i guess for now i will use dump files with iwatch to run foremost against every new session file that i get from tcpflow.

Revision history for this message
Simson Garfinkel (simsong) wrote :

Hi. I've taken over the maintenance of tcpflow. The new version has support for IPv6 and VLANs. The next version will output in DFXML and be significantly faster and more scalable.

I am happy to implement the binary output, but I do not understand its' purpose. What is the format of the binary output? What is the advantage of binary to XML?

Regarding the issue of simultaneous outputs to the same file --- that can happen to the same file but not to stout. If it is useful to have output to the same file, I can implement locking. However there is no easy way to detect that multiple processes are outputting to the same file, so I will need to implement this as a flag to prevent overhead.

Any thoughts?

Revision history for this message
Nwallins (rick-hull) wrote :

Hi, thanks for following up. What i was looking for is a raw dump of the
TCP stream without any byte or character substitution. In this way, I
could pipe the tcpflow output to a decoder, in order to get a human
readable display of a proprietary binary message format.

Does that make sense?

I don't (try to) use tcpflow for this any more, but i still feel strongly
that it should support this mode of operation.

Thanks,
Rick
On Dec 26, 2011 5:10 PM, "Simson Garfinkel" <email address hidden>
wrote:

> Hi. I've taken over the maintenance of tcpflow. The new version has
> support for IPv6 and VLANs. The next version will output in DFXML and be
> significantly faster and more scalable.
>
> I am happy to implement the binary output, but I do not understand its'
> purpose. What is the format of the binary output? What is the advantage
> of binary to XML?
>
> Regarding the issue of simultaneous outputs to the same file --- that
> can happen to the same file but not to stout. If it is useful to have
> output to the same file, I can implement locking. However there is no
> easy way to detect that multiple processes are outputting to the same
> file, so I will need to implement this as a flag to prevent overhead.
>
> Any thoughts?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/192174
>
> Title:
> Allow binary dumping to stdout
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/tcpflow/+bug/192174/+subscriptions
>

Changed in tcpflow (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.