fmt -f <unlimited> should be supported

Bug #868747 reported by jimav
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
coreutils (Ubuntu)
New
Undecided
Unassigned

Bug Description

Strictly speaking this is an enhancement request.

fmt imposes an artificial limit on the maximum output line length controlled by the -f option, which prevents using this tool to "join" together all lines in each paragraph (for any paragraph size). This operation is necessary to prepare plain-text for import into a word processor such as LibreOffice, where "manual line breaks" (i.e. newlines in the middle of paragraphs) are undesirable. If fmt could be run with an effectively unlimited max line length, then it could be used for this purpose.

Ideally there would be a way to specify an explicitly unlimited output line length (say, -f -1).

SUMMARY:
  'fmt -f 9999 file.txt' gives error "invalid width". It should accept an arbitrarily-large value (up to max 32-bit integer).
  Ideally, an option would allow specifying an explicitly unlimited (or maximum) output line length.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: coreutils 8.5-1ubuntu3
ProcVersionSignature: Ubuntu 2.6.35-30.59-generic 2.6.35.13
Uname: Linux 2.6.35-30-generic x86_64
Architecture: amd64
Date: Wed Oct 5 14:45:27 2011
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release Candidate amd64 (20100928)
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.utf8
 SHELL=/bin/bash
SourcePackage: coreutils

Revision history for this message
jimav (james-avera) wrote :
description: updated
Revision history for this message
Jim Meyering (meyering) wrote : Re: [Bug 868747] Re: fmt -f <unlimited> should be supported

jimav wrote:
> Strictly speaking this is an enhancement request.
>
> fmt imposes an artificial limit on the maximum output line length
> controlled by the -f option, which prevents using this tool to "join"

You meant -w, not -f, throughout.

Thanks for the suggestion. Note that the code has this:

    /* Size of paragraph buffer, in words and characters. Longer paragraphs
       are handled neatly (cf. flush_paragraph()), so long as these values
       are considerably greater than required by the width. These values
       cannot be extended indefinitely: doing so would run into size limits
       and/or cause more overflows in cost calculations. FIXME: Remove these
       arbitrary limits. */

    #define MAXWORDS 1000
    #define MAXCHARS 5000

where MAXCHARS/2 specifies the largest width.
I.e., fmt -w 2500 works, but not 2501.

We agree that there should not be such a limit.
But the internals of fmt are not pretty -- significantly less
so than most other parts of the coreutils, and as the comment says
we cannot easily increase them arbitrarily.

In the mean time what can you do if you want truly unlimited-length
paragraphs? It's not trivial since you want to retain paragraph delimiters.
This perl command should do the trick.
It processes your input a paragraph at a time, replacing each newline
(and spaces before/after) with a single space:

    perl -00ple 's/\s*\n\s*/ /g'

E.g., given this input,

1
2
3
4

1
2
3
4
5

It prints this:

    $ (seq 4; echo; seq 5) | perl -00ple 's/\s*\n\s*/ /g'
    1 2 3 4

    1 2 3 4 5

It doesn't preserve indentation, but if you're just going to
paste it into libreoffice, that should be fine.

I've Cc'd the upstream bug-tracker, so we'll have a bug number there, too.

Revision history for this message
jimav (james-avera) wrote :
Download full text (3.8 KiB)

Hi,
Thanks for the substantive reply, and the Perl -00 trick.
-Jim

>________________________________
>From: Jim Meyering <email address hidden>
>To: <email address hidden>
>Sent: Wednesday, October 5, 2011 11:39 PM
>Subject: Re: [Bug 868747] Re: fmt -f <unlimited> should be supported
>
>jimav wrote:
>>  Strictly speaking this is an enhancement request.
>>
>>  fmt imposes an artificial limit on the maximum output line length
>>  controlled by the -f option, which prevents using this tool to "join"
>
>You meant -w, not -f, throughout.
>
>Thanks for the suggestion.  Note that the code has this:
>
>    /* Size of paragraph buffer, in words and characters.  Longer paragraphs
>      are handled neatly (cf. flush_paragraph()), so long as these values
>      are considerably greater than required by the width.  These values
>      cannot be extended indefinitely: doing so would run into size limits
>      and/or cause more overflows in cost calculations.  FIXME: Remove these
>      arbitrary limits.  */
>
>    #define MAXWORDS    1000
>    #define MAXCHARS    5000
>
>where MAXCHARS/2 specifies the largest width.
>I.e., fmt -w 2500 works, but not 2501.
>
>We agree that there should not be such a limit.
>But the internals of fmt are not pretty -- significantly less
>so than most other parts of the coreutils, and as the comment says
>we cannot easily increase them arbitrarily.
>
>In the mean time what can you do if you want truly unlimited-length
>paragraphs?  It's not trivial since you want to retain paragraph delimiters.
>This perl command should do the trick.
>It processes your input a paragraph at a time, replacing each newline
>(and spaces before/after) with a single space:
>
>    perl -00ple 's/\s*\n\s*/ /g'
>
>E.g., given this input,
>
>1
>2
>3
>4
>
>1
>2
>3
>4
>5
>
>It prints this:
>
>    $ (seq 4; echo; seq 5) | perl -00ple 's/\s*\n\s*/ /g'
>    1 2 3 4
>
>    1 2 3 4 5
>
>It doesn't preserve indentation, but if you're just going to
>paste it into libreoffice, that should be fine.
>
>I've Cc'd the upstream bug-tracker, so we'll have a bug number there,
>too.
>
>--
>You received this bug notification because you are subscribed to the bug
>report.
>https://bugs.launchpad.net/bugs/868747
>
>Title:
>  fmt -f <unlimited> should be supported
>
>Status in “coreutils” package in Ubuntu:
>  New
>
>Bug description:
>  Strictly speaking this is an enhancement request.
>
>  fmt imposes an artificial limit on the maximum output line length
>  controlled by the -f option, which prevents using this tool to "join"
>  together all lines in each paragraph (for any paragraph size).  This
>  operation is necessary to prepare plain-text for import into a word
>  processor such as LibreOffice, where "manual line breaks" (i.e.
>  newlines in the middle of paragraphs) are undesirable.  If fmt could
>  be run with an effectively unlimited max line length, then it could be
>  used for this purpose.
>
>  Ideally there would be a way to specify an explicitly unlimited output
>  line length (say, -f -1).
>
>  SUMMARY:
>    'fmt -f 9999 file.txt' gives error "invalid width".  It should accept an arbitrarily-large value (up to max 32-bit integer).
>    Ideally, an ...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.