generate_fdf extracts fields in UTF-16 format
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
pdftk (Debian) |
Fix Released
|
Unknown
|
|||
pdftk (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: pdftk
The generate_fdf tool outputs field names and field values in what appears to be UTF-16 format. To verify:
$ wget http://
$ pdftk Project2.pdf generate_fdf output Project2.fdf
$ less Project2.fdf
(The "may be a binary file" warning will display.) The field titles ("Text1", "Text2", and so on) are self-contained UTF-16 strings, with their own Byte Order Marks (FE FF) at the beginning. Additionally, the field values consist only of a bare BOM.
This makes it very difficult to manually edit the fields; it also appears to be unnecessary, since entering plain ASCII text in the fields generates the same output as entering UTF-16 text when merging the FDF file back in with fill_form.
I am running pdftk 1.40-2ubuntu3 on Ubuntu Dapper.
description: | updated |
Changed in pdftk: | |
status: | Unknown → Confirmed |
Changed in pdftk: | |
status: | New → Confirmed |
Changed in pdftk (Debian): | |
status: | Confirmed → Fix Released |
The following workaround will turn the fields in the generated FDF files into plain ASCII, assuming that they're convertible, by filtering out the BOMs and the embedded NULLs. (ASCII text converted to UTF-16 looks exactly like the result of sticking NULLs before or after (depending on byte order) each character.)
I doubt it will work if the field names contain anything other than ASCII.
$ cat Project2.fdf | sed -e's/\x00//g' | sed -e's/\xFE\xFF//g' | less