generate_fdf followed by fill_form changes checkbox fields

Bug #192410 reported by Adam Buchbinder
2
Affects Status Importance Assigned to Milestone
pdftk (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: pdftk

The generate_fdf command appears not to properly extract checkbox fields, as merging the extracted FDF back in with fill_form changes the values in the PDF. Extracting and then merging the FDF information without editing it should result in an unchanged document. To verify:

$ wget http://www.irs.gov/pub/irs-pdf/f1040.pdf
$ pdftk f1040.pdf generate_fdf output f1040.fdf
$ pdftk f1040.pdf fill_form f1040.fdf output f1040-fill.pdf
$ evince f1040{,-fill}.pdf

Checkboxes (some of them actually radio buttons, some of them plain checkboxes) in f1040-fill.pdf will be checked; the corresponding fields in f1040.pdf are not marked.

I don't know exactly which stage of the process isn't working, but my best guess is that checkbox and radio button fields aren't being extracted properly. In the file, fields are labeled either (with numbers in place of #s) f#_#(0) or c#_#(0). Putting values in the f-fields causes them to show up in the form, but putting values in the c-fields does not. Additionally, while the f-field sections in the FDF look like this (all FDF excerpts were manually converted to ASCII; see bug 192398):

<<
/V ()
/T (f2_20\(0\))
>>

the c-fields look like this:

<<
/V /
/T (c2_13\(0\))
>>

I don't think this is a valid field entry, though I'm not familiar with FDF syntax.

I am running pdftk 1.40-2ubuntu3 on Ubuntu Dapper.

Revision history for this message
Adam Buchbinder (adam-buchbinder) wrote :

Consulting the PDF Reference (looking at version 1.6, available at http://partners.adobe.com/public/developer/en/pdf/PDFReference16.pdf ), it seems that the problem is that the FDF being generated is incomplete.

While the spec requires the FT (field type) key only in the PDF itself (p. 637), it doesn't include that key in the FDF format (p. 677). (This is why I'm guessing as to which fields are button-type and which are text-type. It might be helpful, though, to include a comment above non-"Tx"-type fields--that is, "Btn", "Ch" or "Sig"--which would help the user to interpret the value. (Radio buttons and checkboxes are "Btn" fields, the specifics of which depend on the value in the Ff key, which (I think) is allowed but not required to be exported.) The V (value) key for "Btn" fields is "a name object representing the check box’s appearance state, which is used to select the appropriate appearance from the appearance dictionary" (p. 648); I don't know how much other stuff that pulls into the FDF file--maybe it just references a dictionary kept in the PDF itself? It looks like the names are predefined ("appearance for the off state is optional but, if present, must be stored in the appearance dictionary under the name Off. The recommended (but not required) name for the on state is Yes") but the name for the "on" value can be changed.

In any case, it appears that the value for the checkbox fields, if not the radio fields, should read "/V /Yes" or "/V /Off", which suggests that most probably the field values aren't being properly exported--that, or (somewhat less likely) there's something fishy with the test PDF itself which causes the button fields not to export properly.

Revision history for this message
Adam Buchbinder (adam-buchbinder) wrote :

I've mailed the upstream author; there's no upstream bugtracker.

Changed in pdftk:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.