Imitating PDF Forms in HTML

Recently I got a customer’s request for an eCommerce web site: They wanted to display existing PDF forms in the browser, so that their users can complete them online, send the form data to the server and receive a filled PDF.

Of course most browsers can display PDF directly and even allow to fill out embedded forms. But I wouldn’t know how to get the form data back to the server?

I worked with PDFs and AcroForms some years ago and so came up with a simple solution using the neat Apache PdfBox that actually works pretty well.

PDF form to HTML form

The basic idea is illustrated by this figure:

pdf_form_process.png

  1. Load and analyze the PDF containing the AcroForm.
  2. Create an image of the PDF (JPEG/PNG) and display this image inside an HTML
    tag.
  3. Read the PDF form’s meta-data and create corresponding elements.
  4. Place the fields on the image at the exact position like in the PDF with JavaScript.

Analyze the form and retrieve field data

With PdfBox a PDF file can be loaded with PDDocument.load(file); and the form fields looked up via document.getDocumentCatalog().getAcroForm().getFields(). Each field has one (or sometimes more) widgets associated, that contain the necessary metadata like the field’s name, the type and the exact position inside the PDF. The position is stored in absolute values (pt/inches) and should be translated to a relative position corresponding to the absolute the page size.

The PDF can be converted to an image with PdfBox very simple: new PDFRenderer(document).renderImageWithDPI(page, 150, ImageType.RGB);. This will create an java.awt.image.BufferedImage which can be transformed easily to a JPEG or PNG later.

Here is the full (slightly simplified) code for retrieving the form meta-date and an image:

public class PdfFormAnalyzer implements Closeable {

  private final PDDocument document;

  public PdfFormAnalyzer(File file) throws IOException {
      this.document = PDDocument.load(file);
  }

  public List getFormFields() {
      final PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
      return acroForm.getFields().stream()
              .map(this::toFormField)
              .collect(Collectors.toList());
  }

  public BufferedImage getImageOfPage(int page) throws IOException {
       PDFRenderer pdfRenderer = new PDFRenderer(document);
       return pdfRenderer.renderImageWithDPI(page, 150, ImageType.RGB);
  }

  ...

  private FormField toFormField(PDField pdField) {
      PDDocumentCatalog catalog = document.getDocumentCatalog();
      final FormField result = new FormField(pdField.getFullyQualifiedName());
      withWidget(pdField, (w) -> {
          final float pheight = widget.getPage().getMediaBox().getHeight();
          final float pwidth = widget.getPage().getMediaBox().getWidth();
          result.setPage(catalog.getPages().indexOf(widget.getPage()));
          result.setType(pdField.getFieldType());
          result.setLeft(w.getRectangle().getLowerLeftX() / pwidth);
          result.setTop((pheight - w.getRectangle().getUpperRightY()) / pheight);
          result.setHeight(w.getRectangle().getHeight() / pheight);
          result.setWidth(w.getRectangle().getWidth() / pwidth);
      });
      return result;
  }

  private void withWidget(PDField pdField, Consumer f) {
      final List widgets = pdField.getWidgets();
      if (widgets != null && !widgets.isEmpty()) {
          f.accept(widgets.get(0));
      }
  }

  public void close() throws IOException {
      document.close();       
  }

}

HTTP Backend – Using JAX-RS

To allow the browser to access the meta-data (as JSON) and the image a single JAX-RS endpoint will do:

@Path("/forms")
public class FormsRessource {

    @GET
    @Path("{id}/field-info")
    @Produces(MediaType.APPLICATION_JSON)
    public Response getFieldInfo(@PathParam("id") String pdfId) {
        return withPdf(pdfId, (pdf) -> Response.ok(pdf.getFormFields()).build());
    }

    @GET
    @Path("{id}/image/{page}")
    @Produces("image/png")
    public Response getImage(
            @PathParam("id") String pdfId,
            @PathParam("page") int page) {
        return withPdf(pdfId, (pdf) -> Response.ok(pdf.getImageOfPage(page)).build());
    }

    ....

    @FunctionalInterface
    private interface PdfFunction {
        R apply(T t) throws IOException;
    }

    private Response withPdf(String pdfId, PdfFunction f) {
        final File pdfFile = new File("./content/", pdfId);
        if (!pdfFile.exists()) {
            return Response.status(Response.Status.NOT_FOUND).build();
        } else {
            try (final PdfFormAnalyzer analyzer = new PdfFormAnalyzer(pdfFile)) {
                return f.apply(analyzer);
            } catch (IOException e) {
                return Response.serverError().entity(e.getMessage()).build();
            }
        }
    }       

}

So given there is a PDF named ‘f1.pdf’ in the ‘./content’ directory, the form meta data can be fetched with GET /forms/f1.pdf/field-info, a PNG image of the PDF’s first page with GET /forms/f1.pdf/image/0.

Imitating the PDF form

Finally we need a simple HTML site that displays the image and places the HTML input fields accordingly.

Here is the minimal site:

Screen Shot 2018-08-20 at 14.09.59.png

It will just display an empty HTML form with the image of the PDF form in the background. Now we will add the form elements with some inline JavaScript using JQuery.

First the JSON form info is loaded with JQuery:

    $.getJSON("../app/forms/f1.pdf/field-info", function(data) {
      $(".form-background").load(function() {

        populateForm(data);

      }).each(function() {
        if(this.complete) $(this).load();
      });          
    });

The call to the function populateForm() is wrapped in a load()-handler of the image so that we will wait until the image is really loaded and we know the eventual dimensions.

The populate function will then determine offsets and dimensions of the image and calculate the fields’ positions from relative values into absolute pixel positions.

  function populateForm(data) {
    data.filter(function(e) {return e.page === 0; }).forEach(function(f) {
      var editor = $(".editor-main");
      var bgform = $(".form-background");

      var offset_x = bgform.offset().left;
      var offset_y = bgform.offset().top;
      var qw = bgform.width();
      var qh = bgform.height();

      var ff = $(".form-field[name=" + f.name + "]");
      if (ff.size() === 0) {
        if (f.type === "Btn") {
          ff = $('');
        } else {
          ff = $('');
        }
        editor.append(ff);
      }

      ff.attr('name', f.name);
      ff.css('width', f.width * qw);
      ff.css('height', f.height * qh);
      ff.css('left', offset_x + f.left * qw);
      ff.css('top', offset_y + f.top * qh);

    })
  }

To adapt the fields’ positions when the browser is resized, another handler should be implemented:

  $( window ).resize(function() {
    populateForm(data);
  });

Finally at least a little CSS is needed for the input fields, to enable absolute positioning:

input.form-field {
    position: absolute;
    z-index: 1;
    border: 0;
    padding: 1px 2px 1px 5px;
    background-color: rgba(220, 220, 255, 0.3);
}

The result will look like this:

pdf_form_sample_web_view.png

On the left side you see the original PDF and on the right side the “image + HTML-Form” imitation.

Submit form and handle data

The last step now is to receive the input data, create the final PDF and store or forward the form data.

Actually we only need one more method in our JAX-RS resource to process the HTML form when it is submitted with all the fields. The form was defined as:

 

So the endpoint must accept a POST request that is url form encoded:

@POST
 @Path("{id}/process")
 @Consumes("application/x-www-form-urlencoded")
 @Produces("application/pdf")
 public Response process(
         @PathParam("id") String pdfId,
         MultivaluedMap formParams) {
     return withPdf(pdfId, (pdf) -> {
         final ByteArrayOutputStream out = new ByteArrayOutputStream();
         pdf.fillPdf(flatten(formParams), out);
         return Response.ok(out.toByteArray()).build();
     });
 }

The fillPdf() method expects the parameters and an output-stream to write the PDF result. The corresponding code looks like:

public void fillPdf(Map params, OutputStream out) throws IOException {
    final PDAcroForm form = document.getDocumentCatalog().getAcroForm();
    for (PDField field : form.getFields()) {
        final String value = params.get(field.getFullyQualifiedName());
        if (field instanceof PDCheckBox) {
            PDCheckBox cb = (PDCheckBox) field;
            if ("on".equals(value)) {
                cb.check();
            }
        } else if (value != null && !value.isEmpty()) {
            field.setValue(value);
        }
    }
    form.flatten();
    document.save(out);
}

Besides PDF generation in process handler the submitted data could be stored, emailed or whatwever needed.

The final PDF with forms would look like this:

pdf_form_result.png

You can get the full sample code at GitHub.

Leave a comment