• Merlijn Wajer's avatar
    pdf-to-hocr: fix ocr_line bug and add scaler · a7d074de
    Merlijn Wajer authored
    This commit fixes a bug where ocr_line elements would not have any
    title for some PDFs (like ones created by OCRMyPDF).
    This commit also adds PDF Metadata JSON as a requirement to make the
    hOCR files, using the information contained within to estimate the DPI
    and to scale the hOCR coordinates.