pdf-to-hocr: fix ocr_line bug and add scaler
This commit fixes a bug where ocr_line elements would not have any title for some PDFs (like ones created by OCRMyPDF). This commit also adds PDF Metadata JSON as a requirement to make the hOCR files, using the information contained within to estimate the DPI and to scale the hOCR coordinates.
Please register or sign in to comment