Fix up and validate hOCR output
hocr-combine
seems to turn the XHTML that Tesseract outputs into HTML, with invalid XML tags, this is not OK.
We want to make sure the output is XHTML compliant, and also validate the result with xmllint
.
hocr-combine
seems to turn the XHTML that Tesseract outputs into HTML, with invalid XML tags, this is not OK.
We want to make sure the output is XHTML compliant, and also validate the result with xmllint
.