• Merlijn Wajer's avatar
    hocr: allow parsing more hOCR documents · 6cdb14db
    Merlijn Wajer authored
    It looks like some documents do not contain the xhtml namespace, and
    also do not use ocr_par, but rather ocrx_block. The code will still
    assume that the direct children of these nodes are lines, though.
    6cdb14db