Explore projects
-
-
-
Updated
-
Tesserotate (aka tesseract-baselines-rotate)
Calculates image rotation to rotate them to their proper orientation with tesseract, using the baselines as found by the OSD
Updated -
Tesserotate (aka tesseract-baselines-rotate)
Calculates image rotation to rotate them to their proper orientation with tesseract, using the baselines as found by the OSD
Updated -
Tesseract deriver module used to OCR items with tesseract. Outputs hOCR and various metadata keys.
Updated -
Updated
-
Temporary repository for implementations of https://arxiv.org/pdf/1905.13038.pdf
Will contain at least a Cython implementation and a C implementation (using leptonica to read images)
Updated -
-
Extract structured metadata and content from article PDFs; use this to match against databases of known identifiers.
Updated -
archived 0Updated
-
Updated
-
Updated
-
archived 0Updated
-
Automatic extraction data (e.g. content, title and etc) from archived news pages
Updated -
Updated
-
Updated
-
Updated
-
Updated