Explore GitLab
Discover projects, groups and snippets. Share your projects with others
-
lightweight JS-only slimmed down archive.org website prototype
javascript web components rendertron+ 2 more -
2nd version of search interface of Wayback Machine
-
A repository of reusable scripts to do useful tasks and queries.
-
WARC writing with Colly
-
WARC-ify content stored in directories
-
wip ufiche titlebar image preparation (for OCR)
-
reusable docker image consisting of (just) ubuntu baseline + jake's ia tool
-
-
pointers to slides/talks and projects
-
-
Tesserotate (aka tesseract-baselines-rotate)
Calculates image rotation to rotate them to their proper orientation with tesseract, using the baselines as found by the OSD
-
Tesserotate (aka tesseract-baselines-rotate)
Calculates image rotation to rotate them to their proper orientation with tesseract, using the baselines as found by the OSD
-
-
-
Tesseract deriver module used to OCR items with tesseract. Outputs hOCR and various metadata keys.
-
-
-
westmere tensorflow builds
-