Explore GitLab
Discover projects, groups and snippets. Share your projects with others
-
Extract structured metadata and content from article PDFs; use this to match against databases of known identifiers.
-
Cors gateway, created in desperation at the absence of such ...
-
This is a repository which will contain work to recreate IMR/IMF's archive entry pages, which mostly comprise lists of seeds and full-text search results.
-
Tesseract deriver module used to OCR items with tesseract. Outputs hOCR and various metadata keys.
-
-
-
-
-
-
pointers to slides/talks and projects
-
Go client for Crawl HQ v3
-
Holds Seeder and Tracker that improve seeding of IA resources
-
deno prototype minimal archive.org website using Search, MDAPI, and other REST APIs
-
lightweight JS-only slimmed down archive.org website prototype
javascript web components rendertron+ 2 more -
A Video Editor Powered by the Web
Popcorn Editor is a forked version of the https://popcorn.webmaker.org website with the server backend ripped out that's made to be embedded in pretty much any website needed.
video editin... video clipping nginx+ 3 more -
Shared code for Python deriver modules
-
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.