Explore projects
-
This repository contains utilities for working with the 78rpm collection.
Updated -
Merlijn Wajer / archive-pdf-tools
GNU Affero General Public License v3.0Updated -
-
Updated
-
IA serverless deriving .mp4 from Popcorn Projects. logical mirror pull from https://github.com/Laurian/popcorn-exporter + .gitlab-ci.yml
Updated -
-
Tesserotate (aka tesseract-baselines-rotate)
Calculates image rotation to rotate them to their proper orientation with tesseract, using the baselines as found by the OSD
Updated -
Create 'perfect' CDs from CD items or CDs from LP items, or segment LP side recordings into separate tracks
Updated -
archivecd / morituri
GNU General Public License v3.0 onlyUpdated -
Updated
-
Merlijn Wajer / archive-hocr-tools
GNU Affero General Public License v3.0Updated -
Aram Verstegen / archive-hocr-tools
GNU Affero General Public License v3.0Updated -
This repository contains scripts for retrieving metadata from external sources for the acdc collection.
Updated -
Updated
-
Automatic extraction data (e.g. content, title and etc) from archived news pages
Updated -
See https://git.archive.org/www/tesseract/ instead
Updated -
Extract structured metadata and content from article PDFs; use this to match against databases of known identifiers.
Updated -
Updated
-
BibTeX-to-web for Web Archiving and Internet Archive stuff (based on anonbib): https://bnewbold.the-nsa.org/pub/archivebib/
Updated -
Merlijn Wajer / dhSegment
GNU General Public License v3.0 onlyUpdated