Explore projects
-
This repository contains scripts for retrieving metadata from external sources for the acdc collection.
Updated -
-
Updated
-
adam / heritrix3
Apache License 2.0Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Updated -
Automatic extraction data (e.g. content, title and etc) from archived news pages
Updated -
25th Anniversary "WayForwardMachine" promotional column for Wayback Machine home page.
Updated -
I forked this from https://github.com/steves/PHP-Profiler for use in an experiment
Updated -
www / offshoot-ssr
GNU Affero General Public License v3.0Updated -
See https://git.archive.org/www/tesseract/ instead
Updated -
ia / Sshrc
MIT LicenseUpdated -
Extract structured metadata and content from article PDFs; use this to match against databases of known identifiers.
Updated -
Updated
-
BibTeX-to-web for Web Archiving and Internet Archive stuff (based on anonbib): https://bnewbold.the-nsa.org/pub/archivebib/
Updated -
-
-
Merlijn Wajer / dhSegment
GNU General Public License v3.0 onlyUpdated -
ansible-roles-contrib / ansible-role-nullmailer
GNU General Public License v3.0 onlyUpdated -
ansible-roles-contrib / docker.ubuntu
Apache License 2.0Updated