-
Merlijn Wajer authored
Thanks to Aram Verstegen, still work in progress. commit 6f6a91929eae49f7fb81813dd6eee2f8ead0e8d8 Author: Merlijn Wajer <merlijn@wizzup.org> Date: Thu Jan 20 18:07:26 2022 +0100 hocr-to-epub: remove epub verify Depends on deprecated code commit c65a544bcdd18b616ee342ad5b5c370b3454c206 Author: Aram Verstegen <aram@factorit.nl> Date: Thu Nov 4 19:13:05 2021 +0100 Don't abort for low confidence documents. Allow all file paths to be specified externally commit c8bd25fa3d672841bb8396458810eb1a09ca5e48 Author: Aram Verstegen <aram@factorit.nl> Date: Tue Sep 28 00:37:14 2021 +0200 Trying to improve dehyphenation commit f603149b00081b4b21ffc6f50b4ed3f722d9d399 Author: Aram Verstegen <aram@factorit.nl> Date: Tue Sep 28 00:02:35 2021 +0200 Use fast storage if available commit cd1fed48ff9d620da997b52abcef8c243310a864 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 23:22:16 2021 +0200 Only add textual metadata tag when text is present commit 406e780aa069e56dc03811208e3b74931aadc7a4 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 23:19:36 2021 +0200 Use WORKING_DIR constant for imagestack basenames commit ddf098c921cc15651055cb3bf3a13b1c9826cc20 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 23:15:21 2021 +0200 Avoid divide by zero commit 7a99123b4db07bd9f52467c67f3eb0880b4b81e2 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 23:06:06 2021 +0200 Added comments commit 9f57ed9b352c653e8468ac71fb35f582d299d205 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 23:03:44 2021 +0200 Forgot a word commit 800daa28389e4c2e8adbf00d84351f6de23235d6 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 23:00:46 2021 +0200 Keep the decoded temp files around to speed up cropping multiple images from one page. Keep track of all the temporary files and delete them in the destructor. Tried to improve stylesheet commit aa11017506896ce084d15a737c30af6d111436f8 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 21:05:05 2021 +0200 Added jp2000 to TIFF conversion using kakadu commit 139e5fb3d29bd3a3006f438fed343eede451fe7c Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 20:11:05 2021 +0200 Cleanup commit 0bf4cd8645fea43c76e8e8f0a5e988844063dbe0 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 20:08:16 2021 +0200 Cleanup commit 07805eb01feb04da3926c3e896bf00e98c6ab20a Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 20:07:57 2021 +0200 Show warning instead of omitting pages, try to clean up hyphenation commit 5f068bfd73de26f468bed5621c7605e5a90e4410 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 17:24:08 2021 +0200 Removed shortcut for debugging commit eb649fc0918a13087cd5d6b35cb0a6f3df1d354c Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 17:23:39 2021 +0200 Cleaned up accessibility summary spacing commit 9f9daea19ffb77f08eaf6de57d3959757faefbd2 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 17:21:57 2021 +0200 Fixed usage of iso639 module commit 4d0d8e487b13c0b575d8c8696dc9f21550bca150 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 27 17:16:34 2021 +0200 Fixed (accessibility) metadata commit ef9cd514f71d4fee22ac7207e0eef8ba8c9c39fe Author: Aram Verstegen <aram@factorit.nl> Date: Tue Sep 7 12:21:39 2021 +0200 Skipping pages based on scandata.xml info commit 7d62725ed6b9c2c373002110cb3778d2d04d2e2c Author: Aram Verstegen <aram@factorit.nl> Date: Tue Sep 7 10:52:59 2021 +0200 Adding the cropped images in the epub file commit a0d80ea9e641b96a448214b53eeb8f5316d962a5 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 6 22:51:13 2021 +0200 Organisation and comments commit 94c6f5f1a34919f0d37992b5d96c6b1d9ca1ae0f Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 6 22:37:27 2021 +0200 Comments and naming commit fdf363e41b500aa7f4693e41b2f6028532090d0e Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 6 21:38:51 2021 +0200 Cleanup commit 48b363550fe7064daafd3a7c51430e8690967dcb Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 6 21:25:55 2021 +0200 Make minimum_page_area_pct actually work as a percentage commit 3e3c53ad163f3de1e5a7d26fdb58cd5f13580d51 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 6 21:01:30 2021 +0200 Fixed photo box cleaning logic commit 1394a947461274715687dd5010411940f30c7083 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 6 18:33:04 2021 +0200 Starting with Image Stack (zip file) parsing commit 587328d5724f11b305ca065c9b0ff33091f7b505 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 6 18:32:39 2021 +0200 Removed recursive requirement commit 4493c39c063b48323173f6947ad8748225ebf565 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Sep 6 18:32:22 2021 +0200 Added hocr_page_to_photo_data function commit 4e98dbb1c53641bea647f977f9bddf4dfeb6ff90 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Aug 30 17:28:25 2021 +0200 Take care of metadata provided as lists. Track average OCR word confidence scores commit c0d074a3833edde98834ee76085bee78751ffc4c Author: Aram Verstegen <aram@factorit.nl> Date: Mon Aug 23 19:49:36 2021 +0200 Don't create useless TOCs commit 171507455f3c5202ed1f73a8d50ea7e054acf4cc Author: Aram Verstegen <aram@factorit.nl> Date: Mon Aug 23 19:17:49 2021 +0200 Updated requirements.txt with pinned versions for dependencies commit 9107a7d6e124278235b44cf93c2b9d41aef7a326 Author: Aram Verstegen <aram@factorit.nl> Date: Mon Aug 23 18:56:07 2021 +0200 Put code into a classs. Added metadata parsing and first steps toward verification commit 4d8f5a0277257851ec534c5bbc678ee6fad64173 Author: Aram Verstegen <aram@factorit.nl> Date: Fri Jul 23 13:57:21 2021 +0200 Initial PoC for hOCR to EPUB conversion
f467c624